Bayes Knowledge: Smart data – not big data: August 2017

Monday, 14 August 2017

The likelihood ratio and its use in the 'grooming gangs' news story

This blog has reported many times previously (see links below) about problems with using the likelihood ratio. Recall that the likelihood ratio is commonly used as a measure of the probative value of some evidence E for a hypothesis H; it is defined as the probability of E given H divided by the probability of E given not H.

There is especially great confusion in its use where we have data for the probability of H given E rather than for the probability of E given H. Look at the somewhat confusing argument here in relation to the offence of 'child grooming' which is taken directly from the book McLoughlin, P. “Easy Meat: Inside Britain’s Grooming Gang Scandal.” (2016):

Given the sensitive nature of the grooming gangs story in the UK and the increasing number of convictions, it is important to get the maths right. The McLoughlin book is the most thoroughly researched work on the subject. What the author of the book is attempting to determine is the likelihood ratio of the evidence E with respect to the hypothesis H where:

H: “Offence is committed by a Muslim” (so not H means “Offence is committed by a non-Muslim”)

E: “Offence is child grooming”

In this case, the population data cited by McLoughlin provides our priors P(H)=0.05 and, hence, P(not H)=0.95. But we also have the data on child grooming convictions that gives us P(H | E)=0.9 and, hence, P(not H | E)=0.1.

What we do NOT have here is direct data on either P(E|H) or P(E|not H). However, we can still use Bayes theorem to calculate the likelihood ratio since:

So, in the example we get:

Hence, while the method described in the book is confusing, the conclusion arrived at is (almost) correct (the slight error in the result, namely 170.94 instead of 171, is caused by the authors rounding 10 divided by 95% to 10.53).

See also

Friday, 11 August 2017

Automatically generating Bayesian networks in analysis of linked crimes

Constructing an effective and complete Bayesian network (BN) for individual cases that involve multiple related pieces of evidence and hypotheses requires a major investment of effort. Hence, generic BNs have been developed for common situations that only require adapting the underlying probabilities. These so called `idioms’ make it practically possible to build and use BNs in casework without spending unacceptable amounts of time constructing the network. However, in some situations both the probability tables and the structure of the network depend on case specific details.

Examples of such situations are where there are multiple linked crimes. In (deZoete2015) a BN structure was produced for evaluating evidence in cases where a person is suspected of being the offender in multiple possibly linked crimes. In (deZoete2017) this work has been expanded to cover situations with multiple offenders for possibly linked crimes. Although the papers present a methodology of constructing such BNs, the workload associated with constructing them together with the possibility of making mistakes in conditional probability tables, still present unnecessary difficulties for potential users.

As part of the BAYES KNOWLEDGE project, we have developed online accessible GUIs that allow the user to select the parameters that reflect their crime linkage situation (both for one and double offender crime linkage cases). The associated BN is then automatically generated according to the structures described in (deZoete2015) and (deZoete2017). It is presented visually in the GUI and is available as download for the user as a .net file which can be opened in AgenaRisk or another BN software package. These applications both serve as a tool for those interested or working with crime linkage problems and as a proof of principle of the added value of such GUIs to make BNs accessible by removing the effort of constructing every network from scratch.

The GUIs are available from the `DEMO’ tab on the BAYES KNOWLEDGE website and is based on R code, a statistical programming language. This automated workflow can reduce the workload for, in this case, forensic statisticians and increase the mutual understanding between researchers and legal professionals.

Jacob deZoete will be presenting this work at the 10th International Conference on Forensic Inference and Statistics (ICFIS 2017) in Minneapolis, September 2017.

Links

The working demo for single offender multiple crimes
The working demo for two offenders and linked crimes
de Zoete, J, Sjerps, M, Lagnado,D, Fenton, N.E. (2015), "Modelling crime linkage with Bayesian Networks" Law, Science & Justice, 55(3), 209-217. http://doi:10.1016/j.scijus.2014.11.005 Pre-publication draft here.
de Zoete, J, Sjerps, M, Evaluating evidence in linked crimes with multiple offenders. Science & Justice, 57(3): pp 228-238. https://doi.org/10.1016/j.scijus.2017.01.003

menu

Monday, 14 August 2017

The likelihood ratio and its use in the 'grooming gangs' news story

Friday, 11 August 2017

Automatically generating Bayesian networks in analysis of linked crimes