Bayes Knowledge: Smart data – not big data: 2015

Tuesday, 8 December 2015

Norman Fenton at Maths in Action Day (Warwick University)

Today Norman Fenton was one of the five presenters at the Mathematics in Action Day at Warwick University - the others included writer and broadcaster Simon Singh and BBC presenter Steve Mould (who is also part of the amazing trio Festival of the Spoken Nerd which features Queen Mary's Matt Parker). The Maths in Action day is specifically targeted at A-Level Maths students and their teachers.

Norman says:

This was probably the biggest live event I have spoken at - an audience of 550 in the massive Butterworth Hall (which has recently hosted Paul Weller and the Style Council, Jools Holland) - so it was quite intimidating. My talk was on "Fallacies of Probability and Risk" (the powerpoint slides are here). I hope to get some photos of the event uploaded shortly.

Butterworth Hall (hopefully some real photos from the event to come)

Friday, 27 November 2015

Another international award for the BBC documentary co-presented by Norman Fenton

Earlier this month I reported that the BBC documentary "Climate Change by Numbers" (that I co-presented) won the American Association for the Advancement of Science (AAAS) Science Journalism Gold Award for "best in-depth TV reporting".

Now the programme has won another prestigious award: the European Science TV and New Media Award for the best Science programme on an environmental issue, 2015.

The new award (see photo below) was presented to BBC Executive Director Jonathan Renouf at a ceremony in Lisbon on 25 November 2015. Jonathan thanked the team involved in the programme, saying:

"I'm absolutely delighted to see the film gain such widespread international recognition. It really is a tribute to the way you managed to bring fresh television insight to a very well trodden subject, and to do it in a way that was genuinely entertaining as well as so innovative. Everyone I've spoken to out here is so impressed with the film. Thank you again for all your hard work, passion and commitment in making the show."

The programme has also recently been screened on TV in a number of other countries. Here is a comprehensive review that appeared in La Monde.

The European Science TV and New Media Award

Wednesday, 11 November 2015

BBC Documentary co-presented by Norman Fenton wins AAAS Science Journalism Gold Award for "best in-depth TV reporting"

1 Dec Update: the programme has now won another award.

In March Norman Fenton reported on his experience of presenting the BBC documentary "Climate Change by Numbers". The programme has won the American Association for the Advancement of Science (AAAS) Science Journalism Gold Award for "best in-depth TV reporting". The summary citation says:

The Gold Award for in-depth television reporting went to a BBC team for a documentary that used clever analogies and appealing graphics to discuss three key numbers that help clarify important questions about the scale and pace of human influence on climate. The program featured a trio of mathematicians who use numbers to reveal patterns in data, assess risk, and help predict the future.

Jonathan Renouf Executive Producer at BBC Science said (to those involved in the making of the programme):

It’s a huge honour to win this award; it’s a global competition, open to programmes in every area of science, and it’s judged by science journalists. I can’t think of a finer and more prestigious endorsement of the research and journalistic rigour that you brought to bear in the film. We all know how difficult it is to make programmes about climate change that tread the line between entertainment, saying something new, and keeping the story journalistically watertight. I’m really thrilled to see your efforts recognised in top scientific circles.

Full details of the awards can be found on the AAAS website.

Friday, 6 November 2015

Update on the use of Bayes in the Netherlands Appeal Court

In July I reported about the so-called Breda 6 case in the Netherlands and how a Bayesian argument was presented in the review of the case. My own view was that the Bayesian argument was crying out for a Bayesian network representation (I provided a model in my article to do that).

Now Richard Gill has told me the following:

Finally there has been a verdict in the 'Breda 6' case. The suspects were (again) found guilty. The court is somewhat mixed with respect to the Bayesian analysis: On the one hand they ruled that Frans Alkmeye had the required expertise, and that he was rightly appointed as a 'Bayesian expert'. On the other hand they ruled that a Bayesian analysis is still too controversial to be used in court. Therefore they disregarded 'the conclusion' of Frans's report. This is a remarkable and unusual formulation in verdicts, the normal wording is that report has been disregarded.

This unusual wording is no accident: If the court would say that they had disregarded the report, they would lie, since actually quite a lot of the Bayesian reasoning is included in their judgment. A number of considerations from Frans's report are fully paraphrased, and sometimes quoted almost verbatim.

Also I noticed that the assessment of certain findings is expressed in a nicely Bayesian manner.

However: Contrary to Frans's assessment, the court still thinks that the original confessions of three of the suspects contain strong evidence. Unfortunately, the case is not yet closed, but has been taken to the high court.

Frans Alkmeye has also been appointed as a Bayesian expert in yet another criminal case.

The ruling that the Bayesian analysis is too controversial is especially disappointing since we have recently been in workshops with Dutch judges who are very keen to use Bayesian reasoning - and even Bayesian networks (in the Netherlands there are no juries so the judges really do have to make the decisions themselves). These judges - along with Frans Alkemade - will be among many of the world's top lawyers, legal scholars, forensic scientists, and mathematicians participating in the Isaac Newton Institute Cambridge Programme on Probability and Statistics in Forensic Science that will take place July-Dec 2016. This is a programme that I have organised along with David Lagnado, David Balding, Richard Gill and Leila Schneps. It derives from our Bayes and the Law consortium which states that, despite the obvious benefits of using Bayes:

The use of Bayesian reasoning in investigative and evaluative forensic science and the law is, however, the subject of much confusion. It is deployed in the adduction of DNA evidence, but expert witnesses and lawyers struggle to articulate the underlying assumptions and results of Bayesian reasoning in a way that is understandable to lay people. The extent to which Bayesian reasoning could benefit the justice system by being deployed more widely, and how it is best presented, is unclear and requires clarification.

One of the core objectives of the 6-month programme is to address this issue thoroughly. Within the programme there are three scheduled workshops:

"The nature of questions arising in court that can be addressed via probability and statistical methods", Tue 30th Aug 2016 - Tue 30th Aug 2016
"Bayesian networks in evidence analysis", Mon 26th Sep 2016 - Thurs 29th Sep 2016
"Statistical methods in DNA analysis and analysis of trace evidence", Mon 7th Nov 2016 - Mon 7th Nov 2016

Monday, 26 October 2015

Cyber security risk of nuclear facilities using Bayesian networks

Scientists from Korea (Jinsoo Shin, Hanseong Son, Rahman Khalilur, and Gyunyoung Heo) have published an article describing their Bayesian network model for assessing cyber security risk of nuclear facilities (using the AgenaRisk tool). It is based on combining two models - one which is process based (considers how well security procedures were followed) and the other which is considers the system architecture (considering vulnerabilities and controls). The full paper is here:

Shin, J., Son, H., Khalil ur, R., & Heo, G. (2015). Development of a cyber security risk model using Bayesian networks. Reliability Engineering & System Safety, 134, 208–217. doi:10.1016/j.ress.2014.10.006

Bayesian Networks for Risk Assessment of Public Safety and Security Mobile Service

A new paper by Matti Peltola and Pekka Kekolahti of the Aalto University (School of Electrical Engineering) in Finland uses Bayesian Networks and the AgenaRisk tool to gain a deeper understanding of the availability of Public Safety and Security (PSS) mobile networks and their service under different conditions. The paper abstract states:

A deeper understanding of the availability of Public Safety and Security (PSS) mobile networks and their service under different conditions offers decision makers guidelines on the level of investments required and the directions to take in order to decrease the risks identified. In the study, a risk assessment model for the existing PSS mobile service is implemented for both a dedicated TETRA PSS mobile network as well as for a commercial 2G/3G mobile network operating under the current risk conditions. The probabilistic risk assessment is carried out by constructing a Bayesian Network. According to the analysis, the availability of the dedicated Finnish PSS mobile service is 99.1%. Based on the risk assessment and sensitivity analysis conducted, the most effective elements for decreasing availability risks would be duplication of the transmission links, backup of the power supply and real-time mobile traffic monitoring. With the adjustment of these key control variables, the service availability can be improved up to the level of 99.9%. The investments needed to improve the availability of the PSS mobile service from 99.1 % to 99.9% are profitable only in highly populated areas. The calculated availability of the PSS mobile service based on a purely commercial network is 98.8%. The adoption of a Bayesian Network as a risk assessment method is demonstrated to be a useful way of documenting different expert knowledge as a common belief about the risks, their magnitudes and their effects upon a Finnish PSS mobile service.

Full reference details:

Peltola, M. J., & Kekolahti, P. (2015). Risk Assessment of Public Safety and Security Mobile Service. In 2015 10th International Conference on Availability, Reliability and Security (pp. 351–359). IEEE. doi:10.1109/ARES.2015.65

Sunday, 18 October 2015

What is the value of missing information when assessing decisions that involve actions for intervention?

This is a summary of the following new paper:

Constantinou AC, Yet B, Fenton N, Neil M, Marsh W "Value of Information analysis for interventional and counterfactual Bayesian networks in forensic medical sciences". Artif Intell Med. 2015 Sep 8 doi:10.1016/j.artmed.2015.09.002. The full pre-publication version can be found here.

Most decision support models in the medical domain provide a prediction about a single key unknown variable, such as whether a patient exhibiting certain symptoms is likely to have (or develop) a particular disease.

However we seek to enhance decision analysis by determining whether a decision based on such a prediction could be subject to amendments on the basis of some incomplete information within the model, and whether it would be worthwhile for the decision maker to seek further information prior to the decision. In particular we wish to incorporate interventional actions and counterfactual analysis, where:

An interventional action is one that can be performed to manipulate the effect of some desirable future outcome. In medical decision analysis, an intervention is typically represented by some treatment, which can affect a patient’s health outcome.
Counterfactual analysis enables decision makers to compare the observed results in the real world to those of a hypothetical world; what actually happened and what would have happened under some different scenario.

The method we use is based on the underlying principle of Value of Information. This is a technique initially proposed in economics for the purposes of determining the amount a decision maker would be willing to pay for further information that is currently unknown within the model.

The type of predictive decision support models to which our work applies are Bayesian networks. These are graphical models which represent the causal or influential relationships between a set of variables and which provide probabilities for each unknown variable.

The method is applied to two real-world Bayesian network models that were previously developed for decision support in forensic medical sciences. In these models a decision maker (such as a probation officer or a clinician) has to determine whether to release a prisoner/patient based on the probability of the (unknown) hypothesis variable: “individual violently reoffends after release”. Prior to deciding on release, the decision maker has the option to simulate various interventions to determine whether an individual’s risk of violence can be managed to acceptable levels. Additionally, the decision maker may have the option to gather further information about the individual. It is possible that knowing one or more of these unobserved factors may lead to a different decision about release.

We used the method to examine the average information gain; that is, what we learn about the importance of the factors that remain unknown within the model. Based on six different sets of experiments with various assumptions we show that:

the average relative percentage gain in terms of Value of Information ranged between 11.45% and 59.91% (where a gain of X% indicates an expected X% relative reduction of the risk of violent reoffence);

the potential amendments in Decision Making, as a result of the expected information gain, ranged from 0% to 86.8% (where an amendment of X% indicates that X% of the initial decisions are expected to have been altered).

The key concept of the method is that if we had known that the individual was, for example, a substance misuser, we would have arranged for a suitable treatment; whereas without having information about substance misuse it is impossible to arrange such a treatment and, thus, we risk not treating the individual in the case where he or she is a substance misuser.

The method becomes useful for decision makers, not only when decision making is subject to amendments on the basis of some unknown risk factors, but also when it is not. Knowing that a decision outcome is independent of one or more unknown risk factors saves us from seeking information about that particular set of risk factors.

This summary can also be found on the Atlas of Science

Thursday, 15 October 2015

Talk: Bayesian networks: why smart data is better than big data

by Prof. Norman Fenton from the School of Electronic Engineering and Computer Science (QMUL)

WHEN: Fri, 16th October 2 - 3 pm

WHERE: People's Palace PP2 (Mile End Campus)

"This talk will provide an introduction to Bayesian networks which, due to relatively recent algorithmic breakthroughs, has become an increasingly popular technique for risk assessment and decision analysis. I will provide an overview of successful applications (including transport safety, medical, law/forensics, operational risk, and football prediction). What is common to all of these applications is that the Bayesian network models are built using a combination of expert judgment and (often very limited) data. I will explain why Bayesian networks ‘learnt’ purely from data – even when ‘big data’ is available - generally do not work well."

All are welcome. The seminar consists of an app. 45 min long lecture and discussion.

In case of any questions, feel free to contact me.

Hope to see you tomorrow,

Judit Petervari

____________________

Judit Petervari

PhD Student

Biological and Experimental Psychology Group
School of Biological and Chemical Sciences
Queen Mary University of London
Mile End Road
E1 4NS London
United Kingdom

E-mail: j.petervari@qmul.ac.uk
Office: G.E. Fogg Building, Room 2.16

Tuesday, 15 September 2015

Yet another flawed statistical study attracts massive unquestioning attention

The Guardian, 29 Sept 2015

A very widely reported story in today’s news (see, for example, the report in the Guardian and this Press release) claims that companies in which there is at least one female executive on the Board (‘gender diverse’ companies) in the US, UK and India outperform companies with male-only executives by a staggering US$655 billion per year. The story is based on a study by Grant Thornton whose representative Francesca Lagerberg concludes:

“The research clearly shows what we have been talking about for a while: that diversity leads to better decision-making”.

As is typical when the results of a statistical study fit a popular narrative, the story attracted massive, unquestioning attention. Unfortunately, while I am sure that most people agree that greater gender diversity in the Boardroom is a worthy objective, based on the ‘full report’ – and in the absence of other data - Lagerberg's claim is simply not supported. In fact, the study exemplifies some of the classic misuses of statistics that we wrote about in the first chapter of our book and highlights yet again the need for proper causal/explanatory models to be used in statistical studies such as these*.

Moreover, using the data in Lagerberg's study it is possible to construct a simple causal model (a Bayesian network) that replicates the results but with provably opposite conclusions: diversity decreases performance.

The full report and BN model are provided here. The model can be run in the free version of AgenaRisk.

*Making such an approach both universally feasible and acceptable is the major objective of BAYES-KNOWLEDGE.

Monday, 31 August 2015

Doctoring Data: Review of a must read book

Book Review:

Malcom Kendrick “Doctoring Data: How to sort out medical advice from medical nonsense” Columbus Publishing Ltd, 2015

Review by Norman Fenton and Martin Neil (a pdf version of this article can be found here)

This is an extremely important (and also entertaining) book that should be mandatory reading not just for anybody interested in finding out about what data-driven medical studies really mean, but also for anybody engaged in any kind of empirical work. What Kendrick shows brilliantly is the extent to which the vast majority of medical recommendations and guidelines are based on data-driven studies that are fundamentally flawed and often corrupt. He highlights how the resulting recommendations and guidelines have led (world-wide) to millions of unnecessarily early deaths, millions of people suffering unnecessary pain, and widespread use of drugs and treatments that do more harm than good (example: statins), as well as wasting billions of taxpayer dollars every year.

As researchers who have been involved in empirical studies in a very wide range of disciplines over many years we believe that much of what he says is also relevant to all of these disciplines (which include most branches of the physical, natural and environmental sciences, computer science, the social sciences, and law). Apart from the cases of deliberate corruption and bias (of which Kendrick provides many medical examples) most of the flaws boil down to a basic misunderstanding of statistics, probability and the scientific method.

There are two notable quotes that Kendrick uses, which we believe sum up most of the problems he identifies:

“When a man finds a conclusion agreeable, he accepts it without argument, but when he finds it disagreeable, he will bring against it all the forces of logic and reason” Thucydides.
“I know that most men, including those at ease with problems of the greatest complexity, can seldom accept even the simplest and most obvious truth if it be such as would oblige them to admit the falsity of conclusions which they have delighted in explaining to colleagues, which they have proudly taught to others, and which they have woven, thread by thread, into the fabric of their lives.” Leo Tolstoy

The first sums up the extent to which results of empirical work are doctored to suit the pre-conceived biases and hopes of those undertaking it (a phenomenon also known as ‘confirmation bias’). The second sums up the extent to which there are ideas that represent the ‘accepted orthodoxy’ in most disciplines that are impossible to challenge even when they are wrong. Those brave enough to challenge the accepted orthodoxy risk ruining their careers in their discipline. Hence, most researchers and practitioners simply accept the orthodoxy without question and help perpetuate flawed or useless ideas in order to get funding and progress their careers. Kendrick describes how these problems lie at the heart of the fundamentally fraudulent peer review system in medicine – which applies to both submitting articles to journals and submitting research grant applications. Once again, we believe that all of the areas of research where we have worked (maths, computer science, forensics, law, and AI) suffer from the same flawed peer review system.

Kendrick is not afraid to challenge the leading figures in medicine, often exposing examples of hypocrisy and corruption. Of special interest to us, however, is that he also challenges the attitude of revered figures in our own discipline. For example, Kendrick highlights two quotes in a recent article by Nobel prize-winner Daniel Kahneman, whose work in the psychology of decision theory and risk is held in the highest esteem.:

“The way scientists try to convince people is hopeless because they present evidence, figures, tables, arguments, and so on. But that’s not how to convince people. People aren’t convinced by arguments, they don’t believe conclusions because they believe in the arguments that they read in favour of them. They’re convinced because they read or hear the conclusions from people they trust. You trust someone and you believe what they say. That’s how ideas are communicated. The arguments come later.”
“Why do I believe global warming is happening? The answer isn’t that I have gone through all the arguments and analysed the evidence – because I haven’t. I believe the experts from the Academy of Sciences. We all have to rely on experts.”

Kendrick notes the problem here:

“In one breath he states that people aren’t convinced by arguments; they’re convinced because they read or hear conclusions from people they trust. Then he says that we all have to rely on experts. But he does not link these two thoughts together to ask the obvious question. Just how, exactly, did the experts come to their conclusions?”

Having presented the BBC documentary on Climate Change by Numbers we also got an insight into the extent to which problems exist there.

As good as the book is (and indeed because of how good it is), we feel the need to highlight some points where we believe Kendrick gets it wrong. There are some statistical/probability errors and over-simplifications, which mostly seem to stem from a lack of awareness of Bayesian probability. For example, he says:

“… although association cannot prove causation, a lack of association does disprove causation”.

This is not true as can be proven by the simple counter example we provide below using a Bayesian network*.

Next we believe Kendrick’s faith in randomised control trials (RCTs) as being the (only) reliable empirical basis for medical decision making is misplaced. Because of Simpson’s paradox and the impossibility of accounting for all confounding variables there is, in principle, no solid basis for believing that the result of any RCT is ‘correct’. As is shown in the article here it is possible, for example, that an RCT can find a drug to be effective compared to a placebo in every possible categorisation of trial participants, yet the addition of a single confounding variable can result in an exact reversal of the results.

So, if we are saying that even RCTs cannot be accepted as valid empirical evidence, does that mean that we are even more pessimistic than Kendrick about the possibility of any useful empirical research? No - and this brings us to our final major area of disagreement with Kendrick’s thesis. In contrast to what Kendrick proposes we believe there is an important role for expert judgment in critical decision-making. In fact, we believe expert judgement is inevitable even if every attempt is made to remove it from an empirical study (it is, for example, impossible to remove expert judgment from the very problem of framing the study and choosing the variables and data to collect). Given the inevitability of expert judgment, we feel it should be made obvious, transparent, and open to refutation by experiment. Any scientist should be as open and honest about their judgment as possible and be prepared to make predictions and be contradicted by data.

By combining expert judgment with data it is possible to get far more reliable empirical results with much less data and effort than required for an RCT. This is essentially what we proposed in our book and which is being further developed in the EU project BayesKnowledge.

*Refuting the assertion “If there is no association (correlation) then there cannot be causation”.

Consider the two hypotheses:

H1: “If there is no association (correlation) then there cannot be causation”.
H2: “If there is causation there must be association (correlation).

Kendrick’s assertion (H1) is, of course, equivalent to H2. We can disprove H2 with a simple counter-example using two Boolean variables a, and b, i.e. whose states are True or False. We do this by introducing a third, latent, unobserved Boolean variable c. Specifically we define the relationship between a,b, and c via the following Bayesian network :

By definition b is completely causally dependent on a. This is because, when c is True the state of b will be the same as the state of a, and when c is False the state of b will be the opposite of the state of a.

However, suppose - as in many real-world situations – that c is both hidden and unobserved (i.e. a typical confounding variable). Also, assume that the priors for the variables a and c are uniform (i.e. 50% of the time they are False and 50% of the time they are True).

Then when a is False there is a 50% chance b is False and a 50% chance b is True. Similarly, when a is True there is a 50% chance b is False and a 50% chance b is True. In other words, what we actually observe is zero association (correlation) despite the underling mechanism being completely (causally) deterministic.

The above BN model can be downloaded here and run using the free version of AgenaRisk

Sunday, 30 August 2015

Using Bayesian networks to assess and manage risk of violent reoffending among prisoners

Fragment of BN model

Probation officers, clinicians, and forensic medical practitioners have for several years sought improved decision support for determining whether and when to release prisoners with mental health problems and a history of violence. It is critical that the risk of violent re-offending is accurately measured and, more importantly, well managed with causal interventions to reduce this risk after release. The well-established 'risk predictors' in this area of research are typically based on statistical regression models and their results are less than convincing. But recent work undertaken at Queen Mary University of London has resulted in Bayesian network (BN) models that not only have much greater accuracy, but which are also much more useful for decision support. The work has been developed as part of a collaboration between the Risk and Information Management group and the medical practitioners of the Violence Prevention Research Unit (VPRU) of the Wolfson Institute of Preventative Medicine.

The (BN) model, called DSVM-P (Decision Support for Violence Management – Prisoners) captures the causal relationships between risk factors, interventions and violence. It also allows for specific risk factors to be targeted for causal intervention for risk management of future re-offending. These decision support features are not available in the previous generation of models used by practitioners and forensic psychiatrists.

Full reference:
Constantinou, A., Freestone M., Marsh, W., Fenton, N. E. , Coid, J. (2015) "Risk assessment and risk management of violent reoffending among prisoners", Expert Systems With Applications 42 (21), 7511-7529. Published version: http://dx.doi.org/10.1016/j.eswa.2015.05.025.
Download Pre-publication draft.

Sunday, 26 July 2015

Winchester Science Festival: Fallacies of Probability and Risk (a report by Norman Fenton)


Norman Fenton at the Winchester Science Festival

I had the privilege of being an invited speaker today at this weekend's annual Winchester Science Festival, presenting a talk on "Fallacies of Probability and Risk" (the slides of which can be downloaed from here).

It's the first time I have spoken at one of these big 'popular science' events and I was very impressed by it. There were people of all different age groups attending both the talks and the various activities in the reception area and the audience was really enthusiastic and responsive.

Norman Fenton during his talk

Judging by the other talk I managed to attend before mine (coincidentally by a former Queen Mary student Marcus Chown** with the same title as his latest book) and the event I attended last night (see below) - it was clear that this festival was a high quality, well organised event.

Festival of the Spoken Nerd: Matt Parker of QM in the centre

I arrived in time last night for a performance of the Festival of the Spoken Nerd. This amazing show consists of three scientists/entertainers: stand-up mathematician Matt Parker (who happens to be Maths Outreach Coordinator at Queen Mary), experiments maestro Steve Mould and geek songstress/physicist Helen Arney. They manage to provide 90 minutes of quality humour and mathematics education at the same time. They were actually previewing their new show which they will performing at the Edinburgh Festival and on a UK tour. I strongly recommend you go to it.

**Marcus's website is here.

Friday, 17 July 2015

The use of Bayes in the Netherlands Appeal Court

Henry Prakken

Norman Fenton, 17 July 2016

There has been an important development on the use of Bayes in the Law in the Netherlands, with what is possibly the first full Bayesian analysis of a major crime in an appeal court there.

The case, referred to as the “Breda 6”, was the 1993 murder of a Chinese woman in Breda in her son’s restaurant. Six young people were convicted of the crime and sentenced to up to 10 years in jail (all have since completed their sentences). In 2012 the advocate general recommended the case be looked at again because it centred on confessions which may have been false.

In the review of the case a Bayesian argument supporting the prosecution case was presented by Frans Alkemade (Update: see below about some concerns Frans has about this article). Frans is actually a physicist who previously used Bayes to analyse a drugs trafficking case, concluding in a report commissioned by the prosecution, that there was not enough evidence for a conviction (the suspect was acquitted). The court requested that Henry Prakken (professor in Legal Informatics and Legal Argumentation at the University of Groningen) respond to the Bayesian argument.

In June, while he was preparing his response, I met Henry at the International Conference on AI and the Law in San Diego. Henry told me about the case and Alkemade's analysis for which the guilty hypothesis was "At least some of the six suspects were involved in the crime, which took place after 4:30 on the night of 3 and 4 july 1993, and which included the luring of the victim to the restaurant by at least some of the female suspects". Alkemade interpreted "involved" in a weak way and Henry said:

"..it could even be no more than just knowing about the crime. In fact, one of my points of criticism was that this guilt hypothesis is not useful for the court, since it is consistent with the innocence of any of the individual suspects (and even with the collective innocence of all three male suspects)."

Among other things, Alkemade focused on two pieces of evidence:

A report by the Criminal Intelligence Unit (CID) of the Dutch police, saying that they had received information from "usually reliable" sources identifying two of the male defendants and one of the female defendants as being involved in the murder**.
The subsequent discovery that two of the female defendants (not mentioned in the CID report and who supposedly knew the three defendants mentioned in the CID report) worked next door to the murder scene.

From Henry's description of the analysis, it seemed that Alkemade did not account for all relevant unknown variables and dependencies*** (also see the update), such as the possibility that the anonymous tip-off may have been both malicious and prompted by the fact that the caller knew the defendants worked next door to the murder scene (making the tip-off more believable). This would mean that the combination of the two pieces of evidence would not have been such an incredible coincidence if the defendants were innocent. So in that sense the Bayesian argument was over-simplistic. On the other hand it was also too complex for lawyers to understand since it was presented 'from first principles' in the sense that all of the detailed Bayesian inference calculations were spelled out. For the reasons we have explained in detail here it seemed like a Bayesian network (BN) model would be far more suitable. I therefore produced - in discussions with Henry - a generic BN model to reason about the impact of anonymous evidence when combined with other evidence that can influence the anonymous tip-off (the model is here and can be run using the free AgenaRisk software).

The intention was not to replicate all of the features of the case but rather to demonstrate the impact of missing dependencies in Alkemade's argument. Indeed, with a range of reasonable assumptions, the BN model pointed to a much lower probability of guilt than suggested by Alkemade's calculations.

Henry presented his response in court last week. He said:

"The court session was sometimes frustrating, since the discussion was fragmentary and sometimes I had the impression that the court felt it was confronted with a battle of the experts without the means to understand who was right."

In my view this case confirms our claim that presenting a Bayesian legal argument from first principles (as Alkemade did) is not a good idea. The very fact that people assume it is necessary to do this for Bayes to be accepted is actually the reason (ironically) that there will continue to be very strong resistance to accepting Bayes in the courtroom. Why? Because it means you are restricted to ludicrously over-simplistic (and normally flawed) models of the case (3 unknowns maximum) because that is the limit of the Bayesian calculations you can do by hand and explain clearly. Our proposed solution is to model (and run) the case properly using a BN and report back on the results stating in lay terms what the model assumptions were and how sensitive the conclusions are to different prior assumptions.

**Henry told me that there was something funny with the CID report, as also noted by the advocate general. According to the CID report, the anonymous informant had also accused the defendants mentioned in the report of committing several other crimes, but in the investigations preceding the revision case the police investigators had not been able to find any confirmation of these other crimes, not even reports of these supposed crimes to the police by the supposed victims. As the advocate general stated in 2012, this casts doubt on the reliability of the CID informants.

***My colleague Richard Gill - who knows Alkemade - says that Alkemade was careful to define his pieces of evidence in such a way that he thinks that he can justify the independence assumptions which he needs in order to at least conservatively bound the Likelihood ratio coming from each piece of evidence in turn.

UPDATE 21 July 2015: Frans has contacted me stating a number of concerns about the above narrative and provided a number of technical insights that I was not aware of. As an expert witness in a case that is still under trial, he does not feel free to discuss any details in public, but once the trial has finished I will provide an updated report that incorporates his comments. What I can confirm, however, is that in order to do the calculations manually Frans could not model dependencies between different pieces of evidence - a major limitation - although he did make clear the limitations and pitfalls of this in his report.

See also:

Review of the use of Bayes in the Law (pdf report)
A General Structure for Legal Arguments About Evidence Using Bayesian Networks (published article)
Assessing evidence and testing appropriate hypotheses (published article)
Barry George case: new insights on the evidence.
Sally Clark case: another statistical oversight
Ben Geen: another possible case of miscarriage of justice and misunderstanding of statistics

Thursday, 9 July 2015

Why target setting leads to poor decision-making

Norman Fenton is the co-author of an article in Nature published today that addresses the issue of improved decision-making in the context of international sustainable development goals. The article pushes for a Bayesian, smart-data approach:

We contend that target-setting is flawed, costly and could have little — or even negative — impact. First, targets may have unintended consequences. For example, education quality as a whole suffered in some countries that diverted resources to early schooling to meet the target of the Millennium Development Goal (MDG) of achieving universal primary education.

Second, target-setting inhibits learning by focusing efforts on meeting the target rather than solving the problem. The milestones are easily manipulated — aims such as halving deaths from road-traffic accidents can trigger misreporting if the performance falls short or encourage underperformance if the goal can be exceeded.

Third, it is costly: development partners will have to reallocate scant resources for a 'data revolution' that will cost an estimated US$1 billion a year.

We advocate a different approach. Governments and the development community need to embrace decision-analysis concepts and tools that have been used for decades in mining, oil, cybersecurity, insurance, environmental policy and drug development.

The approach is based on five principles:

Replace targets with measures of investment return
Model intervention decisions
Integrate expert knowledge
Include uncertainty in predictive models
Measure the most informative variables

Recommendations include the following:

It is a common mistake to assume that 'evidence' is the same as 'data' or that 'subjective' means 'uninformative'. Decision-making should draw on all appropriate sources of evidence. In developing countries where data are sparse, expert knowledge can fill the gaps. For instance, in our assessment of the viability of agroforestry projects in Africa, we used our experience to set ranges on tree-survival rates, costs of raising tree seedlings and farm prices of tree products.
....
Decision theorists and local experts will have to work together to identify relevant variables, causal associations and uncertainties. The most widely accepted method of incorporating knowledge for probability assessment is Bayes' theorem. This updates the likelihood of a belief in some event (such as whether an intervention will reduce poverty) when observing new evidence about the event (such as the occurrence of drought). Bayesian analyses — incorporating historical data and expert judgement — are used in transport and systems-safety assessments, medical diagnosis, operational risk assessment in finance and in forensics, but seldom in development. They should be used, for example, to evaluate the relative risks of competing development interventions.
....
Decision-makers .. should employ probabilistic decision analysis, for example Monte Carlo simulations or Bayesian network models. Provided that such models are developed using properly calibrated expert judgement and decision-focused data, they can incorporate the key factors and outcomes and the causal relationships between them. For instance, simulations for evaluating options for building a water pipeline could take into account rare 'what-if' scenarios, such as a hurricane during development, and predict (with probabilities) the time and cost of implementation and the benefits of improved water supply.

Tuesday, 28 April 2015

The statistics of sex

Sir David Spiegelhalter (left) and Norman Fenton

Norman Fenton, 28 April 2015

Last night I attended the launch of David Spiegelhalter's book "Sex by Numbers"** at the Wellcome Collection in London, which is currently also hosting an exhibition on Sexology.

What makes David's book a very good read is that it not only presents intriguing insights into a broad range of sexual activities, but it does so in a way that explains in lay terms the good, bad and ugly of the underlying statistical methods as well as some of the maths. This includes things like erroneous reporting of sexual habits that falls into the category of prosecutors' fallacy***. There are hundreds of different numbers about sex presented and most get a star rating (ranging from 1* to 4*) based on their reliability; so, for example, the surprisingly high number 48% (births that were formally 'illegitimate' in 2012 in England and Wales) is in the most reliable category (4*), while the number 84% (women emotionally unsatisfied with their relationship) is in the least reliable category (1*). The numbers for favourite sexual positions as presented in the following table are rated as 2*:

	*Women*	*Men*
Man on top	48%	25%
Woman on top	33%	45%
Doggy	15%	25%

To give a feel for the range of numbers the book provides those 80% of 25-34 year-olds who have engaged in oral sex in the last year will be interested to know that 3% is the proportion of recommended daily zinc intake contained in an average ejaculation.

**David was one of my co-presenters on the BBC documentary Climate Change by Numbers. The other co-presenter was Hannah Fry, whose book published in February is called "The Mathematics of Love". I deny the rumours circulating that my next book is to be called "The Risks of Marriage"....

***see here for background on the prosecutors' fallacy

Thursday, 26 March 2015

The risk of flying

Norman Fenton, 26 March 2015

I have just done an interview on BBC Radio Scotland about aircraft safety in the light of the GermanWings crash - which now appears to have been a deliberate act of sabotage by the co-pilot*. I have uploaded a (not very good) recording of it here (mp3 file - it is just under 4 minutes) or here (a more compact m4a file)

Because this type of event is so rare classical frequentist statistics provides no real help when it comes to risk assessment. In fact, it is exactly the kind of risk assessment problem for which you need causal models and expert judgement (as explained in our book) if you want any kind of risk insights.

Irrespective of this particular incident, the interview gave me the opportunity to highlight a very common myth, namely that “flying is the safest form of travel”. If you look at deaths per million travellers then, indeed, there are 50 times as many car deaths as plane deaths. However, this is a silly measure because there are so many more car travellers than plane travellers. So, typically, analysts use deaths per million miles travelled; with respect to this measure car travel is still 'riskier' than air travel, but the death rate is only about twice as high as plane deaths. But this measure is also biased in favour of planes because the average plane journey is much further than the average car journey.

So a much fairer measure is the number of deaths per passenger journey. And for this, the rate of plane deaths is actually three times higher than car deaths; in fact only bikes and motorbikes are worse than planes.

Despite all this there is still a very low probability of a plane journey resulting in fatalities - about 1 in half a million (and much less on commercial flights in Western Europe). However, if we have reason to believe that, say, recent converts to a terrorist ideology have been training and becoming pilots then the probability of the next plane journey resulting in fatalities becomes much higher, despite the past data.

*I had an hour’s notice of the interview and was told what I would be asked. I was actually not expecting to be asked about how to assess the risk of this specific type of incident; I was assuming I would only be asked about aircraft safety risk in general and about the safety record of the A320.

Postscript: Following the interview a colleage asked:
"Did you have the mental issues of the co-pilot on the radar when you replied? "
My response: Interesting question. A few years back we were involved extensively in work with NATS (National Air Traffic Safety) to model/predict risk of mid-air collision over the UK airspace. In particular NATS wanted to know how the probability of a mid-air collision might change given different proposals for changes to the ATM architecture (e.g. ‘adding new ground radar stations’ versus ‘adding new on-board collisions alert systems’). Now - apart from three incidents in the late 1940’s which all involved at least one military jet - there has not been any actual mid-air collisions over UK airspace (so negligible data there) and the proposed technology was ‘new’ (so no directly relevant data there) but there was a LOT of data on "near misses" of different degrees of seriousness and a LOT of expert judgment about the causes and circumstances of the near misses. Hence, we were able with NATS experts to build a very detailed model that could be ‘validated’ against the actual near miss data. What is very interesting are what factors NATS needed in the model. The psychological state and stress of air traffic controllers was included in the model as were certain psychological traits of pilots. It turns out that certain airlines were more likely to be involved in a near-misses primarily because of traits of their pilots.

Tuesday, 24 March 2015

The problem with big data and machine learning

The advent of ‘big data’, coupled with fancy statistical machine learning techniques, is increasingly seducing people to believe that new insights and better predictions can be achieved in a wide range of important applications, without relying on the input of domain experts. The applications range from learning how to retain customers through to learning what makes people susceptible to particular diseases. I have written before about the dangers of this kind of 'learning' from data alone (no matter how 'big' the data is).

Contrary to the narrative being sold by the big data community, if you want accurate predictions and improved, decision-making then, invariably, you need to incorporate human knowledge and judgment. This enables you to build rational causal models based on 'smart' data. The main objections to using human knowledge - that it is subjective and difficult to acquire - are, of course, key drivers of the big data movement. But this movement underestimates the typically very high costs of collecting, managing and analysing big data. So, the sub-optimal outputs you get from pure machine learning do not even come cheap.

To clarify the dangers of relying on big data and machine learning, and to show how smart data and causal modelling (using Bayesian networks) gives you better results, we have collected together the following short stories and examples:

The whole subject of 'smart data' rather than 'big data' is also the focus of the BAYES-KNOWLEDGE project.

Tuesday, 3 March 2015

The Statistics of Climate Change

From left to right: Norman Fenton, Hannah Fry, David Spiegelhalter. Link to the Programme's BBC website

Norman Fenton, 3 March 2015 (This is a cross posting of the article here)

I had the pleasure of being one of the three presenters of the BBC documentary called “Climate Change by Numbers” (first) screened on BBC4 on 2 March 2015.

The motivation for the programme was to take a new look at the climate change debate by focusing on three key numbers that all come from the most recent IPCC report. The numbers were:

0.85 degrees - the amount of warming the planet has undergone since 1880
95% - the degree of certainty climate scientists have that at least half the warming in the last 60 years is man-made
one trillion tonnes - the cumulative amount of carbon that can be burnt, ever, if the planet is to stay below ‘dangerous levels’ of climate change

The idea was to get mathematicians/statisticians who had not been involved in the climate change debate to explain in lay terms how and why climate scientists had arrived at these three numbers. The other two presenters were Dr Hannah Fry (UCL) and Prof Sir David Spiegelhalter (Cambridge) and we were each assigned approximately 25 minutes on one of the numbers. My number was 95%.

Being neither a climate scientist nor a classical statistician (my research uses Bayesian probability rather than classical statistics to reason about uncertainty) I have to say that I found the complexity of the climate models and their underlying assumptions to be daunting. The relevant sections in the IPCC report are extremely difficult to understand and they use assumptions and techniques that are very different to the Bayesian approach I am used to. In our Bayesian approach we build causal models that combine prior expert knowledge with data.

In attempting to understand and explain how the climate scientists had arrived at their 95% figure I used a football analogy – both because of my life-time interest in football and because - along with my colleagues Anthony Constantinou and Martin Neil – we have worked extensively on models for football prediction. The climate scientists had performed what is called an “attribution study” to understand the extent to which different factors – such as human CO2 emissions – contributed to changing temperatures. The football analogy was to understand the extent to which different factors contributed to changing success of premiership football teams as measured by the total number of points they achieved season-by-season. In contrast to our normal Bayesian approach – but consistent with what the climate scientists did – we used data and classical statistical methods to generate a model of success in terms of the various factors. Unlike the climate models which involve thousands of variables we had to restrict ourselves to a very small number of variables (due to a combination of time limitations and lack of data). Specifically, for each team and each year we considered:

Wages (this was the single financial figure we used)
Total days of player injuries
Manager experience
Squad experience
Number of new players

The statistical model generated from these factors produced, for most teams, a good fit of success over the years for which we had the data. Our ‘attribution study’ showed wages was by far the major influence. When wages was removed from the study, the resulting statistical model was not a good fit. This was analogous to what the climate scientists’ models were showing when the human CO2 emissions factor was removed from their models; the previously good fit to temperature was no longer evident. And, analogous to the climate scientists’ 95% derived from their models, we were able to conclude there was a 95% chance that an increase in turnover of 10 per cent would result in at least one extra premiership point. (Update: note that this was a massive simplification to make the analogy. I am certainly not claiming that increasing wages causes an increase in points. If I had had the time I would have explained that in a proper model - like the Bayesian networks we have previously built - wages offered is one of the many factors influencing quality of players that can be bought which, in turn, along with other factors influences performance).

Obviously there was no time in the programme to explain either the details or the limitations of my hastily put-together football attribution study and I will no doubt receive criticism for it (I am preparing a detailed analysis). But the programme also did not have the time or scope to address the complexity of some of the broader statistical issues involved in the climate debate (including issues that lead some climate scientists to claim the 95% figure is underestimated and others to believe it is overestimated). In particular, the issues that were not covered were:

The real probabilistic meaning of the 95% figure. In fact it comes from a classical hypothesis test in which observed data is used to test the credibility of the ‘null hypothesis’. The null hypothesis is the ‘opposite’ statement to the one believed to be true, i.e. ‘Less than half the warming in the last 60 years is man-made’. If, as in this case, there is only a 5% probability of observing the data if the null hypothesis is true, the statisticians equate this figure (called a p-value) to a 95% confidence that we can reject the null hypothesis. But the probability here is a statement about the data given the hypothesis. It is not generally the same as the probability of the hypothesis given the data (in fact equating the two is often referred to as the ‘prosecutors fallacy’, since it is an error often made by lawyers when interpreting statistical evidence).See here and here for more on the limitations of p-values and confidence intervals.
Any real details of the underlying statistical methods and assumptions. For example, there has been controversy about the way a method called principal component analysis was used to create the famous hockey stick graph that appeared in previous IPCC reports. Although the problems with that method were recognised it is not obvious how or if they have been avoided in the most recent analyses.
Assumptions about the accuracy of historical temperatures. Much of the climate debate (such as that concerning the exceptionalness of the recent rate of temperature increase) depends on assumptions about historical temperatures dating back thousands of years. There has been some debate about whether sufficiently large ranges were used.
Variety and choice of models. There are many common assumptions in all of the climate models used by the IPCC and it has been argued that there are alternative models not considered by the IPCC which provide an equally good fit to climate data, but which do not support the same conclusions.

Although I obviously have a bias, my enduring impression from working on the programme is that the scientific discussion about the statistics of climate change would benefit from a more extensive Bayesian approach. Recently some researchers have started to do this, but it is an area where I feel causal Bayesian network models could shed further light and this is something that I would strongly recommend.

Acknowledgements: I would like to thank the BBC team (especially Jonathan Renouf, Alex Freeman, Eileen Inkson, and Gwenan Edwards) for their professionalism, support, encouragement, and training; and my colleagues Martin Neil and Anthony Constantinou for their technical support and advice.

My fee for presenting the programme has been donated to the charity Magen David Adom

Watching the programme as it is screened

menu