menu

Monday, 6 March 2017

Explaining and predicting football team performance over an entire season


When I was presenting the BBC documentary Climate Changes by Numbers and had to explain the idea of a statistical 'attribution study', I used the analogy of determining which factors most affected the performance of Premiership football teams year on year. Because I had to do it in a hurry I and my colleague Dr Anthony Constantinou did a very crude analysis which focused on a very small number of factors and showed, unsurprisingly, that turnover (i.e. mainly spend on transfer and wages) had the most impact of these. 

We weren't happy with the quality of the study and decided to undertake a much more comprehensive analysis as part of the BAYES-KNOWLEDGE project. This project is all about improved decision-making and risk assessment using a probabilistic technique called Bayesian Networks. In particular, the main objective of the project is to produce useful/accurate predictions and assessments in situations where there is not a lot of data available. In such situations the current fad of 'big data' methods using machine learning techniques do not work; instead we use 'smart-data' -  a method that combines the limited data available with expert causal knowledge and real-world ‘facts’. The idea of predicting Premiership teams' long term performance and identifying the key factors explaining changes was a perfect opportunity to both develop and validate the BAYES-KNOWLEDGE method, especially as we had previously done extensive work in predicting individual premiership match results (see links at bottom).

The results of the study have now been published in one of the premier international AI journals Knowledge Based Systems.

The Bayesian Network model in the paper enables us to predict, before a season starts, the total league points a team is expected to accumulate throughout the season (each team plays 38 games in a season with three points per win and one per draw). The model results compare very favourably against a number of other relevant and different types of models, including some which use far more data. As hoped for the results also provide a novel and comprehensive attribution study of the factors most affecting performance (measured in terms of impact on actual points gained/lost per season). For example, although unsurprisingly, the largest improvements in performance result from massive increases in spending on new players (an 8.49 points gain), an even greater decrease (up to 16.52 points) results from involvement in the European competitions (especially the Europa League) for teams that have previous little experience in such competitions. Also, something  that was very surprising and that possibly confounds bookies - and gives punters good potential for exploiting -  is that promoted teams generate (on average) a staggering increase in performance of 8.34 points, relative to the relegated team they are replacing. The results in the study also partly address/explain the widely accepted 'favourite-longshot bias' observed in bookies odds.

The full reference citation is:
Constantinou, A. C. and Fenton, N. (2017). Towards Smart-Data: Improving predictive accuracy in long-term football team performance. Knowledge-Based Systems, In Press, 2017, http://dx.doi.org/10.1016/j.knosys.2017.03.005
The pre-print version of the paper (pdf) can be found at http://constantinou.info/downloads/papers/smartDataFootball.pdf

We acknowledge the financial support by the European Research Council (ERC) for funding research project, ERC-2013-AdG339182-BAYES_KNOWLEDGE, and Agena Ltd for software support.

See also:

Wednesday, 8 February 2017

Helping US Intelligence Analysts using Bayesian networks


Causal Bayesian networks are at the heart of a major new collaborative research project led by Australian University Monash  - funded by the United States' Intelligence Advanced Research Projects Activity (IARPA). The objective is to help intelligence analysts assess the value of their information. IARPA was set up following the failure of the US intelligence agencies to properly assess the correct levels of threat posed by Al Qaeda in 2001 and Iraq in 2003.

The chief investigator at Monash, Kevin Korb, said in an interview in the Australian:
"..quantitative rather than qualitative methods were crucial in judging the value of intelligence.... more quantitative approaches could have helped contain the ebola epidemic by making authorities appreciate the scale of the problem months earlier. They could also build a better assessment of the likelihood of events like gunfire between vessels in the South China Sea, a substantial devaluation of the Venezuelan currency or a new presidential aspirant in Egypt."
Norman Fenton and Martin Neil (both of Agena and Queen Mary University of London) will be working on the project along with colleagues such as David Lagnado and Ulrike Hahn at UCL.  AgenaRisk will be used throughout the project as the Bayesian network platform.

Further information:

Queen Mary in new £2 million project using Bayesian networks to create intelligent medical decision support systems with real-time monitoring for chronic conditions



UPDATE 9 Feb 2017: Various Research Fellowship and PhD vacancies funded by this project are now advertised. See here.

Queen Mary has been awarded a grant of £1,538,497 (Full economic cost £1,923,122) from the EPSRC towards a major new collaborative project to develop a new generation of intelligent medical decision support systems. The project, called PAMBAYESIAN (Patient Managed Decision-Support using Bayesian Networks) focuses on home-based and wearable real-time monitoring systems for chronic conditions including rheumatoid arthritis, diabetes in pregnancy and atrial fibrillation. It has the potential to improve the well-being of millions of people.

The project team includes researchers from both the School of Electronic Engineering and Computer Science (EECS) and clinical academics from the Barts and the London School of Medicine and Dentistry (SMD). The collaboration is underpinned by extensive research in EECS and SMD, with access to digital health firms that have extensive experience developing patient engagement tools for clinical development (BeMoreDigital, Mediwise, Rescon, SMART Medical, uMotif, IBM UK and Hasiba Medical).

The project is led by Prof Norman Fenton with co-investigators: Dr William Marsh, Prof Paul Curzon, Prof Martin Neil, Dr Akram Alomainy (all EECS) and Dr Dylan Morrissey, Dr David Collier, Professor Graham Hitman, Professor Anita Patel, Dr Frances Humby, Dr Mohammed Huda, Dr Victoria Tzortziou Brown (all SMD). The project will also include four QMUL-funded PhD students.

The three-year project will begin June 2017.

Background

Patients with chronic diseases must take day-to-day decisions about their care and rely on advice from medical staff to do this. However, regular appointments with doctors or nurses are expensive, inconvenient and not necessarily scheduled when needed. Increasingly, we are seeing the use of low cost and highly portable sensors that can measure a wide range of physiological values. Such 'wearable' sensors could improve the way chronic conditions are managed. Patients could have more control over their own care if they wished; doctors and nurses could monitor their patients without the expense and inconvenience of visits, except when they are needed. Remote monitoring of patients is already in use for some conditions but there are barriers to its wider use: it relies too much on clinical staff to interpret the sensor readings; patients, confused by the information presented, may become more dependent on health professionals; remote sensor use may then lead to an increase in medical assistance, rather than reduction.

The project seeks to overcome these barriers by addressing two key weaknesses of the current systems:
  1. Their lack of intelligence. Intelligent systems that can help medical staff in making decisions already exist and can be used for diagnosis, prognosis and advice on treatments. One especially important form of these systems uses belief or Bayesian networks, which show how the relevant factors are related and allow beliefs, such as the presence of a medical condition, to be updated from the available evidence. However, these intelligent systems do not yet work easily with data coming from sensors.
  2. Any mismatch between the design of the technical system and the way the people - patients and professional - interact.
We will work on these two weaknesses together: patients and medical staff will be involved from the start, enabling us to understand what information is needed by each player and how to use the intelligent reasoning to provide it.

The medical work will be centred on three case studies, looking at the management of rheumatoid arthritis, diabetes in pregnancy and atrial fibrillation (irregular heartbeat). These have been chosen both because they are important chronic diseases and because they are investigated by significant research groups in our Medical School, who are partners in the project. This makes them ideal test beds for the technical developments needed to realise our vision and allow patients more autonomy in practice.

To advance the technology, we will design ways to create belief networks for the different intelligent reasoning tasks, derived from an overall model of medical knowledge relevant to the diseases being managed. Then we will investigate how to run the necessary algorithms on the small computers attached to the sensors that gather the data as well as on the systems used by the healthcare team. Finally, we will use the case studies to learn how the technical systems can integrate smoothly into the interactions between patients and health professionals, ensuring that information presented to patients is understandable, useful and reduces demands on the care system while at the same time providing the clinical team with the information they need to ensure that patients are safe.

Further information: www.eecs.qmul.ac.uk/~norman/projects/PAMBAYESIAN/

This project also complements another Bayesian networks based project - the Leverhulme-funded project "CAUSAL-DYNAMICS (Improved Understanding of Causal Models in Dynamic Decision Making)" - starting January 2017. See CAUSAL-DYNAMICS

Sunday, 1 January 2017

The problem with the likelihood ratio for DNA mixture profiles


We have written many times before (see the links below) about use of the Likelihood Ratio (LR) in legal and forensic analysis.

To recap: the LR is a very good and simple method for determining the extent to which some evidence (such as DNA found at the crime scene matching the defendant) supports one hypothesis (such as "defendant is the source of the DNA") over an alternative hypothesis (such as "defendant is not the source of the DNA"). The previous articles discussed the various problems and misinterpretations surrounding the use of the LR. Many of these arise when the hypotheses are not mutually exclusive and exhaustive. This problem is especially pertinent in the case of 'DNA mixture' evidence, i.e. when some DNA sample relevant to a case comes from more than one person. With modern DNA testing techniques it is common to find DNA samples with multiple (but unknown number of) contributors. In such cases there is no obvious 'pair' of hypotheses that are mutually exclusive and exhaustive, since we have individual hypotheses such as:
  • H1: suspect + one unknown
  • H2: suspect + one known other 
  • H3: two unknowns
  • H4: suspect + two unknowns 
  • H5: suspect + one known other + one unknown
  • H6: suspect + two known others
  • H7: three unknowns 
  • H8: one known other + two unknowns
  • H9: two known others + one unknown
  • H10: three known others
  • H11:  suspect + three unknowns 
  • etc.
It is typical in such situations to focus on the 'most likely' number of contributors (say n) and then compare the hypothesis "suspect + (n-1) unknowns" with the hypothesis "n unknowns". For example, if there are likely to be 3 contributors then typically the following hypotheses are compared:
  • H1: suspect + two unknowns
  • H2: three unknowns
Now, to compute the LR we have two compute the likelihood of the particular DNA trace evidence E under each of the hypotheses. Generally both of these are extremely small numbers, i.e. both the probability values P(E | H1) and P( E | H2) are very small numbers. For example, we might get something like
  • P(E | H1) = 0.00000000000000000001  (10 to the minus 20)
  • P(E | H2) = 0.00000000000000000000000001  (10 to the minus 26)
For a statistician, the size of these numbers does not matter – we are only interested in the ratio (that is precisely what the LR is) and in the above example the LR is very large (one million) meaning that the evidence is a million times more likely to have been observed if H1 is true compared to H2. This seems to be overwhelming evidence that the suspect was a contributor. Case closed?

Apart from the communication problem in court of getting across what this all means (defence lawyers can and do exploit the very low probability of E given H1) and how it is computed, there is an underlying statistical problem with small likelihoods for non-exhaustive hypotheses and I will highlight the problem with two scenarios involving a simple urn example. Superficially, the scenarios seem identical. The first scenario causes no problem but the second one does. The concern is that it is not at all obvious that the DNA mixture problem always corresponds more closely to the first scenario than the second.

In both scenarios we assume the following:

There is an urn with 1000 balls – some of which are white. Suppose W is the (unknown) number of white balls. We have 2 hypotheses:
  • H1: W=100
  • H2:  W=90
We can draw a ball as many times as we like, note its colour and replace it (i.e. sample with replacement). We wish to use the evidence of 10,000 such samples.

Scenario 1: We draw 1001 white balls. In this case using standard statistical assumptions we calculate P(E | H1) = 0.013, P(E|H2) = 0.0000036. Both values are small but the LR is large, 3611, strongly favouring H1 over H2.

Scenario 2: We draw 1100 white balls. In this case P(E | H1) = 0.000057, P(E|H2) < 0.00000001. Again both values are very small but the LR is very large, strongly favouring of H1 over H2.

(note: in both cases we could have chosen a much larger sample and got truly tiny likelihoods but these values are sufficient to make the point).

So in what sense are these two scenarios fundamentally different and why is there a problem?

In scenario 1 not only does the conclusion favouring H1 make sense, but the actual number of balls drawn is very close to the expected number we would get if H1 were true (in fact, W=100 is the 'maximum likelihood estimate' for number of balls). So not only does the evidence point to H1 over H2, but also to H1 over any other hypothesis (and there are 1000 different hypotheses W=0, W=1, W=2 etc.).

In scenario 2 the evidence is actually even much more supportive of H1 over H2 than in scenario 1. But it is essentially meaningless because it is virtually certain that BOTH hypotheses are false.

So, returning to the DNA mixture example, it is certainly not sufficient to compare just two hypotheses. The LR of one million in favour of H1 over H2 may be hiding the fact that neither of these hypotheses is true. It is far better to identify as exhaustive a set of hypotheses as is realistically possible and then determine the individual likelihood value of each hypothesis. We can then identify the hypothesis with the highest likelihood value and consider its LR compared to each of the other hypotheses.

Friday, 18 November 2016

Researcher Dr Anthony Constantinou has his identify 'stolen' in the name of convicted multi-millionaire sex molester Anthony Constantinou


Dr Anthony Constantinou is a researcher in Bayesian AI methods at Queen Mary University of London. He currently works on the ERC-funded BAYES-KNOWLEDGE project led by Prof Norman Fenton and has been a recent visitor at the Isaac Newton Institute University of Cambridge Programme Probability and Statistics in Forensic Science.

While this Anthony Constantinou is well respected within the AI research community - and has also gained a strong reputation for his work in applying Bayesian methods to football prediction - there is another much more well known Anthony Constantinou, namely the multi-millionaire son of tycoon Aristos Constantinou who was murdered at his luxury home in the Bishops Avenue London in 1985. After building up his own business empire this Anthony Constantinou - named by the media as "UK's Wolf of Wall Street" - has been in the news for all the wrong reasons: first with the fraud investigation of his CWM business and then with his trial and recent conviction for sexual assaults on a number of women who worked for him.


News reports on Anthony Constantinou


Now, incredibly, it has been discovered that social media accounts (twitter, facebook, pinterest, youtube channel and blog) in the name of the convicted Anthony Constantinou are claiming the academic and research achievements of Dr Anthony Constantinou. We have no way of knowing whether these accounts are authentic or if they were created for malicious reasons by a third party but somebody has certainly taken a lot of trouble to create this identify theft:
  • this twitter account with his photo is especially deceptive because it claims Dr Constantinou's achievements but also has articles with his views on football (Dr Constantinou publishes weekly premiership match research-driven predictions based on his pi-football website). 
  • this blogspot account called ‘anthonyisback'
  • this youtube channel -  the powerpoint videos that appear here confirm somebody has gone to significant effort to carry out the identity fraud. 
  • Facebook account
  • the 'reliable Anthony Constantinou updates' pinterest site
  • this twitter account in the name of CWM World



See also:


 

Tuesday, 8 November 2016

Confusion over the Likelihood Ratio


7 Jan 2017: There is an update to this post here.

The 'Likelihood Ratio' (LR) has been dominating discussions at the third workshop  in our Isaac Newton Institute Cambridge Programme Probability and Statistics in Forensic Science.
There have been many fine talks on the subject - and these talks will be available here for those not fortunate enough to be attending.

We have written before (see links at bottom) about some concerns with the use of the LR. For example, we feel there is often a desire to produce a single LR even when there are multiple different unknown hypotheses and dependent pieces of evidence (in such cases we feel the problem needs to be modelled as a Bayesian network)- see [1]. Based on the extensive discussions this week, I think it is worth recapping on another one of these concerns (namely when hypotheses are non-exhaustive).

To recap: The LR  is a formula/method that is recommended for use by forensic scientists when presenting evidence - such as the fact that DNA collected at a crime scene is found to have a profile that matches the DNA profile of a defendant in a case. In general, the LR is a very good and simple method for communicating the impact of evidence (in this case on the hypothesis that the defendant is the source of the DNA found at the crime scene).

To compute the LR, the forensic expert is forced to consider the probability of finding the evidence under both the prosecution and defence hypotheses. So, if the prosecution hypothesis Hp is "Defendant is the source of the DNA found" and the defence hypothesis Hp is "Defendant is not the source of the DNA found" then we compute both the probability of the evidence given Hp - written P(E | Hp) - and the probability of the evidence given Hd - written P(E | Hd). The LR is simply the ratio of these two likelihoods, i.e. P(E | Hp) divided by P(E | Hd).

The very act of considering both likelihood values is a good thing to do because it helps to avoid common errors of communication that can mislead lawyers and juries (notably the prosecutor's fallacy). But, most importantly, the LR is a measure of the probative value of the evidence. However, this notion of probative value is where misunderstandings and confusion sometimes arise. In the case where the defence hypothesis is the negation of the prosecution hypothesis (i.e. Hd is the same as "not Hp" as in our example above) things are clear and very powerful because, by Bayes theorem:
  • when the LR is greater than one the evidence supports the prosecution hypothesis (increasingly for larger values) - in fact the posterior odds of the prosecution hypothesis increase by a factor of LR over the prior odds.
  • when the LR is less than one it supports the defence hypothesis (increasingly as the LR gets closer to zero) -  the posterior odds of the defence hypothesis increase by a factor of LR over the prior odds.
  • when the LR is equal to one then the evidence supports neither hypothesis and so is 'neutral' - the posterior odds of both hypotheses are unchanged from their prior odds. In such cases, since the evidence has no probative value lawyers and forensic experts believe it should not be admissible.
However, things are by no means as clear and powerful when the hypotheses are not exhaustive (i.e. the negation of each other) and in most forensic applications this is the case. For example, in the case of DNA evidence, while the prosecution hypothesis Hp is still "defendant is source of the DNA found" in practice the defence hypothesis Hd is often something like "a person unrelated to the defendant is the source of the DNA found".

In such circumstances the LR can only help us to distinguish between which of the two hypotheses is more likely, so, e.g.  when the LR is greater than one the evidence supports the prosecution hypothesis over the defence hypothesis (with larger values leading to increased support). However, unlike the case for exhaustive hypotheses, the LR tells us nothing about the change in odds of the prosecution hypothesis. In fact, it is quite possible that the LR can be very large - i.e. strongly supporting the prosecution hypothesis over the defence hypothesis - even though the posterior probability of the prosecution hypothesis goes down.  This rather worrying point is not understood by all forensic scientists (or indeed by all statisticians). Consider the following example (it's a made-up coin tossing example, but has the advantage that the numbers are indisputable):
Fred claims to be able to toss a fair coin in such a way that about 90% of the time it comes up Heads. So the main hypothesis is
  H1: Fred has genuine skill
To test the hypothesis, we observe him toss a coin 10 times. It comes out Heads each time. So our evidence E is 10 out of 10 Heads. Our alternative hypothesis is:
  H2: Fred is just lucky.

By Binomial theorem assumptions, P(E | H1) is about 0.35 while P(E | H2) is about 0.001. So the LR is about 350, strongly in favour of H1.

However, the problem here is that H1 and H2 are not exhaustive. There could be another hypotheses H3: "Fred is cheating by using a double-headed coin". Now, P(E | H3) = 1.

If we assume that H1, H2 and H3 are the only possible hypotheses* (i.e. they are exhaustive) and that the priors are equally likely, i.e. each is equal to 1/3 then the posteriors after observing the evidence E are:

H1: 0.25907     H2: 0.00074        H3: 0.74019

So, after observing the evidence E, the posterior for H1 has actually decreased despite the very large LR in its favour over H2.
In the above example, a good forensic scientist - if considering only H1 and H2 - would conclude by saying something like
"The evidence shows that hypothesis H1 is 350 times more likely than H2, but tells us nothing about whether we should have greater belief in H1 being true; indeed, it is possible that the evidence may much more strongly support some other hypothesis not considered and even make our belief in H1 decrease". 
However, in practice (and I can confirm this from having read numerous DNA and other forensic case reports) no such careful statement is made. In fact, the most common assertion used in such circumstances is:
 "The evidence provides strong support for hypothesis H1"
Such an assertion is not only mathematically wrong but highly misleading. Consider, as discussed above, a DNA case where:

 Hp is "defendant is source of the DNA found"
 Hd is  "a person unrelated to the defendant is the source of the DNA found".

This particular Hd hypothesis is a common convenient choice for the simple reason that P(E | Hd) is relatively easy to compute (it is the 'random match probability'). For single-source, high quality DNA this probability can be extremely small - of the order of one over several billions; since P(E | Hp) is equal to 1 in this case the LR is several billions. But, this does NOT provide overwhelming support for Hp as is often assumed unless we have been able to rule out all relatives of the defendant as suspects. Indeed, for less than perfect DNA samples it is quite possible for the LR  to be in the order of millions but for a close relative to be a more likely source than the defendant.

While confusion and misunderstandings can and do occur as a result of using hypotheses that are not exhaustive, there are many real examples where the choice of such non-exhaustive hypotheses is actually negligent.  The following worrying example is based on a real case (location details changed as an appeal is ongoing):
The suspect is accused of committing a crime in a particular rural location A near his home village in Dorset. The evidence E is soil found on the suspect's car.  The prosecution hypothesis Hp is "the soil comes from A". The suspect lives (and drives) near this location but claims he did not drive to that specific spot. To 'test' the prosecution hypothesis a soil expert compares Hp with the hypothesis Hd: "the soil comes from a different rural location". However, the 'different rural location'  B happens to be 500 miles away in Perth Scotland (simply because it is close to where the soil analyst works and he assumes soil from there is 'typical' of rural soil). To carry out the test the expert considers soil profiles of E and samples from the two sites A and B.

Inevitably the LR strongly favours Hp (i.e. site A)  over Hd (i.e. site B); the soil profile on the car - even if it was never at location A - is going to be much closer to the A profile than the B profile. But we can conclude absolutely nothing about the posterior probability of A. The LR is completely useless - it tells us nothing other than the fact that the car was more likely to have been driven in the rural location in Dorset than in a a rural location in Perth. Since the suspect had never driven the car outside Dorset this is hardly a surprise.  Yet, in the case this soil evidence was considered important since it was wrongly assumed to mean that it "provided support for the prosecution hypothesis".
This example also illustrates, however, why in practice it can be impossible to consider exhautive hypotheses. For such soil cases, it would require us to consider samples from every possible 'other' location. What an expert like Pat Wiltshire (who is also a participant on the FOS programme) does is to choose alternative sites close to the alleged crime scene and compare the profile of each of those and the crime scene profile with the profile from the suspect. While this does not tell us if the suspect was at the crime scene it can tell us how much more likely the suspect was to have been there rather than sites nearby.

*as pointed out by Joe Gastwirth there could be other hypotheses like "Fred uses the double-headed coin but switches to a regular coin after every 9 tosses".

References
  1. Fenton N.E, Neil M, Berger D, “Bayes and the Law”, Annual Review of Statistics and Its Application, Volume 3, 2016 (June), pp 51-77 http://dx.doi.org/10.1146/annurev-statistics-041715-033428 .Pre-publication version here and here is the Supplementary Material See also blog posting.
  2. Fenton, N. E., D. Berger, D. Lagnado, M. Neil and A. Hsu, (2013). "When ‘neutral’ evidence still has probative value (with implications from the Barry George Case)", Science and Justice, http://dx.doi.org/10.1016/j.scijus.2013.07.002.  A pre-publication version of the article can be found here.

See also previous blog postings:



Friday, 7 October 2016

Bayesian Networks and Argumentation in Evidence Analysis


Some of the workshop participants
On 26-29 September 2016 a workshop on "Bayesian Networks and Argumentation in Evidence Analysis" took place at the Isaac Newton Institute Cambridge. This workshop, which was part of the FOS Programme was also the first public workshop of the ERC-funded project Bayes-Knowledge (ERC-2013-AdG339182-BAYES_KNOWLEDGE).

The workshop was a tremendous success, attracting many of the world's leading scholars in the use of Bayesian networks in law and forensics. Most of the presentations were filmed and can now be viewed here.

There was also a pre-workshop meeting on 23-24 September where participants focused on an important Dutch case that recently went to appeal. The partcipants were divided into two groups - one group developed a BN model of the case and the other developed an agumentation/scenarios-based model of the case. We plan to further develop these and write up the results.

Some of the participants at the pre-workshop meeting anyalysing a specific Dutch case