Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £35.99 £24.99. New subscribers only T&Cs apply

Find out more

Statistical analysis

In this video, Dr Gavin Turbett and Dr Vivek Sahajpal challenge the myth of uniqueness of DNA profiles.
It is believed that the entire human genome of each person is unique. However, forensic DNA testing does not test the entire genome. It only tests a very small fraction of it. There is a large multinational research project going on called the 1000 Genomes Project, where they’re looking to examine 1000 DNA profiles at least, and look at all the variation that actually occurs in those thousand people. And some of the reports that have come out of this study have indicated that there are probably something like 700,000 short tandem repeat sequences in the human genome and current forensic DNA test kits only test about 20 of them.
So you can see we’re only testing a very, very small fraction of the entire genome. With the technology that we use at the moment, identical twins and identical triplets, will have exactly the same forensic DNA profile. They cannot be told apart with this particular technology. Furthermore, we also expect that close biological relatives, such as siblings, are more likely to have the same DNA profile than two people chosen at random. As the number of loci increased, it provides greater powers of discrimination, particularly between close biological relatives. It is also important to remember that what we define as a forensic DNA profile is not static. It actually changes over time.
So 15 years ago, what we defined as a DNA profile was generated with a test kit called Profiler Plus, and that led us to examine nine short tandem repeats plus gender. The test kit we’re using at the moment, PowerPlex 21 allows us to test 20 STR markers plus gender. So we define as a DNA profile has gone from nine plus sex to 20 plus sex in a fairly short space of time, and it will continue to evolve. There are also complicating factors such as mixed DNA profiles, artefacts, degradation, allelic drop-out, the possibility of laboratory contamination or the possibility of error. These factors also can have a significant impact on the outcome of the results.
In regards to the statistical evaluation of DNA profiles, there are a number of ways that forensic DNA statistics can be expressed. A widely preferred approach is known as likelihood ratio or LR. This is the approach that’s used throughout Australia and New Zealand. A likelihood ratio compares the probability of observing the DNA evidence under two alternative or competing hypotheses. These two hypotheses are mutually exclusive. Only one of them could be true. They can’t both be true, and they are typically framed so that they represent the positions of both the prosecution and the defence in the particular case to explain that DNA evidence.
So we might we might refer to them as H1 -the prosecution hypothesis and H2-the defence hypothesis and the likelihood ratio is a scale. It can either be expressed as a number or it can be turned into words, and it’s an expression of support for one or the other of those two propositions. So the level of support for one proposition over the other depends on the amount of information that’s present. So this is a document that ANZPAA-NIFS issued in 2017, and it’s explains the likelihood ratio. So right in the middle with a likelihood ratio of exactly one, we have something that is absolutely neutral. It supports neither the H1 or H2 hypotheses.
As the scale moves to the right, we see have increasing support for hypothesis H1. And if the likelihood ratio goes below one, decreasing support for hypothesis H2. And I’ll now show this as some examples. These are the verbal terms that can be used to explain a likelihood ratio. So the likelihood ratio is expressed as the probability of the evidence if H1 is true, divided by the probability of the evidence if H2 is true and again, it can be expressed either as a number or as a verbal equivalent. And so you can see this is the scale is effectively a mirror image of each other.
But as we start to move in one direction or the other, the wording moves from slight support, moderate support, strong support, very strong support or extremely strong support for one proposition or the other. Here’s an example where we have that might be used to explain a single source DNA profile. So we have a single source DNA crime stain, and it matches a particular individual that the accused person. The prosecution hypothesis H1 will be that the DNA profile is from the accused person. The defence might argue that the DNA profile is not from their client, but is from someone else. The DNA evidence would be assessed statistically, and the outcome would be a statement.
Something along the lines of the DNA evidence is approximately 10 billion times more likely if the DNA profile originated from the accused than if it had originated from another person shown on that scale, you could see the the red X there. That that number of 10 billion would be far to the right on the H1 side, and it could be verbally expressed as extremely strong support for H1. In this example, we have a mixed DNA profile, and the prosecution hypothesis might be that the two person mixed DNA profile has come from the complainant and the accused person. Whereas the defence hypothesis could be that the two person mixed DNA profile has come from the complainant and an unknown person.
Following statistical evaluation, this result
might be reported as follows: the DNA evidence is approximately 900 times more likely if the mixed DNA profile originated from complainant and accused, than if it had originated from complainant and unknown. Shown on that scale, you can see it is still on the H1 side. It is still support for each one, and it could be verbally stated as strong support for H1. But you can see it’s not as extreme as the previous example. In this example, here we may have a very complicated four person DNA profile, and the prosecution might be suggesting that the four person mixed DNA profile has come from the complainant, the accused and two other unknown people.
Whereas the defence might argue that the four person mixed DNA profile has come from the complainant and three unknown people. None of them, of course, being their client. Following statistical evaluation, we might determine that the outcome actually favours H2. So it might be 0.001. And this would be then expressed, it could be shown as being moderate support for H2. We would choose to perhaps word something like that as saying the DNA evidence is more likely if accused is not a contributor, to try and avoid any risk of confusion. An important aspect as far as the statistical analysis is concerned, are the population databases.
Population databases are used to calculate how common or rare a DNA profile is in a particular population. A database should be of sufficient size enough to capture most common alleles of the population to have reliable allele frequency estimates. Research has indicated that samples of more than 200, about 200 non-related individuals which have been collected randomly from a population, will capture most of alleles, which are present in the population and will yield about 240 to 300 alleles. 200 alleles have become the de facto minimum size for a DNA database for a population with no existing data, allele frequency data of the closely related or similar population can be used.
As far as the statistical interpretation is concerned in the Indian scenario, there is limited use of the statistical methods for DNA forensics. The main reason has been the lack of population databases of allele frequencies. Being a large and diverse country, databases are not available for many populations. However, in recent years, population genetic studies have been reported and allele frequency data for data for certain populations is now available. The use of probabilistic starts is increasing, especially in the field of paternity testing. Still, a lot has to be done to cover the all other aspects of DNA testing. Last but not the least regarding statistical interpretation, is that the data and the process should be comprehensible to the stakeholders.
That is the police, prosecution and judiciary. They should be able to appreciate the statistical analysis, then only it will be useful to apply to the forensic DNA analysis.

Are DNA profiles unique? What are likelihood ratios and why are they relevant in forensic DNA profiling? What is the use of a population database? Why is the use of statistical analysis limited in India? Let us explore these questions with Dr Turbett and Dr Sahajpal.

As you’ll recall from the lessons in week 4, we examine only certain loci within the non-coding regions of the DNA and, two unrelated individuals might share alleles at a particular locus. While every individual’s genome is unique, are forensic DNA profiles unique as well? Is it scientifically accurate to use the term ‘match’ while reporting DNA results?

In your jurisdiction, what kind of statistical analysis is conducted in DNA cases? Have you seen terms such as ‘match’, ‘perfect match’ or ‘100 % match’ in forensic DNA profiling reports? Share with us in the comments below.

*References for images

Certain images in this video are sourced from ‘Australia New Zealand Policing Advisory Agency, National Institute of Forensic Science: An introductory guide to evaluative reporting (2017)’.

This article is from the free online

Decoding Forensics for Legal Professionals

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now