P(data | hyp) ≠ P(hyp | data).
The term “pet” is particularly apt in the case of yesterday’s PBS “Here and Now” interview on animal DNA analysis in forensic science. The program’s host interviewed WBUR’s Vicki Croke, who has written an engaging and informative account of “Pet CSI: How Dog and Cat DNA Nabs Bad Guys.” She begins with the following case:
On Sept. 14, 2000, Wayne Shumaker, 58, Corby Myer, 30, and Lynn Ganger, 54—three carpenters building a barn loft at an upscale property near Lakeville, Indiana—were bound and shot execution style during an armed robbery. Less than two years later, the triggerman in the case, Phillip Stroud, was found guilty on all three counts of murder and sentenced to life in prison. The criminal was done in—at least in part—by the dog droppings he had stepped in during the commission of the crime. It turns out that dog feces not only messed up his sneakers, but his defense too. It was a simple mistake that was exploited by the prosecution using some new and very sophisticated science. Samples from Stroud’s sneakers were compared to dog feces at the barn. Through DNA analysis (as they exit, feces snag DNA-carrying epithelial cells from the colon), the specimens turned out to be a perfect match—proof positive that the defendant had been present at the scene of the crime.In the interview, Ms. Croke elaborated on "proof positive" as follows:
[T]he lab needs to calculate probabilities. How common is this particular pattern of DNA in the wider population? In other words, how likely is it that this hair could have come from any other dog or cat than the one linking the criminal to the crime? In the triple murder case we were talking about, the probability that the feces on the suspect’s sneaker came from any other dog than the one at the scene of the crime was one in ten billion!Here and Now's webpage thus refers to "a probability test to determine how likely it is that the DNA comes from any other animal in the area."
These characterizations are fairly typical examples of transposition. The data in the triple-murder case are the pair of DNA profiles that are said to match. The hypothesis is that the source of the sneaker DNA is a different dog. We’ll call this the defense hypothesis, def-hyp. Assuming no laboratory error in profiling ever occurs, the probability of the data—the matching profiles—given that they came from different (and unrelated) dogs is the frequency of the profile in the “wider population.” Let’s assume that one in ten billion is a good estimate of that probability. That is,
P(data | def-hyp) = 1/10,000,000,000.
Is “the probability that the [DNA] came from any of other dog” also 1/10,000,000,000? Not exactly. This probability is P(def-hyp | data). According to Bayes' rule, it depends not only on P(data | def-hyp), but also on two additional probabilities. For one thing, we need to know the probability of the data given the prosecution’s hypothesis, P(data | pros-hyp). This probability is 1 (if the lab is certain to declare a match when the two samples really contain the same dog's DNA).
Another factor to consider is the prior probability of the defense hypothesis, P(def-hyp). How many alternative dogs could have been crossing defendant’s paths in the weeks before the murder? One thousand seems like a lot. If we take the prior odds for the defense hypothesis to be 1,000:1, then the match to the dog doodoo in the barn reduces these odds to 1,000:10,000,000,000 = 1:10,000,000.
What the moral? Transposition is wrong, but almost everybody, from journalists to jurors, does it. A DNA match to a random, unrelated dog may be a one in ten-billion event, but it does not follow that the probability that defendant stepped on stuff from an unrelated dog is one is ten billion. That said, if the random-match probability is as infinitesimal as one in ten billion, the probability of the defense hypothesis (about an unrelated dog being the source of a true match) is still doggone small. Transposition should be avoided, but it is not always the most grievous of errors.