Sunday, August 31, 2014

Hazard Ratios and Heart Failure

Today’s big news in medicine is a new drug, designated LCZ696 by its manufacturer, Novartis. According to the New York Times, LCZ696 “has shown a striking efficacy in prolonging the lives of people with heart failure and could replace what has been the bedrock treatment for more than 20 years.” [1] Specifically, more than 8,400 patients in 47 countries enrolled in a randomized, double-blind experiment in which they received either LCZ696 or an ACE inhibitor called enalapril (in addition to whatever else their doctors prescribed).

The trial was halted after a median follow-up time of 27 months “ because the boundary for an overwhelming benefit with LCZ696 had been crossed.” [2] “By that point, 21.8 percent of those who received LCZ696 had died from a cardiovascular cause or had been hospitalized for worsening heart failure. That figure was 26.5 percent for those receiving enalapril. That represents a 20 percent relative reduction in risk using a statistical measure called the hazard ratio.” [1]

This is good news for patients (if the drug receives regulatory approval and performs as expected in practice). But the account in the Times poses a small statistical puzzle. How does the difference between 21.8 and 26.5 percentage points translate into “a 20 percent relative reduction in risk”? The average risk across patients dropped by 26.5 – 21.8 = 4.7 percentage points. This absolute reduction is appreciable, but 4.7 percentage points is not 20% of the original 26.5 percent risk of hospitalizations or deaths in the control group (4.7 / 26.5 = 17.7%). What accounts for the discrepancy?

The answer lies in the details of a technique known in biostatistics as survival analysis. The statistical technique is not limited to the analysis of death rates. It can be applied to all sorts of situations involving different times to some outcome. The outcome can be the overruling of a Supreme Court case, the firing of a worker, or the exoneration of a prison inmate sentenced to die, to pick a few examples from forensic statistics.

So what does the 20% “relative reduction in risk” cited in the Times article mean? Well, a hazard function is the probability that if you survive to a given time t (the event in question has not already occurred), you will survive in the next instant. A hazard ratio is the ratio of the hazard in the treatment group to the hazard in the control group at t. The heart failure study used an estimation procedure known as proportional hazards regression, which assumes that the hazard in one group is a constant proportion of the hazard in the other group. Under this assumption, in a clinical trial where death is the endpoint, the hazard ratio indicates the relative likelihood of death in treated versus control subjects at any given point in time.

Thus, unlike the ordinary relative risk discussed in many court opinions, the “hazard ratio” is not simply the proportion with a disease in an exposed group divided by the proportion in an unexposed group. In the LCZ696 study, the hazard ratio was 0.80, meaning that the probability that a randomly selected patient taking LCZ696 would die from or be hospitalized for heart failure the next day is 80% of the probability for a randomly selected patient taking enalapril. To put it another way, the probability of hospitalization or death tomorrow from heart failure drops by 20% when LCZ696 is substituted for enalapril.

Yet a third formulation is that the odds that a randomly selected patient treated with LCZ696 will be hospitalized or die sooner than a randomly selected control patient are 0.8 (to 1) — that's 4 to 5, corresponding to a probability of 4/9 = 44%. [3]

How long either patient can expect to live and avoid hospitalization from heart failure is another story. As one article on hazard ratios explains, “[t]he difference between hazard-based and time-based measures is analogous to the odds of winning a race and the margin of victory.” [3] By itself, the hazard ratio picks the winning horse (probably), but it does not give the number of lengths for its expected success.

  1. Andrew Pollack, New Novartis Drug Effective in Treating Heart Failure, N.Y. Times, Aug.31, 2014, at A4
  2. John J.V. McMurray et al., Angiotensin–Neprilysin Inhibition versus Enalapril in Heart Failure, New Engl. J. Med., Aug. 30, 2014
  3. Spotswood L. Spruance et al., Hazard Ratio in Clinical Trials, 48 Antimicrobial Agents and Chemotherapy 2787 (2004)

Thursday, July 31, 2014

The FBI's Worst Hair Days

An article by Spencer Hsu in yesterday's Washington Post suggests that the FBI lost a tug of war within the Justice Department. In 2012, the Bureau commenced a comprehensive review of the testimony of FBI hair analysts about matches to defendants in criminal cases before 2000. In those pre-DNA-evidence days, microscopic hair comparisons were valuable for seeing whether a suspect could be the source of a hair at a crime scene. (They still are, but the FBI now uses mitochondrial DNA testing to demonstrate a positive association and relies on visual comparison to screen out nonmatching hairs.)

Clearly, an inclusion—that is, two hairs with a similar set of features—was never definitive. Even hairs from the same individual vary in certain respects. But hairs from the same individual are more likely to "match" than hairs from different individuals. Thus, a careful hair analyst should have reported a negative finding as an exclusion and a positive finding with words like "not excluded," "could have," "consistent with," or "match, but."

After it became apparent that the FBI’s analysts were not always being this careful, the Department of Justice agreed to “identify[] historical cases for review where a microscopic hair examination conducted by the FBI was among the evidence in a case that resulted in a conviction ... with the goal of reaching final determinations in the coming months.” That was 2012. The Post article reports that in 2013, the FBI stopped the reviews. It started them back up this month, on orders from the Deputy Attorney General.

The FBI attributes the delay, in part, to “a vigorous debate that occurred within the FBI and DOJ about the appropriate scientific standards we should apply when reviewing FBI lab examiner testimony — many years after the fact.” To get a sense of what this debate might have been about, it may be useful to examine the two specific cases mentioned in the Post article on “forensic errors.”

The Exoneration of Santae Tribble

The article includes an imposing photograph of Santae A. Tribble. The caption explains that Tribble, who was convicted in Washington, D.C., at age 17, “spent 28 years in prison based largely on analysis of hairs found at the scene of a taxi driver’s murder in 1978. More advanced DNA testing showed that none of the hairs used as evidence shared Tribble’s genetic profile. A judge has vacated his conviction and dismissed the underlying charges.”

There is no denying that evidence suggesting that an innocent man is guilty is erroneous, but is it a laboratory error? Some people argue that microscopic hair evidence is unvalidated and because it sometimes incriminates innocent people, it should be inadmissible. But if that is correct, why go through the trouble of reviewing all the cases? The FBI could just send out letters in every case saying that the laboratory no longer stands by the unvalidated testimony its examiners gave.

Surely there was (and is) some useful information in microscopic hair comparisons. A 2002 FBI study showed that DNA testing confirmed most visual microscopic associations (almost 90%) on a sample of hairs from casework. For a small minority of hair comparisons—as in Mr. Tribble’s case—microscopy produced false positives.The specificity of the technique—like that of drug tests, tests for strep throat, and so many other things—is not 100%.

Inasmuch as all hair comparisons cannot summarily be dismissed as invalid, what makes the comparison in the Tribble case a departure from the FBI calls “appropriate scientific standards”? An article from the National Association of Criminal Defense Lawyers (NACDL), which is cooperating in the process of reviewing the cases, describes the criteria as follows:
Error Type 1: The examiner stated or implied that the evidentiary hair could be associated with a specific individual to the exclusion of all others.

Error Type 2: The examiner assigned to the positive association a statistical weight or probability or provided a likelihood that the questioned hair originated from a particular source, or an opinion as to the likelihood or rareness of the positive association that could lead the jury to believe that valid statistical weight can be assigned to a microscopic hair association.

Error Type 3: The examiner cites the number of cases or hair analyses worked in the lab and the number of samples from different individuals that could not be distinguished from one another as a predictive value to bolster the conclusion that a hair belongs to a specific individual.
Which of these errors did the FBI laboratory commit in Mr. Tribble’s case? According to an earlier Post article on the case, “A police dog found a stocking on a sidewalk a block away [from the victim’s body]. Months later, the FBI would report that a single hair inside it matched Tribble’s ‘in all microscopic characteristics.’” Ideally, the analyst would have added that hair from other people also could have matched, or, at the least, defense counsel should have elicited this fact on cross-examination.

No such significant qualifications or caveats emerged. Instead, according to the Innocence Project, the FBI analyst "testified that one of the hairs from the stocking mask linked Tribble to the crime." The National Registry of Exonerations reports that he "said ... the hair in the stocking came from Tribble." Such testimony seems to be an "Error Type 1," although it is not clear from these descriptions whether the "link" was explicitly "to the exclusion of all others."

The latter phrase was extremely popular among analysts of impression and patterns (like fingerprints and toolmarks) who believed that their disciple studies characteristics that can exist in their particulars in only one object in the universe. Of course, the words "to the exclusion" are logically redundant. If the analyst believed that "the hair ... came from Tribble," then he must have believed that it did not come from anyone else. But one can believe that a named individual is the source of a trace (because that is the most likely conclusion) without believing it is impossible for anyone else to have been the source (which is, I think, is what "to the exclusion" was supposed to mean).

Thus, there is an ambiguity in the meaning of an "Error Type 1." How explicit must the analyst be in excluding all other individuals as contributors of the hair? The NACLD's description of the criteria indicates that a literal use of the phrase is not critical. The article illustrates the error with the following, hypothetical testimony:
I found brown, Caucasian head hairs on two items of clothing, the sports coat, and a pair of slacks that were reported to me as belonging to (the defendant). Now, these hairs matched in every observable microscopic characteristic to that known hair sample of DEC (the decedent) and consistent with having originated from her. In my opinion, based on my experience in the laboratory and having done 16,000 hair examinations, my opinion is that those hairs came from DEC.
But regardless of whether Tribble's trial testimony included an "Error Type 1" as the FBI has defined the errors, it was excessive. The analyst should have stuck to reporting the results of the comparison and not made a source attribution.

In addition to the analyst's overstated testimony, the prosecutor came vanishingly close to making the “Error Type 2.” He argued in closing that “There is one chance, perhaps for all we know, in 10 million that it could [be] someone else’s hair.

In the end, however, what exonerated Tribble was not the recognition of the hyperbole of the expert and the prosecutor, but the proof from a DNA test that the hair on the stocking probably worn by the actual murderer was not his.

The Conviction of John Duckett

The second case of "forensic error" discussed in the Post article is the trial of James Duckett, a former police officer in Florida. The Post article cites this case as an example of “the continued inadequacy of officials’ response.”

Duckett was convicted and sentenced to death for sexually assaulting, strangling, and drowning an 11-year-old girl. Unlike Tribble, Duckett has not proved actual innocence. Without such proof, even a letter from the FBI disowning some parts of the testimony in the case may not be a get-out-of-jail card.

The analyst in the case was the now notorious Michael Malone. The Post notes that Malone was "discredited in a 1997 inspector general’s report on misconduct at the FBI lab." This report came about nine years after Duckett's conviction, and Duckett made sure the Florida courts heard about it. At the center of Duckett's latest postconviction motion was a report from an expert who had been hired by the FBI in response to the first OIG report to "review[] many cases—particularly death penalty cases—in which Malone offered expert testimony." This expert was sharply critical of Malone's documentation of his work and the unsupportable "degree of analytical certainty" with which Malone testified about the hairs in Duckett's case.

Would a speedier review on the FBI's part have made a difference? I doubt it and have juxtaposed some of the Post’s description of the case with the court’s to indicate why.

Duckett, then a rookie police officer in Mascotte, Fla., was convicted of raping and strangling Teresa McAbee, 11, and dumping her into a lake in 1987.

... Malone ... testified at trial that there was a “high degree of probability” that the hair came from Duckett.

Such testimony is scientifically invalid, according to the parameters of the current FBI review, because it claims to associate a hair with a single person “to the exclusion of all others.”

The Florida court denied Duckett’s request for a new hearing on Malone’s hair match. The court noted that there was other evidence of Duckett’s guilt and that the FBI had not entirely abandoned visual hair comparison.

Malone also explained that hair analysis is not as precise as fingerprints for identifying someone. Malone expressly stated that he could not say that a particular hair came from a specific person to the exclusion of anyone else.

(1) [T]he victim was last seen in Duckett's patrol car; (2) the tire tracks at the murder scene were consistent with those from Duckett's car; (3) no one saw Duckett, the only policeman on duty in Mascotte, from the time he was last seen with the victim until the time he met the victim's mother at the police station; (4) numerous prints of the victim were found on the hood of Duckett's patrol car, although he denied seeing her on the hood; (5) a pubic hair found in the victim's underpants was consistent with Duckett's pubic hair and inconsistent with the others in contact with the victim that evening; and, (6) during a five-month period, Duckett, contrary to department policy, had picked up three young women in his patrol car while on duty and engaged in sexual activity with one and made sexual advances toward the other two.

Of course, the arguably redeeming parts of Malone's testimony and the state's other evidence of guilt do not condone or excuse any foot dragging by the FBI, but they do indicate the complexities that can arise in untangling the consequences of analysts' overstated testimony.


Tuesday, July 29, 2014

A Long Shot Pays Off in Long Island

A family member shouted “we love you” as police took John Bittrolff back to jail. A court in Long Island had just ordered him held without bail on charges of murdering two women over 20 years ago. “Some arrests take a few hours, some days; some take 20 years,” Suffolk County Police Commissioner Edward Webber told reporters.

If police have the killer, it is a success for “familial searching” — the practice of trawling a database for near misses that are especially likely to arise when the source of traces of DNA recovered from crime scenes or victims are very close relatives of one of the “inhabitants” of the database — convicted offenders or, increasingly, arrestees.

Mr. Bittrolff’s DNA profile was not in the New York database. (He had been arrested, but not convicted, for assault in 1993.) However, last year, his brother, Timothy, had been required to give a sample of DNA after a conviction for violating protective orders. DNA from semen found inside the bodies of both women pointed to a brother of Timothy as the source of that semen. But the two victims were said to have been prostitutes, and Mr. Bittrolff’s counsel have been quick to note that "having sex does not mean killing."

In addition to adding to the modest number of possibly successful “outer directed” database trawls, the case is interesting for some procedural twists involving the acquisition of DNA samples. As in the California “grim sleeper” case, police did not initially seek a court order for a sample of their suspect’s DNA to verify that he was indeed associated with the victim’s bodies. Instead, detectives helped themselves to paper bags of garbage left in front of John Bittrolff's house. Among the plastic cups, drink bottles, toothpicks, straws, crawfish heads, cotton swabs, and bandages, they found DNA from his sons, his brother, and his wife (who they trailed until they collected a cigarette butt that she tossed from the window of her truck while driving to work). And, on one paper cup, they found a DNA profile that matched the semen.

But the police were not satisfied. They arrested wiped DNA from a cup of water John Bittrolff drank after his arrest. And even that was not enough. The assistant district attorney (ADA) then applied for a court order to force the twice-DNA-matched suspect to submit to DNA sampling.

Defense lawyers objected that a third sample from Mr. Bittrolff was manifestly unnecessary. The ADA’s response was that prosecutors are entitled to a "judicially approved" DNA sample to present to a grand jury. The court issued the order, and that is where the case stands as of now.

I cannot say that I understand the prosecutor’s reasoning. Unless New York grand jury procedure is very different from the norm, a prosecutor can introduce all manner of evidence without judicial approval. Grand jurors can even rely on unconstitutionally seized evidence without offending the Fourth Amendment.

Was the ADA looking ahead to the trial? Would he want to avoid having to explain the artifices — the “familial searching,” the personal surveillance of family members, and the garbage pull — that the police used to acquire the earlier samples? He might be able to excise all that from the case with a “judicially approved” sample. In any event, the People will present their evidence to the grand jury on Thursday.


The information on the case comes from the following media reports:
I have taken the liberty of using some words in these articles without quotation marks. For a detailed article on the nature and constitutionality of outer-directed DNA database trawling, see David H. Kaye, The Genealogy Detectives: A Constitutional Analysis of “Familial Searching”, 51 Am. Crim. L. Rev. 109 (2013)

Monday, July 28, 2014

Looking Backwards: How Safe Are Fingerprint Identifications?

Yesterday, I explained why the frequency with which factors like confessions are found in cases of wrongful convictions does not measure the general prevalence of those factors. I questioned one published claim that false confessions occur in at least 25% of all cases. My argument was not that this conclusion is wrong, but rather that the studies of false convictions do not provide data that are directly applicable to estimating prevalence.

My analysis was not confined to confessions. It is based on the fact that the wrongful-conviction studies are retrospective. We take the outcome—a false conviction—and ask what evidence misled the judge or jury. This backwards look reveals the frequency of the type of evidence e given false convictions. The statistic P(e|FC) is equal to the prevalence of e in all cases if there is no association between false convictions and e. 1/ In general, such independence is most unlikely.

The flip side of invoking wrongful conviction statistics to conclude that false confessions are common is calling on them to show that fingerprint misidentifications are extremely rare. In United States v. Herrera, 704 F.3d 480 (7th Cir. 2013), Judge Richard Posner wrote that
Of the first 194 prisoners in the United States exonerated by DNA evidence, none had been convicted on the basis of erroneous fingerprint matches, whereas 75 percent had been convicted on the basis of mistaken eyewitness identification. 2/
For this remark, he received some flak. Northwestern University law professor Jay Koehler chastized Judge Posner for ignoring a clear case of misidentification. Koehler wrote that the court’s “claim is inaccurate. Stephan Cowans, who was the 141st person exonerated by postconviction DNA evidence, was famously convicted on the strength of an erroneous fingerprint match.” 3/ However, whether a 0 or instead a 1 belongs in the numerator is not so clear.

Judge Posner cited Greg Hampikian et al., The Genetics of Innocence: Analysis of 194 U.S. DNA Exonerations, 12 Annual Rev. of Genomics and Human Genetics 97, 106 (2011), for the view that there were no erroneous fingerprint matches. Interestingly,  this paper gives a larger figure than either 0 or 1. It claims that three “cases ... involving fingerprint testimony were found to be invalid or improper.” 4/ However, none of the “invalid or improper” fingerprinting results misidentified anyone. Rather, “[i]n the 3 cases that were found to be problematic, the analyst in 1 case reported that the fingerprint was unidentifiable when in fact there was a clear print (later discovered and analyzed); in the 2 other cases, police officers who testified did not disclose the fact that there were fingerprints that excluded the exonerees.” 5/ Taking these words at face value, the court could well conclude that none of the exonerations involved false positives from fingerprint comparisons.

However, the fingerprint evidence in the Cowans case involved both concealment and an outright false identification. As Professor Koehler noted, one of the foremost scholars of false convictions, Professor Brandon Garrett of the University of Virginia School of Law, reported the Cowans case as a false positive. Garrett clearly stated that although the Boston Police fingerprint examiner “realized at some point prior to trial that Cowans was excluded , ... he concealed the fact and instead told the jury that the print matched Cowan’s.” 6/ Likewise, along with the national Innocence Project’s co-founder and co-director Peter Neufeld, Professor Garrett explained in an earlier law review article that the trial transcript showed that “Officer LeBlanc misrepresented to the jury that the latent print matched Cowans’s.” 7/ Thus, the Innocence Project serves up 1.7% as the figure for “improper” fingerprint evidence in the first 300 exonerations.

This may seem like much ado about almost nothing. One problem case in a small sample is not significantly different from none. But there is a legal issue lurking here. To ascertain the more appropriate figure we need to specify the purpose of the inquiry. Do we want to estimate the prevalence of all kinds of improper behavior—including perjury (or at least knowing falsehoods uttered or implied) by fingerprint examiners? If so, the Hampikian or Koehler numbers are the candidates for further analysis.

But Judge Posner was responding to a Daubert challenge to fingerprinting. The question before the Herrara court was whether latent fingerprint examiners can provide valid, seemingly scientific, testimony—not whether they can lie or conceal evidence.  The rate of unintentional misidentifications therefore is the relevant one, and that rate seems closer to zero (in the exonerations to date) than to 1.7%. 8/

So Judge Posner is not clearly wrong in speaking of zero errors. But what can we legitimately conclude from his observation that "[o]f the first 194 prisoners in the United States exonerated by DNA evidence, none had been convicted on the basis of erroneous fingerprint matches, whereas 75 percent had been convicted on the basis of mistaken eyewitness identification"? Does this comparison prove that latent print examiners are more accurate than eyewitnesses?

Not necessarily. In rape cases, where DNA exonerations are concentrated (because DNA for postconviction testing is more likely to be available), there are more instances of eyewitness identifications than of fingerprint identifications. Even if the probability of a false positive identification is the same for fingerprint examiners as for eyewitnesses, there are fewer opportunities for latent print misidentifications to occur. Consequently, the set of false rape convictions will be disproportionately populated with eyewitness errors. The upshot of this base rate effect is that the relative frequency of the errors with different types of evidence in a sample of wrongful convictions may not reflect the relative accuracy of each type of evidence.

Nonetheless, we still have to ask why it is that no (or almost no) cases of unintentional false positives have emerged in the wrongful-conviction cases. Does not this absence of evidence of error prove that errors are absent? Koehler’s answer is that
The fact that few of the DNA exonerations cases overturned verdicts based on erroneous fingerprint matches says virtually nothing about the accuracy of fingerprint analysis precisely because cases involving fingerprint matches are rarely selected for postconviction DNA analyses. By this flawed logic, one might also infer that polygraph errors are “very rare” because none of the DNA exoneration cases overturned erroneous polygraph testimony. 9/
But gathering latent prints is more common than polygraphing defendants, and Koehler does not document his assertion that cases with fingerprint matches are much more rarely the subject of postconviction DNA testing than are cases with other kinds of evidence. Traditionally, it may have been harder to obtain DNA testing when a reported fingerprint match indicated guilt, but postconviction DNA testing has become more widely available. Indeed, Virginia has pursued a test-them-all approach in convictions (with available DNA) for sexual assaults, homicides, and cases of non-negligent manslaughter from 1973 to 1987. 10/ Nevertheless, a selection effect that creates a bias against the inclusion of reported fingerprint matches in the sample of known false verdicts cannot be dismissed out of hand. Certainly, Virginia’s comprehensive testing is exceptional.

Even so, pointing to a likely selection effect is not the same as assessing its impact. Selecting against fingerprinting cases reduces the value of P(FV|FL), the proportion of detected false verdicts given false latent print matches. At the same time, a reported latent print match is highly persuasive evidence. This boosts the value of P(FV|FL). If Koehler’s selection effect is dominant, we might try out a value such as P(FV|FL) = 0.04. That is, we assume that only 4% of all cases with false latent print matches culminate in detected false convictions. How large a fraction of false matches (out of all declared matches) could be reconciled with the observation that no more than 1% or so of the false convictions established by DNA testing involved an arguable latent fingerprint false positive error?

As explained yesterday, this will depend on other variables, some of which are interrelated. Consider 1,000 cases in which police recover and examine latent prints suitable for comparison in 100 (10%) of them. Suppose that 15 of these examinations (15%) produce false matches, and that (as proposed above) only 4% of these false-confessions cases terminate in convictions later upended by DNA evidence. The result is about 1 false conviction. Now consider the other 900 cases with no fingerprint evidence. If, say, 80% of these cases end in convictions of which 10% are false, 72 other false convictions will accrue. Upon examining the 73 false-conviction cases, one would find confessions present in 1/73 (about 1%) of them. Yet, a full 15% of all the fingerprint matches were (by hypothesis) false positives.

Now, I am not contending that any of these hypothetical numbers is realistic. But they do show how a high rate of false fingerprint identification can occur in general casework along with a low rate in the known DNA-based exonerations. Better evidence of the general validity of latent fingerprint analysis than the figures from exonerations should be—and is—available.

  1. By definition, P(e|FC) = P(e & FC) / P(FC). If e and FC are independent, then P(e & FC) = P(e) P(FC) / P(PC) = P(e).
  2. Id. at 487. 
  3. Jonathan J. Koehler, Forensic Fallacies and a Famous Judge, 54 Jurimetrics J. 211, 217 (2014) (note omitted).
  4. Greg Hampikian et al., The Genetics of Innocence: Analysis of 194 U.S. DNA Exonerations, 12 Annual Rev. of Genomics and Human Genetics 97, 106 (2011)
  5. Id.
  6. Brandon L. Garrett, Convicting the Innocent: Where Criminal Prosecutions Go Wrong 107 (2011).
  7. Brandon L. Garrett & Peter J. Neufeld, Invalid Forensic Science Testimony and Wrongful Convictions, 95 Va. L. Rev. 1, 74 (2009).
  8. I say “closer” because it appears that Office LeBlanc first reported that Cowans’ prints were on a mug at the site of the murder for which he was convicted. According to the Innocence Project, “Cowans' fingerprints were actually compared to themselves and not to the fingerprint on the evidence.” Innocence Project, Wrongful Convictions Involving Unvalidated or Improper Forensic Science that Were Later Overturned through DNA Testing. Independent auditors concluded that LeBlanc knew of his mistake before trial but tried to conceal it. Garrett & Neufeld, supra note 7, at 73–74. Perhaps we should score an initially mistaken analysis that an examiner knows is mistaken (and that would not produce false testimony from an honest analyst and that would be caught by the simple safeguard of blind verification) as half a misidentification?
  9. Koehler, supra note 3, at 217. 
  10. John Roman et al., Urban Institute, Post-conviction DNA Testing and Wrongful Conviction 11–12 (2012).
Related Post: Looking Backwards: How Prevalent Are False Confessions?, July 27, 2014

Sunday, July 27, 2014

Looking Backwards: How Prevalent Are False Confessions?

Studies of false convictions are tremendously important. They can rebut complacent assumptions that such things never happen, and they can shine a light on procedures that should be corrected. But the statistics that emerge from these studies are sometimes misunderstood. What should one make of findings that the evidence that convicted innocent defendants often involved confessions (20%), "invalid" forensic science (60%), or eyewitness identifications (70%)? Does this mean that false confessions occur in 20% of all criminal cases, for example? Or that eyewitnesses are wrong 70% of the time?

Years ago, Professor Roger Park at the University of California's Hastings College of the Law asked this question about eyewitnesses. As he pointed out, something had to go wrong in these cases. That something could be common or rare. In more technical jargon, how can you infer the prevalence of a factor from a retrospective study?

Another law professor, David Harris of the University of Pittsburgh, tried to do just this. In a recent and generally penetrating book on flaws in the criminal justice system, he argued as follows:
[T]he basic data are available for all to see. Of the more than 250 exonerations now on record, 25% involved "innocent defendants who made incriminating statements, delivered outright confessions, or plead guilty." ... Recall that DNA evidence--the basis for nearly all the exonerations to date--is available only for a fraction of all criminal cases; experts estimate that police recover testable biological evidence in only 5 to 10 percent of all cases. [I]n these other 90 to 95% of the cases, we have no reason to think that interrogation tactics work any differently, or any better, than in cases in which police recover DNA evidence. Thus, if false statements by suspects occur in 25 percent of the DNA-testable cases, we should expect a similar percentage in the other 90 to 95 percent of the cases. Put another way, if there is no reason to think that the DNA-based exoneration cases differ from others in the system, they provide us with a window into the whole criminal justice system. And that means that the problem we see--the 25 percent of the DNA cases in which false statements occur--represents the tip of the proverbial iceberg, and rational, conservative assumptions would lead us to believe that we should expect to see false convictions and statements by defendants in 25 percent of all cases.
David A. Harris, Failed Forensics: Why Law Enforcement Resists Science 76-77 (2012) (emphasis in original).

But surely there is a mistake here. By this reasoning, if every exoneration were a case in which an eyewitness identified the defendant, "rational, conservative assumptions would lead us to believe that we should expect to see false convictions and [eyewitness identifications] in 100% of all cases." That hardly seems rational.

What has gone wrong? First, the assumption that cases in which DNA evidence can be recovered are comparable to the large remainder of criminal cases overlooks the fact that convictions are not clearly comparable to nonconvictions (acquittals or dismissals). For the professed equality to hold, defendants who are not convicted would have to be just as likely to confess and to do so falsely as are those defendants who are falsely convicted and whose cases can be studied. This necessary premise is implausible. Because confessions cause convictions, the incidence of confessing should be less in the nonconvicted group. In addition, if the case against a suspect is weak, the police may resort to more coercive tactics to obtain a confession. For such reasons, we would not expect to find that the proportion of false confessions in cases of false convictions in DNA (or any other batch of) exoneration cases equals the proportion in all criminal cases.

Bayes' rule also has a role here. Let's stipulate that 25% of all false convictions—not just those of DNA exonerations—involved confessions. This statistic is compatible with many possible ratios of false confessions to true ones in all cases. To prove this, I'll run through some algebra, then give a numerical example, but you can skip the box of algebra if you want.

Let P(FC) be the unknown prevalence of false confessions in all cases. Let P(TC) be the prevalence of true confessions, and P(NC) be the prevalence of the remainder of cases in which defendant does not confess. Under each of these conditions, there is some probability that a verdict of guilty (V) will be attained. (For simplicity, I am going to restrict the analysis to cases without guilty pleas. Discovering that some innocent defendants plead guilty would not be much of a surprise.) For example, the probability of a (false) conviction (FV) given a false confession is P(FV|FC).

Bayes rule then states that

P(FC|FV) = P(FC)P(FV|FC) / [P(FC)P(FV|FC) + P(TC)P(FV|TC) + P(NC)P(FV|NC)].

Because a guilty verdict cannot follow a true confession, P(FV|TC) = 0, and we have

P(FC|FV) = P(FC)P(FV|FC) / [P(FC)P(FV|FC) + P(NC)P(FV|NC)].

We observe P(FC|FV) = 0.25 and want to infer P(FC). Solving for P(FC) yields

P(FC) = kzw/(1-k)y,
where k = P(FC|FV), y = P(FV|FC), z = P(NC), and w = P(FV|NC).

Thus the prevalence of false convictions depends not just on the observed value k = P(FC|FV) = 0.25. It varies with the other conditional probabilities (w and y) and the prevalence of cases without confessions (z).
For a numerical example, consider a set of 1000 cases in which 230 defendants (23%) confess. Suppose that 1 in 10, or 23, of these confessions are false, and that 90% of these false-confessions cases terminate in convictions. The result is about 21 false convictions. Now consider the other 770 cases with no confessions. If, say, 80% of these cases end in convictions of which 10% are false, that will add another 62 false confessions. Upon examining the 83 false-conviction cases, one would find confessions present in 21/83 = 25% of them. Yet, only 2.3% of all the cases (10% of all the confessions) were false confessions.

The lesson of this exercise is simply that the proportion of confessions within a set of cases of false convictions cannot produce a meaningful estimate of their prevalence. Of course, this lesson is not confined to confessions. It applies to any factor that appears (or does not appear) among the known cases of false convictions. If there were no fingerprint identifications in the DNA exoneration cases, would that demonstrate that latent fingerprint identification is highly accurate?

Judge Richard Posner thought so. In United States v. Herrera, 704 F.3d 480 (7th Cir. 2013), he wrote that "[o]f the first 194 prisoners in the United States exonerated by DNA evidence, none had been convicted on the basis of erroneous fingerprint matches, whereas 75 percent had been convicted on the basis of mistaken eyewitness identification." Id. at 487. I will discuss this assertion and its implications in a later posting.

Related Postings

Looking Backwards: How Safe Are Fingerprint Identifications?, July 28, 2014

False Confessions, True Confessions, and the Q Factor, July 2, 2014

Up in Smoke: 5 Million Neonatal Blood Samples Incinerated

Originally posted: Double Helix Law Blog, Mar. 15, 2010

An effort to avoid the pointless destruction of the millions of Guthrie cards maintained by the Texas Department of Health Services has come to naught. Plaintiffs who sued the department as well as privacy advocates initially were open to the idea of preserving all the cards with neonatal bloodspots for future research while seeking consent for their storage from millions of parents. [1]

However, this was not to be. After the state quickly settled the dubious lawsuit, an enterprising but inadequately informed journalist published Internet stories alleging that the department had turned “over hundreds of dried blood samples to the federal government to help build a vast DNA database–a forensics tool designed to identify missing persons and crack cold cases” and that the samples “were forwarded along to the federal government to create a vast DNA database, one that could help crack cold cases and identify missing persons.” [2] In her latest installment of this tall tale she continues to write that the samples will “help identify missing persons and crack cold cases.” [1]

The suggestion that the U.S. military is using the samples to build a database to “crack cold cases” or to identify “missing persons” in this country is preposterous. To summarize a previous posting [2]:

First, the research project is limited to mitochondrial DNA, which rarely is used in forensic investigations because it is not capable of providing specific identification. Second, AFDIL does not maintain any databases of DNA profiles to crack cold cases. Third, even if AFDIL were authorized to maintain a database of civilian DNA profiles for criminal investigations, a collection of nameless mtDNA sequences from de-identified samples would be pretty useless. Finally, the true purpose of the research is clear from “Federal MtDNA Paper” posted on the Tribune's website. The AFDIL paper explains that the research database, which cannot be used to identify individuals, simply allows geneticists to put estimates of random-match probabilities for mtDNA on a sounder footing. These estimates are necessary to understand the probative value of an mtDNA match in any criminal investigation or trial. They have nothing in particular to do with cold hits or missing persons.

In sum, the research database has virtually no meaningful privacy implications. Some parents might not want their children’s blood samples used to improve the criminal justice system, but that alone is not much of a reason to destroy what the article calls a medical “treasure trove.” The children’s DNA is not going into any military or law-enforcement database for tracking down missing persons or cracking cold cases.

Yet, this fear apparently was the monkey wrench that jammed the effort to preserve the samples while seeking consent. Here are some excerpts from the latest news as described by the same journalist:

[T]he Department of State Health Services . . . agreed in December to destroy the blood spots, after a civil rights attorney and several Texas parents sued the state for storing them for research purposes without permission. But after the court settlement was signed, privacy advocates lobbied the agency for an alternate solution: a research database that would keep the blood spots intact while seeking electronic consent from parents. They got the go-ahead from some key lawmakers and from the lawsuit’s plaintiffs, who pledged to void the settlement, but not from DSHS.

When The Texas Tribune discovered last month that state health officials had turned hundreds of baby blood spots over to a federal Armed Forces lab between 2003 and 2007 to build a mitochondrial DNA database . . . any chance for saving the blood spots fizzled out. All 5 million blood spots were sent to a Houston-area incinerator last week.

“If there was any way the blood spots were going to be saved, the whole thing fell apart at that point,” said state Sen. Bob Deuell, R-Greenville, . . . “When this came out about these specimens going to the military, I said, ‘We’ve lost this one.’”

. . . State health officials say Austin-based national patient privacy advocate Deborah Peel and Deuell, a physician, approached DSHS Commissioner David Lakey early this year about using electronic consents to save the 5 million existing blood spots from destruction. The agency reviewed the idea but never pursued it. . . .

Critics say . . . DSHS conveniently settled the lawsuit before the trial went to the discovery phase, meaning the documents on the federal DNA study were never disclosed to the plaintiffs. (The Tribune obtained the documents on the federal project — designed to build a forensics tool to help identify missing persons and crack cold cases — through Texas open-records laws.) “Unfortunately, that of course confirmed the plaintiffs’ worst fears,” said Peel, founder of the nonprofit advocacy group Patient Privacy Rights.

Peel said the state’s decision not to seek a non-destructive solution is a shame. . . . “We were going to … reach out to those 5 million families and let them know they had an alternative to having their blood spots destroyed,” Peel said. . . .

Deuell said the impression he got from state health officials was that they feared they would be subject to litigation from other parents if they negotiated with the plaintiffs not to destroy the blood spots. . . . “They said, ‘The plaintiffs are just three people out of 5 million. Who’s to say somebody else wouldn’t come back and file a new suit?’”

Harrington [plaintiffs' attorney] said that worry is “utter nonsense.” He said both sides could have gone back to the judge to have a new settlement drafted — one that would’ve protected the agency. “What’s the harm in that?” Harrington asked. “We would have supplemented or amended the settlement. It would have been totally possible.”

But once news broke that some of the blood spots had been turned over to the federal lab — and that the state had no intention of destroying those samples — the plaintiffs’ offer was off the table. Instead, they have demanded that the state get the blood spots back from the federal government, or they’ll file another lawsuit. . . . [1]

Maybe another lawsuit would be a good thing. With competent lawyering and journalism, the people of Texas finally might realize that none of their children’s DNA has found its way into any DNA database for identifying anyone.


1. Emily Ramshaw, DNA Destruction, Tex. Tribune,
2. David H. Kaye, A Texas Tall Tale of “DNA Deception,” Double Helix Law, Mar. 4, 2010

Saturday, July 26, 2014

A Texas Tall Tale of “DNA Deception”

Originally posted: Double Helix Law Blog, Mar. 4, 2010

A “non-profit, nonpartisan public media organization,” the Texas Tribune broke a remarkable story. The story goes like this. Texas, like every other state, pricks the heels of new born children for a blood sample. It screens these samples for rare, metabolic genetic diseases and stores spots of blood on a card for each child. As the March of Dimes explains, “[w]hen test results show that the baby has a birth defect, early diagnosis and treatment can make the difference between lifelong disabilities and healthy development.” [1]

As these “Guthrie cards” began to accumulate, it became clear that they might be useful for medical research. In 1994, law professor Jean McEwen and doctor-lawyer Phil Reilly called them “inchoate databases” and found that many laboratories were open to the idea of sharing them — in anonymized form — for research that would benefit the public. [2]

The Texas State Department of Health Services did exactly this. It provided medical researchers with de-identified Guthrie cards to study “the gene involved in club foot, to inspect the DNA of infants who develop childhood cancer, [and] to examine prenatal lead exposure.” [3] For its efforts, the department was sued. It had treated the cards as free for the taking, without going back to parents to obtain explicit permission to release their child’s blood spots (with no name attached). Although it is a huge jump from any case law, and even though the legally cognizable damages suffered by any parent whose child’s nameless blood spot made its way to a laboratory are obscure, five plaintiffs alleged violations of the protection of the Fourth Amendment, the Texas Constitution, and the common law. On their behalf and seeking to represent a much larger class of plaintiffs, the Texas Civil Rights Project sought declaratory and injunctive relief. [4]

The case promptly settled. The state agreed to destroy millions of cards, to give parents clearer procedures to opt out of the storage of the cards, and to pay $26,000 in attorneys fees and costs.

There things might have stayed — but for a journalist’s “review of nine years’ worth of e-mails and internal documents on the Department of State Health Services’ newborn blood screening program.” [3] She found that the state had concealed its involvement in a nefarious and far-reaching military or law-enforcement project. The Texas doctors had turned “over hundreds of dried blood samples to the federal government to help build a vast DNA database — a forensics tool designed to identify missing persons and crack cold cases.” [3] The samples, she repeated, “were forwarded along to the federal government to create a vast DNA database, one that could help crack cold cases and identify missing persons.” [5] The database would be shared worldwide, “for international law enforcement and investigation in the context of homeland security and anti-terrorism efforts.” [3]

Incensed, the lawyer for the five plaintiffs fired off a letter to the governor and the attorney-general. He accused the “TDSHS [of] supplying those blood samples taken from newborn babies to the military, not just for research, but so that the military can build a mitochondria DNA data base, which can be used in part for law enforcement purposes.” [5] He complained that “[t]his … alarming development … raises the specter of the federal government building an international DNA data base,” and he demanded that “within ten (10) days of this letter, you retrieve from the federal government all the blood samples that Texas has sent to the U.S. military and retrieve and destroy all information taken from those samples … .” [5] Indeed, he suddenly realized that this military project was why the state was so willing to settle the case: “‘Sometimes there are slam-dunk cases, but I’d never seen this kind of case settle without discovery,’ says [Jim] Harrington, director of the Texas Civil Rights Project. ‘This explains the mystery of why they gave up so fast.’” [3]

The trouble is that it’s all smoke and no fire. The reporter and the lawyer apparently misread the report of the Armed Forces DNA Identification Laboratory (AFDIL) detailing its efforts to collect and study mitochondrial DNA (mtDNA) from varied people and places. As explained in Chapters 11 of The Double Helix and the Law of Evidence, AFDIL is a world leader in mitochondrial DNA sequencing because the technique is exceedingly valuable in identifying the remains of soldiers missing in action. [6] But mtDNA is not used to “crack cold cases,” at least not by generating cold hits in any law-enforcement database of DNA profiles from possible offenders. The national database (NDIS) maintained by the FBI — the one that actually helps in cracking cold cases — is limited to checking crime-scene samples against STR profiles in the DNA from the cell nucleus. These DNA sequences are wonderful for discriminating among individuals. When a 13-locus match from a crime-scene to one of the more than seven million profiles in NDIS pops up, it can constitute a practically conclusive identification to a known individual. [6] And, the bigger NDIS is, the more likely it is that the culprit will be in it. This kind of database is “only as valuable as its … size.” [3]

Not so with mtDNA. Everyone in the same maternal line shares the same sequence, and other essentially unrelated maternal lineages might have the same sequences. [6] Moreover, it would be inane to put anonymous sequences — nuclear or mitochondrial — into the database used in searching for cold hits. A hit from a crime-scene sample to a profile from a Guthrie card with no name linked to it would have little or no investigative value. The (nameless) Texas children need not fear being swept up in criminal or terrorist investigations because AFDIL sequenced their anonymous DNA.

But if the federal government does not want the samples for a database that will be used to catch criminals or terrorists, what nefarious international database are these profiles going into? Prosaically, they are part of a scientific, population-genetics database that will be helpful in understanding the significance of a match in an ordinary criminal case. Consider State v. Ware, the very first case with mtDNA evidence. Hairs were found in the bed where a young girl was attacked. [6, chap. 12] The hairs looked similar to the defendant’s under a microscope, but there have been false convictions with hairs that happen to look similar. (Just check with the Innocence Project.) Nuclear DNA, which could yield well-nigh conclusive results, were absent in the hair shafts, but there were enough mitochondria to get a useful sequence, and this sequence matched the defendant’s. [6]

Because mtDNA just does not have the power of nuclear DNA to differentiate among individuals, however, defense counsel in such cases can object (appropriately) that the evidence is confusing or misleading without statistics on how rare the mitotype in question would be in the general population. How many people would be falsely incriminated by the mtDNA sequence in the case?

By understanding the variations in the mtDNA sequences in different places and populations, scientists can estimate how rare or how common a mitotype that incriminates a suspect might be. Such estimates require reference databases, but the existing forensic-statistical-reference databases, defense counsel and a number of scientists have argued, are too small and full of gaps in the population groups represented. [6] Indeed, the federal government has received considerable flak from the media and a vocal group of scientists, lawyers, and sundry others for its refusal to supply de-identified nuclear-DNA profiles from law-enforcement databases for new studies to supplement the existing statistical-reference databases long used to estimate the probability of random STR-profile matches in criminal cases. [8, 9]

In sum, the AFDIL study is a response to a legitimate scientific and legal concern. The federal government (as it should) wants to improve the infrastructure for using mtDNA evidence in court by enlarging the statistical-reference databases. Thus, the AFDIL report — the supposed smoking gun posted on the Tribune's website — is entitled “Development and Expansion of High-quality Control Region Databases to Improve Forensic mtDNA Evidence Interpretation.” As the title indicates, these scientific databases do not generate DNA evidence. They “improve” the “interpretation” of mtDNA evidence from other sources. The very first sentence of the report makes it plain that the databases are for statistical purposes only:
Mitochondrial DNA testing in the forensic context requires appropriate, high-quality population databases for estimating the rarity of questioned haplotypes. However, large forensic mtDNA databases, which adhere to strict guidelines in terms of their generation and maintenance, are not widely available for many regional populations of the United States or most global populations outside of the United States and Western Europe. [7]
After elaborating, the report continues:
In order to address this issue, the Armed Forces DNA Identification Lab (AFDIL) has undertaken a high-throughput control region databasing effort. … Global populations that are currently underrepresented in available forensic mtDNA databases will comprise approximately 25% of the total number of samples. The remaining individuals will represent regional samples of various U.S. populations and global populations that contribute to the overall mtDNA diversity of the U.S. The high-quality mtDNA data generated from these efforts will be publicly available to permit examination of regional mtDNA substructure and admixture, and ultimately to improve our ability to interpret mtDNA evidence. [7]
This population-genetics study is entirely different from building a huge database of mitotypes to generate cold hits. MtDNA does not work well for this purpose, and even if the FBI wanted to do it, anonymous data from AFDIL would be useless. All that those data can do is help investigators, judges and juries better assess the results of a match to a known suspect or defendant. Suggestions that neonatal samples are being put into databases that could result in the unknowing “donors” being swept up in future investigations of crime or terrorism are troubling — but not because they are true.


[1] March of Dimes, Newborn Screening Tests, Mar. 2008,
[2] J. E. McEwen & P. R. Reilly, Stored Guthrie Cards as DNA “Banks,” 55 Am. J. Human Genetics 196-200 (1994), available at
[3] Emily Ramshaw, DNA Deception, Texas Tribune, Feb. 22, 2010, available at, last viewed, March 2, 2010
[4] Beleno v. State Dep’t of Health Serv., Civ. No. SA09CA1088 (W.D. Tex. Mar. 12, 2009) (complaint)
[5] Emily Ramshaw, TribBlog: AG’s Office Fires Back at Blood Spot Attorney, Feb. 22, 2010, available at, last viewed, March 2, 2010
[6] David H. Kaye, The Double Helix and the Law of Evidence (2010)
[7] Jodi A. Irwin et al., Development and Expansion of High-quality Control Region Databases to Improve Forensic mtDNA Evidence Interpretation, 1 Forensic Sci. Int’l: Genetics 154-157 (2007)
[8] David H. Kaye, Trawling DNA Databases for Partial Matches: What Is the FBI Afraid Of?, 19 Cornell J. L. & Public Pol’y 145-171 (2009)
[9] D. E. Krane et al., Time for DNA Disclosure, 326 Science 1631-1632 (2009), DOI: 10.1126/science.326.5960.1631

Related posting:  A Texas Tall Tale of "DNA Deception", July 26, 2014