Tuesday, October 7, 2014

The Supreme Sound of Silence: Same-Sex Marriage and DNA Databases

The big news among Supreme Court watchers is the big dog that did not bark in the night — the Court’s denial of petitions for certiorari in seven cases striking down bans on same-sex marriage in Indiana, Wisconsin, Utah, Oklahoma, and Virginia. [1] A denial of a cert petition has no precedential value. It does not mean that the Court approves of the decision below—or that it disapproves of it. It means that, for unstated and often banal reasons (the Court receives some 10,000 petitions a year [4]), no more than three Justices voted in favor of review the decision below. (By convention, it takes four votes to grant the writ that triggers the Court’s review of the case on the merits.)

The Court watchers are treating the rejection of the writs here as a “tacit win to gay marriage” on the theory that it means that if and when the Court chooses to confront the issue, a majority of states will have sanctioned same-sex marriage, making it more likely that the Court will accept the argument that the Constitution forbids limiting the institution of marriage to couples of the opposite sex. [3]

This predicted dynamic was evident in the Court’s handling of laws requiring routine DNA collection for law enforcement databases. No appellate court ever struck down a law requiring convicted offenders to provide samples, and for some thirty years, the Court invariably denied petitions for review in those cases. Only after Maryland’s highest court essentially invalidated that state’s law providing for DNA collection on arrest did the Supreme Court step in. By that time, every state had a DNA database for convicted offenders, and a majority had extended them to require pre-conviction DNA sampling. Every state signed an amicus brief urging the Court to uphold the practice. The Court split 5–4 on the constitutionality of pre-conviction DNA testing. Had the states and the federal executive branch not presented so unified a front in favor of expansive DNA collection, the outcome could have been different. [2]

References
  1. Amy Howe, Today’s Orders: Same-sex Marriage Petitions Denied, SCOTUSblog, Oct. 6, 2014, 10:41 AM, http://www.scotusblog.com/2014/10/todays-orders-same-sex-marriage-petitins-denied/
  2. David H. Kaye, Why So Contrived? DNA Databases After Maryland v. King, 104 J. Crim. L. & Criminology 535 (2014), available at http://ssrn.com/abstract=2376467
  3. Adam Liptak, Supreme Court Delivers Tacit Win to Gay Marriage, N.Y. Times, Oct. 7, 2014, at A1, http://www.nytimes.com/2014/10/07/us/denying-review-justices-clear-way-for-gay-marriage-in-5-states.html
  4. Robert M. Yablon, Justice Sotomayor and the Supreme Court’s Certiorari Process, 123 Yale L.J. F. 551 (2014), http://yalelawjournal.org /forum/justice-sotomayor-and-the-supreme-courts-certiorari-process.html

Tuesday, September 30, 2014

Bayes in Our Times

Today's New York Times has an article on "a once obscure field known as Bayesian statistics."1/ It is an informative piece by Faye Flam, a science journalist with an uncommonly good grasp of science. But a quantum of confusion infects the effort to contrast "Bayesian statistics" with "the more traditional or 'classical' approach, known as frequentist statistics."

The article presents the solution to famous Monty Hall problem (known to "classical" probabilists as the three-curtains problem long before its appearance in the TV game show) as especially amenable to "Bayesian statistics." But frequentist thinking works quite well here. In the long run, the strategy of switching beats the strategy of not switching. This is easily proved with classical, objective probabilities.

Indeed, it is not clear that the Monty Hall problem is even a problem in statistical inference.2/ There are no statistical (sample) data to consider and no sense in which the use of Bayes' rule to solve the probability problem "counter[s] pure objectivity." How, then, do "[t]he two methods approach the same problem[] from different angles"?

Of course, the Monty Hall problem is nice for illustrating the power of Bayes' rule in working with conditional probabilities. I have used it in this way in my courses, and that may have been the reason it appears in the article. But it does not illustrate the philosophical divide between frequentists and Bayesians.

To this extent, it is disappointing that the Times (but probably not the author) chose to start the online version of the article with a large photograph of Monty Hall captioned "Bayesian statistics can help solve the Monty Hall problem of winning a car." It would have been equally accurate to report that "Frequentist statistics can help solve the Monty Hall problem of winning a car." But that is is hardly news fit to print.

Notes

1.Faye D. Flam, The Odds, Continually Updated, N.Y. Times, Sept. 30, 2014, at D1.

2. On the distinction between a "problem of statistical inference or, more simply, a statistics problem," and a probability problem, see, for example, Morris H. DeGroot, Probability and Statistics 257 (1975).


Sunday, August 31, 2014

Hazard Ratios and Heart Failure

Today’s big news in medicine is a new drug, designated LCZ696 by its manufacturer, Novartis. According to the New York Times, LCZ696 “has shown a striking efficacy in prolonging the lives of people with heart failure and could replace what has been the bedrock treatment for more than 20 years.” [1] Specifically, more than 8,400 patients in 47 countries enrolled in a randomized, double-blind experiment in which they received either LCZ696 or an ACE inhibitor called enalapril (in addition to whatever else their doctors prescribed).

The trial was halted after a median follow-up time of 27 months “ because the boundary for an overwhelming benefit with LCZ696 had been crossed.” [2] “By that point, 21.8 percent of those who received LCZ696 had died from a cardiovascular cause or had been hospitalized for worsening heart failure. That figure was 26.5 percent for those receiving enalapril. That represents a 20 percent relative reduction in risk using a statistical measure called the hazard ratio.” [1]

This is good news for patients (if the drug receives regulatory approval and performs as expected in practice). But the account in the Times poses a small statistical puzzle. How does the difference between 21.8 and 26.5 percentage points translate into “a 20 percent relative reduction in risk”? The average risk across patients dropped by 26.5 – 21.8 = 4.7 percentage points. This absolute reduction is appreciable, but 4.7 percentage points is not 20% of the original 26.5 percent risk of hospitalizations or deaths in the control group (4.7 / 26.5 = 17.7%). What accounts for the discrepancy?

The answer lies in the details of a technique known in biostatistics as survival analysis. The statistical technique is not limited to the analysis of death rates. It can be applied to all sorts of situations involving different times to some outcome. The outcome can be the overruling of a Supreme Court case, the firing of a worker, or the exoneration of a prison inmate sentenced to die, to pick a few examples from forensic statistics.

So what does the 20% “relative reduction in risk” cited in the Times article mean? Well, a hazard function is the probability that if you survive to a given time t (the event in question has not already occurred), you will survive in the next instant. A hazard ratio is the ratio of the hazard in the treatment group to the hazard in the control group at t. The heart failure study used an estimation procedure known as proportional hazards regression, which assumes that the hazard in one group is a constant proportion of the hazard in the other group. Under this assumption, in a clinical trial where death is the endpoint, the hazard ratio indicates the relative likelihood of death in treated versus control subjects at any given point in time.

Thus, unlike the ordinary relative risk discussed in many court opinions, the “hazard ratio” is not simply the proportion with a disease in an exposed group divided by the proportion in an unexposed group. In the LCZ696 study, the hazard ratio was 0.80, meaning that the probability that a randomly selected patient taking LCZ696 would die from or be hospitalized for heart failure the next day is 80% of the probability for a randomly selected patient taking enalapril. To put it another way, the probability of hospitalization or death tomorrow from heart failure drops by 20% when LCZ696 is substituted for enalapril.

Yet a third formulation is that the odds that a randomly selected patient treated with LCZ696 will be hospitalized or die sooner than a randomly selected control patient are 0.8 (to 1) — that's 4 to 5, corresponding to a probability of 4/9 = 44%. [3]

How long either patient can expect to live and avoid hospitalization from heart failure is another story. As one article on hazard ratios explains, “[t]he difference between hazard-based and time-based measures is analogous to the odds of winning a race and the margin of victory.” [3] By itself, the hazard ratio picks the winning horse (probably), but it does not give the number of lengths for its expected success.

References
  1. Andrew Pollack, New Novartis Drug Effective in Treating Heart Failure, N.Y. Times, Aug.31, 2014, at A4
  2. John J.V. McMurray et al., Angiotensin–Neprilysin Inhibition versus Enalapril in Heart Failure, New Engl. J. Med., Aug. 30, 2014
  3. Spotswood L. Spruance et al., Hazard Ratio in Clinical Trials, 48 Antimicrobial Agents and Chemotherapy 2787 (2004)

Thursday, July 31, 2014

The FBI's Worst Hair Days

An article by Spencer Hsu in yesterday's Washington Post suggests that the FBI lost a tug of war within the Justice Department. In 2012, the Bureau commenced a comprehensive review of the testimony of FBI hair analysts about matches to defendants in criminal cases before 2000. In those pre-DNA-evidence days, microscopic hair comparisons were valuable for seeing whether a suspect could be the source of a hair at a crime scene. (They still are, but the FBI now uses mitochondrial DNA testing to demonstrate a positive association and relies on visual comparison to screen out nonmatching hairs.)

Clearly, an inclusion—that is, two hairs with a similar set of features—was never definitive. Even hairs from the same individual vary in certain respects. But hairs from the same individual are more likely to "match" than hairs from different individuals. Thus, a careful hair analyst should have reported a negative finding as an exclusion and a positive finding with words like "not excluded," "could have," "consistent with," or "match, but."

After it became apparent that the FBI’s analysts were not always being this careful, the Department of Justice agreed to “identify[] historical cases for review where a microscopic hair examination conducted by the FBI was among the evidence in a case that resulted in a conviction ... with the goal of reaching final determinations in the coming months.” That was 2012. The Post article reports that in 2013, the FBI stopped the reviews. It started them back up this month, on orders from the Deputy Attorney General.

The FBI attributes the delay, in part, to “a vigorous debate that occurred within the FBI and DOJ about the appropriate scientific standards we should apply when reviewing FBI lab examiner testimony — many years after the fact.” To get a sense of what this debate might have been about, it may be useful to examine the two specific cases mentioned in the Post article on “forensic errors.”

The Exoneration of Santae Tribble

The article includes an imposing photograph of Santae A. Tribble. The caption explains that Tribble, who was convicted in Washington, D.C., at age 17, “spent 28 years in prison based largely on analysis of hairs found at the scene of a taxi driver’s murder in 1978. More advanced DNA testing showed that none of the hairs used as evidence shared Tribble’s genetic profile. A judge has vacated his conviction and dismissed the underlying charges.”

There is no denying that evidence suggesting that an innocent man is guilty is erroneous, but is it a laboratory error? Some people argue that microscopic hair evidence is unvalidated and because it sometimes incriminates innocent people, it should be inadmissible. But if that is correct, why go through the trouble of reviewing all the cases? The FBI could just send out letters in every case saying that the laboratory no longer stands by the unvalidated testimony its examiners gave.

Surely there was (and is) some useful information in microscopic hair comparisons. A 2002 FBI study showed that DNA testing confirmed most visual microscopic associations (almost 90%) on a sample of hairs from casework. For a small minority of hair comparisons—as in Mr. Tribble’s case—microscopy produced false positives.The specificity of the technique—like that of drug tests, tests for strep throat, and so many other things—is not 100%.

Inasmuch as all hair comparisons cannot summarily be dismissed as invalid, what makes the comparison in the Tribble case a departure from the FBI calls “appropriate scientific standards”? An article from the National Association of Criminal Defense Lawyers (NACDL), which is cooperating in the process of reviewing the cases, describes the criteria as follows:
Error Type 1: The examiner stated or implied that the evidentiary hair could be associated with a specific individual to the exclusion of all others.

Error Type 2: The examiner assigned to the positive association a statistical weight or probability or provided a likelihood that the questioned hair originated from a particular source, or an opinion as to the likelihood or rareness of the positive association that could lead the jury to believe that valid statistical weight can be assigned to a microscopic hair association.

Error Type 3: The examiner cites the number of cases or hair analyses worked in the lab and the number of samples from different individuals that could not be distinguished from one another as a predictive value to bolster the conclusion that a hair belongs to a specific individual.
Which of these errors did the FBI laboratory commit in Mr. Tribble’s case? According to an earlier Post article on the case, “A police dog found a stocking on a sidewalk a block away [from the victim’s body]. Months later, the FBI would report that a single hair inside it matched Tribble’s ‘in all microscopic characteristics.’” Ideally, the analyst would have added that hair from other people also could have matched, or, at the least, defense counsel should have elicited this fact on cross-examination.

No such significant qualifications or caveats emerged. Instead, according to the Innocence Project, the FBI analyst "testified that one of the hairs from the stocking mask linked Tribble to the crime." The National Registry of Exonerations reports that he "said ... the hair in the stocking came from Tribble." Such testimony seems to be an "Error Type 1," although it is not clear from these descriptions whether the "link" was explicitly "to the exclusion of all others."

The latter phrase was extremely popular among analysts of impression and patterns (like fingerprints and toolmarks) who believed that their disciple studies characteristics that can exist in their particulars in only one object in the universe. Of course, the words "to the exclusion" are logically redundant. If the analyst believed that "the hair ... came from Tribble," then he must have believed that it did not come from anyone else. But one can believe that a named individual is the source of a trace (because that is the most likely conclusion) without believing it is impossible for anyone else to have been the source (which is, I think, is what "to the exclusion" was supposed to mean).

Thus, there is an ambiguity in the meaning of an "Error Type 1." How explicit must the analyst be in excluding all other individuals as contributors of the hair? The NACLD's description of the criteria indicates that a literal use of the phrase is not critical. The article illustrates the error with the following, hypothetical testimony:
I found brown, Caucasian head hairs on two items of clothing, the sports coat, and a pair of slacks that were reported to me as belonging to (the defendant). Now, these hairs matched in every observable microscopic characteristic to that known hair sample of DEC (the decedent) and consistent with having originated from her. In my opinion, based on my experience in the laboratory and having done 16,000 hair examinations, my opinion is that those hairs came from DEC.
But regardless of whether Tribble's trial testimony included an "Error Type 1" as the FBI has defined the errors, it was excessive. The analyst should have stuck to reporting the results of the comparison and not made a source attribution.

In addition to the analyst's overstated testimony, the prosecutor came vanishingly close to making the “Error Type 2.” He argued in closing that “There is one chance, perhaps for all we know, in 10 million that it could [be] someone else’s hair.

In the end, however, what exonerated Tribble was not the recognition of the hyperbole of the expert and the prosecutor, but the proof from a DNA test that the hair on the stocking probably worn by the actual murderer was not his.

The Conviction of John Duckett

The second case of "forensic error" discussed in the Post article is the trial of James Duckett, a former police officer in Florida. The Post article cites this case as an example of “the continued inadequacy of officials’ response.”

Duckett was convicted and sentenced to death for sexually assaulting, strangling, and drowning an 11-year-old girl. Unlike Tribble, Duckett has not proved actual innocence. Without such proof, even a letter from the FBI disowning some parts of the testimony in the case may not be a get-out-of-jail card.

The analyst in the case was the now notorious Michael Malone. The Post notes that Malone was "discredited in a 1997 inspector general’s report on misconduct at the FBI lab." This report came about nine years after Duckett's conviction, and Duckett made sure the Florida courts heard about it. At the center of Duckett's latest postconviction motion was a report from an expert who had been hired by the FBI in response to the first OIG report to "review[] many cases—particularly death penalty cases—in which Malone offered expert testimony." This expert was sharply critical of Malone's documentation of his work and the unsupportable "degree of analytical certainty" with which Malone testified about the hairs in Duckett's case.

Would a speedier review on the FBI's part have made a difference? I doubt it and have juxtaposed some of the Post’s description of the case with the court’s to indicate why.


Duckett, then a rookie police officer in Mascotte, Fla., was convicted of raping and strangling Teresa McAbee, 11, and dumping her into a lake in 1987.

... Malone ... testified at trial that there was a “high degree of probability” that the hair came from Duckett.

Such testimony is scientifically invalid, according to the parameters of the current FBI review, because it claims to associate a hair with a single person “to the exclusion of all others.”

The Florida court denied Duckett’s request for a new hearing on Malone’s hair match. The court noted that there was other evidence of Duckett’s guilt and that the FBI had not entirely abandoned visual hair comparison.

Malone also explained that hair analysis is not as precise as fingerprints for identifying someone. Malone expressly stated that he could not say that a particular hair came from a specific person to the exclusion of anyone else.

(1) [T]he victim was last seen in Duckett's patrol car; (2) the tire tracks at the murder scene were consistent with those from Duckett's car; (3) no one saw Duckett, the only policeman on duty in Mascotte, from the time he was last seen with the victim until the time he met the victim's mother at the police station; (4) numerous prints of the victim were found on the hood of Duckett's patrol car, although he denied seeing her on the hood; (5) a pubic hair found in the victim's underpants was consistent with Duckett's pubic hair and inconsistent with the others in contact with the victim that evening; and, (6) during a five-month period, Duckett, contrary to department policy, had picked up three young women in his patrol car while on duty and engaged in sexual activity with one and made sexual advances toward the other two.

Of course, the arguably redeeming parts of Malone's testimony and the state's other evidence of guilt do not condone or excuse any foot dragging by the FBI, but they do indicate the complexities that can arise in untangling the consequences of analysts' overstated testimony.

References

Tuesday, July 29, 2014

A Long Shot Pays Off in Long Island

A family member shouted “we love you” as police took John Bittrolff back to jail. A court in Long Island had just ordered him held without bail on charges of murdering two women over 20 years ago. “Some arrests take a few hours, some days; some take 20 years,” Suffolk County Police Commissioner Edward Webber told reporters.

If police have the killer, it is a success for “familial searching” — the practice of trawling a database for near misses that are especially likely to arise when the source of traces of DNA recovered from crime scenes or victims are very close relatives of one of the “inhabitants” of the database — convicted offenders or, increasingly, arrestees.

Mr. Bittrolff’s DNA profile was not in the New York database. (He had been arrested, but not convicted, for assault in 1993.) However, last year, his brother, Timothy, had been required to give a sample of DNA after a conviction for violating protective orders. DNA from semen found inside the bodies of both women pointed to a brother of Timothy as the source of that semen. But the two victims were said to have been prostitutes, and Mr. Bittrolff’s counsel have been quick to note that "having sex does not mean killing."

In addition to adding to the modest number of possibly successful “outer directed” database trawls, the case is interesting for some procedural twists involving the acquisition of DNA samples. As in the California “grim sleeper” case, police did not initially seek a court order for a sample of their suspect’s DNA to verify that he was indeed associated with the victim’s bodies. Instead, detectives helped themselves to paper bags of garbage left in front of John Bittrolff's house. Among the plastic cups, drink bottles, toothpicks, straws, crawfish heads, cotton swabs, and bandages, they found DNA from his sons, his brother, and his wife (who they trailed until they collected a cigarette butt that she tossed from the window of her truck while driving to work). And, on one paper cup, they found a DNA profile that matched the semen.

But the police were not satisfied. They arrested wiped DNA from a cup of water John Bittrolff drank after his arrest. And even that was not enough. The assistant district attorney (ADA) then applied for a court order to force the twice-DNA-matched suspect to submit to DNA sampling.

Defense lawyers objected that a third sample from Mr. Bittrolff was manifestly unnecessary. The ADA’s response was that prosecutors are entitled to a "judicially approved" DNA sample to present to a grand jury. The court issued the order, and that is where the case stands as of now.

I cannot say that I understand the prosecutor’s reasoning. Unless New York grand jury procedure is very different from the norm, a prosecutor can introduce all manner of evidence without judicial approval. Grand jurors can even rely on unconstitutionally seized evidence without offending the Fourth Amendment.

Was the ADA looking ahead to the trial? Would he want to avoid having to explain the artifices — the “familial searching,” the personal surveillance of family members, and the garbage pull — that the police used to acquire the earlier samples? He might be able to excise all that from the case with a “judicially approved” sample. In any event, the People will present their evidence to the grand jury on Thursday.

References

The information on the case comes from the following media reports:
I have taken the liberty of using some words in these articles without quotation marks. For a detailed article on the nature and constitutionality of outer-directed DNA database trawling, see David H. Kaye, The Genealogy Detectives: A Constitutional Analysis of “Familial Searching”, 51 Am. Crim. L. Rev. 109 (2013)

Monday, July 28, 2014

Looking Backwards: How Safe Are Fingerprint Identifications?

Yesterday, I explained why the frequency with which factors like confessions are found in cases of wrongful convictions does not measure the general prevalence of those factors. I questioned one published claim that false confessions occur in at least 25% of all cases. My argument was not that this conclusion is wrong, but rather that the studies of false convictions do not provide data that are directly applicable to estimating prevalence.

My analysis was not confined to confessions. It is based on the fact that the wrongful-conviction studies are retrospective. We take the outcome—a false conviction—and ask what evidence misled the judge or jury. This backwards look reveals the frequency of the type of evidence e given false convictions. The statistic P(e|FC) is equal to the prevalence of e in all cases if there is no association between false convictions and e. 1/ In general, such independence is most unlikely.

The flip side of invoking wrongful conviction statistics to conclude that false confessions are common is calling on them to show that fingerprint misidentifications are extremely rare. In United States v. Herrera, 704 F.3d 480 (7th Cir. 2013), Judge Richard Posner wrote that
Of the first 194 prisoners in the United States exonerated by DNA evidence, none had been convicted on the basis of erroneous fingerprint matches, whereas 75 percent had been convicted on the basis of mistaken eyewitness identification. 2/
For this remark, he received some flak. Northwestern University law professor Jay Koehler chastized Judge Posner for ignoring a clear case of misidentification. Koehler wrote that the court’s “claim is inaccurate. Stephan Cowans, who was the 141st person exonerated by postconviction DNA evidence, was famously convicted on the strength of an erroneous fingerprint match.” 3/ However, whether a 0 or instead a 1 belongs in the numerator is not so clear.

Judge Posner cited Greg Hampikian et al., The Genetics of Innocence: Analysis of 194 U.S. DNA Exonerations, 12 Annual Rev. of Genomics and Human Genetics 97, 106 (2011), for the view that there were no erroneous fingerprint matches. Interestingly,  this paper gives a larger figure than either 0 or 1. It claims that three “cases ... involving fingerprint testimony were found to be invalid or improper.” 4/ However, none of the “invalid or improper” fingerprinting results misidentified anyone. Rather, “[i]n the 3 cases that were found to be problematic, the analyst in 1 case reported that the fingerprint was unidentifiable when in fact there was a clear print (later discovered and analyzed); in the 2 other cases, police officers who testified did not disclose the fact that there were fingerprints that excluded the exonerees.” 5/ Taking these words at face value, the court could well conclude that none of the exonerations involved false positives from fingerprint comparisons.

However, the fingerprint evidence in the Cowans case involved both concealment and an outright false identification. As Professor Koehler noted, one of the foremost scholars of false convictions, Professor Brandon Garrett of the University of Virginia School of Law, reported the Cowans case as a false positive. Garrett clearly stated that although the Boston Police fingerprint examiner “realized at some point prior to trial that Cowans was excluded , ... he concealed the fact and instead told the jury that the print matched Cowan’s.” 6/ Likewise, along with the national Innocence Project’s co-founder and co-director Peter Neufeld, Professor Garrett explained in an earlier law review article that the trial transcript showed that “Officer LeBlanc misrepresented to the jury that the latent print matched Cowans’s.” 7/ Thus, the Innocence Project serves up 1.7% as the figure for “improper” fingerprint evidence in the first 300 exonerations.

This may seem like much ado about almost nothing. One problem case in a small sample is not significantly different from none. But there is a legal issue lurking here. To ascertain the more appropriate figure we need to specify the purpose of the inquiry. Do we want to estimate the prevalence of all kinds of improper behavior—including perjury (or at least knowing falsehoods uttered or implied) by fingerprint examiners? If so, the Hampikian or Koehler numbers are the candidates for further analysis.

But Judge Posner was responding to a Daubert challenge to fingerprinting. The question before the Herrara court was whether latent fingerprint examiners can provide valid, seemingly scientific, testimony—not whether they can lie or conceal evidence.  The rate of unintentional misidentifications therefore is the relevant one, and that rate seems closer to zero (in the exonerations to date) than to 1.7%. 8/

So Judge Posner is not clearly wrong in speaking of zero errors. But what can we legitimately conclude from his observation that "[o]f the first 194 prisoners in the United States exonerated by DNA evidence, none had been convicted on the basis of erroneous fingerprint matches, whereas 75 percent had been convicted on the basis of mistaken eyewitness identification"? Does this comparison prove that latent print examiners are more accurate than eyewitnesses?

Not necessarily. In rape cases, where DNA exonerations are concentrated (because DNA for postconviction testing is more likely to be available), there are more instances of eyewitness identifications than of fingerprint identifications. Even if the probability of a false positive identification is the same for fingerprint examiners as for eyewitnesses, there are fewer opportunities for latent print misidentifications to occur. Consequently, the set of false rape convictions will be disproportionately populated with eyewitness errors. The upshot of this base rate effect is that the relative frequency of the errors with different types of evidence in a sample of wrongful convictions may not reflect the relative accuracy of each type of evidence.

Nonetheless, we still have to ask why it is that no (or almost no) cases of unintentional false positives have emerged in the wrongful-conviction cases. Does not this absence of evidence of error prove that errors are absent? Koehler’s answer is that
The fact that few of the DNA exonerations cases overturned verdicts based on erroneous fingerprint matches says virtually nothing about the accuracy of fingerprint analysis precisely because cases involving fingerprint matches are rarely selected for postconviction DNA analyses. By this flawed logic, one might also infer that polygraph errors are “very rare” because none of the DNA exoneration cases overturned erroneous polygraph testimony. 9/
But gathering latent prints is more common than polygraphing defendants, and Koehler does not document his assertion that cases with fingerprint matches are much more rarely the subject of postconviction DNA testing than are cases with other kinds of evidence. Traditionally, it may have been harder to obtain DNA testing when a reported fingerprint match indicated guilt, but postconviction DNA testing has become more widely available. Indeed, Virginia has pursued a test-them-all approach in convictions (with available DNA) for sexual assaults, homicides, and cases of non-negligent manslaughter from 1973 to 1987. 10/ Nevertheless, a selection effect that creates a bias against the inclusion of reported fingerprint matches in the sample of known false verdicts cannot be dismissed out of hand. Certainly, Virginia’s comprehensive testing is exceptional.

Even so, pointing to a likely selection effect is not the same as assessing its impact. Selecting against fingerprinting cases reduces the value of P(FV|FL), the proportion of detected false verdicts given false latent print matches. At the same time, a reported latent print match is highly persuasive evidence. This boosts the value of P(FV|FL). If Koehler’s selection effect is dominant, we might try out a value such as P(FV|FL) = 0.04. That is, we assume that only 4% of all cases with false latent print matches culminate in detected false convictions. How large a fraction of false matches (out of all declared matches) could be reconciled with the observation that no more than 1% or so of the false convictions established by DNA testing involved an arguable latent fingerprint false positive error?

As explained yesterday, this will depend on other variables, some of which are interrelated. Consider 1,000 cases in which police recover and examine latent prints suitable for comparison in 100 (10%) of them. Suppose that 15 of these examinations (15%) produce false matches, and that (as proposed above) only 4% of these false-confessions cases terminate in convictions later upended by DNA evidence. The result is about 1 false conviction. Now consider the other 900 cases with no fingerprint evidence. If, say, 80% of these cases end in convictions of which 10% are false, 72 other false convictions will accrue. Upon examining the 73 false-conviction cases, one would find confessions present in 1/73 (about 1%) of them. Yet, a full 15% of all the fingerprint matches were (by hypothesis) false positives.

Now, I am not contending that any of these hypothetical numbers is realistic. But they do show how a high rate of false fingerprint identification can occur in general casework along with a low rate in the known DNA-based exonerations. Better evidence of the general validity of latent fingerprint analysis than the figures from exonerations should be—and is—available.

Notes
  1. By definition, P(e|FC) = P(e & FC) / P(FC). If e and FC are independent, then P(e & FC) = P(e) P(FC) / P(PC) = P(e).
  2. Id. at 487. 
  3. Jonathan J. Koehler, Forensic Fallacies and a Famous Judge, 54 Jurimetrics J. 211, 217 (2014) (note omitted).
  4. Greg Hampikian et al., The Genetics of Innocence: Analysis of 194 U.S. DNA Exonerations, 12 Annual Rev. of Genomics and Human Genetics 97, 106 (2011)
  5. Id.
  6. Brandon L. Garrett, Convicting the Innocent: Where Criminal Prosecutions Go Wrong 107 (2011).
  7. Brandon L. Garrett & Peter J. Neufeld, Invalid Forensic Science Testimony and Wrongful Convictions, 95 Va. L. Rev. 1, 74 (2009).
  8. I say “closer” because it appears that Office LeBlanc first reported that Cowans’ prints were on a mug at the site of the murder for which he was convicted. According to the Innocence Project, “Cowans' fingerprints were actually compared to themselves and not to the fingerprint on the evidence.” Innocence Project, Wrongful Convictions Involving Unvalidated or Improper Forensic Science that Were Later Overturned through DNA Testing. Independent auditors concluded that LeBlanc knew of his mistake before trial but tried to conceal it. Garrett & Neufeld, supra note 7, at 73–74. Perhaps we should score an initially mistaken analysis that an examiner knows is mistaken (and that would not produce false testimony from an honest analyst and that would be caught by the simple safeguard of blind verification) as half a misidentification?
  9. Koehler, supra note 3, at 217. 
  10. John Roman et al., Urban Institute, Post-conviction DNA Testing and Wrongful Conviction 11–12 (2012).
Related Post: Looking Backwards: How Prevalent Are False Confessions?, July 27, 2014

Sunday, July 27, 2014

Looking Backwards: How Prevalent Are False Confessions?

Studies of false convictions are tremendously important. They can rebut complacent assumptions that such things never happen, and they can shine a light on procedures that should be corrected. But the statistics that emerge from these studies are sometimes misunderstood. What should one make of findings that the evidence that convicted innocent defendants often involved confessions (20%), "invalid" forensic science (60%), or eyewitness identifications (70%)? Does this mean that false confessions occur in 20% of all criminal cases, for example? Or that eyewitnesses are wrong 70% of the time?

Years ago, Professor Roger Park at the University of California's Hastings College of the Law asked this question about eyewitnesses. As he pointed out, something had to go wrong in these cases. That something could be common or rare. In more technical jargon, how can you infer the prevalence of a factor from a retrospective study?

Another law professor, David Harris of the University of Pittsburgh, tried to do just this. In a recent and generally penetrating book on flaws in the criminal justice system, he argued as follows:
[T]he basic data are available for all to see. Of the more than 250 exonerations now on record, 25% involved "innocent defendants who made incriminating statements, delivered outright confessions, or plead guilty." ... Recall that DNA evidence--the basis for nearly all the exonerations to date--is available only for a fraction of all criminal cases; experts estimate that police recover testable biological evidence in only 5 to 10 percent of all cases. [I]n these other 90 to 95% of the cases, we have no reason to think that interrogation tactics work any differently, or any better, than in cases in which police recover DNA evidence. Thus, if false statements by suspects occur in 25 percent of the DNA-testable cases, we should expect a similar percentage in the other 90 to 95 percent of the cases. Put another way, if there is no reason to think that the DNA-based exoneration cases differ from others in the system, they provide us with a window into the whole criminal justice system. And that means that the problem we see--the 25 percent of the DNA cases in which false statements occur--represents the tip of the proverbial iceberg, and rational, conservative assumptions would lead us to believe that we should expect to see false convictions and statements by defendants in 25 percent of all cases.
David A. Harris, Failed Forensics: Why Law Enforcement Resists Science 76-77 (2012) (emphasis in original).

But surely there is a mistake here. By this reasoning, if every exoneration were a case in which an eyewitness identified the defendant, "rational, conservative assumptions would lead us to believe that we should expect to see false convictions and [eyewitness identifications] in 100% of all cases." That hardly seems rational.

What has gone wrong? First, the assumption that cases in which DNA evidence can be recovered are comparable to the large remainder of criminal cases overlooks the fact that convictions are not clearly comparable to nonconvictions (acquittals or dismissals). For the professed equality to hold, defendants who are not convicted would have to be just as likely to confess and to do so falsely as are those defendants who are falsely convicted and whose cases can be studied. This necessary premise is implausible. Because confessions cause convictions, the incidence of confessing should be less in the nonconvicted group. In addition, if the case against a suspect is weak, the police may resort to more coercive tactics to obtain a confession. For such reasons, we would not expect to find that the proportion of false confessions in cases of false convictions in DNA (or any other batch of) exoneration cases equals the proportion in all criminal cases.

Bayes' rule also has a role here. Let's stipulate that 25% of all false convictions—not just those of DNA exonerations—involved confessions. This statistic is compatible with many possible ratios of false confessions to true ones in all cases. To prove this, I'll run through some algebra, then give a numerical example, but you can skip the box of algebra if you want.

Let P(FC) be the unknown prevalence of false confessions in all cases. Let P(TC) be the prevalence of true confessions, and P(NC) be the prevalence of the remainder of cases in which defendant does not confess. Under each of these conditions, there is some probability that a verdict of guilty (V) will be attained. (For simplicity, I am going to restrict the analysis to cases without guilty pleas. Discovering that some innocent defendants plead guilty would not be much of a surprise.) For example, the probability of a (false) conviction (FV) given a false confession is P(FV|FC).

Bayes rule then states that

P(FC|FV) = P(FC)P(FV|FC) / [P(FC)P(FV|FC) + P(TC)P(FV|TC) + P(NC)P(FV|NC)].

Because a guilty verdict cannot follow a true confession, P(FV|TC) = 0, and we have

P(FC|FV) = P(FC)P(FV|FC) / [P(FC)P(FV|FC) + P(NC)P(FV|NC)].

We observe P(FC|FV) = 0.25 and want to infer P(FC). Solving for P(FC) yields

P(FC) = kzw/(1-k)y,
where k = P(FC|FV), y = P(FV|FC), z = P(NC), and w = P(FV|NC).

Thus the prevalence of false convictions depends not just on the observed value k = P(FC|FV) = 0.25. It varies with the other conditional probabilities (w and y) and the prevalence of cases without confessions (z).
For a numerical example, consider a set of 1000 cases in which 230 defendants (23%) confess. Suppose that 1 in 10, or 23, of these confessions are false, and that 90% of these false-confessions cases terminate in convictions. The result is about 21 false convictions. Now consider the other 770 cases with no confessions. If, say, 80% of these cases end in convictions of which 10% are false, that will add another 62 false confessions. Upon examining the 83 false-conviction cases, one would find confessions present in 21/83 = 25% of them. Yet, only 2.3% of all the cases (10% of all the confessions) were false confessions.

The lesson of this exercise is simply that the proportion of confessions within a set of cases of false convictions cannot produce a meaningful estimate of their prevalence. Of course, this lesson is not confined to confessions. It applies to any factor that appears (or does not appear) among the known cases of false convictions. If there were no fingerprint identifications in the DNA exoneration cases, would that demonstrate that latent fingerprint identification is highly accurate?

Judge Richard Posner thought so. In United States v. Herrera, 704 F.3d 480 (7th Cir. 2013), he wrote that "[o]f the first 194 prisoners in the United States exonerated by DNA evidence, none had been convicted on the basis of erroneous fingerprint matches, whereas 75 percent had been convicted on the basis of mistaken eyewitness identification." Id. at 487. I will discuss this assertion and its implications in a later posting.

Related Postings

Looking Backwards: How Safe Are Fingerprint Identifications?, July 28, 2014

False Confessions, True Confessions, and the Q Factor, July 2, 2014