Saturday, May 23, 2015

No Relief for Jeffrey MacDonald After FBI Declares It “Exceeded the Limits of Science” with Hair Analysis

It was not yet 3:30 a.m. on February 17, 1970, when tragedy struck Captain Jeffrey MacDonald’s family at 544 Castle Drive, Fort Bragg, North Carolina. His pregnant wife, Collete, “had both her arms broken and was stabbed repeatedly in the chest and neck with a paring knife and an ice pick” (Anthony 2013).  Five-year-old Kimberley “was beaten across the head with a club and stabbed multiple times in the neck. Two-year-old Kristen was stabbed over 30 times in the back, chest and neck ... . MacDonald himself received relatively minor injuries except for a single stab wound that punctured his lung” (Ibid.)

MacDonald, who was a surgeon with the Green Berets, spoke of an attack “by four intruders — two white men, a black man and a white woman. He said the woman held a candle and chanted ‘Acid is groovy’ and ‘Kill the pigs’. On the headboard in the marital bedroom the word ‘PIG’ was written in blood.” (Ibid.) It was eerily similar to the depraved murders of Charles Manson’s followers in Los Angeles. “At Roman Polanski's home they killed the director's pregnant wife, Sharon Tate, and with her blood smeared the word ‘PIG’ on a wall.” (Ibid.) Indeed, Army investigators found an article on the Manson murders in the living room.

After an extended preliminary hearing culminated in a report exonerating Captain MacDonald, he left the Army with an honorable discharge and moved to California. But his father-in-law’s relentless pursuit led to the case being placed before a federal grand jury in 1974. An indictment came the next year. In 1979, federal prosecutors convicted him of the three murders. Appeals and post-conviction motions ensued. The case generated a “small library of books, a TV mini-series, countless documentaries and a forest of newsprint.” (Ibid.)

The latest opinion in this “wilderness of error” (to use the title of the most recent book on the case) is from a federal district court in North Carolina. The court issued this opinion last week, in the midst of an ongoing investigation into FBI reports and testimony about hair comparisons in thousands of cases before 2000. MacDonald’s case is now one of many in which the Department of Justice has confessed error in the presentations of its FBI laboratory personnel who compared hair samples from crime scenes to those of suspects.

Thus, last year, the Department advised MacDonald’s counsel that:
We have determined that the microscopic hair comparison analysis testimony or laboratory report presented in this case included statements that exceeded the limits of science and were, therefore invalid: (1) the examiner stated or implied that the evidentiary hair could be associated with a specific individual to the exclusion of all others—this type of testimony exceeded the limits of science; (2) the examiner assigned to the positive association a statistical weight or probability or provided a likelihood that the questioned hair originated from a particular source, or an opinion as to the likelihood or rareness of the positive association that could lead the jury to believe that valid statistical weight can be assigned to a microscopic hair association—this type of testimony exceeded the limits of science. (A copy of the documents upon which our determination is based is enclosed.) We take no position regarding the materiality of the error in this case.
According to the court, the FBI and the Innocence Project (IP) identified three errors based in the lab reports or trial testimony. None of them prompted the court to change an earlier order denying him post-conviction relief.

In light of the perception of award-winning journalists that the FBI “faked an entire field of forensic science” (Lithwick 2015), that the Bureau placed “pseudoscience in the witness box” (ibid.), and that it performed “virtually worthless” analyses (Blakemore, 2015), it is worth looking carefully at the descriptions of the self-reported “invalid” science. Not having the FBI-IP report cited by the court at my disposal, I rely solely on the court’s description of it. If this description is accurate and if the report on MacDonald's case is representative, one may want to exercise some caution with respect to the surprising number of FBI reports that are said to exude "junk science" (Editorial 2015).

Hair analysis figured into the MacDonald case in an unusual way. It was not performed to associate MacDonald with the crime scene. He was lying in the house, wounded and apparently floating in and out of consciousness. Hairs in the house — especially ones on or around the bodies of the victims — were significant only because they might have come from the invading Manson-like killers. But visual and microscopic inspections of various hairs from the house did not seem to support MacDonald's extraordinary story. Instead, the features seen in the hairs were consistent with hairs sampled from the MacDonalds themselves.

1

A bedspread on the floor of the master bedroom of the MacDonald home contained a hair entangled with a purple cotton thread. An FBI lab technician mounted the hair on a slide marked “Q96 H (from thread).” Paul Stombaugh, who was in charge of the Chemistry Branch of the Chemistry and Physics Section of the FBI crime laboratory, examined the Q96 thread and hair, and wrote:
Light brown to blond head hairs that microscopically match the K1 head hairs of COLLETE MACDONALD were found in specimens ... Q96.... The Q96 hair was found entangled around a purple cotton sewing thread like that used in the construction of the Q12 pajama top [belonging to defendant]. Further, this hair had bloodlike deposits along its shaft.
The 2014 report found no errors in this 1974 laboratory report or in Stombaugh’s testimony at the 1979 trial that “this hair—in conducting a comparison examination with the comparison microscope—microscopically matched the head hairs of Colette MacDonald.

On cross-examination, however, defense counsel suggested that it was peculiar that the thread and the hair “were still wrapped around together after four years of having been in the laboratory custody.” He asked, “Doesn't it make a difference to you to find out what treatment or handling a hair would have had before you examined it in the laboratory?” Stombaugh replied that “The hair was not mounted sir, as were many other ones in this submission. We opened the vials up and identified what was inside. If they were hairs, we would mount it on a slide and then they were compared.” The following exchange then occurred:
Q. Mr. Stombaugh, the question was: weren't you concerned with what might have been done to that hair that might possibly lead you to a wrong conclusion unless you found out what they had done with it?
A. Sir, the only conclusion on the hair examination that I was going to make was its origin.
Q. That is pretty serious about whose hair it is. That is a fundamental question you were being asked.
A. That is correct.
This last exchange is what, in the eyes of the Inspector General and the FBI and IP reviewers, moved Stombaugh’s testimony beyond the limits of science—he said he was examining the hair to reach a “conclusion” of some sort about “its origin” and that this was a “fundamental question.” But he never presented any definitive conclusion of identity. Neither did he try to quantify the probability of identity. To be sure, he did state that the hairs had matching colors and microscopic features. But the reviewers did not deem this conclusion improper or unacceptable. Somehow the conclusion became “invalid” because Stombaugh explained that he was not overly concerned with how the hair had come to entangled with the thread. This event, he said, was not a problem for him to consider because his task was strictly limited to ascertaining whether there was a possible association between that hair and the sample of known hairs from the defendant. Considering this testimony about “the origin” in context, it hardly seems like an egregious example of “pseudoscience” or the like.

2

The second instance of “invalid science” reported in 2014 was a 1999 laboratory report of Robert Fram, an examiner in the FBI Lab Hairs and Fiber Unit. At this point in the post-conviction proceedings, the district court had ordered the FBI to ship the hairs to the Armed Forces DNA Identification Laboratory for mitochondrial DNA testing. Fram documented the contents of the slides and sample being packed up and sent. During this process, he examined a glass microscope slide marked “19 1/2 L2082 Q96 PMS,” which contained four hairs. He observed that:
A forcibly removed Caucasian head hair found on one of the Q96 resubmitted glass microscope slides . . . exhibits the same microscopic characteristics as hairs in the K2 specimen. Accordingly, this hair is consistent with having originated from KIMBERLY MACDONALD, the identified source of the K2 specimen.
Fram also stated in the report that “[h]air comparisons are not a basis for personal identification.

Again, condemning these observations as erroneous seems harsh. Although the phrase “consistent with” is far from ideal, no one seems to doubt that the hair truly was “consistent with” the little girl’s, and MacDonald did not contend that it originated from anyone else.

3

In response to MacDonald’s original 1990 Petition for Post Conviction Relief, FBI laboratory analyst Michael Malone studied one hair found near Colette MacDonald. Malone was to become notorious for giving false or dubious testimony in other cases (Earl 2014). In this phase of the MacDonald case in 1991, however, he simply wrote that:
This hair [Q79] was compared to the pubic hair sample of JEFFREY MACDONALD (specimen K22). This hair exhibits the same individual microscopic characteristics as the pubic hairs of JEFFREY MACDONALD, and accordingly is consistent with having originated from JEFFREY MACDONALD.
Like Fram, he added a qualification. But where Fram cautioned that “[h]air comparisons are not a basis for personal identification,” Malone noted that “hair comparisons do not constitute a basis for absolute personal identification.

Despite the addition of the word “absolute,” on their face, these statements do not seem to “state[] or impl[y] that the evidentiary hair could be associated with a specific individual to the exclusion of all others,” and they do not “assign[] to the positive association a statistical weight or probability or provide[] a likelihood that the questioned hair originated from a particular source.” Finding matching physical features is consistent with the proposition that the hair was MacDonald's. At the same time, “hair comparisons do not constitute a basis for absolute personal identification” -- the match does not exclude everyone else in the world. Thus, Malone's statements do not seem to be scientifically invalid (at least with respect to the two criteria in the Inspector General's letter).

Rather, the legitimate concern is psychological -- without a literal statement that the observed similarities are also consistent with the possibility that the hair was not MacDonald's, the reader might give the match more weight than it logically deserves. This misconstruction of the report by a lay reader is certainly possible, and I would not want reports about hair matches to be written like Malone's and Fram's were. But this objection is different than dismissing the findings as invalid on the theory that the statements in the report logically imply that the only individual in world who could have been the source of the hair was MacDonald. In reaching the latter conclusion, the Inspector General may have gone too far.

Microscopic hair comparison is only a rough indicator of identity. Many people could share the same characteristics. But this limitation does not make the field fraudulent. Many disease symptoms, for example, are overinclusive when used to make a diagnosis, but that fact does not render them invalid or worthless as diagnostic criteria.

Likewise, the consistency that Malone reported was not sufficient to establish to a near certainty that the hair was MacDonald’s rather than an intruder’s. In fact, the later mitochondrial DNA testing excluded MacDonald, his wife, and his children as the source of the hair. Consequently, Malone’s reported similarity could have been false (if Malone did not make accurate observations, or if he lied about what he observed). Or, perhaps the Q79 hair was physically similar to MacDonald’s, as Malone said, but it nevertheless originated from someone else. As MacDonald and Fram explicitly stated, physical similarity alone is probative but not definitive of identity.

References

Related Postings

Saturday, March 21, 2015

The Junk DNA Wars

This month, the New York Times' published a report on “the junk DNA wars” asking “Is Most of Our DNA Garbage”? 1/ Readers of the article (and an anonymous follow-up piece on the reactions appearing in science blogs) 2/ would come away thinking that there is a serious debate in the scientific community over the proposition that “junk DNA” is “mostly functional.”

Without defining terms like “functional” and “junk,” however, it is impossible to know what is in dispute and what is not.The follow-up piece is particularly frustrating. It observes that
Some scientists, like T. Ryan Gregory, a evolutionary biologist ... argue that if DNA is mostly functional, then it’s hard to explain why rather humble species, like the onion, have far more DNA than we do. ...
Those who disputed Gregory’s findings [sic — Gregory did not discover the long-standing C-value paradox 3/ ], including supporters of intelligent design, cited the Encode Project, an N.I.H.-sponsored attempt to catalog the functional elements of the genome. Encode scientists found that 80 percent of the genome had “biochemical functions,” suggesting that there was a lot less junk DNA than scientists had thought. But did “biochemical function” really mean anything?
For many scientists, it didn’t. A University of Toronto biochemist, Larry Moran, wrote that “the general public has been snowed by the Encode publicity campaign and by na├»ve journalists who have enthusiastically reported that junk DNA is dead.”
But the Times' writers did not explain why “many scientists” are not snowed by the 80% statistic. After reading some of the ENCODE papers and the surrounding (typically hyperbolic) publicity, I concluded that:
The ENCODE papers show that 80% of the genome displays signs of certain types of biochemical activity—even though the activity may be insignificant, pointless, or unnecessary. This 80% includes all of the introns, for they are active in the production of pre-mRNA transcripts. But this hardly means that they are regulatory or otherwise functional. Indeed, if one carries the ENCODE definition to its logical extreme, 100% of the genome is functional—for all of it participates in at least one biochemical process—DNA replication.

That the ENCODE project would not adopt the most extreme biochemical definition is understandable—that definition would be useless. But the ENCODE definition is still grossly overinclusive from the standpoint of evolutionary biology. From that perspective, most estimates of the proportion of “functional” DNA are well under 80%. 4/
In short, evolutionary biologists reject "biochemical function" as a criterion for recognizing "junk" because not every bit of biochemical activity affects the reproductive fitness of organisms. (Neither does chemical activity per se show any influence on phenotypes that are related to the healthy functioning of those organisms.) To the evolutionary biologists, the term “junk DNA” means parts of the genome in which the particular DNA sequences (the order of the base pairs) do not have evolutionary significance. The Times article defines “junk DNA” differently, and vaguely, as “pieces of DNA that do nothing for us.” This is not the scientific definition. In fact, the earliest papers on “junk DNA” proposed that much of it might “do something” for us.

The “junk DNA war” (or rather the confusion about the meaning of the term “junk”) has spilled over into the legal realm. A brief that leading genetics and genomics researchers submitted to the U.S. Supreme Court to clarify the privacy implications of forensic DNA typing tried to address it. 5/ These researchers observed that
  • In genetics, “junk DNA” denotes sequences that lie outside of genes and that are not under detectable selective pressure: that such DNA exists is not in doubt.
  • “Junk” DNA sequences could be biologically useful or interesting yet not be useful for disease diagnosis or prediction.
  • ENCODE data do not reveal that anywhere near 80% of the genome contains medically relevant information.
  • The ENCODE findings indicate that the system that regulates gene expression is exquisitely complex, but they do little to change the status of “junk DNA” in general.
As far as I know, these conclusions have not been contradicted by new studies, but I have not conducted a recent literature review and would be grateful to hear of relevant papers that undermine these observations.

Notes
  1. Carl Zimmer, Is Most of Our DNA Garbage?, N.Y. Times Mag., Mar. 5, 2015 
  2. Re: Is Most of Our DNA Garbage?, N.Y. Times Sunday Mag., Mar. 20, 2015
  3. See Sean R. Eddy, The C-value Paradox, Junk DNA and ENCODE, 22 Current Biology R898 (2012)
  4. David H. Kaye, ENCODE’S “Functional Elements” and the CODIS Loci (Part II. Alice in Genomeland), Forensic Science, Statistics, and the Law, Sept. 18, 2012 (note omitted)
  5. Brief of Genetics, Genomics, and Forensic Science Researchers as Amici Curiae in Support of Neither Party, Maryland v. King, No. 12-204, Dec, 28, 2012, reprinted in part in Henry T. Greely & David H. Kaye, A Brief of Genetics, Genomics and Forensic Science Researchers in Maryland v. King, 53 Jurimetrics J. 43 (2013), available at http://ssrn.com/abstract=2403063http://ssrn.com/abstract=2403063. Disclosure statement: I prepared an initial draft of the brief and coordinated the revisions to it.

Thursday, March 5, 2015

The (Lack of) Meaning of the Supreme Court's Disposition of Raynor v. State

Yesterday, Popular Science reported that a “recent refusal by the Supreme Court means that involuntary DNA collection isn't unconstitutional.” This will come as a surprise to the Justices who voted to deny a writ of certiorari to the Maryland Court of Appeals in Raynor v. State, 99 A.3d 753 (Md. 2014).

Raynor is one of many cases in which courts have concluded that the Fourth Amendment prohibition against “unreasonable searches and seizures” does not apply to acquiring and testing naturally shed DNA. This particular case arose when, two years after a reported rape, the victim told police that she suspected Glenn Raynor had attacked her. Raynor agreed to come to a police station to answer questions. At the interview, he declined to provide a DNA sample, but after he left, police took swabs of the armrests of the chair in which had sat. The trial court denied his motion to suppress evidence of the incriminating match that followed, noting that “if he was so concerned about it, he should have worn a long sleeve shirt.” A conviction and a 100-year sentence of imprisonment followed.

According to the Popular Science article,
Raynor appealed the decision, saying the DNA evidence shouldn't have been used because it was collected without his consent. The appeal made it all the way up to the Supreme Court, which on Monday, the court announced [sic] that it would not hear the case. The Supreme Court did not comment on the denial—and to be fair, they get requests to hear a whole lot of cases every year and have to deny a majority of them—[but] their refusal to hear the case means they stand with the lower court’s majority opinion [which stated that]:
We hold that DNA testing of the 13 identifying junk loci within genetic material, not obtained by means of a physical intrusion into the person’s body, is no more a search for purposes of the Fourth Amendment, than is the testing of fingerprints, or the observation of any other identifying feature revealed to the public—visage, apparent age, body type, skin color.
In fact, the Supreme Court denies some 97% of the petitions it receives from private parties. Any first year law student knows that denying one of these 7,000 or so petitions does not mean that the Court “stand[s] with the lower court’s majority opinion.” It merely means that, for any number of possible reasons, four of the nine Justices did not vote to re-examine the case. In short, although police have been doing such testing time and again over the last twenty years or so, the U.S. Supreme Court has yet to approve — or disapprove — of the constitutionality of the practice.

References
Related posting

Tuesday, February 24, 2015

Genetic Determinism and Essentialism on the Electronic Frontier

The latest bit of what, in the scientific world, is discredited genetic determinism, comes from the Electronic Frontier Foundation (EFF). This is not the first time the EFF has strayed from electronics to genetics, where it seems inclined to overstate scientific findings. 1/ Now the organization wants the Supreme Court to decide whether it is an unreasonable search or seizure for police, without probable cause and a warrant, to acquire and analyze shed DNA for identifying features that might link a suspect to a crime. That is a perfectly reasonable request, although, in the unlikely event that the Court takes this bait, making the case for a Fourth Amendment violation will not be easy.

What is less reasonable, indeed, what many geneticists and bioethicists regard as ill-advised, is to portray DNA as a map of “who we are, where we come from and who we will be.” 2/ My DNA is not who I am. It determines some things about me — my blood type, for example — but not my occupation, my interests, my skills, my criminal record, or my political affiliation. Yet, rather than simply point out that people have legitimate reasons to want to maintain the confidentiality of certain traits or risks that DNA analysis could reveal — such as an inherited form of Alzhiemer’s Disease — the EFF is concerned that “[r]esearchers have theorized DNA may also determine race, intelligence, criminality, sexual orientation, and even political ideology.” 3/

Of course, researchers have “theorized” almost everything at one time or another. And the prospect that police will collect DNA from a suspect surreptitiously to find out if he is a liberal Democrat or a conservative Republican seems a tad silly. Still, I was curious: Is there really a theory of how genes determine political ideology?

I turned to the news article in a 2012 issue of Nature cited by the EFF. 4/ Nothing in the article gives a theory of genetic determinism for political ideology. The article refers to twin studies that imply genetics plays some role in political behavior. There are some reports of candidate genes from studies that have “yet to be independently replicated.” 5/

As for a theory of how unknown genes might, to some degree, in some settings, influence political ideology, the theory is that some genes affect general attitudes or emotional reactions that could relate in some manner to political ideology. For example,
US conservatives may not seem to have much in common with Iraqi or Italian conservatives, but many political psychologists agree that political ideology can be narrowed down to one basic personality trait: openness to change. Liberals tend to be more accepting of social change than conservatives. ...

Theoretically, a person who is open to change might be more likely to favour gay marriage, immigration and other policies that alter society and are traditionally linked to liberal politics in the United States; personalities leaning towards order and the status quo might support a strong military force to protect a country, policies that clamp down on immigration and bans on same-sex marriage. 6/
These remarks are not a basis for a true friend of the Court to imply that political ideology might be a genetically determined phenotype. 7/

Notes
  1. See David H. Kaye, Dear Judges: A Letter from the Electronic Frontier Foundation to the Ninth Circuit, Forensic Science, Statistics and the Law, Sept. 20, 2012.
  2. Brief of Amicus Curiae Electronic Frontier Foundation in Support of Petitioner on Petition for a Writ of Certiorari, Raynor v. Maryland, No. 14-885, Feb. 18, 2015, at 2.
  3. Id. (note omitted).
  4. Lizzie Buchen, Biology and Ideology: The Anatomy of Politics, 490 Nature 466 (2012).
  5. Id. at 466.
  6. Id. at 468.
  7. For a critical discussion of factual errors and distortions in Supreme Court amicus briefs generally, see Allison Orr Larsen, The Trouble with Amicus Facts, 100 Va. L. Rev. 1757 (2014).

Friday, February 20, 2015

Buza Reloaded: California Supreme Court Grants Review

Yesterday the California Supreme Court granted review in People v. Buza, No. A125542 (Cal. Ct. App., 1st Dist., Dec. 3, 2014), and ordered the Court of Appeal opinion "depublished." A depublication order "is not an expression of the court's opinion of the correctness of the result of the decision or of any law stated in the opinion." Cal. Rules of Court, Rule 8.1125(d) (2015). However, "an opinion of a California Court of Appeal ... that is not ... ordered published must not be cited or relied on by a court or a party in any other action" in California. Rule 8.1115(a).

The California Department of Justice issued a information bulletin advising all state law enforcement agencies that
By operation of state law, the Supreme Court’s order granting review removes the Court of Appeal’s opinion as published authority and prevents citation or reliance on that decision in any other action. As a result of the California Supreme Court’s grant of review of this decision, there is now no state precedent that precludes collection of DNA database samples from adult felony arrestees pursuant to Penal Code section 296.

Penal Code sections 296(a)(2) and 296.1(a) therefore are in full effect and mandate the collection of DNA database samples from all adults arrested for a felony or wobbler offense. All authorized arrestee samples that have been or will be received by the California Department of Justice DNA Data Bank program will be analyzed and uploaded to CODIS.

Closely related postings

Sunday, February 15, 2015

"Remarkably Accurate": The Miami-Dade Police Study of Latent Fingerprint Identification (Pt. 2)

A week ago, I noted the Justice Department’s view that a “study of ... latent print examiners ... found that examiners make extremely few errors. Even when examiners did not get an independent second opinion about the decisions, they were remarkably accurate.” 1/ But just how accurate were they?

The police who conducted the study “[p]resented the data to a professor from the Department of Statistics at Florida International University” (p. 39), and this “independent statistician performed a statistical analysis from the data generated” (p. 45). The first table in the report (Table 4, p. 53) contains the following data (in slightly different form):

Table 1. Classifications of Pairs
Examiner's
Statement
Nonmates (N) Mates (M)
953 235
+ 42 2547
? 403 446

Here, “+” stands for a positive opinion of identity between a pair of prints (same source), “–” denotes a negative opinion (an exclusion), and “?” indicates a refusal to make either judgment (an inconclusive) even though the examiner initially deemed the prints sufficient for comparison.

What do the numbers in Table 1 mean? As noted in my previous posting, they pertain to the judgments of 109 examiners with regard to various pairings of 80 latent prints with originating friction ridge skin (mates) and nonoriginating skin (nonmates). A total of 3,138 pairs were mates; of these, the examiners reached a positive or negative conclusion in 2,692 instances. Another 1,398 were nonmates; of these, the examiners reached a conclusion in 995 instances. Given that examiners were presented with mates and that they reached a conclusion of some sort, the proportion of matches declared was P(+|M & not-?) = 2,457/2,692 = 91.3%. These were correct matches. For the pairings in which the examiners reached a conclusion, they declared nonmates to match in P(+|N & not-?) = 42/995 = 4.2% of the pairs. These were false positives. With respect to all the comparisons (including the ones that they found to be inconclusive), the true positive rate was P(+|M) = 2,457/3,138 = 78.3%, and the false positive rate was P(+|N) = 42/1,398 = 3.0%. Similar reasoning applies to the exclusions. Altogether, we can write:

Table 2. Conditional Error Rates

Excluding inconclusives Including inconclusives
False + P(+ | N & not-?)
4.2%
P(+ | N)
3.0%
False – P(– | M & not-?)
8.7%
P(– | M)
7.5%


These error rates, which are clearly reported in the study, do not strike me as "remarkably small"—especially considering that they include the full spectrum of pairs—easy as well as difficult comparisons. Of course, they do not include blind verification of the conclusions, a matter addressed in another part of the study.

The authors report more reassuring values for “Positive Predictive Value” (PPV) and “Negative Predictive Value (NPV).” These were 98.3% and 92.4%, respectively. But these quantities depend on the proportions of pairs that are mates (69%) and nonmates (31%) in the test pairs. The prevalence of mates in casework—or the “prior probability” in a particular case—might be quite different. 2/

A better statistic for thinking about the probative value of an examiner’s conclusion is the likelihood ratio (LR). Are matches declared more frequently when examiners encounter mated pairs than nonmates? How much more frequent are these correct classifications? Are declared exclusions more frequent when examiners encounter nonmates than mates? How much more frequent are these correct classifications?

The LR answers these questions. For declared matches, the LR is P(+|M) / P(+|N) = 0.783 / 0.030 = 26. For declared exclusions, it is P(–|N) / P(–|M) = 9. 3/ These values support the claim that, on average, examiners can distinguish paired mates from paired nonmates. If all the examiners were flipping fair coins to decide, the LRs would be expected to be 1. The examiners did much better than that.

Nevertheless, claims of overwhelming confidence across the board do not seem to be justified. If examiners were presented with equal numbers of mates and nonmates, one would expect that a declared match would be a correct match in P(M|+) = 26/27 = 96% of the cases in which a match is declared. 4/ Likewise, a declared exclusion would a correct classification in P(N|–) = 9/10 = 90% of the instances in which an exclusion is declared. The PPV and PNV in the Miami-Dade study are a little bit higher because the prevalence of mates was 69% instead of 50%, and the examiners were cautious — they were less likely to err when making positive identifications than negative ones.

Suppose, however, that in a case of average difficulty, an average examiner declared a match when the defendant had strong evidence that he never had been in the room where the fingerprints were found. Let us say that a judge or juror, on the basis of the non-fingerprint evidence in the case, would assign a probability of 1% rather than 50% or 69% to the hypothesis of that the defendant is the source of the latent print. The examiner, properly blinded to this evidence, would not know of this small prior probability. An LR of 26 would raise the prior probability from 1% to 26%. Informing the judge or juror of the reported PPV of 98.3% from the study without explaining that it does not imply a “predictive value” of 98.3% in this case would be very dangerous. It would lead the factfinder to regard the examiner’s conclusion as far more powerful than it actually is.

Notes

  1. David H. Kaye, "Remarkably Accurate": The Miami-Dade Police Study of Latent Fingerprint Identification (Pt. 1), Forensic Science, Statistics, and the Law,  Feb. 8, 2015
  2. In addition, the NPV has been adjusted upward from 80% “[i]n [that] consideration was given to the number of standards presented to the participant.” P. 53.
  3. Removing nondeclarations of matches or exclusions (inconclusives) from the denominators of the LRs does not change the ratios very much. They become 22 and 11, respectively.
  4. This result follows immediately from Bayes' rule with a prevalence of P(M) = P(N) = 1/2, since P(M|+) = P(+|M) P(M) / [P(+|M) P(M) + P(+|NM) P(NM)] = P(+|M) / [P(+|M) + P(+|NM)] = LR / (LR + 1) = 26/27.

Sunday, February 8, 2015

"Remarkably Accurate": The Miami-Dade Police Study of Latent Fingerprint Identification (Pt. 1)

A week ago (Feb. 2, 2015), the Justice Department issued a press release entitled "Fingerprint Examiners Found to Have Very Low Error Rates." According to the Department:
A large-scale study of the accuracy and reliability of decisions made by latent fingerprint examiners found that examiners make extremely few errors. Even when examiners did not get an independent second opinion about the decisions, they were remarkably accurate. But when decisions were verified by an independent reviewer, examiners had a 0% false positive, or incorrect identification, rate and a 3% false negative, or missed identification, rate. ... “The results from the Miami-Dade team address the accuracy, reliability, and validity in the forensic science disciplines, ...” said Gerald LaPorte, Director of NIJ’s Office of Investigative and Forensic Sciences.
Inasmuch as the researchers -- latent print examiners and a police commander in the Miami Dade Police Department 1/ -- only studied the performance of 109 latent print examiners, it is not clear how many forensic science disciplines it actually addresses. Nor is it obvious what "validity" means (beyond "accuracy") in this one activity.

But let's put press releases to the side and look into the study itself. The authors assert that
The foundation of latent fingerprint identification is that friction ridge skin is unique and persistent. Through the examination of all of the qualitative and quantitative features available in friction ridge skin, impressions can be positively identified or excluded to the individual that produced it. 2/
This study does next to nothing to validate this foundation. The premise of uniqueness is very difficult to validate, and this study is limited to "80 latent prints with varying quantity and quality of information from [a grand total of] ten known sources." 3/ But, to its credit, the research does tell us about the ability of one large group of examiners to correctly and reliably pair these particular latent prints to the more complete known prints of the fingers that generated them. Let's see how much it reveals in this regard.

The Test Set

As for the prints used in the experiment, "[a] panel of three International of Association (IAI) certified latent print examiners independently examined and compared the 320 latent prints to the known standards and scored each latent print and subsequent comparison to their known standard according to a rating scale that was designed and used for this research; 80 were selected as the final latent prints to be used for testing purposes." 4/ The purpose of the three independent examinations was to rate the latent-known pairs on a difficulty scale "in order to present the participants with a broad range of latent print examinations that were representative of actual casework." 5/ Although the researchers may well have succeeded in fashioning a test set with pairs of varying difficulty, the report does not explain how they knew that this set was "representative of actual casework" and that "[t]he test sets utilized in this study were similar to the work that participants perform on a daily basis." 6/ Neither did they report how consistently the three uber-experts gauged the difficulty of the pairs.

The Examiners Who Were Tested

It seems that readers of the Miami-Dade report must take on faith the assertion that the test set is "representative of actual casework." In contrast, it is plain that the test subjects are not representative of all caseworkers. Rather than seek a random sample of all practicing latent print examiners -- which would be a difficult undertaking -- the researchers chose a convenience sample. Only "[l]atent print examiners in the United States who were an active member [sic] of the IAI received an email invitation from the MDPD FSB inviting them to participate in this study." 7/ Inasmuch as IAI certification is a mark of distinction, the sampling frame diverges from the population of all examiners. Departing from good statistical practice, the report does not state how large the nonresponse rate for IAI-certified invitees was. If it was high (as seems probable), the sample of examiners is likely to be a biased sample of all IAI-certified examiners.

In addition to soliciting participation from IAI-certified examiners, "[a]pplications were also made available to any qualified latent print examiner, regardless of affiliation with a professional organization." 8/ How this was done is not explained, but in the end, 55% of the subjects were not IAI-certified. 9/

Of course, these features of the sampling method do not deprive the study of all value. The experiment shows what a set of motivated examiners (volunteers) with high representation from IAI-certified examiners achieved when they (1) knew that their performance would be used in a report on the capabilities of their profession, (2) had an unspecified period of time to work, and (3) may not have always worked alone on the test materials. In the next posting on the study, I will describe these results.

Notes

  1. The only description of the authors in the report is on the title page, which identifies them as Igor Pacheco, CLPE (MDPD), Brian Cerchiai, CTPE (MDPD), and Stephanie Stoiloff, MS (MDPD)." The International Association for Identification lists the first two authors as certified latent print examiners as of Dec. 4, 2014. Mr. Cerchiai is also a, IAI certified tenprint examiner. The third author is a senior police bureau commander in the Forensic Services Bureau of the Miami-Dade Police Department (MDPD). In July 2012, she testified before the Senate Judiciary Committee on behalf of the International Association of Chiefs of Police that "[f]orensic science is not the floundering profession that some may portray it to be."
  2. Igor Pacheco, Brian Cerchiai & Stephanie Stoiloff, Miami-Dade Research Study for the Reliability of the ACE-V Process: Accuracy & Precision in Latent Fingerprint Examinations, Final Technical Report, Award No. 2010-DN-BX-K268, Dec. 2014 (abstract).
  3. Id. The latent prints were not just from fingers. Some were palm prints.
  4. Id. at 24.
  5. Id. at 27.
  6. Id. at 35.
  7. Id. at 34.
  8. Id. at 35.
  9. Id. at 51.
Related Postings
  • Reports on studies in mainstream journals can be found on this blog under the labels "fingerprint" and "error."