Sunday, May 1, 2016

BIT Global Group's Next Flaky Forensics Conference

BIT Global Group is notable for the scope and intensity of its spam promoting what appear to be parodies of academic or professional conferences. The latest email "on behalf of the organizing committee" cordially invited me "to attend BIT’s 3rd International Congress of Forensics & Police Tech Expo 2016, which will be held during October 27-29, 2016 at International Conference Center, Dalian, China."

The organizing committee (or, as the conference's website designates it, the program committee) consists of one individual, Dr. Xiaodan Mei, President of BIT Group Global Ltd., China. But the "Advisory Board Members" are "coming soon," and already, "the Plenary Session will invite a distinguished international panel including Dr. Henry Lee and many others to talk about the impact of recent forensic scientific and technological breakthroughs around the world."

Indeed, if the website is accurate, the "Renowned Speakers of WCF-2016" who have signed on are
  • "Dr. Henry C. Lee, Chief Emeritus for Scientific Services, University of New Haven, USA" and
  • "Dr. John Zheng Wang, Professor and Director of Certificate Programs, California State University [Long Beach], USA,"

On May 16, 2016, Ms. Cherry Dong, Organizing Committee of the Forensics & Police Tech Expo 2016, advised me that
I’m writing to follow up my last invitation letter several weeks ago, it is regretful we couldn’t receive your reply. Now we would like to extend our cordially invitation to you again to join the BIT’s 3rd International Congress of Forensics & Police Tech Expo 2016 ... . Forensic science has attracted enormous attentions ... . Therefore, we establish this focused event ... So please kindly inform us if you are available to give us a presentation in the conference, it will be highly appreciated you can give us a prompt reply with a title and brief (3-5 sentences) summary on your recent work in the area.
The renowned speakers list has grown with the addition of Mathias Gaertner, "Publicly Accredited and Sworn in Expert Witness for Information Technology, Germany"; Eric Kreuter, "a Partner in the Financial Advisory Services Group at Marks Paneth LLP"; Erik Laykin, "Managing Director, Duff & Phelps LLC"; Frank Prieels, "Professor, University of Duesseldorf, Belgium"; Andre Stuart, CEO, "21st Century Forensic Animation"; and Linda Xiao, "Professional Officer (Technical), University of Technology, Australia."

For more information on BIT's "congresses" and practices, see Wikipedia and Flaky Academic Conferences.

Friday, April 29, 2016

False Justice and Prosecutors' Fallacies

False Justice, by Jim and Nancy Petro, is an engaging, first-person tale of a former Ohio Attorney General's involvement in correcting false convictions as well as a summary and refutation of, as the book's subtitle puts it, "Eight Myths that Convict the Innocent." 1/ The book reveals the frustrations that lawyers in the innocence movement know all too well, and it wisely warns prosecutors, police, and the public of pernicious fallacies about criminals and the criminal justice system.

But the book perpetuates a fallacy of a different sort -- a statistical fallacy often called, in legal circles, "the prosecutor's fallacy." 2/ With some types of trace evidence (particularly DNA evidence), it is feasible to estimate the probability of a match between a defendant and the trace evidence given that a suspect is not the source of the trace at the crime scene. We can write this coincidental match probability as Pr(Match | ~Source).

The problem is that the judge or jury wants to know the probability of a DNA match given that a suspect is not the source of the DNA: P(~Source | Match). These two conditional probabilities are conceptually distinct. Sometimes they can be numerically identical or very close to one another, but other times they are not even close. The statistical or logical fallacy consists of naively transforming P(Match | ~Source) into P(~Source | Match).

The first instance of this transposition occurs at page 47, when the Petros quote from a letter intended to persuade a county prosecutor of Clarence Elkins' innocence (and of the guilt of a different inmate in the same cellblock):
We had a very convincing match. In a letter [the Ohio Innocence Project] and Elkin's attorneys ... informed the Summit County prosecutor that newly conducted DNA testing "conclusively exonerates Elkins and implicates Earl Mann in the murder and rapes in which Elkins was convicted." The letter explained that the full profile of the DNA from the girl's panties and Mrs. Judy Johnson's vaginal swab were "consistent with Earl Mann's DNA for full 12-point match."
How convincing was this match? This "full 12-point match," did not involve a random match probability P(Match | ~Source) as small as those for normal STR matches. It came from Y-STR testing. Ordinary forensic STR testing uses loci scattered across different pairs of autosomal chromosomes. For those STRs, estimating the probability of a 12-locus match would involve 24 multiplications of smallish fractions and give rise to tiny match probabilities for any given profile. Not so for forensic Y-STR testing. Y-STRs all lie on a single Y chromosome and are inherited father to son, as one package. Multiplying the population frequencies for individual Y-STRs would not make sense. 3/ Instead of multiplying,
As in all Y-STR DNA analysis, the the odds of finding a match are calculated on how many times that specific configuration of markers has been seen in a particular database. In this case, Earl Mann's DNA, in a database of 4,000 samples, matched the crime scene DNA. The letter explained, "Thus far, it ... is a unique Y-STR profile, and there is less than a 1 in 4,000 chance that it is not Earl Mann who left his DNA at the crime scene in the most highly probative areas."
Presumably, "less than ... 1 in 4,000" refers to the fraction 1/4001 that expresses how often the profile has been seen -- only once -- compared to how many Y-STR profiles have been recorded -- 4000 previous profiles plus Mann's. 4/ A 1/4001 "chance that it is not Earl Mann" given that the trace DNA and Mann's have the Y-STR profile is P(~Source | Match). In contrast, 1/4001 is the probability of randomly picking Mann's profile from a population in which 1/4001 profiles are just like Mann's. It is P(Match | ~Source). Elkin's lawyers have transposed. To build their case against Mann, they have committed the prosecutor's fallacy.

Now, Earl Mann was almost certainly guilty -- but not just because he had a matching profile. According to the 2000 Census, Summit County was home to approximately 140,000 men between the ages of 20 and 59. At a rate of 1 man per 4001, we would expect to find 140000/4001 = 35 of them with matching DNA. Looking at just the Y-STR match, it no longer sounds as if the chance that Mann was not the source is only 1/4001.

In fact, one could argue the chance that Mann was not the source is P(~Source | Match) = 34/35! After all, there were some 35 men in the right age range and locale for whom one could say "the full profile of the DNA from the girl's panties and Mrs. Judy Johnson's vaginal swab were 'consistent with ... for full 12-point match.'" Mann is just one of them. As such, for him, P(Source | Match) = 1/35; hence, P(~Source | Match) = 34/35.  5/

The passage quoted above is not the only instance of transposition in False Justice. It occurs just about every time the Petros quote a random match probability. Most of these probabilities are so small that the resulting likelihood ratio would swamp any reasonable prior probability, making the fallacy for particular transpositions somewhat academic. Still, False Justice does not get its description of the meaning of small match probabilities quite right.

  1. Jim Petro & Nancy Petro, 2015. False Justice: Eight Myths that Convict the Innocent. Routledge: New York, NY (rev. ed.).
  2. William C. Thompson & Edward L. Shumann, (1987). Interpretation of Statistical Evidence in Criminal Trials: The Prosecutor's Fallacy and the Defense Attorney's Fallacy. Law and Human Behavior, 2(3): 167-187 (introducing the phrase).
  3. See, e.g., David H. Kaye, 2010. The Double Helix and the Law of Evidence. Harvard Univ. Press: Cambridge, MA.
  4. Another way to estimate the Y-STR profile frequency is more commonly used, but that is tangential to the issue of transposition.
  5. A better way to arrive at P(~Source | Match) is to apply Bayes' rule. That formula yields 34/35 if one assumes that Mann and every other man in Summit County in the age range mentioned has the same prior probability of being the source of the trace DNA and that everyone else in the world has a source probability of zero. 

Saturday, April 16, 2016

The Department of Justice's Plan for a Forensic Science Discipline Review

On March 21, the Department of Justice announced to the National Commission on Forensic Science that it will be
expanding its review of forensic testimony by the FBI Laboratory beyond hair matching to widely used techniques such as fingerprint examinations and bullet-tracing. Officials also said that if the initial review finds systemic problems in a forensic discipline, expert testimony could be reviewed from laboratories beyond the FBI that do analysis for DOJ. 1/
The head of the Department's Office of Legal Policy welcomed input from the Commission on the following topics:
  • How to prioritize disciplines
  • Scope of time period
  • Sampling particular types of cases
  • Consideration of inaccuracies
  • Levels of review
  • Legal and/or forensic reviewers
  • External review processes
  • Ensuring community feedback on methodology
  • Duty/process to inform parties 2/
On April 8, the Department quietly posted a Notice of Public Comment Period on the Presentation of the Forensic Science Discipline Review Framework in the Federal Register. The public comment period ends on May 9.

According to an earlier statement of Deputy Attorney General Sally Yates, the review is intended to "advance the practice of forensic science by ensuring DOJ forensic examiners have testified as appropriate in legal proceedings." Obviously, the criteria for identifying what is and is not "appropriate" will be critical. For example, which of the following examples of testimony about glass fragments (or paraphrases of the testimony) would be deemed inappropriate?
  • "In my opinion the refractory indices of the two glasses are consistent and they could have common origin." Varner v. State, 420 So.2d 841 (Ala. Ct. Crim. App. 1982). 
  • "Test comparisons of the glass removed from the bullet and that found in the pane on the back door, through which the unaccounted-for bullet had passed, revealed that all of their physical properties matched, with no measurable discrepancies. Based upon F.B.I. statistical information, it was determined that only 3.8 out of 100 samples could have the same physical properties, based upon the refractive index test alone, which was performed." Johnson v. State, 521 So.2d 1006 (Ala. Ct. Crim. App. 1986).
  • "Bradley was able to opine, to a reasonable degree of scientific certainty, that the glass standard and the third fragment had a 'good probability of common origin.'" People v. Smith, 968 N.E.2d 1271 (Ill. App. Ct. 2012).
  • "Blair Schultz, an Illinois State Police forensic chemist, compared a piece of standard laminated glass from defendant's windshield to a piece of glass from Pranaitis' clothing. He found them to have the same refractive index, which means that the two pieces of glass could have originated from the same source. The likelihood of this match was one in five, meaning that one out of every five pieces of laminated glass would have the same refractive index." People v. Digirolamo, 688 N.E.2d 116 (Ill. 1997).
  • "[O]ne of the glass fragments found in appellant's car was of common origin with glass from the victim's broken garage window. The prosecutor asked her if she were to break one hundred windows at random in Allen County, what would be the percentage of matching specimens she would expect. Over appellant's objection, she answered that if one hundred windows were broken, six of the windows would have the properties she mentioned." Hicks v. State, 544 N.E.2d 500, 504 (Ind. 1989).
Hopefully, the Department will learn from the FBI/DOJ Microscopic Hair Comparison Analysis Review 3/ and
(1) make public the detailed criteria is employs,
(2) use a system with measured reliability for applying these criteria, and
(3) make the transcripts or other materials under review readily available to the public.
  1. Spencer S. Hsu, Justice Department Frames Expanded Review of FBI Forensic Testimony, Wash. Post, Mar. 21, 2016. 
  2. Office of Legal Policy, U.S. Department of Justice, Presentation of the Forensic Science Discipline Framework to the National Commission on Forensic Science, Mar. 21, 2016
  3. See Ultracrepidarianism in Forensic Science: The Hair Evidence Debacle, Washington & Lee Law Review (online), Vol. 72, No. 2, pp. 227-254, September 2015
Related Posting
"Stress Tests" by the Department of Justice and the FBI's "Approved Scientific Standards for Testimony and Reports", Feb. 25, 2016

Saturday, April 2, 2016

Sample Evidence: What’s Wrong with ASTM E2548-11 Standard Guide for Sampling Seized Drugs?

Samuel Johnson once observed that “You don't have to eat the whole ox to know that it is tough.” Or maybe he didn't say this, 1/ but the idea applies to many endeavors. One of them is testing of seized drugs. The law needs to—and generally does—recognize the value of surveys and samples in drug and many other kinds of cases. 2/ If the quantity of seized drugs is large, it is impractical and typically unnecessary to test every bit of the materials. Clear guidance on how to select samples from the population of seized matter would be helpful to courts and laboratories alike.

To accomplish this goal, the Chemistry and Instrumental Analysis Subject Area Committee of the Organization of Scientific Area Committees for Forensic Science (OSAC) has recommended the addition of ASTM International’s Standard Guide for Sampling Seized Drugs for Qualitative and Quantitative Analysis (known as ASTM E2548-11) to the National Institute of Standards and Technology (NIST) Registry of Approved Standards. Unfortunately, this "Standard Guide" is vague in its guidance, incomplete and out of date in its references, and nonstandard in its nomenclature for sampling.

The Standard does not purport to prescribe "specific sampling strategies." 3/ Instead, it instructs “[t]he laboratory ... to develop its own strategies” and “recommend[s] that ... key points be addressed.” There are only two key points. 4/ One is that “[s]tatistically selected units shall be analyzed to meet Practice E2329 if statistical inferences are to be made about the whole population.” 5/ But ASTM E2329 merely describes the kinds of analytical tests that can or should be performed on samples. It reveals nothing about how to draw samples from a population. So far, ASTM E2548 offers no guidance about sampling.

The other “key point” is that “[s]ampling may be statistical or non-statistical.” Although tautological (A is either X or not-X), X is not defined, and an explanatory note intensifies the ambiguity. It states that “[f]or the purpose of this guide, the use of the term statistical is meant to include the notion of an approach that is probability-based.” 6/  Does “probability-based” mean probability sampling (the subject of ASTM E105-10)? At least the latter has a well-defined meaning in sampling theory. 7/ It means that every unit in the sampling frame has a known probability of being drawn.

But even if this is what ASTM E2548-11 Standard Guide means by “probability-based,” the phrase is not congruent with "statistical." The note indicates that even sampling that is not “probability-based” still can be considered "statistical sampling." Later parts of the the Standard allow inferences to populations to be made from "statistical" samples but not from "non-statistical" ones. Using an undefined notion of "statistical" and "non-statistical" as the fundamental organizing principle departs from conventional statistical terminology and reasoning. The usual understanding of sampling differentiates between probability samples -- for which sampling error readily can be quantified -- and other forms of sampling (whether systematic or ad hoc) -- for which statistical analysis depends on the assumption that the sample is the equivalent of a probability sample.

Thus, the statistical literature on sampling commonly explains that
If the probability of selection for each unit is unknown, or cannot be calculated, the sample is called a non-probability sample. Non-probability samples are often less expensive, easier to run and don't require a frame. [¶] However, it is not possible to accurately evaluate the precision (i.e., closeness of estimates under repeated sampling of the same size) of estimates from non-probability samples since there is no control over the representativeness of the sample. 8/
In contrast, because the ASTM Standard does not focus on probability sampling as opposed to other "statistical sampling," the laboratory personnel (or the lawyer) reading the standard never learns that "it is dangerous to make inferences about the target population on the basis of a non-probability sample." 9/

Indeed, Figure 1 of ASTM 2548 introduces further confusion about "statistical sampling." In this figure, a statistical “sampling plan” is either “Hypergeometric,” “Bayesian,” or “Other probability-based.” But the sampling distribution of a statistic is not a “sampling plan” (although it could inform one). A sampling plan should specify the sample size (or a procedure for stopping the sampling if results on the sampled items up to that point make further testing unnecessary). For sampling from a finite population without replacement, the hypergeometric probability distribution applies to sample-size computations and estimates of sampling error. But how does that make the sampling plan hypergeometric? One type of “sampling plan” would be to draw a simple random sample of a size computed to have a good chance of producing a representative sample. Describing a plan for simple random sampling, stratified random sampling, or any other design as “hypergeometric,” “Bayesian,” or “other” is not helpful.

Similarly confusing is the figure’s trichotomy of “non-statistical” into the following “plans”: “Square root N,” “Management Directive,” and “Judicial Requirements.” Using the old √N + 1 rule of thumb for determining sample size may be sub-optimal, 10/ but it is “statistical” -- it uses a statistical computation to establish a sample size. So do any judicial or administrative demands to sample a fixed percentage of the population (an approach that a Standard should deprecate). No matter how one determines the sample size, if probability sampling has been conducted, statistical inferences and estimates have the same meaning.

Also puzzling are the assertions that “[a] population can consist of a single unit,” 11/ and that “numerous sampling plans ... are applicable to single and multiple unit populations.” 12/ If a population consists of “a single unit” (as the term is normally used), 13/ then a laboratory that tests this unit has conducted a census. The study design does not involve sampling, so there can be no sampling error.

When it comes to the issue of reporting quantities such as sampling error, the ASTM Standard is woefully inadequate. The entirety of the discussion is this:
7.1 Inferences based on use of a sampling plan and concomitant analysis shall be documented.

8.1 Sampling information shall be included in reports.
8.1.1 Statistically Selected Sample(s)—Reporting statistical inferences for a population is acceptable when testing is performed on the statistically selected units as stated in 6.1 above [that is, according to a standard that is on the NIST Registry with a disclaimer by NIST]. The language in the report must make it clear to the reader that the results are based on a sampling plan.
8.1.2 Non-Statistically Selected Sample(s)—The language in the report must make it clear to the reader that the results apply to only the tested units. For example, 2 of 100 bags were analyzed and found to contain Cocaine.
These remarks are internally problematic. For example, why would an analyst report the population size, the sample size, and the sample data for “non-statistical” samples but not for “statistical” ones?

More fundamentally, to be helpful to the forensic-science and legal communities, a standard has to consider how the results of the analyses should be presented in a report and in court. Should not the full sampling plan be stated — the mechanism for drawing samples (e.g., blinded, which the ASTM Standard calls “black box” sampling, or selecting from numbered samples by a table of random numbers, which it portrays as not “practical in all cases”); the sample size; and the kind of sampling (simple random, stratified, etc.)? It is not enough merely to state that “the results are based on a sampling plan.”

When probability sampling has been employed, a sound foundation for inferences about population parameters will exist. But how should such inference be undertaken and presented? A Neyman-Pearson confidence interval? With what confidence coefficient? A frequentist test of a hypothesis? Explained how? A Bayesian conclusion such as “There is a probability of 90% that the weight of the cocaine in the shipment seized exceeds X”? The ASTM Standard seems to contemplate statements about “[t]he probability that a given percentage of the population contains the drug of interest or is positive for a given characteristic,” but it does not even mention what goes into computing a Bayesian credible interval or the like. 14/

The OSAC Newsletter proudly states that "[a] standard or guideline that is posted on either Registry demonstrates that the methods it contains have been assessed to be valid by forensic practitioners, academic researchers, measurement scientists, and statisticians through a consensus development process that allows participation and comment from all relevant stakeholders." The experience with ASTM Standards 2548 and 2329 suggests that even before a proposed standard can be approved by a Scientific Area Committee, the OSAC process should provide for a written review of statistical content by a group of statisticians. 15/

Disclosure and disclaimer: I am a member of the OSAC Legal Resource Committee. The information and views presented here do not represent those of, and are not necessarily shared by NIST, OSAC, any unit within these organizations, or any other organization or individuals.

  1. According to the anonymous webpage Apocrypha: The Samuel Johnson Sound Bite Page, the aphorism is "apocryphal because it's not found in his works, letters, or contemporary biographies about Samuel Johnson. But it is similar to something he once said about Mrs. Montague's book on Shakespeare: 'I have indeed, not read it all. But when I take up the end of a web, and find it packthread, I do not expect, by looking further, to find embroidery.'"
  2. See, e.g., David H. Kaye et al., David E. Bernstein & Jennifer L. Mnookin, The New Wigmore: A Treatise on Evidence: Expert Evidence (2d ed. 2011); Hans Zeisel & David H. Kaye, Prove It with Figures: Empirical Methods in Law and Litigation (1997).
  3. See ASTM E2548-11, § 4.1. 
  4. Id., § 4.2.
  5. § 4.2.2.
  6. § 4.2.1 (emphasis added).
  7. E.g., Statistics Canada, Probability Sampling, July 23, 2013:
    Probability sampling involves the selection of a sample from a population, based on the principle of randomization or chance. Probability sampling is more complex, more time-consuming and usually more costly than non-probability sampling. However, because units from the population are randomly selected and each unit's probability of inclusion can be calculated, reliable estimates can be produced along with estimates of the sampling error, and inferences can be made about the population.
  8. National Statistical Service (Australia), Basic Survey Design, (emphasis in original).
  9. Id.
  10. See J. Muralimanohar & K. Jaianan, Determination of Effectiveness of the “Square Root of N Plus One” Rule in Lot Acceptance Sampling Using an Operating Characteristic Curve, Quality Assurance Journal, 14(1-2): 33.37, 2011.
  11. § 5.2.2.
  12. § 5.3.
  13. Laboratory and Scientific Section, United Nations Office on Drugs and Crime, Guidelines on Representative Drug Sampling 3 (2009).
  14. Cf. James M. Curran, An Introduction to Bayesian Credible Intervals for Sampling Error in DNA Profiles, Law, Probability and Risk, 4, 115−126, 2011, doi:10.1093/lpr/mgi009
  15. Of course, no process is perfect, but early statistical review can make technical problems more apparent. Cf. Sam Kean, Whistleblower Lawsuit Puts Spotlight On FDA Technical Reviews, Science, Feb. 2, 2012.

Saturday, March 26, 2016

NIST Distances Itself from the First OSAC-approved Forensic Science Standard

On January 11, 2016, a group created by the federal government to develop better standards for forensic science approved -- without changing a single word of substance -- a standard previously promulgated by a committee of ASTM International (formerly the American Society for Testing Materials). The federally mandated body that showcased this standard is the Organization of Scientific Area Committees for Forensic Science (OSAC). It is supported, administratively and financially, by the Department of Commerce's highly respected National Institute of Standards and Technology (NIST). 1/

The approved standard has the ponderous name of ASTM E2329−14 Standard Practice for Identification of Seized Drugs. To my eye, it looks like an odd choice for the first (and thus far, only) entry in the OSAC Registry of Approved Standards. Why so?

For one thing, this one standard itself seems to approve of nine other ASTM standards -- none of which have been vetted by OSAC. Second, the FSSB approved the standard over objections from two out of the three OSAC "resource committees" -- its Legal Resources Committee and its Human Factors Committee. 2/ Third. the standard permits definitive conclusions based on standardless, subjective assessments of botanical specimens. Fourth, without discussing or citing any studies of error probabilities for the methods involved, the standard states or suggests that false-positive errors will not occur. Thus, an earlier posting tartly contrasted some of the language in the Standard to the admonition in the 2009 report of the National Research Council Committee on Identifying the Needs of the Forensic Science Community that "[a]ll results for every forensic science method should indicate the uncertainty in the measurements that are made, and studies must be conducted that enable the estimation of those values."

On March 17, 2016, more than two months after the NIST-created OSAC adopted this first standard, NIST issued a public statement disavowing the standard as written because "concerns have been raised that some of the language in the standard is not scientifically rigorous." 3/ Like the National Research Council, NIST appreciates that "no measurement, qualitative or quantitative, should be characterized as without the risk of error or uncertainty." 4/ The statement adds that "NIST and the FSSB have independently asked that ASTM review the language."

The FSSB's action is, in a way, quite puzzling. Why would the FSSB want the organization that already wrote and approved the standard that the FSSB reviewed and adopted as a registry-ready, gold standard to "review" the FSSB-approved standard? It is not as if some new scientific research suddenly undermined the standard, requiring it to be revised. And even if new information had surfaced after the FSSB voted, the appropriate response would have been to take it down until the issue could be resolved.

Moreover, why would NIST and the FSSB "independently" ask for ASTM review when the subcommittee that wanted the standard on the registry already had promised to secure revisions through ASTM. The record before the FSSB included the following response to criticisms filed by the Legal Resource Committee:
The Seized Drug subcommittee intends to clarify the quoted language pertaining to uncertainty and error during the next ASTM revision of this document." E2329-14 Seized Drugs Response to LRC Comments FINAL.pdf (277K) SAC Chemistry/Instrument Analysis, Jan. 11, 2016
Does the fact that NIST and the FSSB have added their voices to that of the OSAC Subcommittee on Seized Drugs mean that revisions that clearly should have been made before posting a standard to the repository will occur any sooner? And what can justify leaving a standard that admittedly needs "clarification" on the registry pending the requisite rewriting? Cannot laboratories continue to use the ASTM standard to guide them just as they did before the OSAC registry existed?

Whatever the answers to these questions may be, NIST's reservations about the first OSAC standard, although not spelled out in full, were the subject of questions at a recent meeting of the National Commission on Forensic Science. On March 21, 2016, Commissioner Marc LeBeau asked presenters from NIST whether NIST planned to post statements of agreement as well as disagreement for every future OSAC-approved standard. I cannot locate a transcript or videotape of the meeting, but my recollection is that the answer was essentially "no."

No doubt, NIST hopes that the kerfuffle over ASTM E2329−14 is one off, but the apparent inclination of OSAC subcommittees to try to import unimproved ASTM standards into the registry does not bode well. The latest example is ASTM E2388-11 Standard Guide for Minimum Training Requirements for Forensic Document Examiners. It is up for consideration as an OSAC standard (and for public comment during the next couple of weeks) even though OSAC has no approved standard on what the document examiners are expected to do once they are trained.

Disclosure and disclaimer: I am a member of the OSAC Legal Resource Committee. The information and views presented here do not represent those of, and are not necessarily shared by, NIST, OSAC, any unit within these organizations, or any other organization or individuals.

  1. See OSAC Subcommittee and Scientific Area Committee (SAC) Chairs, ("OSAC is part of an initiative by NIST and the Department of Justice to strengthen forensic science in the United States. The organization is a collaborative body of more than 500 forensic science practitioners and other experts who represent local, state, and federal agencies; academia; and industry. NIST has established OSAC to support the development and promulgation of forensic science consensus documentary standards and guidelines, and to ensure that a sufficient scientific basis exists for each discipline.").
  2. The third resource committee, the Quality Infrastructure Committee (QIC), does not seem to comment on the substance of proposed standards.
  3. NIST Statement on ASTM Standard E2329-14, Mar. 17, 2016,
  4. That said, the NIST statement cautions that "It is important to note that NIST is not contesting results obtained from seized evidence using the standard." Id.

Monday, March 7, 2016

Hot Paint: Another ASTM Standard (E2937-13) that Needs More Work

A second standard for comparing samples of paint under review for the OSAC Registry is ASTM E2937-13, on "Infrared Spectroscopy in Forensic Paint Examinations." It raises several of the issues previously noted in the broader ASTM E1610-14 "Standard Guide for Forensic Paint Analysis and Comparison."

The standard for IR spectroscopy presupposes that the goal is to “to determine whether any significant differences exist between the known and questioned samples,” where a “significant difference” is “a difference between two samples that indicates that the two samples do not have a common origin.” The criminalist then is expected to declare whether “[s]pectra are dissimilar,” “indistinguishable,” or “inconclusive.”

Although categorical judgments have the benefit of simplicity and familiarity, most literature on forensic inference now maintains that analysts should present statements about the weight of the evidence rather than categorical conclusions about source hypotheses. By considering and presenting the degree to which the observations support one hypothesis as compared to another without dictating the conclusion that must be drawn, the analyst supplies the most information. It is not clear whether the standard rejects this view and is intended to preclude experts from using a weight-of-evidence approach to the comparison process.

The categorical approach that the standard adopts is notable for its vagueness. On its face, the definition of “significant difference” permits analysts to declare that differences with almost no discriminating power are so significant that two samples “do not have a common origin.” This lack of guidance arises because any difference that occurs more frequently among two samples with different origins than among two same-source samples “indicates” different origins and hence is “significant.” For example, a difference that arises 1,000 times more often for different-source samples is indicative of difference sources. But so is a difference that arises only 10% more often for different-source samples. Both “indicate” non-association. They differ only in the magnitude of the measure of non-association. The 1,000-times-more-often quantity is a strong indication of non-association, whereas the 10% figure is a weak indication. But in both cases, the differences indicate (to some degree) non-association relative to association.

To avoid this looseness, one might try to read “indicates” as connoting “strongly indicates” or “establishes,” but there is no reason to promulgate an ambiguous standard that requires readers in the fields of forensic science and law to struggle to discern and supply its intended meaning. And, if “establishes” is the intended meaning, then more guidance is needed to help analysts determine, on the basis of objective data about the range of differences seen in same-source and in different-source samples, when a difference is “significant” in the sense of discriminating between the former and the latter types of samples. That is, the standard should supply a validated decision rule; it should present the conditional error probabilities of this decision rule; and it should refer specifically to the studies that have validated it. These features of standards are not absolute requirements for admitting scientific evidence, but they would go far to assuring courts and counsel that the criteria of “known or potential rate of error” and “standards controlling the technique's operation” enumerated in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 594 (1993), militate in favor of admissibility (and persuasiveness of the testimony if a case goes to trial).

Section of the ASTM Standard does not begin to do this. It offers an unbounded “rule of thumb” — “that the positions of corresponding peaks in two or more spectra be within ±5 cm^-1. For sharp absorption peaks one should use tighter constraints. One should critically scrutinize the spectra being compared if corresponding peaks vary by more than 5 cm^-1. Replicate collected spectra may be necessary to determine reproducibility of absorption position.” What is the basis for “critical scrutiny”? How many replicates are necessary? When are they necessary? What is the accuracy of examiners who follow the open-ended “rule of thumb”?

Given the lack of standards for deciding what is “significant,” the definitions of “dissimilar,” “indistinguishable,” and “inconclusive” are indeterminate. They read:
  • 10.7.1 Spectra are dissimilar if they contain one or more significant differences.
  • 10.7.2 Spectra are indistinguishable if they contain no significant differences.
  • 10.7.3 A spectral comparison is inconclusive if sample size or condition precludes a decision as to whether differences are significant.
Inasmuch as any difference can be considered “significant,” the criminalist has no basis in the standard to declare an inclusion, an exclusion, or an inconclusive outcome. This deprives the standard of the legally desirable status under Daubert of “standards controlling the technique's operation.”

Thursday, March 3, 2016

What Is a "Conservative" Method in Forensic Statistics?

Statistical hypothesis testing involves a "null hypothesis" against an "alternative hypothesis." If data are not well outside the range of what would be expected if the null hypothesis is true, then that hypothesis cannot be rejected in favor of the specified alternative. It is usually thought that the more demanding the statistical test, the more "conservative" it is. For example, if a researcher claims to have discovered a new treatment that cures cancer, the null hypothesis is that the new therapy does not work. Sticking with this belief retains the status quo (of not using the novel treatment). In this example, the "conservative" thing to do is to insist on a small p-value (results that have a small probability of arising if the treatment is ineffective) before accepting the alternative.

Does this carry over to forensic science? Is it conservative to retain the null hypothesis unless there is strong evidence against it? Consider the following excerpt from an FBI publication on forensic glass comparisons 1/:
A conservative threshold will differentiate all samples from different sources but may also indicate that a difference exists in specimens that are actually from the same source. A high threshold for differentiation may not be able to differentiate all specimens from sources that are genuinely different but will not differentiate specimens that are actually from the same source.
The "conservative" scientific stance therefore tends to support or preserve the prosecution's case. The state can produce a witness who can testify "conservatively" to finding that the broken window at the crime scene is chemically indistinguishable from the bit of glass removed from the defendant's sweatshirt.

On the other hand, a committee of the National Academic of Sciences that studied forensic DNA testing defined "conservative" in terms of impact on a defendant's claim of innocence 2/:
Conservative—favoring the defendant. A conservative estimate is deliberately chosen to be more favorable to the defendant than the best (unbiased) estimate would be.
Plainly, the FBI document's use of "conservative" is difficult to square with the NAS committee's definition of the word. The FBI document treats the hypothesis that favors the hypothesis supporting the prosecution's case as the status quo that should be retained unless there is strong evidence to the contrary.

This use of the prosecution's hypothesis that the broken window is the source of the incriminating fragment as the null hypothesis is not necessarily wrong, but it engenders confusion. The confusion can be dispelled if the presentation of the findings includes a statement of how rare or common "indistinguishable" windows are in a relevant population. Evaluating the data about the glass thus would have two steps. In step 1, the data are classified as"indistinguishable" (or not). If the samples are indistinguishable, then a random match probability is provided to indicate its probative value with respect to the hypothesis that the glass originated from the broken window.

Of course, if one could articulate the probability of the data given the hypothesis that the source is broken window versus the probability that the glass associated with the defendant had a different origin, this two-step process would not be needed. The expert could present these probabilities.

  1. Maureen C. Bottrell, Forensic Glass Comparison: Background Information Used in Data Interpretation, 11 Forensic Sci. Communications No. 2 (2009)
  2. National Research Council, Committee on The Evaluation of Forensic DNA Evidence: An Update, The Evaluation of Forensic DNA Evidence 215 (1996)