Saturday, April 16, 2016

The Department of Justice's Plan for a Forensic Science Discipline Review

On March 21, the Department of Justice announced to the National Commission on Forensic Science that it will be
expanding its review of forensic testimony by the FBI Laboratory beyond hair matching to widely used techniques such as fingerprint examinations and bullet-tracing. Officials also said that if the initial review finds systemic problems in a forensic discipline, expert testimony could be reviewed from laboratories beyond the FBI that do analysis for DOJ. 1/
The head of the Department's Office of Legal Policy welcomed input from the Commission on the following topics:
  • How to prioritize disciplines
  • Scope of time period
  • Sampling particular types of cases
  • Consideration of inaccuracies
  • Levels of review
  • Legal and/or forensic reviewers
  • External review processes
  • Ensuring community feedback on methodology
  • Duty/process to inform parties 2/
On April 8, the Department quietly posted a Notice of Public Comment Period on the Presentation of the Forensic Science Discipline Review Framework in the Federal Register. The public comment period ends on May 9.

According to an earlier statement of Deputy Attorney General Sally Yates, the review is intended to "advance the practice of forensic science by ensuring DOJ forensic examiners have testified as appropriate in legal proceedings." Obviously, the criteria for identifying what is and is not "appropriate" will be critical. For example, which of the following examples of testimony about glass fragments (or paraphrases of the testimony) would be deemed inappropriate?
  • "In my opinion the refractory indices of the two glasses are consistent and they could have common origin." Varner v. State, 420 So.2d 841 (Ala. Ct. Crim. App. 1982). 
  • "Test comparisons of the glass removed from the bullet and that found in the pane on the back door, through which the unaccounted-for bullet had passed, revealed that all of their physical properties matched, with no measurable discrepancies. Based upon F.B.I. statistical information, it was determined that only 3.8 out of 100 samples could have the same physical properties, based upon the refractive index test alone, which was performed." Johnson v. State, 521 So.2d 1006 (Ala. Ct. Crim. App. 1986).
  • "Bradley was able to opine, to a reasonable degree of scientific certainty, that the glass standard and the third fragment had a 'good probability of common origin.'" People v. Smith, 968 N.E.2d 1271 (Ill. App. Ct. 2012).
  • "Blair Schultz, an Illinois State Police forensic chemist, compared a piece of standard laminated glass from defendant's windshield to a piece of glass from Pranaitis' clothing. He found them to have the same refractive index, which means that the two pieces of glass could have originated from the same source. The likelihood of this match was one in five, meaning that one out of every five pieces of laminated glass would have the same refractive index." People v. Digirolamo, 688 N.E.2d 116 (Ill. 1997).
  • "[O]ne of the glass fragments found in appellant's car was of common origin with glass from the victim's broken garage window. The prosecutor asked her if she were to break one hundred windows at random in Allen County, what would be the percentage of matching specimens she would expect. Over appellant's objection, she answered that if one hundred windows were broken, six of the windows would have the properties she mentioned." Hicks v. State, 544 N.E.2d 500, 504 (Ind. 1989).
Hopefully, the Department will learn from the FBI/DOJ Microscopic Hair Comparison Analysis Review 3/ and
(1) make public the detailed criteria is employs,
(2) use a system with measured reliability for applying these criteria, and
(3) make the transcripts or other materials under review readily available to the public.
Notes
  1. Spencer S. Hsu, Justice Department Frames Expanded Review of FBI Forensic Testimony, Wash. Post, Mar. 21, 2016. 
  2. Office of Legal Policy, U.S. Department of Justice, Presentation of the Forensic Science Discipline Framework to the National Commission on Forensic Science, Mar. 21, 2016
  3. See Ultracrepidarianism in Forensic Science: The Hair Evidence Debacle, Washington & Lee Law Review (online), Vol. 72, No. 2, pp. 227-254, September 2015
Related Posting
"Stress Tests" by the Department of Justice and the FBI's "Approved Scientific Standards for Testimony and Reports", Feb. 25, 2016

Saturday, April 2, 2016

Sample Evidence: What’s Wrong with ASTM E2548-11 Standard Guide for Sampling Seized Drugs?

Samuel Johnson once observed that “You don't have to eat the whole ox to know that it is tough.” Or maybe he didn't say this, 1/ but the idea applies to many endeavors. One of them is testing of seized drugs. The law needs to—and generally does—recognize the value of surveys and samples in drug and many other kinds of cases. 2/ If the quantity of seized drugs is large, it is impractical and typically unnecessary to test every bit of the materials. Clear guidance on how to select samples from the population of seized matter would be helpful to courts and laboratories alike.

To accomplish this goal, the Chemistry and Instrumental Analysis Subject Area Committee of the Organization of Scientific Area Committees for Forensic Science (OSAC) has recommended the addition of ASTM International’s Standard Guide for Sampling Seized Drugs for Qualitative and Quantitative Analysis (known as ASTM E2548-11) to the National Institute of Standards and Technology (NIST) Registry of Approved Standards. Unfortunately, this "Standard Guide" is vague in its guidance, incomplete and out of date in its references, and nonstandard in its nomenclature for sampling.

The Standard does not purport to prescribe "specific sampling strategies." 3/ Instead, it instructs “[t]he laboratory ... to develop its own strategies” and “recommend[s] that ... key points be addressed.” There are only two key points. 4/ One is that “[s]tatistically selected units shall be analyzed to meet Practice E2329 if statistical inferences are to be made about the whole population.” 5/ But ASTM E2329 merely describes the kinds of analytical tests that can or should be performed on samples. It reveals nothing about how to draw samples from a population. So far, ASTM E2548 offers no guidance about sampling.

The other “key point” is that “[s]ampling may be statistical or non-statistical.” Although tautological (A is either X or not-X), X is not defined, and an explanatory note intensifies the ambiguity. It states that “[f]or the purpose of this guide, the use of the term statistical is meant to include the notion of an approach that is probability-based.” 6/  Does “probability-based” mean probability sampling (the subject of ASTM E105-10)? At least the latter has a well-defined meaning in sampling theory. 7/ It means that every unit in the sampling frame has a known probability of being drawn.

But even if this is what ASTM E2548-11 Standard Guide means by “probability-based,” the phrase is not congruent with "statistical." The note indicates that even sampling that is not “probability-based” still can be considered "statistical sampling." Later parts of the the Standard allow inferences to populations to be made from "statistical" samples but not from "non-statistical" ones. Using an undefined notion of "statistical" and "non-statistical" as the fundamental organizing principle departs from conventional statistical terminology and reasoning. The usual understanding of sampling differentiates between probability samples -- for which sampling error readily can be quantified -- and other forms of sampling (whether systematic or ad hoc) -- for which statistical analysis depends on the assumption that the sample is the equivalent of a probability sample.

Thus, the statistical literature on sampling commonly explains that
If the probability of selection for each unit is unknown, or cannot be calculated, the sample is called a non-probability sample. Non-probability samples are often less expensive, easier to run and don't require a frame. [¶] However, it is not possible to accurately evaluate the precision (i.e., closeness of estimates under repeated sampling of the same size) of estimates from non-probability samples since there is no control over the representativeness of the sample. 8/
In contrast, because the ASTM Standard does not focus on probability sampling as opposed to other "statistical sampling," the laboratory personnel (or the lawyer) reading the standard never learns that "it is dangerous to make inferences about the target population on the basis of a non-probability sample." 9/

Indeed, Figure 1 of ASTM 2548 introduces further confusion about "statistical sampling." In this figure, a statistical “sampling plan” is either “Hypergeometric,” “Bayesian,” or “Other probability-based.” But the sampling distribution of a statistic is not a “sampling plan” (although it could inform one). A sampling plan should specify the sample size (or a procedure for stopping the sampling if results on the sampled items up to that point make further testing unnecessary). For sampling from a finite population without replacement, the hypergeometric probability distribution applies to sample-size computations and estimates of sampling error. But how does that make the sampling plan hypergeometric? One type of “sampling plan” would be to draw a simple random sample of a size computed to have a good chance of producing a representative sample. Describing a plan for simple random sampling, stratified random sampling, or any other design as “hypergeometric,” “Bayesian,” or “other” is not helpful.

Similarly confusing is the figure’s trichotomy of “non-statistical” into the following “plans”: “Square root N,” “Management Directive,” and “Judicial Requirements.” Using the old √N + 1 rule of thumb for determining sample size may be sub-optimal, 10/ but it is “statistical” -- it uses a statistical computation to establish a sample size. So do any judicial or administrative demands to sample a fixed percentage of the population (an approach that a Standard should deprecate). No matter how one determines the sample size, if probability sampling has been conducted, statistical inferences and estimates have the same meaning.

Also puzzling are the assertions that “[a] population can consist of a single unit,” 11/ and that “numerous sampling plans ... are applicable to single and multiple unit populations.” 12/ If a population consists of “a single unit” (as the term is normally used), 13/ then a laboratory that tests this unit has conducted a census. The study design does not involve sampling, so there can be no sampling error.

When it comes to the issue of reporting quantities such as sampling error, the ASTM Standard is woefully inadequate. The entirety of the discussion is this:
7.1 Inferences based on use of a sampling plan and concomitant analysis shall be documented.

8.1 Sampling information shall be included in reports.
8.1.1 Statistically Selected Sample(s)—Reporting statistical inferences for a population is acceptable when testing is performed on the statistically selected units as stated in 6.1 above [that is, according to a standard that is on the NIST Registry with a disclaimer by NIST]. The language in the report must make it clear to the reader that the results are based on a sampling plan.
8.1.2 Non-Statistically Selected Sample(s)—The language in the report must make it clear to the reader that the results apply to only the tested units. For example, 2 of 100 bags were analyzed and found to contain Cocaine.
These remarks are internally problematic. For example, why would an analyst report the population size, the sample size, and the sample data for “non-statistical” samples but not for “statistical” ones?

More fundamentally, to be helpful to the forensic-science and legal communities, a standard has to consider how the results of the analyses should be presented in a report and in court. Should not the full sampling plan be stated — the mechanism for drawing samples (e.g., blinded, which the ASTM Standard calls “black box” sampling, or selecting from numbered samples by a table of random numbers, which it portrays as not “practical in all cases”); the sample size; and the kind of sampling (simple random, stratified, etc.)? It is not enough merely to state that “the results are based on a sampling plan.”

When probability sampling has been employed, a sound foundation for inferences about population parameters will exist. But how should such inference be undertaken and presented? A Neuman-Pearson confidence interval? With what confidence coefficient? A frequentist test of a hypothesis? Explained how? A Bayesian conclusion such as “There is a probability of 90% that the weight of the cocaine in the shipment seized exceeds X”? The ASTM Standard seems to contemplate statements about “[t]he probability that a given percentage of the population contains the drug of interest or is positive for a given characteristic,” but it does not even mention what goes into computing a Bayesian credible interval or the like. 14/

The OSAC Newsletter proudly states that "[a] standard or guideline that is posted on either Registry demonstrates that the methods it contains have been assessed to be valid by forensic practitioners, academic researchers, measurement scientists, and statisticians through a consensus development process that allows participation and comment from all relevant stakeholders." The experience with ASTM Standards 2548 and 2329 suggests that even before a proposed standard can be approved by a Scientific Area Committee, the OSAC process should provide for a written review of statistical content by a group of statisticians. 15/

Disclosure and disclaimer: I am a member of the OSAC Legal Resource Committee. The information and views presented here do not represent those of, and are not necessarily shared by NIST, OSAC, any unit within these organizations, or any other organization or individuals.

Notes
  1. According to the anonymous webpage Apocrypha: The Samuel Johnson Sound Bite Page, the aphorism is "apocryphal because it's not found in his works, letters, or contemporary biographies about Samuel Johnson. But it is similar to something he once said about Mrs. Montague's book on Shakespeare: 'I have indeed, not read it all. But when I take up the end of a web, and find it packthread, I do not expect, by looking further, to find embroidery.'"
  2. See, e.g., David H. Kaye et al., David E. Bernstein & Jennifer L. Mnookin, The New Wigmore: A Treatise on Evidence: Expert Evidence (2d ed. 2011); Hans Zeisel & David H. Kaye, Prove It with Figures: Empirical Methods in Law and Litigation (1997).
  3. See ASTM E2548-11, § 4.1. 
  4. Id., § 4.2.
  5. § 4.2.2.
  6. § 4.2.1 (emphasis added).
  7. E.g., Statistics Canada, Probability Sampling, July 23, 2013:
    Probability sampling involves the selection of a sample from a population, based on the principle of randomization or chance. Probability sampling is more complex, more time-consuming and usually more costly than non-probability sampling. However, because units from the population are randomly selected and each unit's probability of inclusion can be calculated, reliable estimates can be produced along with estimates of the sampling error, and inferences can be made about the population.
  8. National Statistical Service (Australia), Basic Survey Design, http://www.nss.gov.au/nss/home.nsf/SurveyDesignDoc/B0D9A40C6B27487BCA2571AB002479FE?OpenDocument (emphasis in original).
  9. Id.
  10. See J. Muralimanohar & K. Jaianan, Determination of Effectiveness of the “Square Root of N Plus One” Rule in Lot Acceptance Sampling Using an Operating Characteristic Curve, Quality Assurance Journal, 14(1-2): 33.37, 2011.
  11. § 5.2.2.
  12. § 5.3.
  13. Laboratory and Scientific Section, United Nations Office on Drugs and Crime, Guidelines on Representative Drug Sampling 3 (2009).
  14. Cf. James M. Curran, An Introduction to Bayesian Credible Intervals for Sampling Error in DNA Profiles, Law, Probability and Risk, 4, 115−126, 2011, doi:10.1093/lpr/mgi009
  15. Of course, no process is perfect, but early statistical review can make technical problems more apparent. Cf. Sam Kean, Whistleblower Lawsuit Puts Spotlight On FDA Technical Reviews, Science, Feb. 2, 2012.

Saturday, March 26, 2016

NIST Distances Itself from the First OSAC-approved Forensic Science Standard

On January 11, 2016, a group created by the federal government to develop better standards for forensic science approved -- without changing a single word of substance -- a standard previously promulgated by a committee of ASTM International (formerly the American Society for Testing Materials). The federally mandated body that showcased this standard is the Organization of Scientific Area Committees for Forensic Science (OSAC). It is supported, administratively and financially, by the Department of Commerce's highly respected National Institute of Standards and Technology (NIST). 1/

The approved standard has the ponderous name of ASTM E2329−14 Standard Practice for Identification of Seized Drugs. To my eye, it looks like an odd choice for the first (and thus far, only) entry in the OSAC Registry of Approved Standards. Why so?

For one thing, this one standard itself seems to approve of nine other ASTM standards -- none of which have been vetted by OSAC. Second, the FSSB approved the standard over objections from two out of the three OSAC "resource committees" -- its Legal Resources Committee and its Human Factors Committee. 2/ Third. the standard permits definitive conclusions based on standardless, subjective assessments of botanical specimens. Fourth, without discussing or citing any studies of error probabilities for the methods involved, the standard states or suggests that false-positive errors will not occur. Thus, an earlier posting tartly contrasted some of the language in the Standard to the admonition in the 2009 report of the National Research Council Committee on Identifying the Needs of the Forensic Science Community that "[a]ll results for every forensic science method should indicate the uncertainty in the measurements that are made, and studies must be conducted that enable the estimation of those values."

On March 17, 2016, more than two months after the NIST-created OSAC adopted this first standard, NIST issued a public statement disavowing the standard as written because "concerns have been raised that some of the language in the standard is not scientifically rigorous." 3/ Like the National Research Council, NIST appreciates that "no measurement, qualitative or quantitative, should be characterized as without the risk of error or uncertainty." 4/ The statement adds that "NIST and the FSSB have independently asked that ASTM review the language."

The FSSB's action is, in a way, quite puzzling. Why would the FSSB want the organization that already wrote and approved the standard that the FSSB reviewed and adopted as a registry-ready, gold standard to "review" the FSSB-approved standard? It is not as if some new scientific research suddenly undermined the standard, requiring it to be revised. And even if new information had surfaced after the FSSB voted, the appropriate response would have been to take it down until the issue could be resolved.

Moreover, why would NIST and the FSSB "independently" ask for ASTM review when the subcommittee that wanted the standard on the registry already had promised to secure revisions through ASTM. The record before the FSSB included the following response to criticisms filed by the Legal Resource Committee:
The Seized Drug subcommittee intends to clarify the quoted language pertaining to uncertainty and error during the next ASTM revision of this document." E2329-14 Seized Drugs Response to LRC Comments FINAL.pdf (277K) SAC Chemistry/Instrument Analysis, Jan. 11, 2016
Does the fact that NIST and the FSSB have added their voices to that of the OSAC Subcommittee on Seized Drugs mean that revisions that clearly should have been made before posting a standard to the repository will occur any sooner? And what can justify leaving a standard that admittedly needs "clarification" on the registry pending the requisite rewriting? Cannot laboratories continue to use the ASTM standard to guide them just as they did before the OSAC registry existed?

Whatever the answers to these questions may be, NIST's reservations about the first OSAC standard, although not spelled out in full, were the subject of questions at a recent meeting of the National Commission on Forensic Science. On March 21, 2016, Commissioner Marc LeBeau asked presenters from NIST whether NIST planned to post statements of agreement as well as disagreement for every future OSAC-approved standard. I cannot locate a transcript or videotape of the meeting, but my recollection is that the answer was essentially "no."

No doubt, NIST hopes that the kerfuffle over ASTM E2329−14 is one off, but the apparent inclination of OSAC subcommittees to try to import unimproved ASTM standards into the registry does not bode well. The latest example is ASTM E2388-11 Standard Guide for Minimum Training Requirements for Forensic Document Examiners. It is up for consideration as an OSAC standard (and for public comment during the next couple of weeks) even though OSAC has no approved standard on what the document examiners are expected to do once they are trained.

Disclosure and disclaimer: I am a member of the OSAC Legal Resource Committee. The information and views presented here do not represent those of, and are not necessarily shared by, NIST, OSAC, any unit within these organizations, or any other organization or individuals.

Notes
  1. See OSAC Subcommittee and Scientific Area Committee (SAC) Chairs, https://rticqpub1.connectsolutions.com/content/connect/c1/7/en/events/event/shared/1187757659/speaker_info.html?sco-id=1187765255 ("OSAC is part of an initiative by NIST and the Department of Justice to strengthen forensic science in the United States. The organization is a collaborative body of more than 500 forensic science practitioners and other experts who represent local, state, and federal agencies; academia; and industry. NIST has established OSAC to support the development and promulgation of forensic science consensus documentary standards and guidelines, and to ensure that a sufficient scientific basis exists for each discipline.").
  2. The third resource committee, the Quality Infrastructure Committee (QIC), does not seem to comment on the substance of proposed standards.
  3. NIST Statement on ASTM Standard E2329-14, Mar. 17, 2016, http://www.nist.gov/forensics/nist-statement-on-astm-e2329-14.cfm
  4. That said, the NIST statement cautions that "It is important to note that NIST is not contesting results obtained from seized evidence using the standard." Id.

Monday, March 7, 2016

Hot Paint: Another ASTM Standard (E2937-13) that Needs More Work

A second standard for comparing samples of paint under review for the OSAC Registry is ASTM E2937-13, on "Infrared Spectroscopy in Forensic Paint Examinations." It raises several of the issues previously noted in the broader ASTM E1610-14 "Standard Guide for Forensic Paint Analysis and Comparison."

The standard for IR spectroscopy presupposes that the goal is to “to determine whether any significant differences exist between the known and questioned samples,” where a “significant difference” is “a difference between two samples that indicates that the two samples do not have a common origin.” The criminalist then is expected to declare whether “[s]pectra are dissimilar,” “indistinguishable,” or “inconclusive.”

Although categorical judgments have the benefit of simplicity and familiarity, most literature on forensic inference now maintains that analysts should present statements about the weight of the evidence rather than categorical conclusions about source hypotheses. By considering and presenting the degree to which the observations support one hypothesis as compared to another without dictating the conclusion that must be drawn, the analyst supplies the most information. It is not clear whether the standard rejects this view and is intended to preclude experts from using a weight-of-evidence approach to the comparison process.

The categorical approach that the standard adopts is notable for its vagueness. On its face, the definition of “significant difference” permits analysts to declare that differences with almost no discriminating power are so significant that two samples “do not have a common origin.” This lack of guidance arises because any difference that occurs more frequently among two samples with different origins than among two same-source samples “indicates” different origins and hence is “significant.” For example, a difference that arises 1,000 times more often for different-source samples is indicative of difference sources. But so is a difference that arises only 10% more often for different-source samples. Both “indicate” non-association. They differ only in the magnitude of the measure of non-association. The 1,000-times-more-often quantity is a strong indication of non-association, whereas the 10% figure is a weak indication. But in both cases, the differences indicate (to some degree) non-association relative to association.

To avoid this looseness, one might try to read “indicates” as connoting “strongly indicates” or “establishes,” but there is no reason to promulgate an ambiguous standard that requires readers in the fields of forensic science and law to struggle to discern and supply its intended meaning. And, if “establishes” is the intended meaning, then more guidance is needed to help analysts determine, on the basis of objective data about the range of differences seen in same-source and in different-source samples, when a difference is “significant” in the sense of discriminating between the former and the latter types of samples. That is, the standard should supply a validated decision rule; it should present the conditional error probabilities of this decision rule; and it should refer specifically to the studies that have validated it. These features of standards are not absolute requirements for admitting scientific evidence, but they would go far to assuring courts and counsel that the criteria of “known or potential rate of error” and “standards controlling the technique's operation” enumerated in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 594 (1993), militate in favor of admissibility (and persuasiveness of the testimony if a case goes to trial).

Section 10.6.1.1 of the ASTM Standard does not begin to do this. It offers an unbounded “rule of thumb” — “that the positions of corresponding peaks in two or more spectra be within ±5 cm^-1. For sharp absorption peaks one should use tighter constraints. One should critically scrutinize the spectra being compared if corresponding peaks vary by more than 5 cm^-1. Replicate collected spectra may be necessary to determine reproducibility of absorption position.” What is the basis for “critical scrutiny”? How many replicates are necessary? When are they necessary? What is the accuracy of examiners who follow the open-ended “rule of thumb”?

Given the lack of standards for deciding what is “significant,” the definitions of “dissimilar,” “indistinguishable,” and “inconclusive” are indeterminate. They read:
  • 10.7.1 Spectra are dissimilar if they contain one or more significant differences.
  • 10.7.2 Spectra are indistinguishable if they contain no significant differences.
  • 10.7.3 A spectral comparison is inconclusive if sample size or condition precludes a decision as to whether differences are significant.
Inasmuch as any difference can be considered “significant,” the criminalist has no basis in the standard to declare an inclusion, an exclusion, or an inconclusive outcome. This deprives the standard of the legally desirable status under Daubert of “standards controlling the technique's operation.”

Thursday, March 3, 2016

What Is a "Conservative" Method in Forensic Statistics?

Statistical hypothesis testing involves a "null hypothesis" against an "alternative hypothesis." If data are not well outside the range of what would be expected if the null hypothesis is true, then that hypothesis cannot be rejected in favor of the specified alternative. It is usually thought that the more demanding the statistical test, the more "conservative" it is. For example, if a researcher claims to have discovered a new treatment that cures cancer, the null hypothesis is that the new therapy does not work. Sticking with this belief retains the status quo (of not using the novel treatment). In this example, the "conservative" thing to do is to insist on a small p-value (results that have a small probability of arising if the treatment is ineffective) before accepting the alternative.

Does this carry over to forensic science? Is it conservative to retain the null hypothesis unless there is strong evidence against it? Consider the following excerpt from an FBI publication on forensic glass comparisons 1/:
A conservative threshold will differentiate all samples from different sources but may also indicate that a difference exists in specimens that are actually from the same source. A high threshold for differentiation may not be able to differentiate all specimens from sources that are genuinely different but will not differentiate specimens that are actually from the same source.
The "conservative" scientific stance therefore tends to support or preserve the prosecution's case. The state can produce a witness who can testify "conservatively" to finding that the broken window at the crime scene is chemically indistinguishable from the bit of glass removed from the defendant's sweatshirt.

On the other hand, a committee of the National Academic of Sciences that studied forensic DNA testing defined "conservative" in terms of impact on a defendant's claim of innocence 2/:
Conservative—favoring the defendant. A conservative estimate is deliberately chosen to be more favorable to the defendant than the best (unbiased) estimate would be.
Plainly, the FBI document's use of "conservative" is difficult to square with the NAS committee's definition of the word. The FBI document treats the hypothesis that favors the hypothesis supporting the prosecution's case as the status quo that should be retained unless there is strong evidence to the contrary.

This use of the prosecution's hypothesis that the broken window is the source of the incriminating fragment as the null hypothesis is not necessarily wrong, but it engenders confusion. The confusion can be dispelled if the presentation of the findings includes a statement of how rare or common "indistinguishable" windows are in a relevant population. Evaluating the data about the glass thus would have two steps. In step 1, the data are classified as"indistinguishable" (or not). If the samples are indistinguishable, then a random match probability is provided to indicate its probative value with respect to the hypothesis that the glass originated from the broken window.

Of course, if one could articulate the probability of the data given the hypothesis that the source is broken window versus the probability that the glass associated with the defendant had a different origin, this two-step process would not be needed. The expert could present these probabilities.

References
  1. Maureen C. Bottrell, Forensic Glass Comparison: Background Information Used in Data Interpretation, 11 Forensic Sci. Communications No. 2 (2009)
  2. National Research Council, Committee on The Evaluation of Forensic DNA Evidence: An Update, The Evaluation of Forensic DNA Evidence 215 (1996)

Tuesday, March 1, 2016

"Reasonable Scientific Certainty," the NCFS, the Law of the Courtroom," and that Pesky Passive Voice

In a posting last week about a proposed National Commission on Forensic Science recommendation for the Attorney General to take the position that an expert witness "is not required" to utter words like "reasonable scientific certainty" as a condition for the admission of the testimony, I discussed a set of cases cited in a public comment from a Commissioner. The author of the public comment wrote me that I was mistaken in at least one respect. He does not maintain that "Recommendation #1 would require the Department of Justice to argue for overturning existing law that 'seem[s] to require' these phrases in some forensic-science identification fields" and that attributing this view to him
takes my comment out of context and mischaracterizes it. My comments relating to the cited federal district court opinions stated that 'a number of federal judges apparently endorse -- and some still seem to require -- the use of these phrases in their courtrooms ... . I didn't argue that 'existing law' 'seems to require' these phrases in some forensic-science identification fields.'  Instead, I said that 'a number of federal judges ... seem to require' the use of these phrases. I think that it's fair to say that 'existing law' on this topic and what a given federal district court judge may believe existing law to be, are not necessarily the same thing. (In my experience they can be far from the same thing). Case law does not always (and often does not) translate into the law of the courtroom -- the way trial judges interpret and apply federal rules and case law in trial practice. That was the point I was attempting to make -- not that 'Recommendation #1 would require the Department of Justice to argue for overturning existing law.' That is clearly not the case. However, DOJ attorneys may nevertheless be required to utter those 'magic words' in a given courtroom and in a given case.
It certainly is true that some judges will want the proponents of the evidence to swear to its "reasonable scientific certainty." And, unless a higher court has explicitly ruled that this phraseology should not be used  -- as some have and as is appropriate for the reasons stated in a separate views document that the Commission is developing -- the trial judge may believe that he or she has the legal prerogative to make it "the law of the courtroom." The Commission recommendation recognizes as much, for it explicitly allows the prosecutor to put on such testimony when directed to do so by the judge in the case.

But does this judge's edict really have the status of "law"? Under the crude,legal realist perspective that the law is whatever the court says it is, I suppose one would have to call it "law." However, it seems clearer to call it a judicial practice that is ripe for change (without having to amend any rules of evidence or change any binding caselaw). To encourage this change, and to the extent that the passive voice in Recommendation 1(b) introduces ambiguity, it would be better for the Commission simply to say that "The Attorney General should direct all attorneys appearing on behalf of the Department of Justice ... (b) to assert the legal position that such terminology should not be required."

Friday, February 26, 2016

Is "Reasonable Scientific Certainty" Unreasonable?

Next month, the National Commission on Forensic Science is expected to vote on a proposal to make three recommendations about the testimonial use of phrases such as "to a reasonable degree of scientific certainty" and "to a reasonable degree of [discipline] certainty":
Recommendation #1: The Attorney General should direct all attorneys appearing on behalf of the Department of Justice (a) to forego use of these phrases when presenting forensic discipline testimony unless directly required by judicial authority as a condition of admissibility for the witness’ opinion or conclusion, and (b) to assert the legal position that such terminology is not required and is indeed misleading.

Recommendation #2: The Attorney General should direct all forensic science service providers and forensic science medical providers employed by Department of Justice not to use such language in reports or couch their testimony in such terms unless directed to do so by judicial authority.

Recommendation #3: The Attorney General should, in collaboration with NIST, direct the OSACs to develop appropriate language that may be used by experts when reporting or testifying about results or findings based on observations of evidence and data derived from evidence.
Most of the public comments have been supportive, 1/ but three days ago, one commissioner submitted a comment arguing that Recommendation #1 would require the Department of Justice to argue for overturning existing law that “seem[s] to require” these phrases in some forensic-science identification fields and that Recommendation #3 asks the Attorney General to take action that exceeds her authority.

[Added 3/1/16: At least, this is what I thought the comment was driving at, but, as explained in a follow-up posting, I was mistaken. Nevertheless, I think the analysis of this point is worth leaving up for general viewing, since it addresses a question that might be raised about the proposal.]

The second point is well taken—the Attorney General has no power to “direct” NIST or the OSAC to act, and NIST supports but does not direct the OSAC structure. However, the notion that any federal district court is legally compelled to condition the admission of expert testimony on an obscure phrase like “reasonable scientific certainty” seems farfetched. Below are excerpts from a comment that I filed with the Commission today explaining my thinking (with minor alterations):

Previous drafts of the final document before the Commission included references to the case law and literature supporting the subcommittee’s view that these recommendations are compatible with the existing law of evidence — that the law does not require experts to use these particular (and problematic) phrases, even though some judges and lawyers expect and even prefer to hear them. 2/ The comments that follow do not try to restate the previous legal analysis or to summarize the legal literature. They respond to the analysis in the Feb. 23 Comment. ...

Nothing in the Comment establishes that, when presented with the relevant legal authority and analysis, any court would find it difficult to accept the position the Department is being asked to take. The cases cited in the Comment do not contradict the proposed position. 3/ Not one of these cases considered whether the testifying expert must testify to “a reasonable degree of [discipline] certainty” as opposed to offering an opinion that the markings on a gun or fingerprint offer strong support for the source conclusion (or some similar less-than-absolutely-certain testimony). In most of them, the defense sought to exclude the source-attribution testimony entirely, on the Daubert ground that science and statistics did not support source attributions to one and only one possible source. The trial judges in these cases agreed that absolute, categorical claims of identity were too extreme. Those assertions are the kind of overclaiming that, Deputy Attorney General Yates announced two days ago, the Department of Justice is seeking to avoid.

As an alternative to scientifically indefensible or overstated claims, the trial judges in the cited cases set an upper bound on the certainty that the expert may express — “reasonable certainty” of one kind or another. Other federal trial judges have set other upper bounds. E.g., United States v. Glynn, 578 F.Supp.2d 567 (S.D.N.Y. 2008) (“the ballistics opinions ... may be stated in terms of ‘more likely than not,’ but nothing more”). No court has dictated one formulaic expression to the exclusion of all other ways to solve the problem of expert and prosecutorial exaggeration. 4/ In every one of the cases cutting back on overclaiming, for the government’s experts to have presented less categorically certain phrasing in these cases would not have violated the pretrial orders, and the government easily could have requested somewhat different phrasing as long as it did not amount to the kind overclaiming that the orders were issued to protect against.

United States v. Cazares, 788 F.3d 956 (9th Cir. 2015), the only appellate case that the Comment perceives as demonstrating that “it is an overstatement to categorically claim that the phrase ‘to a reasonable degree of [discipline] certainty’ ‘is not required’” clearly does not demand the use of this phrase instead of more transparent alternatives. No such alternatives were before the Ninth Circuit. The firearms examiner did not use the phrase “reasonable ballistic certainty,” but instead claimed total  “scientific certainty.” Id. at 988. The Assistant U.S. Attorney did the same. Id. The panel excused this testimony and prosecutorial exaggeration as harmless error. 5/ It cited to the cases noted in the Comment only to show that less-than-absolute testimony of firearms identification had been held to satisfy the requirements of Daubert. In an obvious dictum, the court of appeals referred to “reasonable ballistic certainty” as “the proper expert characterization of toolmark identification”—not to prescribe these words as the only permissible mode of expressing conclusions across the realm of forensic identification, but only to make the point that, given the expert’s acknowledgment of subjectivity in her analysis and her concession that “[t]here is no absolute certainty in science,” id. at 988, “[a]ny error in this case from the ‘scientific certainty’ characterization was harmless.” Id. at 990.

Moreover, the nature of the disagreement with the observation that “use of the [reasonable degree of scientific or discipline-specific certainty] phrase is not required by law and is primarily a relic of custom and practice” is difficult to fathom. The Comment agrees that “the use of this phrase is not required by the Federal Rules of Evidence.” This is every bit as true in the Ninth Circuit as the others judicial circuits. What, then, is the basis of the claim that a court is “perhaps” required to insist that an expert use the phrase? The Constitution can override the rules of evidence, but no one can seriously claim that the Constitution conditions expert scientific testimony on a particular form of words — and a potentially misleading mixture of words at that.

In sum, there are courts that find comfort in phrases like "reasonable scientific certainty," and a few courts have fallen back on variants such as "reasonable ballistic certainty" as a response to arguments that identification methods cannot ensure that an association between an object or person and a trace is 100% certain. But it seems fair to say that "such terminology is not required " -- at least not by any existing rule of law.

Notes
  1. E.g., Erin Murphy & Andrea Roth, Public Comment on NCFS Recommendation Re: Reasonable Degree of Scientific Certainty, Feb. 23, 2016, http://www.regulations.gov/#!documentDetail;D=DOJ-LA-2016-0001-0011

  2. These have been moved to a separate "views" document available through a link at https://www.justice.gov/ncfs/reporting-and-testimony. The recommended position is supported not only by the opinions of appellate courts across the country, but also the writings of federal judges, the drafters of the Federal Rules of Evidence, and the authors of the three leading legal treatises on scientific evidence.

  3. If they did, that would be a reason for the Department to advance a position to harmonize a conflict among the U.S. courts of appeals.

  4. For example, in one case cited in the Comment, United States v. Monteiro, 470 F. Supp. 2d 351 (D. Mass. 2006), the trial judge actually granted the defendant’s motion to exclude firearms testimony (unless the government supplemented the record with information establishing compliance with professional standards). The court then presented “reasonable degree of ballistic certainty” testimony as an acceptable way for the expert may testify,” but the court’s concern was plainly that “the expert may not testify that there is a match to an exact statistical certainty.” Id. at 375.

    Similarly, in United States v. Ashburn, 88 F. Supp. 3d 239 (E.D.N.Y. 2015), the  court’s concern was testimony “that he is ‘certain’ or ‘100%’ sure of his conclusions that two items match, that a match is to ‘the exclusion of all other firearms in the world,’ or that there is a ‘practical impossibility’ that any other gun could have fired the recovered materials.” Id. at 250. The trial judge settled on “reasonable ballistic certainty” as an acceptable alternative, but not necessarily an exclusive one.

    So too, in United States v. Taylor, 663 F.Supp.2d 1170 (D.N.M. 2009), the district judge wrote that:
    Mr. Nichols will be permitted to give to the jury his expert opinion that there is a match between the .30–.30 caliber rifle recovered from the abandoned house and the bullet believed to have killed Mr. Chunn. However, because of the limitations on the reliability of firearms identification evidence discussed above, Mr. Nichols will not be permitted to testify that his methodology allows him to reach this conclusion as a matter of scientific certainty. Mr. Nichols also will not be allowed to testify that he can conclude that there is a match to the exclusion, either practical or absolute, of all other guns. He may only testify that, in his opinion, the bullet came from the suspect rifle to within a reasonable degree of certainty in the firearms examination field.
    Id. at 1180.

  5. The court of appeals reasoned that “the ‘scientific certainty’ characterization was subject to cross examination which resulted in acknowledgment of subjectivity in the expert's work, [and] the district court properly instructed as to the role of expert testimony and there was substantial evidence otherwise linking the defendants to the . . . murders.” Id. at 990.