J. McKenna, J. Cecil & P. Coukos, "Reference Guide on Forensic DNA Evidence" in Reference Manual on Scientific Evidence (Federal Judicial Center 1994)
 
DNA analysis is based on well-established principles of the wide genetic variability among humans and the presumed uniqueness of an individual's genetic makeup (identical twins excepted). Laboratory techniques for isolating and observing the DNA of human chromosomes have long been used in nonforensic scientific settings. The forensic application of the technique involves comparing a known DNA sample obtained from a suspect with a DNA sample obtained from the crime scene, and often with one obtained from the victim. Such analyses typically are offered to support or refute the claim that a criminal suspect contributed a biological specimen (e.g., semen or blood) collected at a crime scene. For example, an analyst may testify on the basis of a report that includes the following:

Deoxyribonucleic acid (DNA) profiles for [the specific sites tested] were developed from specimens obtained from the crime scene, from the victim, and from the suspect. Based on these results, the DNA profiles from the crime scene match those of the suspect. The probability of selecting at random from the population an unrelated individual having a DNA profile matching the suspect's is approximately 1 in 200,000 in Blacks, 1 in 200,000 in Whites, and 1 in 100,000 in Hispanics.An objection to this sort of testimony usually comes before the court when the defense moves to exclude the testimony and the report. Such a motion can be made before or during trial, depending on circumstances and the court's rules regarding in limine motions. Before the Supreme Court's decision in Daubert v. Merrell Dow Pharmaceuticals, Inc., such disputes often were aired in hearings devoted to determining whether the theory and techniques of DNA identification were generally accepted by the relevant scientific community and so satisfied the Frye standard. Since Daubert, general acceptance remains an issue but only one of several a court may consider when DNA evidence is offered. Frequently disputed issues include the validity of applying the standard RFLP [restriction fragment length polymorphism] technique to crime samples, the proper interpretation of test results as showing a match, and the appropriate statistical determination of the probability of a coincidental match.

. . .

RFLP analysis is based on the observable variability of human genetic characteristics. Human genetic information (one's genome) is encoded primarily in chromosomal DNA, which is present in most body cells. Except for sperm and egg cells, each DNA-carrying cell contains 46 chromosomes. Forty-four of these are arranged in homologous pairs of one autosome (nonsex chromosome) inherited from the mother and one autosome from the father. The cells also carry two sex chromosomes (an X from the mother and either an X or a Y from the father). Chromosomal DNA sequences vary in length and are made up of four organic bases (adenine (A), cytosine (C), thymine (T), and guanine (G)). A pairs only with T; C pairs only with G. These sequences of base pairs are arranged in long chains that form the twisted double helix, or ladder structure, of DNA. Thus, if the bases on one side of the helix or ladder are represented as CATAGAT, the complementary side would be GTATCTA.

Most DNA-carrying cells in a human contain the same information encoded in the approximately 3.3 billion base pairs per set of chromosomes in each cell. More than 99% of the base pairs in human cells are the same for all individuals, which accounts for the many common traits that make humans an identifiable species. The remaining base pairs (about 3 million) are particular to an individual (identical twins excepted), which accounts for most of the wide variation that makes each person unique.



A gene (characteristic DNA sequence) is found at a particular site, or locus, on a particular chromosome. For instance, a gene for eye color is found at the same place or locus on the same chromosome in every individual. Normal individuals have two copies of each gene at a given locus--one from the father and one from the mother. A locus on the DNA molecule where all humans have the same genetic code is called monomorphic. However, genes vary. An individual may receive the genetic code for blue eyes from his or her mother and the genetic code for brown eyes from his or her father. An alternative form of a gene is known as an allele. A locus where the allele differs among individuals is called polymorphic, and the difference is known as a polymorphism.

Although some polymorphisms have been found to govern what makes individuals observably distinct from one another (e.g., eye color), others serve no known function. Among these noncoding DNA regions are some in which certain base pair sequences repeat in tandem many times (e.g., CATCATCAT . . . paired with GTAGTAGTA . . .). This is known as a Variable Number of Tandem Repeats, or a VNTR. The number of base pairs and the sequence of pairs vary from locus to locus on one chromosome and from chromosome to chromosome. RFLP analysis allows scientists to determine the size of a repetitive sequence. Because the length of these sequences (sometimes called band size) of base pairs is highly polymorphic, although not necessarily unique to an individual, comparison of several corresponding sequences of DNA from known (suspect) and unknown (forensic) sources gives information about whether the two samples are from the same source. . . .

First, DNA is extracted from an evidence sample collected at the crime scene. Second, it is digested by a restriction enzyme that recognizes a particular known sequence called a restriction site and cuts the DNA there. The result is many DNA fragments of varying sizes. Third, digested DNA from the crime sample is placed in a well at the end of a lane in an agarose gel, which is a gelatin-like material solidified in a slab about five inches thick. Digested DNA from the suspect is placed in another well on the same gel. Typically, control specimens of DNA fragments of known size, and, where appropriate, DNA specimens obtained from a victim, are run on the same gel. Mild electric current applied to the gel slowly separates the fragments in each lane by length, as shorter fragments travel farther than longer, heavier fragments. This procedure is known as gel electrophoresis.

Fourth, the resulting array of fragments is transferred for manageability to a sheet of nylon by a process known as Southern blotting. Either during or after this transfer, the DNA is denatured ("unzipped") by heating, separating the double helix into single strands. The weak bonds that connect the two strands are susceptible to heat and salinity. The double helix can be unzipped or denatured without disrupting the chain on either side. Fifth, a probe--usually with a radioactive tag--is applied to the membrane. The probe is a single strand of DNA that hybridizes with (binds to) its complementary sequence when it is applied to the samples of denatured DNA. The DNA locus identified by a given probe is found by experimentation, and individual probes often are patented by their developers. Different laboratories may use different probes (i.e., they may test for alleles at different loci). Where different probes are used, test results are not comparable.

Finally, excess hybridization solution is washed off, and the nylon membrane is placed between at least two sheets of photographic film. Over time, the radioactive probe material exposes the film where the biological probe has hybridized with the DNA fragments. The result is an autoradiograph, or an autorad, a visual pattern of bands representing specific DNA fragments. An autorad that shows two bands in a single lane indicates that the source is heterozygous for that locus (i.e., he or she inherited a different allele from each parent). If the autorad shows only one band, the person may be homozygous for that allele (i.e., each parent contributed the same variant of the gene). Together, the two alleles make up the person's genotype (genetic code) for the specific locus associated with the probe.

Once an appropriately exposed autorad is obtained, the probe is washed from the membrane, and the process is repeated with different probes that bind to different sequences of DNA. Three to five probes are typically used, the number depending in part on the amount of testable DNA recovered from the crime sample. The result is a set of autorads, each of which shows the results of one probe. . . . Illustrations of the results of all probes depict an overlay of multiple films or the results from a multilocus probe.

If the two DNA samples are from the same source, and if the laboratory procedures are conducted properly, hybridized DNA fragments of approximately the same length should appear at the same point in the suspect and evidence specimen lanes. If, on visual inspection, the DNA band patterns for the suspect and the evidence sample appear to be aligned on the autorad, this impression is verified by a computerized measurement. If the two bands fall within a specified length, or match window (e.g., 2.5% of band length), a match is declared for that probe or allele. For forensic purposes, a match means that the patterns are consistent with the conclusion that the two DNA samples came from the same source. Taken together, the results of the probes form the DNA profile.

. . .

If a profile match is declared, it means only that the DNA profile of the suspect is consistent with that of the source of the crime sample. The crime sample may be from the suspect or from someone else whose profile, using the particular probes involved, happens to match that of the suspect. Expert testimony concerning the frequency with which the observed alleles are found in the appropriate comparison population is necessary for the finder of fact to make an informed assessment of the incriminating value of this match. (1)

The frequency with which an individual allele occurs in the comparison population is taken to be the probability of a coincidental match on that allele. These individual probabilities of a coincidental match are combined into an estimate of the probability of a coincidental match on the entire profile. This estimate is interpreted as the probability that a person selected at random from a comparison population would have a DNA profile that matches that of the crime sample. The probability estimate typically provided by a forensic expert cannot be interpreted strictly as the probability that an examiner will declare a match when the samples are actually from different sources. That probability is affected by other factors, the most important of which is the chance of laboratory error.

Differences in scientific opinion arise with respect to two main issues: (1) the appropriate method for computing the estimated probability of a coincidental match of a DNA profile; and (2) the selection of an appropriate comparison population. These issues are addressed below.



A. What Procedure Was Used to Estimate the Probability That the Individual Alleles Match by Coincidence?

Ascertaining an allele's frequency in a given population is essentially an empirical exercise. A sample of individuals is drawn from the designated population, their DNA is examined with genetic probes used in forensic analysis, and a table of frequencies is developed. For example, the FBI has constructed frequency tables using a fixed-bin method, in which standardized size markers are used to define boundaries of bins into which are sorted the fragment sizes observed in a sample population.

. . .

Assumptions of classical population genetics are used to estimate the probability that a person chosen at random from the specified population would exhibit the same genotype as the suspect's. The greater the probability of such a coincidental match, the lower the incriminating value of the evidence.



B. How Were the Probability Estimates for Coincidental Matches of Individual Alleles Combined for an Overall Estimate of a Coincidental Match of the Entire DNA Profile?

One of the most difficult and contentious issues in forensic use of DNA evidence is how to estimate the probability that two DNA profiles match by chance. This issue has become especially difficult in federal courts since the Supreme Court's decision in Daubert. . . .

What follows is a description of two techniques for estimating the probability of a coincidental match: the product rule technique and the modified ceiling principle technique, which was recommended by the NRC [National Research Council of the National Academy of Sciences] committee. The probability estimates can vary widely depending on the technique used.



1. Product rule technique

The product rule technique offers the most straightforward method of computing the probability of a matching DNA profile. To compute the probability of a random occurrence of a specific pattern of alleles in a DNA profile, the analyst multiplies the separate estimated probabilities of a random occurrence of each allele in the comparison population. When these individual probabilities are multiplied, the estimated probability of a distinctive pattern occurring at random may be less than one in several hundred thousand.

The probability estimate resulting from the multiplication assumes that the individual alleles identified by genetic probes are independent of each other. If the probabilities of the individual alleles are not independent (i.e., if certain alleles are likely to occur together in a person), multiplying the individual allele frequencies may underestimate or overestimate the true probability of matching alleles in the chosen population and thereby misstate the incriminating value of the evidence. Critics of the product rule technique contend that in some ethnic subpopulations the alleles identified by commonly used genetic probes are not independent and that using a broad-based comparison population is therefore inappropriate. They prefer estimation techniques that do not require analysts to assume the independence of individual alleles in large comparison populations.



2. Modified ceiling principle technique

The method recommended by the NRC committee involves conservative interpretations of existing population data. As a preliminary test, the laboratory should examine its population database to determine if it contains a sample that matches the profile of the multiple alleles of the crime sample. If no match is found across multiple genetic probes, the expert reports that "the DNA pattern was compared to a database containing N individuals from the population and no match was observed." This is an extremely conservative approach that yields probative information but does not give the fact finder any information concerning the probability of finding a matching profile by chance in the larger population.

The NRC committee has proposed a modified ceiling principle technique, which takes advantage of the computation techniques of population genetics but which includes adjustments that the committee believes make it an "appropriately conservative" approach. Although it noted that recent empirical studies have detected no evidence of a departure from independence within or across commonly used genetic probes, the NRC committee chose "to assume for the sake of discussion that population substructure may exist and provide a method for estimating population frequencies in a manner that adequately accounts for it." . . .

The NRC committee recommends that two adjustments be made to population frequencies derived from existing data to permit conservative estimates of the likelihood of a coincidental matching profile. First, the 95% upper confidence limit for the estimated allele frequency is computed for each of the existing population samples (Black, White, Native American, etc.). This upper bound of the confidence interval is intended to accommodate the uncertainties in current population sampling. Second, the largest of these upper confidence limit estimates, or 10%, whichever is greater, is used to compute the joint probability of a coincidental match on the DNA profile of the crime sample. A lower bound of 10% is intended to address concerns that current population data sets may be substructured in unknown ways that would yield misleading estimates of a coincidental matching profile for members of subpopulations. The resulting probabilities then are multiplied as in the product rule computations. The upper confidence limit from among the existing samples, or a 10% lower bound, is used to provide a probability estimate that errs, if at all, in a conservative direction (i.e., that is more favorable to a suspect).



C. What Is the Relevant Comparison Population for Estimating the Probability of a Coincidental Allelic Match?



Disputes over the appropriate comparison population have focused on cases in which the product rule technique has been used. The extent to which the product rule technique may underestimate the probability of a coincidental match has been hotly disputed by population geneticists and other scholars. The dispute centers on disagreements over the adequacy of commonly used comparison populations and the role of racial and ethnic subpopulations in probability estimation. Specifying the appropriate comparison population may be of considerable importance, as the estimates of a coincidental match can vary greatly depending on the population selected. For example, the prevalence of certain alleles may vary greatly across races--some alleles are common in Black populations and infrequent in White populations, and vice versa. If a DNA profile for a Black suspect is compared with frequency estimates based on a White population, the estimated likelihood of a chance match may be in error by some unknown amount. Similarly, if a DNA profile of a member of a subpopulation with a distinct frequency distribution of alleles is compared with frequency estimates based on an inappropriate larger population, an error of unknown magnitude may result. Concern over accuracy of estimates of a coincidental match has focused attention on the assumptions used in selecting a comparison population and the scientific validity of the methods used to estimate the probability of a coincidental match.



1. Is the comparison population consistent with the population of possible sources of the DNA?

If the modified ceiling principle technique is not used, an appropriate comparison population must be designated. Such a designation is guided by the characteristics of the population of individuals who might have been the source of the sample. For example, if a rape victim saw her assailant and described him as White, and there is no more information to implicate a member of a specific subpopulation or group, the comparison population should be those who appear to be White and who were in a position to commit the assault. Comparisons based on members of populations or subpopulations who appear to be non-White therefore would be inappropriate. Similarly, where there is no information indicating the race or ethnicity of the perpetrator, the comparison population should be designated by the characteristics of those in a position to commit the assault. The race or ethnicity of the suspect is irrelevant.

In other cases, however, the pool of alternative suspects may be limited to members of a distinct isolated community or a specific ethnic subgroup. This circumstance has arisen in federal courts where the defendant was a member of a Native American tribe and the crime occurred on the tribal reservation. The typical databases of allele frequencies used for probability estimation address broader population groups, such as Blacks, Whites, Native Americans, Hispanics from the southeastern United States, and Hispanics from the southwestern United States. The extent to which broad racial and cultural comparison populations must correspond to the characteristics of the suspect population turns on the extent to which the distribution of alleles tested for in the forensic analysis differs for the suspect population and the comparison database. This remains a disputed issue among scientists. Difficulty in resolving this issue was responsible, in part, for the NRC committee's recommending a technique that does not require the designation of a suspect population.



2. Does the comparison population conform to characteristics that allow the estimation of the joint occurrence of matching alleles by multiplication of the probabilities of the individual alleles?

The comparison population must conform to assumptions that underlie the technique used to compute the estimate of a coincidental match, or at least conform sufficiently that minor deviations are of little consequence in computing the probability estimates. The computation of the probability of a random match by the product rule technique is based on the assumption that the individual alleles of the DNA profile are independent of one another. According to the principles of population genetics, the independence of alleles may be assumed only where the comparison population mixes freely and mates randomly (with respect to the alleles) such that the distribution of alleles within the comparison population is homogeneous. If the comparison population does not conform to these assumptions, the alleles may not be independent, and the computation of probability estimates may be incorrect.

Opponents of the product rule technique argue that it is inappropriate to use broad racial and cultural characteristics to specify the comparison population, because in reality such groups do not mix freely with respect to relevant genes. Broad racial groups, they argue, disguise subpopulations that would be more appropriate for comparison. More specifically, opponents charge that use of broad racial groups violates the assumption of independence justifying the multiplication of the separate probabilities assigned to each probe. For example, a White comparison group that includes diverse ethnic groups (e.g., persons of Polish, Italian, or Irish descent) may mask differences in the distribution of alleles among subgroups. Opponents charge that no meaningful estimate of the probability of a particular DNA profile can be developed without specifying a suitable subpopulation that meets the demanding assumptions of the product rule technique. When alleles are not independent, as when a comparison group contains a substructure, the product rule technique may underestimate the probability that the forensic and suspect DNA patterns match by coincidence.

Proponents of the product rule technique acknowledge that some substructuring may exist in the comparison populations typically used; but they argue that it is inappropriate to apply the assumption strictly and that the probability estimates generally are accurate in spite of violations of the strict assumption regarding the absence of substructuring. Furthermore, proponents claim that the conservative features of the fixed-bin method more than compensate for any underestimation. The NRC committee notes that what little empirical evidence existed at the time of its report appeared to support this contention

1. 55. Statistical testimony concerning the likelihood of a DNA profile matching by coincidence is necessary to assess the probative value of the matching profile. . . . See People v. Barney, 8 Cal. App. 4th 798, 817 (Cal. Ct. App. 1992) ("The statistical calculation step is the pivotal element of DNA analysis, for the evidence means nothing without a determination of the statistical significance of a match of DNA patterns."); D. H. Kaye, The Forensic Debut of the National Research Council's DNA Report: Population Structure, Ceiling Frequencies and the Need for Numbers, 34 Jurimetrics J. 369, 381 (1994) ("As a legal matter, a completely unexplained statement of a 'match' should be inadmissible because it is too cryptic to be weighed fairly by the jury, . . . [H]ow to present to a jury valid scientific evidence of a match is a legal rather than a scientific issue falling far outside the domain of the general acceptance test and the helds of statistics and population genetics. Thus, it would not be 'meaningless' to inform the jury that two samples match and that this match makes it more probable, in an amount that is not precisely known, that the DNA in the samples comes from the same person."). But see United States v. Martinez, 3 F.3d 1191, 1199 (8th Cir. 1993) (ruling that admitting evidence of a DNA profile match without evidence concerning the statistical probability of a coincidental match was not reversible error where defendant stipulated that statistical evidence was not required), cert. denied, 114 S. Ct. 734 (1994).

Back to Top

 


div1.gif (1531 bytes)
Home | Contents | Topical Index | Syllabi | Search | Contact Us | Professors' Pages
Cases | Problems | Rules | Statutes | Articles | Commentary