You are here
Science Resources: DNA Technologies
Non-Law-Enforcement Database Searches: Investigative Leads and the Risk of Privacy Exposure
The Role of Direct-to-Consumer Databases and Investigative Genetic Genealogy in Catching the Golden State Killer
The apprehension of James DeAngelo in April 2018 marked a breakthrough moment for law enforcement and their use of DTC genetic databases to generate investigative leads. There has since been a rapid increase in the use of non-law-enforcement DNA databases to search for potential suspects.[1] This search technique is known as investigative or forensic genetic genealogy (IGG/FGG), to distinguish it from extended-family searches using law enforcement databases.[2]
The Department of Justice announced an interim policy on using non-law-enforcement genetic databases to generate investigative leads, recommending their use only for unsolved violent crimes where CODIS searches have failed to produce a probative and confirmed DNA match.[3] The Department of Justice has also provided grants to several prosecutors’ offices to facilitate using the approach.[4] Still, investigative searches of non-law-enforcement databases raise legal and methodological questions.
When investigators search non-law-enforcement databases, they do not need to find exact matches like they do when searching CODIS.[5] Instead, investigators will more likely identify extended family who lead to potential suspects.[6] Though only a fraction of the U.S. population has directly contributed to DTC genetic databases, a much larger portion of the population is identifiable through IGG because DNA is shared between relatives. Early estimates indicated that at least 60% of the U.S. individuals with European ancestry could be identified through close relatives in the databases.[7] This percentage has only increased as DTC databases have grown in size.
In the case of DeAngelo, law enforcement searches of the DTC platform identified multiple likely relatives who had uploaded DNA data to the service. Investigators then examined the families of these relatives, narrowing the search using information like age and place of residence. Finally, they collected new DNA samples from suspects to compare against the original forensic sample.
Because investigative genetic genealogical searches are typically only a means for identifying potential leads (unless the perpetrators themselves are in the DTC databases), the full details of the search may not be disclosed before or during trial. Alternatively, the detailed disclosure of the search may only occur during a pretrial hearing related to the legality of the search. For example, investigators searched the DTC platform GEDmatch as part of their investigation of the Golden State Killer, but it was their search of another site—MyHeritage—that actually identified the familial matches that helped break the case.[8] The searches at MyHeritage and another DTC platform (which returned no leads) were not revealed until after the trial.
Instances of inconclusive or undisclosed IGG searches can be difficult to evaluate for legal and scientific rigor. The Department of Justice’s interim policy on IGG searches states that investigators shall identify themselves as law enforcement and search only in DTC services that give notice to their users that law enforcement may search the database.[9] Though the DeAngelo case predates the Department of Justice’s policy, at the time investigators searched GEDmatch and other sites, these platforms provided no such notice to their users.
Since 2018, GEDmatch has updated it policies and permits users to opt out of law enforcement searches of their data to investigate violent crimes.[10] In at least one instance, however, an investigator sought and was granted a warrant to search the entire GEDmatch database.[11] However, in that case investigators had already performed a search of the GEDmatch database before users were permitted to opt out of law enforcement searches. They sought to access genetic matches that they had previously already seen.[12] By contrast, Ancestry reports receiving two criminal subpoenas to access the Ancestry DNA database between July 1 and December 31, 2020. Ancestry challenged both requests, and the requests were withdrawn.[13]
Reviewing all attempted IGG searches, not just successful ones, is also important to protect the disclosure of genetic data and sensitive information. If investigators upload an individual’s genome-wide data to a public database accessible by other users, it might be considered the equivalent of publicly disclosing sensitive personal information. The Department of Justice’s interim policy on IGG searches recommends investigators take steps to limit public access to the forensic sample’s genetic data on these platforms.[14] Still, if multiple IGG searches are performed across various databases, there is increased risk of exposing an individual’s genetic and personal information. Searches that are not disclosed make it difficult to protect an individual’s sensitive information.
Disclosing the full extent of the IGG search is also important to evaluate the scientific details of the search itself. Different DTC services can rely on different analytical tools to identify potential relatives, and the quality of these tools and their review by outside experts may vary. Identifying potential relatives relies on setting a threshold for genetic relatedness—a feature which varies within relationships and diminishes rapidly the more distant the relatives. This can have critical consequences—affecting the accuracy and specificity of the search results, and potentially generating false leads and bringing individuals into unnecessary contact with law enforcement. Reviewing the full extent of the investigative genetic genealogy search is important to ensure that the search is carried out with scientific rigor, following policy guidelines.
In addition to reviewing how IGG searches are conducted, it is also important to review how investigators obtained the data necessary to search DTC databases. Specifically, law enforcement DNA profiles rely on very different genetic data (a profile from twenty noncoding STR loci) than do DTC services (data from more than half a million genome-wide SNPs). In cases where investigators used a state or private lab to perform the SNP genotyping on a forensic sample, the lab should show extensive validation of the genotyping process across varying levels of DNA quality and quantity.[15] New research also shows it may be possible to translate STR genotype data into genome-wide SNP data using statistical methods.[16] While these methods have been demonstrated in a research context, they have not been evaluated and standardized for forensic purposes.
Regardless of which approach is used to generate the SNP data, it is critical to consider the quality and quantity of the original forensic DNA. Many forensic samples are low quantity, degraded, or are mixtures of multiple individuals. These factors can contribute to serious errors in genotyping and identification.
Overall, investigative genetic genealogy and law enforcement’s use of DTC genetic databases can be powerful tools and resources for generating investigative leads. However, the complex, changing landscape of protections to DTC customer data and the potential for undisclosed law enforcement searches risk privacy exposure for DTC customers and their relatives. Close scrutiny of the methodology of these searches can help ensure successful outcomes while protecting individuals’ genetic privacy.
[1] Sara H. Katsanis, Pedigrees and Perpetrators: Uses of Dna and Genealogy in Forensic Investigations, 21 Ann. Rev. Genomics & Hum. Genetics 535 (2020), available at https://doi.org/10.1146/annurev-genom-111819-084213; Natalie Ram et al., Genealogy Databases and the Future of Criminal Investigation, 369 Science 1078, available at https://doi.org/10.1126/science.aau1083.
[2] Christi J. Guerrini et al., Four Misconceptions About Investigative Genetic Genealogy, 8 J. L. & Biosci., Jan.–June 2021, at 1–18, available at https://doi.org/10.1093/jlb/lsab001; Sci. Working Grp. on DNA Analysis Methods (SWGDAM), Overview of Investigative Genetic Genealogy, Feb. 18, 2020 [hereinafter Sci. Working Grp.], https://1ecb9588-ea6f-4feb-971a-73265dbf079c.filesusr.com/ugd/4344b0_6cc9e7c82ccc4fc0b5d10217af64e31b.pdf.
[3] U.S. Dep’t of Just., Interim Policy: Forensic Genetic Genealogical DNA Analysis and Searching, Sept. 2019, [hereinafter DOJ Interim Policy], https://www.justice.gov/olp/page/file/1204386/download.
[4] U.S. Dep’t of Just., Fact Sheet: Prosecuting Cold Cases Using DNA, Nov. 2023, https://bja.ojp.gov/doc/fs-prosecuting-cold-cases-using-dna.pdf.
[5] Sci. Working Grp., supra note 2.
[6] Yaniv Erlich et al., Identity Inference of Genomic Data Using Long-Range Familial Searches, 362 Science 690 (2018), available at https://doi.org/10.1126/science.aau4832.
[7] Erlich et al., supra note 2.
[8] Paige St. John, The Untold Story of How the Golden State Killer Was Found, L.A. Times (Dec. 8, 2020, 5:00 a.m.), https://www.latimes.com/california/story/2020-12-08/man-in-the-window.
[9] DOJ Interim Policy, supra note 3.
[10] Jocelyn Kaiser, A Judge Said Police Can Search the DNA of 1 Million Americans Without Their Consent. What’s Next?, Science: Sci. & Pol’y, (Nov. 7, 2019, 2:40 p.m.), available at https://doi.org/10.1126/science.aba1428.
[11] Kashmir Hill & Heather Murphy, Your DNA Profile Is Private? A Florida Judge Just Said Otherwise, N.Y. Times: Business (Nov. 5, 2019), https://www.nytimes.com/2019/11/05/business/dna-database-search-warrant.html.
[12] Search Warrant for GEDmatch, Orlando Police Dep’t (9th Cir. June 14, 2019), https://assets.documentcloud.org/documents/6547788/Orlando-PD-Search-Warrant-for-GEDMatch.pdf.
[13] Ancestry, Ancestry Transparency Report, Jan. 2021, https://www.ancestry.com/cs/transparency.
[14] DOJ Interim Policy, supra note 3.
[15] Sci. Working Grp., supra note 2.
[16] Jaehee Kim et al., Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci, 175 Cell 848 (2018), available at https://doi.org/10.1016/j.cell.2018.09.008; Jaehee Kim & Noah Rosenberg, Record-Matching of STR Profiles with Framentary Genomic SNP Data, 31 Eur. J. of Hum. Genet.11 (2023), available at https://doi.org/10.1038/s41431-023-01430-9.