You are here
Science Resources: DNA Technologies
Genetic Algorithms Predict an Individual’s Risk for a Specific Disease, Behavior, or Physical Trait
Big data and artificial intelligence are now commonly used to produce predictive algorithms, improve online retailer suggestions, predict voter behaviors, evaluate the risk of violating bail, and suggest the length of criminal sentences. Genetics has also adopted this methodology, and several new genetic algorithms are designed to read an individual’s genetic information and predict, with a level of uncertainty, physical and behavioral traits.
These algorithms are likely to crop up in cases involving medical diagnoses and possibly consumer protection—the Federal Trade Commission has taken action in the past when marketers have over-advertised the predictive power of consumers’ genetic information.[1] It is also possible that these algorithms will be used as tools for forensic identification, and as supplemental evidence during sentencing.
Polygenic Risk Scores
One genetic algorithm that is increasingly being applied in clinical and direct-to-consumer genetics is the polygenic risk score (PRS).[2] A polygenic risk score seeks to identify the extent to which an individual’s genetics contribute to that individual’s risk of developing a complex and polygenic disease, where numerous genetic variants each make a small and additive contribution to the risk for that disease (Fig. 19A,B).[3]
Figure 19. A polygenic risk score (PRS) leverages data from hundreds of genome-wide genetic variants to determine an individual’s relative risk for a disease. A) Genetic variants (represented as stars) are identified in population genetic databases. Some variants are shared among individuals (common) and others are unique (rare). B) An individual may carry hundreds of common and rare genetic variants, some of which will be associated with a specific disease through research studies like GWAS and which contribute different levels of risk for that disease. C) A PRS sums the risk across all of the disease-associated variants an individual carries to calculate that individual’s risk for a disease relative to the overall population. Adapted from https://www.genome.gov/Health/Genomics-and-Medicine/Polygenic-risk-scores.
Polygenic risk scores leverage data from dozens to thousands of genetic variants in an individual’s genome to estimate an individual’s risk for a specific disease (e.g., type 2 diabetes or depression) or behavior (e.g., risk of aggressive behaviors).[4] The variants used to calculate the PRS are chosen from genome-wide association studies that identify SNPs associated with the particular trait (see Databases of Genetic Information: Research, Commercial, and Forensic).[5]
The PRS does not, however, incorporate nongenetic factors like age, lifestyle, or environment, which also impact an individual’s overall risk of developing a condition. Nongenetic factors may offer opportunities to ameliorate the risk attributed to genetics (e.g., people with a higher PRS for colorectal cancer could modify their diet or undergo more screening).
Note that a PRS only provides a relative risk score—how a person’s risk compares to that of other people (Fig. 19C).[6] The relative risk reported for an individual should not be confused with the absolute risk for that disease, which is based on the whole population. If a person’s genetic profile increases their risk for a disease by 50% relative to that the broader population, but the prevalence of the disease in the whole population is only 0.1%, that individual’s absolute risk is still only 0.15%.
Polygenic risk scores may appear in the courts in several different situations. In the past, judges have shown willingness to consider evidence indicating a “genetic predisposition” to certain behaviors when sentencing, although these cases exclusively involve evidence offered by the defense in mitigation, rather than by prosecution as an aggravating factor.[7] A PRS for a complex trait, such as tendency toward aggressive behavior or even a clinical disease like schizophrenia, is becoming another approach to argue for enhanced or mitigated sentencing. In civil litigation, one might also see these scores introduced on the topic of causation.
Direct-to-consumer genetic testing services are also beginning to offer PRS estimates to customers.[8] Depending on how these estimates are marketed, they may or may not require regulatory approval. For example, 23andMe launched a PRS for type 2 diabetes in 2019, but because the PRS was marketed as a low-risk “wellness” product, they did not seek FDA approval. Other 23andMe PRSs are FDA-approved.[9]
In the future, PRSs are also likely to be used in medical care as diagnostic and treatment tools,[10] an approach the National Institutes of Health has begun to test cautiously and in limited capacity.[11] Inappropriate or ineffective application of these algorithms could lead to medical malpractice cases.
Forensic DNA Phenotyping
Law enforcement agencies are also interested in predictive genetic algorithms as a means of genetic phenotyping, or forensic DNA phenotyping. As PRSs are a method for turning genome-wide information into a prediction for a disease, genetic phenotyping similarly turns genome-wide information into a prediction of a person’s appearance.
Direct-to-consumer genetic companies already provide “predictions” of a user’s eye color, hair color, or height. These could become powerful forensic tools, turning an unidentified DNA sample into an investigative lead. There have been some notable instances of this approach,[12] and researchers have used whole-genome sequencing data to predict traits like age, height, and face morphology.[13] However, critics argue that the results generate only vague and general predictions of physical features, doing no better than predictions from nongenetic information.[14]
Cautions About Predictive Genetic Algorithms
Predictive genetic algorithms need to be applied cautiously for several methodological reasons. One concern is that the genetic and trait data used to generate these tools is predominantly of European origin, which can compromise the value of their results when these tools are applied to non-European individuals (see Databases of Genetic Information: Research, Commercial, and Forensic).[15] This will be less of a problem as more data from diverse populations are collected, but the diversity of data used to generate and validate these algorithms will always need to be reviewed.
Predictive algorithms are also sensitive to the genetic data to which they are applied. Forensic labs should show extensive validation of their approach for collecting genetic data from forensic DNA samples, which—unlike clinical or research samples—often vary in quality, quantity, and purity.
It is important to underscore that these predictions are probabilities and not deterministic. Even traits with a strong genetic component (e.g., height) are still influenced by environmental, behavioral, and stochastic factors.
[1] Press Release, Fed. Trade Comm’n, FTC Approves Final Consent Orders Settling Charges that Companies Deceptively Claimed Their Genetically Modified Nutritional Supplements Could Treat Diseases (May 12, 2014), https://www.ftc.gov/news-events/press-releases/2014/05/ftc-approves-final-consent-orders-settling-charges-companies.
[2] Amit V. Khera et al., Genome-Wide Polygenic Scores for Common Diseases Identify Individuals with Risk Equivalent to Monogenic Mutations, 50 Nature Genetics 1219 (2018), available at https://doi.org/10.1038/s41588-018-0183-z; Polygenic Risk Scores, Nat’l Hum. Genome Resch. Inst. (last updated Aug. 11, 2020), https://www.genome.gov/Health/Genomics-and-Medicine/Polygenic-risk-scores .
[3] Nat’l Hum. Genome Resch. Inst., supra note 2.
[4] Khera et al., supra note 2; Katherine L. Musliner et al., Association of Polygenic Liabilities for Major Depression, Bipolar Disorder, and Schizophrenia with Risk for Depression in the Danish Population, 76 JAMA Psychiatry 516 (2019), available at https://doi.org/10.1001/jamapsychiatry.2018.4166; Tom G. Richardson et al., An Atlas of Polygenic Risk Score Associations to Highlight Putative Causal Relationships Across the Human Phenome, 8:e43657 eLife (2019), available at https://doi.org/10.7554/eLife.43657.
[5] Shing Wan Choi et al.,Tutorial: A Guide to Performing Polygenic Risk Score Analyses, 15 Nature Protocols 2759 (2020), available at https://doi.org/10.1038/s41596-020-0353-1.
[6] Nat’l Hum. Genome Resch. Inst., supra note 2.
[7] Nita A. Farahany, Neuroscience and Behavioral Genetics in U.S. Criminal Law: An Empirical Analysis, 2 J. L. & Biosci. 485 (2016), available at https://doi.org/10.1093/jlb/lsv059.
[8] Antonio Regalado, 23andMe Thinks Polygenic Risk Scores Are Ready for the Masses, but Experts Aren’t So Sure, MIT Tech. Rev. (Mar. 8, 2019), https://www.technologyreview.com/2019/03/08/136730/23andme-thinks-polygenic-risk-scores-are-ready-for-the-masses-but-experts-arent-so-sure/.
[9] Tara Goodin, FDA allows marketing of first direct-to-consumer tests that provide genetic risk information for certain conditions, FDA News Release. (April 6, 2017), available at https://www.fda.gov/news-events/press-announcements/fda-allows-marketing-first-direct-consumer-tests-provide-genetic-risk-information-certain-conditions.
[10] Benjamin Cross, Richard Turner, Munir Pirmohamed, Polygenic risk scores: An overview from bench to bedside for personalised medicine, 13 Frontiers in Genetics 1000667 (2022), available at https://doi.org/10.3389/fgene.2022.1000667.
[11] Prabarna Ganguly, Press Release, NIH Funds Centers to Improve the Role of Genomics in Assessing and Managing Disease Risk, Nat’l Hum. Genome Resch. Inst. (July 1, 2020), available at https://www.genome.gov/news/news-release/NIH-funds-centers-to-improve-role-of-genomics-in-assessing-and-managing-disease-risk.
[12] Samuel Hodge, Current Controversies in the Use of DNA in Forensic Investigations, 48 Univ. Balt. L. Rev. 39 (2018), available at https://scholarworks.law.ubalt.edu/ublr/vol48/iss1/3/; Michele Casey Van Laan, The Genetic Witness: Forensic DNA Phenotyping, 2 J. Emerging Forensic Sci. Resch. 33 (2017); Erin Murphy, Forensic DNA Typing, 1 Ann. Rev. Criminology 497 (2018), available at https://doi.org/10.1146/annurev-criminol-032317-092127.
[13] Christoph Lippert, Identification of Individuals by Trait Prediction Using Whole-Genome Sequencing Data, 114 Proc. Nat’l Acad. Sci. U.S. 10166 (2017), available at https://doi.org/10.1073/pnas.1711125114.
[14] Yaniv Erlich, Major Flaws in “Identification of Individuals by Trait Prediction Using Whole-Genome Sequencing Data”, BioRxiv (Sept. 7, 2017), available at https://doi.org/10.1101/185330.
[15] L. Duncan et al., Analysis of Polygenic Risk Score Usage and Performance in Diverse Human Populations, 10 Nature Commc’ns 1 (2019), available at https://doi.org/10.1038/s41467-019-11112-0; Alicia R. Martin et al., Human Demographic History Impacts Genetic Risk Prediction Across Diverse Populations, 100 Am. J. Hum. Genetics 635 (2017), available at https://doi.org/10.1016/j.ajhg.2017.03.004; Alice B. Popejoy & Stephanie M. Fullerton, Genomics Is Failing on Diversity, 538 Nature 161, 161–164 (2016), available at https://doi.org/10.1038/538161a.