You are here

Science Resources: DNA Technologies

Population Biobanks Link Genetic, Demographic, and Trait Information

Population Biobanks

Population biobanks are critical tools in health research that consist of large collections of genetic data and biological samples linked with participants’ demographic, health, and trait information, often through their electronic health records. Biobanks are usually meant to reflect the general public and are not constructed to represent a particular trait or disease, as might be the case in a GWAS cohort. They are meant to serve as a shared resource for multiple independent studies and researchers (Fig. 14).

Figure 14. Diagram of a biobank research system, comprising participants, primary researchers, the biobank, and secondary researchers.

Figure 14. Diagram of a biobank research system, comprising participants, primary researchers, the biobank, and secondary researchers. Participants are recruited, often through local clinics and doctors but sometimes directly from the biobank, to provide biological samples and health information—including ongoing access to electronic health records. The data are collected, curated, and analyzed as a collective. Secondary researchers, who may not have direct access to patients or participants, can request data and analyses from the biobank after receiving permission from oversight organizations.

Participants in the biobank often give broad consent during the sample collection for their data to be used by multiple studies and researchers focused on health and disease. Participants are not asked to consent to non-research use, such as access for marketing or law enforcement purposes.

Population biobanks may be part of a national project linked to a national health service, or they may be part of an institutional effort led by a large hospital network or commercial group. For example, the United States initiated a national biobank effort in 2015 called All of Us, led by the National Institutes of Health.[1] The effort aims to collect health and genetic data from one diverse million participants across the United States to facilitate research and personalized medicine. As of 2024, All of Us has already fully enrolled over 500,000 participants. Similar projects include the UK Biobank, which has data from over 500,000 participants in the United Kingdom’s national health care system,[2] and deCODE Genetics, which is funded by a commercial pharmaceutical company.[3]

The types of data within population biobanks varies and may change over time. For example, the genetic data collected for All of Us includes both genome-wide genotype data and whole-genome sequence data. The UK Biobank began with genome-wide genotype data and is now collecting whole-genome sequence data.

The link between genetic data and health, demographic, and personal identifiable information makes population biobanks at high risk for participant reidentification, exposure, and discrimination. There is, however, strict oversight and gatekeeping to the biobank data, controlled by national health institutions that manage the biobank, IRBs at smaller institutional biobanks, and the research community.

Certificates of Confidentiality Protect Research-Participant Information

Large population biobanks and research databases contain genetic data linked with personal identifiable information, including names, dates of birth, residency, and highly sensitive information regarding health and behavior. These genetic databases can be a valuable and important tool in criminal investigations. However, researchers and others are concerned that the use of these databases to compare forensic DNA samples or link individuals with certain behaviors and traits may violate privacy and inhibit valuable research.[4] Certificates of Confidentiality seek to protect research-participant information.

But research databases and information about research participants are shielded from law enforcement searches and compelled disclosure by Certificates of Confidentiality (Certificates). Ever since the Comprehensive Drug Abuse Prevention and Control Act of 1970 and subsequent amendments, the Department of Health and Human Services (HHS) has issued Certificates to biomedical, behavioral, clinical, and other researchers.[5]

Importantly, Certificates provide protections that run counter to traditional discovery rules, which make them unusual and may lead to confusion for judges and lawyers unfamiliar with their structure.

Certificates explicitly protect researchers against compelled disclosure of identifying information about research participants in any civil, criminal, administrative, legislative, or other proceedings. Additionally, the 21st Century Cures Act has significantly strengthened the statutory language around compelled disclosure of data, indicating that the covered information is immune from legal process and not admissible as evidence.[6]

Since 2017, all NIH-funded projects using identifiable, sensitive information are automatically issued Certificates. Certificates of Confidentiality are also granted through other HHS agencies and may be requested by researchers for studies not federally funded.

Although Certificates of Confidentiality offer broad and explicit protections, the different circumstances in which genetic data are collected and shared can complicate these protections, and confusion and hesitancy around these protections can lead to inappropriate disclosure of information.[7]

For example, Certificates are issued to the research institution, not the individual researcher. Though the Certificate granted to the parent institution of a multisite study covers all other sites, in situations where researchers share data across institutions, judges may find that an institution who received a demand for disclosure or has been ordered by the court to disclose data may not be aware of the Certificate, or may be aware but unwilling to defend it.[8] If research institutions are unwilling to challenge a court-ordered disclosure, the researchers who run the study may end up in a position where they are forced to comply.[9]

There are only a few cases where Certificates have been presented as definitive protections for study data—despite the explicit language. They are often presented as one aspect among many to be considered.[10] There is very little case law to clearly demonstrate the degree that courts consider Certificates to protect research participant data.

Although the 21st Century Cures Act has strengthened protections in some ways, it now permits disclosure of information as required by federal, state, and local laws—except for use in legal proceedings.[11] These disclosures are intended to comply with mandatory public health reporting, but the extent of required disclosures could be interpreted more broadly.

In instances where courts permit disclosing information, they can limit the extent of that disclosure. For example, in Murphy v. Philip Morris Inc., the judge restricted attempts to reidentify study participants from anonymized data, limited the use of information to a particular case, and limited disclosure of information to only certain individuals.[12] In North Carolina v. Bradley, the court ordered research records be maintained as a sealed record in case of later appeal, limited dissemination of the material to the defense and state counsels, and required arguments based on their contents to be made in a separate sealed brief or addendum.[13] The uneven landscape of protections for research participant data can be confusing for institutions, researchers, law enforcement, and judges, with research participants risking exposure and discrimination if care is not taken.

Lastly, the variable landscape of protections can also be confusing for research participants. A research participant may have a general expectation, from reviewing the research study’s consent form and discussions with research staff, that their data are protected. The full extent of this protection under a Certificate of Confidentiality, however, may be less clear.

Specifically, genetic research data are increasingly being included in an individuals’ medical records, both to enhance research and medical care.[14] Certificate protections technically apply to all copies of the research data, even copies moved to medical records.[15] But practically speaking, it is unlikely that those responding to the request for medical records—which are not protected from compelled disclosure—would be aware that the included research data are protected by a Certificate of Confidentiality. Certificate protections would require the research-data portion of the medical record be treated differently from the rest of the medical record.[16] The transfer of identical genetic data from research study to medical record complicates expectations about privacy and legal protections and puts participants at risk of exposure and discrimination.

 

[1] Francis S. Collins & Harold Varmus, A New Initiative on Precision Medicine, 372 New Eng. J. Med. 793 (2015), available at https://doi.org/10.1056/nejmp1500523; All of Us Research Program Investigators, The “All of Us” Research Program, 381 New Eng. J. Med. 668 (2019), available at https://doi.org/10.1056/NEJMsr1809937.

[2] Clare Bycroft et al., The UK Biobank Resource with Deep Phenotyping and Genomic Data, 562 Nature 203 (2018), available at https://doi.org/10.1038/s41586-018-0579-z.

[3] Hakon Hakonarson et al., deCODE genetics, Inc., 4 Pharmacogenomics 209 (2003), available at https://doi.org/10.1517/phgs.4.2.209.22627.

[4] Leslie E. Wolf & Laura M. Beskow, Genomic Databases, Subpoenas, and Certificates of Confidentiality, 21 Genetics in Med. 2681 (2019), available at https://doi.org/10.1038/s41436-019-0592-0.

[5] Policy & Compliance: Certificates of Confidentiality (CoC)—Human Subjects, NIH Grants & Funding, https://grants.nih.gov/policy/humansubjects/coc.htm (last visited February 1, 2024).

[6] Leslie E. Wolf & Laura M. Beskow, New and Improved? 21st Century Cures Act Revisions to Certificates of Confidentiality, 44 Am. J. L. & Med. 343 (2018), available at https://doi.org/10.1177/0098858818789431.

[7] Leslie E. Wolf et al., Certificates of Confidentiality: Protecting Human Subject Research Data in Law and Practice, 43 J. L. Med. & Ethics 594 (2015), available at https://doi.org/10.1111/jlme.12302.

[8] Id.

[9] Laura M. Beskow et al., Certificates of Confidentiality and Compelled Disclosure of Data, 322 Science 1054 (2008), available at https://doi.org/10.1126/science.1164100.

[10] L. E. Wolf et al., supra note 7.

[11] Wolf & Beskow, supra note 4.

[12] Murphy v. Philip Morris Inc., No. CV 99-7155-RAP (JWJx), 2000 U.S. Dist. LEXIS 21128 (C.D. Cal. Mar. 17, 2000). See generally L. E. Wolf et al., supra note 7.

[13] North Carolina v. Bradley, 179 N.C. App. 551, 634 S.E.2d 258 (2006). See generally Beskow et al., supra note 9; L. E. Wolf et al., supra note 7.

[14] Leslie E. Wolf & Laura M. Beskow, Certificates of Confidentiality: Mind the Gap, Utah L. Rev. (2021); Susan M. Wolf et al., Managing Incidental Findings and Research Results in Genomic Research Involving Biobanks and Archived Data Sets, 14 Genetics in Med. 361 (2012), available at https://doi.org/10.1038/gim.2012.23.

[15] Wolf & Beskow, supra note 6; Wolf & Beskow, supra note 14.

[16] Wolf & Beskow, supra note 14.