You are here
Science Resources: DNA Technologies
DNA Sequence Variation
Variation in Chromosome Number
Despite the remarkable variation in health, appearance, and behavior observed between people, about 99.9% of the DNA sequence is identical between individuals. Of the the 0.01% of differing DNA, only some of these sequences are responsible for the diversity we observe. Other variation is due to differences in the epigenome, the collection of biological marks and molecules that help regulate DNA without changing DNA itself. DNA variation in general depends upon the insertion or deletion of a base, or substitution of one base to another (e.g., a C is substituted for an A). Sometimes these changes are quite large, like the entire duplication of a chromosome. For example, Down syndrome is caused by a chromosomal anomaly in which a person inherits three copies of chromosome 21 instead of the typical two copies. This chromosomal abnormality is known as trisomy 21 (Fig. 4A).
Figure 4. DNA variation occurs at multiple levels of scale, affecting whole chromosomes and single nucleotide bases. A) An individual’s karyotype shows they possess three copies of chromosome 21 instead of the normal two copies. B) The expansion of the huntingtin gene’s CAG-repeat leads to a malformed huntingtin protein and the onset of Huntington’s disease. The expanded huntingtin gene is caused by duplications of the CAG-repeat in tandem. C) Single nucleotide variants (SNVs) and single nucleotide polymorphisms (SNPs, pronounced "snips") create sequence variation between individuals at specific sites in the genome. There are millions of SNVs and SNPs throughout an individual's genome, providing the sequence variation that results in differences in human appearance, health, and behavior.
Variation in Chromosome Structure
Insertions or deletions may also occur at the scale of a single gene or region of several genes, referred to as gene copy number variation (CNV). For example, duplications of oncogenes (genes responsible for uncontrolled cell growth) are common in many forms of cancer. One such example is the duplication of the gene HER2, which causes overproduction of the HER2 protein and is responsible for a subset of breast cancers. Duplications may also impact much smaller sections of DNA. In Huntington’s disease, for instance, the excessive duplication of a three-nucleotide repeat in the huntingtin gene causes the final protein to be unstable and break apart into toxic fragments (Fig. 4B).
Noncoding sequence may also be affected by duplications or deletions. Short tandem repeats (STRs) consist of a short motif of DNA, two to seven bases in length, that is repeated tens or hundreds of times. The exact number of repeats for a specific STR varies between individuals. There are numerous STRs within the human genome. Because STR lengths are easily measured, and the combination of lengths across STRs in an individual is highly specific, STR testing has become the standard for forensic DNA identification. The STRs chosen for forensic identification are all noncoding sequences, so that they cannot be used to infer features of the individual relating to their health, appearance, or behavior.
Broadly, the types of genetic variation like CNVs, STRs, and smaller insertions or deletions of DNA sequence are known as forms of structural variation. This distinguishes them from other forms of variation that affect single DNA base pairs.
Variation in DNA Base Pairs
DNA variations at the level of single DNA base pairs, such as the substitution of one nucleotide for another, are referred to as single nucleotide variants (SNVs) (Fig. 4C). SNVs may arise spontaneously in human cells or be caused by external insults, like UV radiation. They may be distinct to an individual or passed on to offspring and spread through future generations. If an SNV is passed on to future generations and becomes frequent in the population—for example, 90% of people carry an A at a site and 10% of people carry a C—it is considered a single nucleotide polymorphism (SNP) (pronounced “snip”). SNPs are the most common form of genetic variation between individuals. A single individual may carry between 4 to 5 million SNVs that differ from the human reference genome, and hundreds of millions of SNPs have been identified across all human populations.
Most SNVs occur in the noncoding sequence that predominates the genome. SNVs in non-regulatory, noncoding regions can be highly informative about a person's relationship to other individuals. Commercial genetic genealogy and genetic ancestry services make use of these SNVs to link customers to other relatives and ancestral regions. Other SNVs occur in noncoding, regulatory sequences of DNA and affect the expression of genes, altering when, for how long, and at what level a gene is turned on or off. The few SNVs that actually fall in the coding regions of genes may also affect the function of the gene and its corresponding protein, altering how well the protein works or if it works at all.