You are here
Science Resources: DNA Technologies
DNA Basics: Nucleotides, Genes, and the Genome
Deoxyribonucleic acid (DNA) is a complex molecule that contains the genetic code of an organism. It acts as an instruction manual for the operation of each individual cell, and it allows the coordinated functions of a complex organism like a human. If you imagine that any given cell in your body is a factory, the DNA in the cell is like the operation manual for that factory, describing what, how, and when to operate and make products.
DNA itself is made up of four chemical building blocks, referred to as nucleotide bases (Fig. 1). The bases are adenine, thymine, guanine, and cytosine—abbreviated A, T, G, and C. The bases are strung together, in varying combinations, forming two complementary strands where an A on one strand is always opposite a T on the other strand, and a C is always opposite a G. The strands are connected to each other by chemical bonds between the complementary bases, forming base pairs (bp). The strands twist upon themselves in a ladder-like double-helix. The order of the bases can be read sequentially, like words in a sentence, to decipher the instructions contained in the DNA. Because the two strands of DNA are complementary, reading the sequence of one strand also tells the sequence of the opposite strand.
Figure 1. DNA is composed of nucleotide bases. The DNA double helix is formed by two reverse-complementary strands of DNA, each composed of a sequence of nucleotide bases, with a sugar phosphate backbone. The strands are complementary because adenine binds opposite thymine and cytosine binds opposite guanine.
Genes, Coding Sequence, and Regulatory Sequence
DNA is made of genes, which include both coding sequences and regulatory sequences. A gene is a segment of DNA that provides instructions for making a specific protein, a molecule that serves a function in your body (Fig. 2). For example, the gene ADH1A codes for the protein alcohol dehydrogenase 1A, one of the several alcohol dehydrogenase proteins in humans that are essential for the breakdown of alcohol in the body.
Figure 2. DNA is organized into genes, which code for proteins that act together to perform cellular functions affecting physical traits. Only a small portion (~2%) of the total DNA is coding sequence that produces proteins. The rest of the genome, interspersed within and between genes, is noncoding sequence. A large portion of noncoding sequence acts as regulatory DNA, controlling when, how much of, and what form of the protein is made.
Humans have about 23,000 protein-coding genes. The location of these genes in the DNA is consistent between individuals and is known as its locus. In order to function correctly, a gene must have a very specific DNA sequence. Sometimes, slight differences in a gene’s sequence can occur and still allow the gene to function, contributing to the genetic variation between individuals. Other times, the gene’s function is more seriously compromised can lead to disease. The different forms of the same gene are referred to as alleles. Locus and allele may also refer generally to any specific site of DNA that varies in its sequence between individuals.
Within a gene, the sequence itself can be divided into coding and noncoding sequence (Fig. 2). Coding sequences of DNA are read by the cell and used to make a protein. Despite how useful this coding sequence is, it only makes up about 2% of the total DNA in a cell. The remaining 98% of the DNA is noncoding sequence and is interspersed within and between genes.
Noncoding sequence was once pejoratively called junk DNA, but this is a misnomer. In fact, it is estimated that between 25% and 80% of noncoding sequences act as essential regulatory DNA. Along with the epigenome (the set of biochemical marks and molecules that exists in and around DNA), regulatory DNA controls when, how much of, and what form of a protein is made by a gene. It may be located nearby or very far away from the gene it regulates. The gene ADH1A, for example, codes for an alcohol dehydrogenase protein that is only made in the liver, and only during fetal development and infancy. The timing and specificity of the protein’s production are controlled by the epigenome and noncoding regulatory DNA.
DNA, Chromosomes, and the Genome
A human cell’s DNA is predominantly stored in a special subunit of the cell called the nucleus (Fig. 3), which is like the command center for the cell. Some DNA is stored in another cellular subunit, the mitochondrion (Fig. 3), which acts like the power plant for the cell. As cells breakdown during their natural lifecycle, they can also release DNA fragments that float freely in the body before being degraded or excreted.
Figure 3. All cells in the body carry an entire, identical genome, organized into chromosomes. Nuclear DNA is organized into 23 pairs of chromosomes, numbered by size, and stored in the nucleus of the cell. The entire human genome is three billion base pairs in size. Mitochondrial DNA, only about 17,000 base pairs in size, is stored within the mitochondria. Thousands of mitochondria, all containing identical mitochondrial genomes, are present in a single cell.
The total nuclear DNA in a single human cell is about three billion base pairs long. Laid out end-to-end, it would be almost six feet long. In order for each cell to contain all three billion base pairs of DNA in the nucleus, the DNA is compactly organized into strictly controlled structures. These structures are called chromosomes.
Each human cell (except for sperm and egg cells) carries two sets of 23 chromosomes (Fig. 3), each set totaling about 3 billion bases in size. One set is inherited from the father, and one set is inherited from the mother. The chromosomes are numbered from 1 to 22 in order of size (chromosome 1 being the largest), with the additional sex chromosomes X and Y. A genetically female individual typically carries two X chromosomes, and a genetically male individual typically carries one X and one Y chromosome. Unlike the rest of the cells, sperm and egg cells only carry a single set of the 23 chromosomes.
The totality of all of an individual’s DNA (all 23 pairs of chromosomes) is referred to as that individual’s genome. Historically, it was difficult for scientists to study an individual’s complete genome, and so scientists focused on characterizing single genes through the study of genetics. Today, new technologies allow scientists to examine the entire genome and the interrelationships of genes as the study of genomics. In general, the terms genetics and genomics are pften used interchangeably.