Detecting Natural Selection (Part 2)
The Organization of the Genes
This is the third of multiple postings I plan to write about detecting natural selection using molecular data (ie, DNA sequences). The first posting contained a brief introduction and can be found here. The second post described the organization of the genome.
In the last entry I mentioned that the term gene is often used interchangeably with protein coding sequence. In this entry, I will describe the structure of protein coding genes. Only a portion of the gene contains protein coding sequences. We will divide the gene into multiple parts: exons, introns, upstream sequence (or 5’ flanking regions), and downstream sequence (3’ flanking regions). The exons contain the protein coding sequence, and they are separated by introns. The introns and exons are transcribed into RNA, the introns are then spliced out to make messenger RNA (mRNA), and then the mRNA (coding sequence) is translated into a protein.
(Note: The majority of life on earth is prokaryotic, and prokaryotic genes do not contain introns.)
The region upstream of the gene usually contains non-coding sequences that control when and how the coding sequences are transcribed into mRNA (the introns and downstream regions may also contain transcriptional regulatory regions). For more on the regulation of transcription check out this site. For the rest of this discussion, we will refer to two types of sequences: non-coding (introns and upstream and downstream regions) and protein coding.
The protein coding sequence of a gene is made up of sets of three nucleotides called codons. When the mRNA transcribed from a gene is translated into a protein, each codon encodes a single amino acid. Just like nucleotides are the building blocks of DNA, amino acids are the building blocks of proteins. There are 64 different 3 nucleotide combinations (4 different nucleotides in combinations of 3, or 4^3 = 64), but there are only 20 amino acids. That means that some amino acids are encoded by multiple codons. We refer to this as the redundancy of the genetic code.
The U’s in this figure are the RNA equivalent of the T’s in DNA.
The three nucleotides in a codon can each be referred to by their position in the codon: first, second, and third. Because of the redundancy of the genetic code, some codons encode the same amino acid as other codons. The codons that encode the same amino acid tend to have the same first and second nucleotide, but differ at the third codon position. For this reason, many mutations of the third codon position do not lead to a change in the amino acid encoded by the codon.
Mutations that do not lead to a change in the amino acid encoded by a codon are known as synonymous. Conversely, those that lead to a change in amino acid are called non-synonymous. Next time we will discuss how we can compare synonymous and non-synonymous differences between two coding sequences to infer natural selection.