Saturday, December 24, 2005

Detecting Natural Selection (Part 5)

Allele and Genotype Frequencies in Populations

This is the sixth of multiple postings I plan to write about detecting natural selection using molecular data (ie, DNA sequences). The first post contained a brief introduction and can be found here. The second post described the organization of the genome, and the third described the organization of genes. The fourth post described codon based models for detecting selection, and the fifth detailed how relative rates can be used to detect changes in selective pressure

The previous analytical techniques we have discussed all deal with comparing sequences from different species (or homologous sequences that resulted from gene duplication). From here on out we will be discussing how variation within a species can be used to detect natural selection at the sequence level. To do this we must first address what we expect when there is no selection. This post deals with expected allele and genotype frequencies and changes in allele frequencies between generations (something I have written about before). Subsequent posts will use more recently developed analyses that allow us to detect selection by sampling allele frequencies in a population.

Before we begin discussing how to detect natural selection we must lay out our neutral expectation (ie, null hypothesis). Let’s begin by assuming a one locus, two allele model, where the two alleles are given by A and a. We can use a Punnett square to determine the expected frequency of each genotype in a mating between two heterozygotes.


A

a

A

AA

Aa

a

Aa

aa

We expect one quarter of the progeny to have the genotype AA, one quarter to have the genotype aa, and half to be heterozygous (Aa). We can extend this model to determine the expected genotype frequencies in a population after one generation of random mating. Let the frequency of allele A in the first generation be p, and the frequency of allele a is given by q (p + q = 1).


p

q

p

p2

pq

q

pq

q2

As you can see from this table, when there is random mating, a large population, no mutation, no migration, and no natural selection, the genotype frequencies in the second generation can be given by the following equations (in terms of allele frequencies in the previous generation):

  • Freq(AA) = p2
  • Freq(Aa) = 2pq
  • Freq(aa) = q2

We can then use these formulas to determine the allele frequencies in the second generation. Let p' and q' be the allele frequencies of A and a in the second generation, such that:

  • p' = p2 + pq
  • q' = q2 + pq

As I have previously described, Hardy and Weinberg showed that random mating on its own does not change allele frequencies. We can see that by rearranging the equation for p' given above:

p' = p(p + q)

As mentioned above, p+q =1, so p' = p, and we have proof that random mating alone does not alter allele frequencies (the frequency of allele a does not change because q' = 1-p', which is equivalent to q'=q).

Natural selection, however, can lead to changes in allele frequencies between generations. I detailed how to determine the expected allele frequencies after selection given the allele frequencies before selection and the fitness of the different genotypes in my post on mean fitness and genetic load. We can also derive the marginal fitness of the alleles (remember, fitness is a measure of the number of progeny left per individual carrying a particular genotype, and in diploid organisms the genotype consists of two alleles), and we get the following results:

  • WA = pWAA + qWAa
  • Wa = qWaa + pWAa

where WA and Wa are the marginal fitnesses of alleles A and a, and WAA, WAa, and Waa are the fitness of each of the genotypes (the number of progeny left by an individual carrying that genotype). As you can see, by measuring changes in genotype frequencies from generation to generation we can estimate the fitness of each genotype, and by measuring changes in allele frequencies we can estimate the marginal fitness of each allele.

These results provide the theoretical framework for all of population genetics, but they are rarely used to detect selection because more powerful techniques have been developed for molecular data. I still felt it necessary to lay out some of these concepts for you so that you can appreciate what will follow: detecting natural selection using nucleotide sequence polymorphism.

2 Comments:

At 8:07 AM, Blogger Peter Ellis said...

How justifiable is the assumption of random mating?

 
At 1:00 PM, Blogger RPM said...

Non-random mating (such as inbreeding) will only affect genotype frequencies (inbreeding will lead to excess of homozygotes); this is detectable by comparing the observed genotype frequencies with those expected based on the observed allele frequencies. Natural selection will usually lead to changes of allele frequencies from one generation to the next.

 

Post a Comment

<< Home