Saturday, December 31, 2005

The Toilet Seat Debate Settled

The Science Creative Quarterly, never one to disappoint, has a mathematical treatment of the great toilet seat debate. If you live with a member of the opposite sex and share a single bathroom, a derivation is given to determine who should put the toilet seat up/down and when. It concludes by suggesting:
"In the morning John leaves the seat up after performing #1. In the evening he puts it down.

"This rule may not be precise but it is simple and approximately equitable; moreover the use of a definite rule sets expectations. The seat is put down in the evening to avoid the notorious 'middle of the night surprise'."

This, of course, assumes that middle of the night surprise consists of "her falling in". While this is an obvious concern, the middle of the night mistake is not to be ignored, wherein he forgets that the toilet seat is down and attempts to perform #1. I caution anyone who attempts to apply this model to their home restroom use that us men are notorious for our poor aim, and it is highly advisable to give us a larger target at which to shoot.

Wednesday, December 28, 2005

Molecular Creationism

Sahotra Sarkar has a post up at the Post-Genomics blog on why creationists have been targeting molecular biology (an extremely developed field) rather than other fields, such as evo-devo (which has produced very little scientific output relative to more established areas). It's a good read, and Sarkar does a nice job of incorporating where the Disco PR machine falls into this decision.

Update (12/30/2005): Sarkar's post has been deleted, but a cashed version is available here.

Why Study Speciation Genes?

I mentioned previously that John Wilkins has gotten me thinking about speciation, and his most recent post on speciation contains a bit of a poke at geneticists studying speciation:

“Some researchers, such as Chung-I Wu . . . seek to find ‘speciation genes’ which are modified through this inadvertent selection. This is, I believe, a category mistake, and a logical fallacy.

”The category mistake is to presume that because a genetic distance causes speciation, it is therefore a gene ‘for’ speciation. But there is no prior specification of genes that cause reproductive isolation. A genetic change may do so, or it may not. Identifying that it has done so is something that can only be done post hoc. And there appears to be no particular genes that cause speciation over large evolutionary distances - there may be an active gene complex in Drosophila which when changed causes reproductive isolation, but it doesn't therefore follow that a homolog of that complex will do the same thing in other flies, or in insects generally, or in all animals, etc. In fact, it doesn't even follow that we will find this is the complex involved in all cases of Drosophila speciation, either.”

I will argue that Drosophila geneticists are not so much interested in finding “speciation genes”, but rather interested in understanding the genetics of speciation. To do so requires finding mutations that allow the species boundary to be surmounted. As I have mentioned previously, good species are reproductively isolated, preventing any genetical analysis of the factors that lead to this isolation. Mutations in the “speciation genes” (especially the extremely useful Hybrid Male Rescue mutation), however, allow researchers to cross individuals from different species and study the genetics of speciation.

Geneticists like to find generalities. That is why we study model organisms; they are easy to work with in a laboratory setting and allow us to extend our discoveries regarding molecular biology, cellular function, development, physiology, etc to other related taxa (both closely related and more distant relatives). Wilkins makes a valid point that it is difficult to generalize discoveries regarding the genetics of speciation made within one system (compared to the generalizing done in other fields). For example, finding a speciation gene in one species pair does not say anything about that gene’s involvement in speciation in general. I don’t think you will find a single geneticist who would disagree with that.

Other research on the genetics of speciation does provide the potential to generalize across different taxa (finding these “speciation genes”’ is merely a necessary step in this process for some species pairs). It is difficult to say how common certain patterns are in speciation, but some of the following insights from Drosophila have the potential to be applicable in distantly related taxa.

Researchers have shown that genome rearrangements can influence the distribution of genes responsible for reproductive isolation throughout the genome between sympatric and parapatric species pairs. Some have argued for the same type of relationship between chromosomal inversions and speciation in humans and chimps, but they may have done the analysis incorrectly or even violated one of the assumptions of their model (humans and chimps probably speciated allopatrically). The findings in Drosophila are similar to those observed in Rhagoletis and in sunflowers, suggesting that this model may be common for parapatric speciation.

With the development of high throughput technologies for studying genome wide patterns of gene expression in Drosophila, researchers have begun to investigate expression changes in species hybrids. The disproportionate effect of cis regulatory changes between species pairs (compared to trans changes) has not been tested outside of the D. melanogaster species group. This type of find differs from the discovery of a “speciation gene” because it is a trend observed from the study of a large set of genes mis-expressed in species hybrids. These experiments would not be possible without the use of speciation gene mutants so that interspecific hybrids could be created.

Wilkins will be happy to know that some researchers are interested specifically in finding which genes are involved in maintaining species boundaries. They do so not for the sake of finding the genes, but for determining how those genes evolve. Some focus on the role of natural selection in speciation. Many others are interested in understanding how genetic changes between species lead to behavioral differences that underlie the prezygotic isolation of the species. And yes, some do study individual speciation genes, but they do so to understand why natural selection would favor the evolution of a particular protein and how it interacts with other gene products to prevent interspecific hybridization.

While the individual genes responsible for speciation may not be the same between different species pairs, “speciation genes” are probably under the same evolutionary forces regardless of which species pair one studies. As researchers discover more speciation genes, it will become possible to determine if certain classes of proteins (such as transcription factors) are disproportionately present within the catalog. Biologists do not study the genetics of speciation simply to find speciation genes; they search for speciation genes (and QTLs) to determine how organisms, the environment, and the genome interact to produce reproductive isolation between species.

Saturday, December 24, 2005

Detecting Natural Selection (Part 5)

Allele and Genotype Frequencies in Populations

This is the sixth of multiple postings I plan to write about detecting natural selection using molecular data (ie, DNA sequences). The first post contained a brief introduction and can be found here. The second post described the organization of the genome, and the third described the organization of genes. The fourth post described codon based models for detecting selection, and the fifth detailed how relative rates can be used to detect changes in selective pressure

The previous analytical techniques we have discussed all deal with comparing sequences from different species (or homologous sequences that resulted from gene duplication). From here on out we will be discussing how variation within a species can be used to detect natural selection at the sequence level. To do this we must first address what we expect when there is no selection. This post deals with expected allele and genotype frequencies and changes in allele frequencies between generations (something I have written about before). Subsequent posts will use more recently developed analyses that allow us to detect selection by sampling allele frequencies in a population.

Before we begin discussing how to detect natural selection we must lay out our neutral expectation (ie, null hypothesis). Let’s begin by assuming a one locus, two allele model, where the two alleles are given by A and a. We can use a Punnett square to determine the expected frequency of each genotype in a mating between two heterozygotes.









We expect one quarter of the progeny to have the genotype AA, one quarter to have the genotype aa, and half to be heterozygous (Aa). We can extend this model to determine the expected genotype frequencies in a population after one generation of random mating. Let the frequency of allele A in the first generation be p, and the frequency of allele a is given by q (p + q = 1).









As you can see from this table, when there is random mating, a large population, no mutation, no migration, and no natural selection, the genotype frequencies in the second generation can be given by the following equations (in terms of allele frequencies in the previous generation):

  • Freq(AA) = p2
  • Freq(Aa) = 2pq
  • Freq(aa) = q2

We can then use these formulas to determine the allele frequencies in the second generation. Let p' and q' be the allele frequencies of A and a in the second generation, such that:

  • p' = p2 + pq
  • q' = q2 + pq

As I have previously described, Hardy and Weinberg showed that random mating on its own does not change allele frequencies. We can see that by rearranging the equation for p' given above:

p' = p(p + q)

As mentioned above, p+q =1, so p' = p, and we have proof that random mating alone does not alter allele frequencies (the frequency of allele a does not change because q' = 1-p', which is equivalent to q'=q).

Natural selection, however, can lead to changes in allele frequencies between generations. I detailed how to determine the expected allele frequencies after selection given the allele frequencies before selection and the fitness of the different genotypes in my post on mean fitness and genetic load. We can also derive the marginal fitness of the alleles (remember, fitness is a measure of the number of progeny left per individual carrying a particular genotype, and in diploid organisms the genotype consists of two alleles), and we get the following results:

  • WA = pWAA + qWAa
  • Wa = qWaa + pWAa

where WA and Wa are the marginal fitnesses of alleles A and a, and WAA, WAa, and Waa are the fitness of each of the genotypes (the number of progeny left by an individual carrying that genotype). As you can see, by measuring changes in genotype frequencies from generation to generation we can estimate the fitness of each genotype, and by measuring changes in allele frequencies we can estimate the marginal fitness of each allele.

These results provide the theoretical framework for all of population genetics, but they are rarely used to detect selection because more powerful techniques have been developed for molecular data. I still felt it necessary to lay out some of these concepts for you so that you can appreciate what will follow: detecting natural selection using nucleotide sequence polymorphism.

On Bullshit

I just finished reading Harry G. Frankfurt's On Bullshit -- well, "finished" is a bit of a misnomer, as it was more like started and finished within an hour. It's a fun read, but not really worth the $10; it's hard to justify charging that much for a clever essay (not to mention the absurdity of binding a single essay as a hard cover book).

Frankfurt, an emeritus professor of philosophy at Princeton, inspired by the amount of bullshit spewed forth in modern society, sets out to determine what makes a statement bullshit. He concludes that something is bullshit when a person makes a declaration with no regard to the known facts and current evidence (whether or not they think they are lying is irrelevant). Does this remind you of anything or anyone? I found one particular passage quite fitting given the recently concluded proceedings in Dover:

"Bullshit is unavoidable whenever circumstances require someone to talk without knowing what he is talking about. Thus the production of bullshit is stimulated whenever a person's obligations or opportunities to speak about some topic exceed his knowledge of the facts that are relevant to that topic. This discrepancy is common in public life, where people are frequently impelled -- whether by their own propensities or by the demands of others -- to speak extensively about matters of which they are to some degree ignorant."

How to Tell When People Don't Respect Your Field

Take that Cosmic Variance:

Friday, December 23, 2005

Weekly Random Ten (23 December 2005)

Last Random Ten of the Year Edition

With Christmas, Chanukah, and New Years going on, I won't have time to post another Weekly Random Ten before the year is out; you can consider this post a wish of happy whatever the hell it is you're celebrating this year. Unless it's Kwanzaa -- that's not a real holiday. If it was, I'd wish you a Killer Kwanzaa.

Happy Festivus to the rest of us. Now, go drink some egg nog, exchange secular holiday season presents, and remember why you only see your family once a year. Oh, and here's your evolgen Random Ten:
  1. Tilt - Berkeley Pier
  2. Fifteen - Landmine
  3. Me First and the Gimme Gimmes - Fire and Rain
  4. Wu-Tang Clan - Gravel Pit
  5. The Von Bondies - Crawl Through the Darkness
  6. Coheed & Cambria - The Crowing
  7. The Bouncing Souls - Born to Lose
  8. Elton John - Rocket Man
  9. Jimi Hendrix - Purple Haze
  10. 88 Fingers Louie - Two Face Bastard

Too Much Law, Not Enough Science

While everyone is engaged in an orgy of Kitzmiller v Dover legal babble, I'd like to write about some actual science. After a few days of an unreliable internet connection (traveling to see family), I'm finally staying with someone with a fast connection that I can plug in to (well, that wasn't meant to be a double entendre). I'll hopefully have some posts up over the next few days. John Wilkins has been writing some good stuff on speciation that has inspired me. While he prefers the philosophical aspects, I may delve into some of the more experimental analyses on the genetics of speciation. I will also get back my Detecting Natural Selection series, moving on to population genetics.

And, here's a little teaser for you: expect some changes to evolgen in the new year . . .

Tuesday, December 20, 2005

In Case You Didn't Hear . . .

. . . Intelligent Design is out.

I don't think I'll have much else to say about this. As you can see by my recent posts, I'm trying to focus this blog toward science (and ski jumps, television, and bad journalism). Mostly science, though.

Update (21 Dec 2005): The Questionable Authority has posted a roundup of commentary on the Dover decision. Blogarithmicly has some links as well.

Mean Fitness, Genetic Load, and the Misapplication of Population Genetics Metrics

I will probably deal with some of these concepts in my Detecting Natural Selection series, but a couple of letters in the American Scientist (hat tip: Deinonychus antirrhopus) have inspired me to post my opinion. The letters -- one a reply to an article by Paul E. Turner, and the other a reply by Turner to the first letter -- deal with how natural selection shapes the fitness of individuals and populations. In the first letter, Dmitri E. Kourennyi complains:

"In the last paragraph of the 'Cheaters Sometimes Prosper' section, the author mentions an apparent conflict between the prediction of evolutionary game theory in the prisoner's dilemma and the theory of evolution. Cheaters can lower the average fitness of the population. On the other hand, paraphrasing the author's statement, Darwin's theory of evolution suggests that the population becomes better adapted to its environment over time.

"As far as I know, evolutionary theory does not claim that populations are driven toward higher fitness. Evolutionary pressure acts at the level of individuals. As a result, the average fitness of the population usually increases, but it can decrease in some cases, such as when cheaters have an advantage over cooperators and can take over the population."

To which Turner replies:

"In my opinion, Dr. Kourennyi is misreading this sentence in at least two ways. First, his literal reading is that 'steers' is equivalent to 'guides,' where the population is purposefully taken in the direction of increased fitness. As he correctly indicates, natural selection is blind and the process does not drive populations to increased fitness through time. As for Dr. Kourennyi's conclusion, I wrote that '[Prisoner's dilemma] is somewhat counter to Darwin's theory by natural selection.' Selection can favor takeover by cheaters, leading to a surprising (in a Darwinian sense) decrease in mean fitness of the population."

I think they’re both pissing into the wind on this one, as there is no meaningful way to measure the fitness of a population (someone can correct me on this, but please read below for my argument). The two measures I usually equate with population fitness are mean fitness and genetic load. Mean fitness, however, does not have any meaning outside of the context for which it was derived -- as a factor used in determining the expected change in allele frequencies under natural selection. Genetic load, which is calculated using the meaningless statistic mean fitness, is therefore a useless metric.

Let us first examine mean fitness. Under Hardy-Weinberg equilibrium allele frequencies remain constant from generation to generation and can be used to predict the genotype frequencies. If we have a one locus, two allele system (alleles A and a), where the frequency of A is given by p and the frequency of a is given by q (and q = 1 - p), we get the following genotype frequencies:

  • Freq(AA) = p2
  • Freq(Aa) = 2pq
  • Freq(aa) = q2

Now, imagine that one genotype is more fit than the others (leaves more offspring per individual carrying that genotype), so we get the following fitness measurements for each genotype: WAA, WAa, Waa. The fitness of a genotype can be thought of as the number of offspring left per individual carrying that genotype, such that WAA is the number of offspring from an AA individual. To standardize for differences in the number of individuals carrying each of the three genotypes and the fitness of the different genotypes relative to each other, we use the mean fitness:

Wbar = p2WAA + 2pqWAa + q2Waa

The frequency of each genotype after selection can be calculated using the frequency of each genotype before selection, the fitness of that genotype, and the mean fitness:

  • Freq(AA after selection) = p2WAA / Wbar
  • Freq(Aa after selection) = 2pqWAa / Wbar
  • Freq(aa after selection) = q2Waa / Wbar

By favoring particular genotypes, natural selection leads to changes in allele frequencies between generations. The frequency of the A allele can be calculated using the frequency of the genotypes carrying that allele:

p = Freq(AA) + ½Freq(Aa)

If we represent the frequency of the A allele before selection as p and after selection as p', we get the following:

p' = (p2WAA / Wbar) + ½(2pqWAa / Wbar)

This model can be applied to any locus regardless of the number of alleles segregating at that locus (I’ll leave it up to the reader to perform this derivation).

As you can see, Wbar is simply a metric used to account for the fitness of each genotype relative to each other, and is used when we would like to calculate changes in allele and genotype frequencies due to natural selection. It is called “mean fitness”, but it does not measure any meaningful aspect of the population. This did not stop J.B.S. Haldane from using Wbar in his equation for genetic load:

L = (Wmax - Wbar) / Wmax

where Wmax is the theorertical maximum of mean fitness. Haldane decided that mean fitness actually measures something, and that a population with higher mean fitness is, well, more fit. This goes contra to the paradigm of natural selection, in which fitness is measured for individuals not populations.

Kourennyi makes a valid point in his letter when he points out that evolutionary theory says nothing about the fitness of populations, but he slips a bit when he claims that cheaters affect the fitness of a population. The fitness of a population does not measure the health or vitality of a population; it simply standardizes an equation for calculating changes in allele frequencies.

Monday, December 19, 2005

A Few Words on Speciation

John Wilkins has been writing about modes of speciation (who would've thunk it?), using Sergey Gavrilets's recent book, Fitness Landscapes and the Origin of Species, as a framework. I have not read the Gavrilets book, but I am familiar with a lot of the literature on speciation -- mostly that dealing with studies of the genetics of speciation in natural populations. Wilkins has been writing about how we define modes of speciation (allopatric, sympatric, chromosomal, etc), and I like how he has framed the issue. Rather than individual types of speciation, Wilkins shows how there is overlap between the different modes and certain modes tend to occur with other modes. Both posts are worth reading, and I'm hoping he keeps them coming.

While I don't study speciation, what I have read in the literature has gotten me thinking about allopatric and sympatric speciation. I prefer to define sympatry and allopatry based on gene flow rather than geographic range (note that these are not mutually exclusive, as gene flow depends on geographical isolation). In this framework two sympatric population are essentially one single population with complete migration between them (m=0.5). Conversely, two allopatric populations are reproductively isolated (either pre-zygotically or post-zygotically) such that m=0. If 0<m<0.5, the populations are parapatric.

In studying speciation in natural populations, we can sample speciating populations at different points in the process. For example, two populations that recently became geographically isolated give a snap-shot of the early stages of speciation. Two populations that are partially reproductively isolated (due to mediocre mate discrimination, incomplete post-zygotic isolation, etc.) show the intermediate stages of speciation. Finally, two populations that have nearly complete reproductive isolation due to genetic factors provide insights into the later stages of speciation. From studying species pairs at different stages, we can increase our understanding of the entire speciation process.

This brings me to the current paradigms regarding sympatric speciation. Ernst Mayr advocated (and most biologists seem to agree) that allopatric speciation (now defined by geography) is the norm and sympatric speciation is very rare. There are some examples of geographical sympatric speciation (the Rhagoletis system is the most accepted), but they are few and far between. Even the sympatric nature of the Rhagoletis system has been called into question -- traditionally by the argument that host shifts due not really constitute sympatric speciation, and more recently by molecular studies that show the genetic potential for speciation originated allopatrically (defined geographically). Unless we can observe the speciation process from beginning to end (I know, I'm sounding like a creationist) we have no way of knowing that speciation was entirely sympatric.

Biologists can only measure the current (and recent) gene flow between populations. It is difficult to impossible to determine whether two populations ever were geographically isolated at some point in the past. The recent work on Rhagoletis, as well as the best model we have for chromosomal speciation with gene flow suggests that even sympatric speciation requires geographic isolation (or some other reproductively isolation) to allow for genetic differentiation between the populations. When these populations come back into contact, reinforcement favors the evolution of pre-zygotic and post-zygotic isolating barriers due to decreased fitness of the hybrids. If this is the case, we need to redefine the dichotomy, moving away from sympatric and allopatric speciation. Instead, the two modes should be “entirely allopatric” and “allopatry with reinforcement”. This new framework provides two plausible modes of speciation (rather than the extremely unlikely entirely sympatric speciation), but the mechanisms of speciation would still differ between the two modes so the dichotomy is biologically meaningful.

My Own Personal Launching Pad

Yesterday, before I got pissed off at FOX for airing Cuckoo Bananas' address rather than showing the Family Guy on time, I shoveled some snow in my backyard.

It turns out the hill is neither steep enough nor long enough to build up sufficient speed to get good air. In case you're wondering (and can't tell by the tracks in the snow), I'm a skier not one of those dopes who sit in the middle of a run taking up space.

The NIH’s Cancer Genomics Project

In a move that you either view as ambitious (if you are a geneticists) or misguided (if you are a cell biologist) the NIH has unveiled a cancer genomics project. Nature reports the two perspectives:

“The project is a potentially huge undertaking that could take 10 years and cost US$1.5 billion. Its proponents say that tallying up all the genetic mutations in cancer cells may reveal new drug targets.

“But opponents argue that cancer biology is too poorly understood to make such a cataloguing approach viable and say that the money would be better spent on basic research into how cancer functions.”

The project will consist of the following steps:

  1. Sample two or three types of tumors from multiple individuals (I’m guessing on the orders of hundreds or thousands of cancer patients).
  2. Perform “high-throughput” analysis on the cells -- the Nature article indicates gene expression as one type of analysis, but does not describe any others. Maybe Orac knows of some other analyses.
  3. Sequencing of about 2,000 genes from each of the tumors. How will the researchers choose these genes? I have no idea, and I don’t know if the heads of this project know.

A Washington Post article describes the different genetic causes of cancer, something the Nature article fails to do:

“Some cancers are caused by a mutation in a single gene that normally keeps a cell from making offspring. Others are caused by the mistaken duplication of a gene that promotes normal cell division, boosting its reproductive capacity to abnormal levels.

“In other cases, entire pieces of chromosomes -- long, gene-bearing strands of DNA inside cells -- break off and reattach to other chromosomes, inducing spurious and unregulated growth signals.

“Still other cancers result when rogue molecules attach themselves to genes whose job is to control cell division. Such “epigenetic” changes are invisible on standard tests that look for mutated genes because the genes themselves are healthy but are being manhandled by the other molecules.”

From the brief descriptions of the project I have been able to find, the proposed NIH research plan will only deal with mutations in single genes. Unless the “high-throughput” analyses that go unnamed consist of karyotyping or in situ probes for duplications, this study won’t even skim the surface of the genetics of cancer. I don’t think you can even do either of these in a high-throughput manner. The epigenetic effects may be picked up by the gene expression analysis, but it will be difficult to distinguish between mutations to regulatory regions (cis effects), mutations to transcription factors that control the expression of the gene (trans effects), and epigenetic effects.

Without more detail, it’s hard to judge the merits of this project. The research will definitely result in some worthwhile discoveries, but it may be wiser to spend the money on more detailed analyses. Just because “high-throughput” works for genome sequencing, doesn’t mean that Francis Collins needs to apply it to every study the National Human Genome Research Institute gets involved in.

Sunday, December 18, 2005

Reason #342 why FOX sucks

They cut into the Family Guy's time slot to show President Cuckoo Bananas' address to the nation about Iraq. I guess they didn't get the memo: Peter Griffin = good, George W. Bush = douchebag.

Fucking assholes.

Proponents of the theory are liars and bad scientists

The New York Times has some more coverage on the Dover Panda Trial, this time consisting of a profile of Judge John E. Jones III. The article contains a paragraph regarding the potential scope of Judge Jones's decision:

"Legal experts said the big question was whether Judge Jones would rule narrowly or more broadly on the merits of teaching intelligent design as science. Proponents of the theory argue that living organisms are so complex that the best explanation is that a higher intelligence designed them."

Other than that, the article does not deal with the issues of the trial, instead focusing on Jones's political background, the attention of the national media, and his sharp wit. The author, Laurie Goodstein, does not bother to explain that intelligent design has no scientific merit, and the arguments by its proponents have been thoroughly disproven. Instead, she uses a description of intelligent design that we have grown so accustomed to reading: “Proponents of the theory . . . so complex . . . intelligently designed.” A simple Google search reveals how common this type of phrase is:

  • New York Times (18 Oct 2005): “Proponents of intelligent design, however, argue that living organisms are so complex that the best explanation is that a higher intelligence designed them.”
  • New York Times (4 Nov 2005): “The More center's lawyers put scientists on the witness stand who argued that intelligent design - the idea that living organisms are so complex that the best explanation is that a higher intelligence designed them - is a credible scientific theory and not religion because it never identifies God as the designer.”
  • New York Times (27 Sep 2005): “Intelligent design is the idea that living organisms are so complex that the best explanation is that some kind of higher intelligence designed them.”
  • MSNBC (8 Nov 2005): “Intelligent design holds that the universe is so complex that it must have been created by a higher power.”

Two of the New York Times articles were written by Goodstein (and the MSNBC article substitutes "universe" for "living organisms"), but it appears that this description has circulated throughout the journalistic circles. I don’t have much of a problem the phrasing (it’s intelligent design in a nutshell -- in fact, it’s all intelligent design has to offer), but it would be nice to see the following:

Proponents of intelligent design argue that living organisms are so complex that they must have been designed by a higher intelligence. They have presented no evidence for their claims, have not experimentally tested their hypothesis, and their arguments are widely rejected amongst biologists.

I don’t even need to read a description of evolutionary theory (it’s so complex that it would take a few paragraphs to describe, and the crappy writers at the Times would probably screw it up), as long as the description of intelligent design is accompanied by the qualifier that it’s absolute bullshit.

Saturday, December 17, 2005

Detecting Natural Selection (Part 4)

Phylogenetics and Relative Rates

This is the fifth of multiple postings I plan to write about detecting natural selection using molecular data (ie, DNA sequences). The first post contained a brief introduction and can be found here. The second post described the organization of the genome, and the third described the organization of genes. The fourth post described codon based models for detecting selection.

The simple codon based model for detecting natural selection that I described previously (dN/dS) involves comparing two homologous sequences. If we have three or more sequences, we can create a rooted phylogeny, and four or more sequences allow us to create an unrooted phylogeny. With the analysis of dN and dS we were not concerned with which lineage the substitutions occurred on. In our relative rate analysis we will be determining were in the tree (on which branch) the substitutions occurred.

I will not get into the detail regarding different algorithms for creating phylogenies, and I will assume that we already know the evolutionary relationship of the sequences we are comparing. If you are interested in learning more about how phylogenies are created, I would recommend starting with this book and following the literature citations therein. I will point out that the length of the branches represents the number of substitutions that have accumulated along a particular lineage. Phylogenies can be created with either DNA sequences or translated protein coding sequences (amino acid sequences) depending on if the sequences are closely related or not. (DNA sequences are preferred for closely related sequences because they evolve faster and accumulate more substitutions in a shorter period of time, while amino acid sequences are preferred for more distantly related sequence because they evolve slower.)

Recall from our codon based comparisons that we have, essentially, three different selective scenarios:

  1. Neutral evolution - a sequence is evolving without the constraint or influence of natural selection
  2. Purifying selection or selective constraint - natural selection is acting as a conservative force, restricting the evolution of a sequence
  3. Positive selection - natural selection is driving the evolution of a sequence, causing it to evolve faster than the neutral expectation

Relative rate tests compare the selective constraint along two lineages of a phylogeny. Assuming all other factors are equal (ie, population size, mutation rate, gametogenesis, generation time), if the selective constraint along both lineages is equal, the branch going from sequence A should be of equal length to the branch leading to sequence B. If, however, the selective constraints differ, the branch lengths should be unequal.

If there are more substitutions along one lineage than the other, we must invoke some explanation for these differences. In some cases, differences in rates can be attributed to life history differences of the species from which the two sequences come. For example, mice have much shorter life spans than humans, and this has been used to explain differences in rates of evolution between the two lineages. Body size and metabolic rate can also affect rates of evolution, as can whether the sequence is from an X-chromosome or autosome -- autosomes are found equally in males and females, whereas X-chromosomes are disproportionately found in females, and male gametogenesis involves more cell divisions (more potential for mutation) than female gametogenesis.

We can control for the effects of life history on rate differences by selecting two genes from a single genome (ie, duplicate genes) or obtaining sequences from organisms with similar life histories. If we are interested in comparing sequences from two species with known life history differences, we can sample multiple sequences from each of those species (as well as our third species). Life history should affect all sequences equally, whereas selection should only affect a subset of the sequences.

If we observe differences in rates along two lineages after controlling for other variables, we conclude that the selective constraints along the two lineages differ. The difference can be due to increased purifying selection along the slowly evolving lineage (the shorter branch) or positive selection along the rapidly evolving lineage (the longer branch). Distinguishing between these two hypotheses requires more information (such as difference in synonymous and non-synonymous substitutions). Some of the other assays for natural selection that I will describe can also be used to discriminate between increased selective constraint and positive selection.

I am not sure if I will post another entry on comparative and phylogenetic analyses, or if I will move on to discussing nucleotide polymorphism in my next post. If you have any suggestions, or further questions, please post them in the comments.

Friday, December 16, 2005

Weekly Random Ten (16 December 2005)

Odds and Ends Edition

Alright, so it's been a couple of weeks since the last evolgen Weekly Random Ten. I'm sorry. Really, I am. I know how much none of you look forward to reading this warn out staple of my humble blog. It's also been a while since I published my most recent entry in the Detecting Natural Selection series. I hope to have the next post (on phylogenetics and relative rates) up soon (once I write it). I promise. Really, I do.

As for the post I promised on William Harris's talk, well, that's not gonna happen. Maybe I'll post what I've got written (about half of it), but I can't motivate myself to write any more about that intellectually vacuous drivel like some people. The same goes for he who shall not be named (aka, John Davison . . . oops, I guess I named him), who now has his own blog -- which I refuse to visit or link to (you can find it yourself if you need to see it, just search "I know nothing about chromosomal rearrangements and evolution").

Winter's almost here, though, judging by the recent snow/ice storm, I'd say it's already here. Don't forget to support our troops in the War on Christmas. I'm gonna wear my Cross-Buster t-shirt along with a Santa cap (y'know, cause the juxtaposition is such irony, and irony is so cool) while listening to this week's evolgen Random Ten.
  1. Blood, Sweat & Tears - I Love You More Than You'll Ever Know
  2. Foghat - Slow Ride
  3. Bouncing Souls - Say Anything
  4. AFI - Advances in Modern Technology
  5. Long Beach Dub Allstars - Grass Cloud
  6. Tilt - Past the Point
  7. Nena - 99 Luftballons
  8. Rancid - Stand Your Ground
  9. The Fraternity of Man - Don't Bogart Me
  10. Stealers Wheel - Stuck in the Middle

The Competitive World of Gene Naming

Many genes have names, most do not. Drosophila geneticists are among the most clever at naming genes. Historically, genes have been named by the phenotype of the first mutation in that gene. For example, the first Drosophila mutation discovered disrupted a pigmentation pathway and cause the fly's eyes to be white (as opposed to the wild-type red hue). Thomas Hunt Morgan's lab named this mutation white, and now the gene is also known as white.

The white gene is a particular bad example of clever naming, but it does illustrate the naming process. Some of the more fun names include fruitless (that I've blogged on before and used to be known as fruity), tinman (problems with heart development), and Scott of the Antarctic (which makes me think of this). Other research communities have different ways of naming genes. Yeast geneticists are among the least interesting (when it comes to naming genes), using a combination of three letters and a number. The mouse/rat community seems to have dry naming rules as well, but it's hard to tell. One of my all time favorite gene names pays homage to the second greatest video game character of all time (behind Mario, of course), Sonic the Hedgehog. Mutations in the Drosophila gene hedgehog (which affects segmentation patterning) causes the embryo to look like a balled up hedgehog. When the homologous gene was identified in zebrafish in 1993, it was named sonic hedgehog after the popular video game.

Whole genome sequencing has added an extra dimension to the naming game. One of the first steps after assembling a completed genome is the annotation. This requires identifying all of the protein coding genes, tRNAs, rRNAs, transposable elements, and any other class of sequences in the genome using gene finding algorithms and alignments with closely related known sequences (including other sequenced genomes). The annotation process, however, is drastically different than traditional gene naming. During annotation, genes are assigned unique identifiers (usually some sequence of letters and numbers) regardless if the gene was named using a classic mutation experiment. Genes that were never named via mutagenesis (or some other molecular analysis of function) only go by their boring annotation identifier, but previously characterized genes get to keep their old names as well.

That brings me to a sad piece of news reported in Nature:
"A cancer research institute has been threatened with legal action by the US branch of Japanese video-game franchise Pokemon, after one of its researchers borrowed the company's trademark to name an oncogene."
The authors of the paper in question use the name Pokemon as a clever abbreviation for the genes functional description, POK erythroid myeloid ontogenic. Around the time of publication the paper received a fair bit of press, and people began saying things like, "Pokemon causes cancer." Not surprisingly, that got the folks behind the TV-show/video-game in a tizzy. Pokemon may cause seizures (the TV show not the gene), but the show does not cause cancer (although aberrant expression of the gene is found in cancer cells).

Wednesday, December 14, 2005

Do Wikis Work?

Nature has examined the efficacy of Wikipedia (actually, accuracy would be a better word). Compared to Encyclopedia Britannica, wikis fair quite well when it comes to science articles:
"The exercise revealed numerous errors in both encyclopaedias, but among 42 entries tested, the difference in accuracy was not particularly great: the average science entry in Wikipedia contained around four inaccuracies; Britannica, about three."
Wikipedia often gets a lot of bad press for the gross errors -- such as people editing their own entries with alterior motives or providing misinformation -- but the Nature study revealed four major errors from each encyclopedia in the 50 entries examined. Nature sent out unlabeled entries from both Britannica and Wikipedia to experts in fields relating to the entries; these entries included Australopithecus africanus, Cambrian explosion, Dolly, Kin selection, Ernst Mayr, Mutation, Punctuated equilibrium (these are the entries that would probably interest regular readers of this blog).

One major complaint about wikis is not the content, but the manner of presentation:
"Editors at Britannica would not discuss the findings, but say their own studies of Wikipedia have uncovered numerous flaws. 'We have nothing against Wikipedia,' says Tom Panelas, director of corporate communications at the company's headquarters in Chicago. 'But it is not the case that errors creep in on an occasional basis or that a couple of articles are poorly written. There are lots of articles in that condition. They need a good editor.'"
The article seems to suggest, however, that the wikis don't so much need editors, as more experts writing articles. While armchair scientists may have some knowledge about a particular field, the experts would be able to add an extra dimension that people outside of the research community lack. I can see the Wikipedia entries being more akin to informal reviews that are distilled for the general public.

Nature also has a short report on this story.

Tangled Bank #43

The Tangled Bank

Tangled Bank #43 is up -- yes, you should check out the linkage.

Tuesday, December 13, 2005

Can Intelligent Designers Inseminate Virgins?

Hooray for Humanzee

John Wilkins points us to the inspiration for King Kong -- a Russian scientist’s failed attempts to create a human-chimpanzee hybrid. The conversation over at Pharyngula shifted to obstacles of hybridization between humans and chimps. PZ Myers thinks that differences in gene regulation between hybrids would present a greater barrier to hybridization than genome rearrangements. I agree with him that rearrangements themselves would not pose a problem for hybridization -- many species, including humans, are polymorphic for rearrangements. Also, the difference in chromosome number (humans have 23 pairs, chimps have 24 pairs) is not all that important because it results from a fusion of two chromosomes along the human lineage; both species have equivalent amounts of genomic information, it’s just arranged slightly different.

The rearrangements may still be important if they harbor genes responsible for the reproductive isolation between the two species. This is well supported in Drosophila and has been suggested in apes, but not without some disagreement. The theory posits that inversions will prevent the transfer of reproductive isolating factors between speciating populations by suppressing recombination. The problem with applying this theory to humans and chimps is that the model assumes range overlap during speciation, such that rearrangements are necessary to prevent gene flow between the two species in regions of the genome containing hybrid incompatibility factors. From what I understand, an important event during hominid-chimp speciation was the emigration of hominids from rain forests to open grass lands meaning the speciation occurred allopatrically. Hence, we have no reason to believe that the genomic rearrangements differentiating the human and chimp genomes play an important role in the reproductive isolation of the two species.

Human-chimp hybrids and evolutionary intermediates have interested scientist and lay-people alike for quite a long time. One particularly famous case involves a predominantly bipedal chimp named Oliver who was often referred to as a Humanzee (for human-chimpanzee hybrid). Oliver had many behaviors that seemed more human than chimp, leading some people to believe he was either the product of a human-chimp hybridization event or some evolutionary missing link. It turns out, via a simple molecular assay, that Oliver was 100% chimpanzee, and his human-like behaviors were often exaggerated. Oliver represents a good piece of evidence for the role of developmental plasticity in anatomical evolution -- one could imagine that his upright stance could have influenced other members of his community, thereby changing the selective pressures on the morphology of the population.

Studying the genetics of speciation represents a major paradox: the study of genetics (from Mendel to Morgan to current research done today) requires the crossing of different individuals to determine how traits are inherited, but species boundaries prevent those requisite matings. Research on the genetics of speciation can be done using everyone’s favorite model system, Drosophila, by creating mutant flies that can mate across species boundaries. Surprisingly, mutations in single genes can break down those species boundaries, allowing for viable hybridization between D. melanogaster and its close relatives. Geneticists can create inter-specific hybrids using these mutant flies and study the effects of transcriptional regulatory elements on the differences in gene expression (something would obviously interest Dr. Myers). So far, a few speciation genes have been identified, but their protein products do not fall into a particular class -- they include transcription factors and a component of the nuclear pore complex.

It’s hard to refute the idea that transcriptional regulation is important in speciation, given both the analysis of expression differences between species and the identification of transcription factors that prevent hybridization. I guess I’m going to have to agree with the developmental biologist on this one -- the human-chimpanzee hybridization experiments probably failed due to regulatory differences (along with problems in sperm-egg recognition proteins). Also, God probably wouldn’t have wanted it to happen anyway.

Friday, December 09, 2005

Do These People Work at the Gap?

Individuals who are red-green color blind cannot distinguish between the colors red and green (makes sense). One type of red-green color blindness (deuteranomaly) is due to a recessive mutation on the X-chromosome. Men are more likely to be red-green color blind because they only have one copy of the X-chromosome, whereas women can carry the mutant allele on one copy of the X-chromosome and still have normal color vision if they have a wild-type copy on their other X-chromosome.

A new study reveals a hidden benefit of red-green color blindness -- the ability to distinguish 15 shades of khaki.
"They identified 15 shades of khaki that fitted the bill, and tested their prediction by showing two sets of subjects - one with deuteranomaly and the other with normal vision - a series of cards carrying pairs of different khaki shades. It proved to be almost impossible for people with normal vision to tell the colours apart."
At first thought, one might imagine this mutation would be beneficial for men picking out slacks at their local department store, but it turns out that the recessive allele may have aided the color-blind males when hunting.
"Simmons hypothesizes that because deuteranomaly is quite common in human populations, the gene responsible may have once provided an evolutionary benefit. For example, it may have helped them spot potential food items in complicated environments such as grass or foliage, he suggests."
I find this hypothesis to be a bit of a stretch. A simple analysis of the frequency of the allele may reveal that it is maintained in mutation-selection equilibrium. For example, some people have suggested that the prevalence of cystic fibrosis is due to a beneficial quality of being heterozygous for a mutation that causes the disease, but it can be adequately explained by the mutation rate from the wild-type allele to the disease causing allele.

A final note on color-blindness and powerpoint presentations: always make figures in your presentations as black and white as possible. You never know when an audience member (especially an important one) cannot distinguish between red and green. You may end up discussing two trend-lines that some people won't be able to tell apart.

J'like dags?

Afarensis reported some of the popular press surrounding the publication of the dog genome. One item from National Geographic's coverage seemed a bit odd:
"Scientists had previously found that about 5 percent of the human genome sequence appears in the mouse genome. The new study shows that 5 percent of the human genome is also shared with dogs."
This made absolutely no sense to me. It sounds like they are saying that there is 5% sequence identity between humans, mice, and dogs -- this is totally erroneous considering the following quote from the mouse genome publication:
"At the nucleotide level, approximately 40% of the human genome can be aligned to the mouse genome. These sequences seem to represent most of the orthologous sequences that remain in both lineages from the common ancestor, with the rest likely to have been deleted in one or both genomes."
From reading the Nature report of the genome sequence, however, I have discovered the true meaning of the five percent:
"A comparative analysis of the human, mouse and dog by Lindblad-Toh et al. showed that about 5% of the human genome is being maintained by natural selection - suggesting that it has some essential function. Almost all of this sequence is also present in the dog genome. Only 1-2% of the genomes encodes proteins, so there would seem to be an additional common set (about 3%) of functional elements in mammalian non-coding DNA. These common sequences may constitute, for example, regulatory elements, structural elements or RNA genes. Notably, such regions are found mostly within the 0.8 Gbp of ancestral sequence common to human, mouse and dog."
I have not read the actual article (I plan to, and if it's interesting I'll blog on it), but it appears that 5% of the genome is more conserved that expected based on neutral evolution. As they mention, genes are expected to evolve slower (be more conserved), but there is also a substantial suite of regulatory elements that are also under strong purifying selection.

Thursday, December 08, 2005

Support NIH Funding

The Genetics Society of America is encouraging its members to write to their US Representative. If you are currently doing research or have received support from the NIH, please write to your Representative.
We have a second chance to influence the fiscal year 2006 budget for NIH. Before leaving town for the Thanksgiving Day break, the House rejected the conference agreement reached on the Labor-HHS-Education appropriations bill for fiscal year 2006.

While a number of factors contributed to the defeat of this bill, certainly one of the main reasons was the bill's failure to fund critical health and education programs adequately. NIH's growth, for example, would be held to about $250 million (a 0.7% increase), the smallest increase in more than three decades.

The Senate responded by instructing its conferees to support the Senate recommended amount for NIH ($29.4 billion, a $1 billion increase [3.7% increase]). However, the House is resisting adding any more money for NIH.

Therefore, please urge your Representative to support the Senate passed recommendation of $29.4 billion by clicking here:

Tuesday, December 06, 2005

The Greats of Evolutionary Genetics

I often reference the classic minds of evolutionary genetics (Dobzhanksy, Wright, and others), but I tend to leave off one of the most important geneticist of them all, H.J. Muller. Thankfully, James Crow has published an short biography of Muller in Nature Reviews Genetics focusing on Muller's contribution to evolutionary biology.
"Although Hermann Joseph Muller is best remembered for his discovery that X-irradiation induces genetic mutations, for which he won the Nobel Prize, he made many influential contributions to evolutionary biology. Muller was the first to emphasize a gene-centred view of evolution, and he made both experimental and theoretical contributions to our understanding of speciation. He also reached insightful conclusions about how genes interact, how they are acted on by natural selection, and how their evolution is influenced by sexual reproduction and population structure. His influence on genetics and evolution was therefore substantial and wide ranging . . . In fact, Muller's interest in evolution pervaded his entire career."
Muller began his career in Thomas Hunt Morgan's lab at Columbia working with Sturtevant and Bridges. While Sturtevant is best known for constructing the first genetic map, Muller is best known for understand the physical nature of chromosomes. Muller received his Nobel prize for his work on X-irradiation and mutation, but he also made other discoveries regarding the homology of chromosomes. He showed that the chromosome arms in Drosophila have the same genes (are homologous) between multiple species. For this discovery, the chromosome arms are known as "Muller's Elements".

Some of Muller's other contributions to evolutionary genetics include:
  • The importance of duplicate genes (through examination of the Bar locus)
  • The importance of sexual reproduction in reducing genetic load (Muller's Ratchet)
  • A multi-locus model for the speciation via hybrid incompatibility factors (Dobzhansky-Muller incompatibilities)
I feel like I should also include some of the low-lights of Muller's career. He spent a lot of time bouncing around between academic institutions. He left the United States in 1932 (his communist beliefs conflicted with the current social environment) for Germany, only to see the rise of Hitler's Nazi party less than one year later. This led to a move to Russia, where research on evolutionary biology was stagnated by Lysenkoism. Eventually, he returned to America, but had difficulty finding a faculty position.
"Having been to Russia, he was branded as a communist, and having spoken out against Lysenko, he was branded as a fascist. With wry amusement, he once said that at least both could not be true."
Eventually, Muller was hired by Indiana University in 1945, and earned his Nobel prize in 1946. His work is some of the most important in both genetics and evolutionary biology. If you have access to the Nature publications, I suggest you read the entire article -- Crow tells both a human story and the story of a scientist.

What, US worry?

This is more down Chris Mooney's alley, but still:

Sunday, December 04, 2005

The Fatal Flaw

It's that time of the year again, when the snow begins to fall, houses are outlined by white lights, and old white men don brightly colored sports coats. No, I'm not talking about Christmakwanzakah -- it's college football's bowl season! (Also known as "The Single Worst Way to Decide a National Champion".) This year, the Bowl Championship Series (BCS) lucked out and only two teams finished with undefeated records. If only it were always so easy.

In the previous two years, the BCS has seen three teams finish undefeated and only one team finish undefeated. In this current system (better than what existed prior to the BCS, worse than what we would have in an ideal world) a mixture of human voters and computer algorithms decide the top two teams in the nation (in addition to ranking the top 25 teams). These two teams play each other in early January for the right to be called the ESPN/USA Today/Coaches Poll/BCS national champion, and get to keep this pretty crystal trophy:

In situations where there are more than two teams that can legitimately claim the right to play in this game, we have what can be lightly referred to as "controversy". This year, only Texas and USC finished the regular season with unblemished records, removing any doubt regarding who should play in the national championship game. This, however, does not mean that the BCS is devoid of controversy, as there are 3 other high payout bowl games involved. The participants in the four games (the national championship and the three other games) are determined as follows:
  • 6 automatic invitations: The six conferences involved (the Big East, ACC, SEC, Big Ten, Big 12, PAC10) each get an automatic invitation for their conference champion.
  • Other automatic invitations: If a team from one of the BCS conferences does not win its conference, but finishes in the top four in the BCS standings, they get an automatic invitation to one of the BCS games. If a team from a non-BCS conference finishes in the top six in the BCS standings, they get an automatic invitation.
  • At large invitations: Finally, if there are any invitations remaining, any other team that finishes in the top twelve in the BCS standings can be invited to a BCS game.
This year, after the automatic invitations were handed out for conference champions, one non-BCS team (Notre Dame) finished sixth in the BCS standings, earning an automatic invitation. Additionally, a team from a BCS conference that did not win its conference (Ohio State) finished fourth in the standings and earned the final automatic bid. This means that there were no at large invitations available, much to the University of Oregon's dismay. You see, they finished fifth in the final rankings, but were not one of the eight teams chosen to play in a BCS game due to eight automatic invitations being handed out.

There isn't much Oregon can do about being left out, but is it possible for more than eight teams to qualify for BCS bowl games? Imagine the following scenario:
  • The top two teams in the BCS poll automatically qualify for the national championship, but one of the teams fails to win its conference (it has happened before) -- 2 bids
  • Another bid is given to the team that won the conference that one of the top two teams did not win, and four other bids are given to each of the other conference winners -- 5 bids
  • One other non-conference champion finishes in the top four (like Ohio State did this year), and a team from a non-BCS conference finishes in the top six (like Notre Dame this year, or Utah last season) -- 2 bids
That's nine bids for eight available positions (there are other possible combinations that I'll leave for the reader to figure out). Remember, being a conference champion says nothing about where you rank in the BCS poll (this year, the six conference champions finished ranked 1, 2, 3, 7, 10, and 22 in the final poll). Considering that all of the possible events needed to cause a total breakdown of the BCS have occurred at least once, this does not seem like it's too much of a stretch.

So, what can the BCS do to remedy this? Next year, there will be an extra game, which means two more teams will qualify (ten total). Is it possible for more than ten teams to automatically qualify for BCS games?
  • The six conference champions automatically qualify, but only one of those teams is ranked in the top six in the BCS rankings, and none are ranked in the top two -- 6 bids
  • Both teams ranked 1 and 2 fail to win their conference championship -- 2 bids
  • One of the following combinations: (a) 2 non-conference champions finish ranked 3 & 4, and at least one non-BCS team finishes ranked 5 or 6; (b) at least one non-conference champion finishes ranked 3 or 4, and 2 non-BCS teams finish ranked 5 & 6 -- at least 3 bids
Ok, so this is more of a stretch than the first scenario, but it's still theoretically possible. What's important is we have shown that there is a fatal flaw in the BCS without even invoking the problems of picking the national champion -- there may simply be more teams automatically qualifying for BCS games than there are spaces for those teams in the games.

Of course, picking only two teams to play for the national championship only works in seasons in which two teams have better records than all other teams, and this only happens when exactly two teams finish with undefeated records. Some people have suggested a "plus one" system, in which an extra game is played after the regular bowl season. This system would not work in seasons in which there are two obvious choices (for example, this year) because it seems absurd to play an extra game -- just let those top two teams play for the national championship.

The only remedy that would work in all scenarios is a simple playoff between the top teams in the nation. This could be a four team playoff with the top 4 teams in the BCS rankings, a 6 team playoff with the top six teams (the top two teams get a first round bye), or an 8 team playoff (picking teams just as they are done with the current system, only eliminating the fatal flaw somehow). Why don't they do this (there are playoffs in every other collegiate sport in the United States, including at the lower levels of college football)? Because the university presidents claim that the additional games due to a playoff would detract from the time the players should devote toward academics. Oh, the bitter-sweet taste of irony!

I apologize if you expected this post to contain anything relating to evolution or genetics. This was far to important to not post.

Saturday, December 03, 2005

Genomics in the Post-Genomics Era

Sahotra Sarkar points us to the new Post-Genomics blog, with an impressive list of contributors. I'm having a hard time pinning down the exact dates of the Pre-Genomics, Genomics, and Post-Genomics eras. Francis Collins and colleagues presented a "blueprint for the genomic era" in 2003. Nature, however, also published a jobs editorial entitled Bioinformatics in a post-genomics age in 1997. It seems illogical that the post-genomics age occurred prior to the genomics era.

So, does "post-genomics" really mean anything? In short, no. The longer answer is, well, not really, but kinda, if you look at it one way. Sort of. I see "post-genomics" as a synonym for "the genomics era". There are, in fact, only two eras. We have the time prior to whole genome sequences, or the pre-genomics era. In the last ten years, we have seen a dramatic increase in the amount of sequence data publicly available, including many whole genome sequences. With multiple genome sequences available, we are now in the (post-) genomics era. Call it whichever you prefer, just make sure you understand that there really isn't a difference between the post-genomics and genomics eras.