Monday, October 10, 2005

Note to Molecular Biologists: Quit Misusing "Homology"

Homology: Similarity between species [or DNA sequences] that results from inheritance of traits from a common ancestor (Freeman and Herron 1998, Evolutionary Analysis).

Similarity: The quality or condition of being similar; resemblance. See Synonyms at likeness (The American Heritage Dictionary of the English Language, Fourth Edition).

I've been reading a lot of papers on the molecular biology and genetics of recombination and chromosomal aberrations. For those who aren't familiar with the literature, the mechanisms behind recombination and genome rearrangements (such as inversions, chromosomal translocations, and segmental deletions) are quite similar. They both begin with a double strand break (DSB) and subsequent invasion of a highly similar sequence (usually from the homologous chromosome) in order to repair that break. If the repair comes via a homologous chromosome, we get either gene conversion or a crossing over event. If, however, a non-allelic region (i.e., a similar sequence from a non-homologous region) invades, we can get a chromosomal aberration that may have biological consequences such as speciation or disease.

An overview of meiotic recombination between two homologous chromosomes, based on the DSB repair model, showing intermediates and the proteins implicated in their formation by genetic and/or molecular criteria. The points at which several proteins act is still under investigation, and some may be required at several steps. Proteins that function in both mitotic and meiotic HR are indicated in bold; all others are unique to meiosis. A DSB is introduced in a DNA duplex by the Spo11 nuclease, likely acting in conjunction with several other proteins that are known to be required for DSB induction. The 5' ends of the break undergo 5' to 3' resection to yield 3'-OH single-stranded tails. 3) One of these single-stranded tails invades a homologous duplex, displacing a D-loop. 4) Several steps resulting in the formation of a bimolecular intermediate with double Holliday junctions follow. At this point, correction of mismatches in heteroduplex DNA can result in gene conversion. 5) Resolution of the intermediate to yield a product with a parental configuration of flanking sequences (non-crossover). 6) Resolution to yield a crossover product. (

The molecular biologists who write articles about recombination and DSB repair usually refer to a homologous sequence being used as the template. This is usually the case, as the allele from the homologous chromosome is most common site of genetic exchange. The problem arises when they refer to degrees of homology, distinguishing between sequences of "high homology" or "very little homology" in different repair pathways. To an evolutionary biologist, this sounds like nails on a chalkboard. Homology refers to common ancestry -- two sequences are either homologous (they share a common ancestor) or they are not. There is no in between. It's kind of like being pregnant: you either are or you are not, and you cannot be "a little bit pregnant."

What these diction-deficient molecular biologists mean when they say "homology" is "sequence similarity identity." Similarity Identity can be measured as a proportion or percent and sequences can be highly similar (95% of the same nucleotides) or highly divergent with low similarity identity. This scale of similarity identity ranges from zero percent to 100 percent, although determining similarity identity below 25 percent is impossible due to difficulties in alignment. Even though homology is a yes or no issue, we could think of a case where we have a probability of homology. This would not refer to the amount of sequence similarity identity between sequences, but rather the probability that the sequences are homologous. There would be a correct answer (either yes or no), but due to uncertainties in our ability to determine homology, we would be forced to say we are 80% sure that these sequences are homologous. I'm not sure if anyone does this.


At 11:13 AM, Blogger PZ Myers said...

I recall a letter to Science sometime in the late 1980s that made the same point: homology is an inference drawn from the data, while similarity is a measurable property of two sequences.

All the molecular biologists I knew were talking about it, but it didn't change a thing.

At 9:55 PM, Anonymous Anonymous said...

Well, this topic opens a huge can of worms with respect to nomenclature. For example, in this discussion, it would be sloppy language to refer to similarity as a measurable property between two sequences. Identity is a more precise term. Similarity (usually reserved for peptide alignments) refers to an arbitrary scale in physicochemical similarity between amino acid residues. The Grantham Index is frequently used, but you could just as easily use another evolutionary scale. Even when identity is 100%, it is premature to call two sequences homologous until the length of the sequence has been considered. Which is what BLAST does with its E-value. Ugh. Then we get into all kindsa debates like: "how did the genome start?", "what is the difference between homology, orthology, and paralogy?", "how do haploids, diploids, tetraploids, or diploidized tetraploids fit into definitions about homology?", etc. Then my head starts to hurt.

I guess almost all the research evolutionary geneticists do (except mutation accumulation experiments or experimental evolution) is based on inferences of shared ancestry rather than firsthand knowledge of that ancestry. Regarding nomenclature, the key is deciding how much doubt is acceptable in those inferences and what names to give those varying levels of doubt.

At 7:17 AM, Blogger RPM said...

Foiled by my own sticklerism for nomenclature! How ironic!! Thank you anonymous commenter, whomever you may be. Identity is a much better word than similarity.


Post a Comment

<< Home