The Evolution of Drosophila Non-coding DNA
Peter Andolfatto's recent article on the evolution of non-coding DNA in Drosophila has elicited some comments from the blogosphere. A lot of people are referring to it as an analysis of "junk DNA," which can be traced back to this press release by UCSD and this editor's summary in Nature. I am bothered by the term "junk DNA" even when people put the scare quotes around "junk" implying they don't really mean junk or they are not sure if it really is junk. Let's just call it "non-(protein) coding DNA" and relieve it of all the associations with trash and waste.
I am going to try a new approach to commenting on a research article. I have yet to read the entire paper -- I have skimmed the abstract and looked at some of the figures -- so I am going to comment as I take a closer look at it. From what I understand, Andolfatto has looked at whole genome sequences from closely related Drosophila species and found patterns of sequence divergence and polymorphism in non-coding sequences that are not consistent with the neutral model of molecular evolution. I am hoping he can convince me that these patterns are due to selection on non-coding sequences and not demography or hitchhiking/background-selection due to selection on linked coding sequences.
Previous studies of selection on non-coding sequences has focused on conserved regulatory elements, but this approach does not allow inferences of positive selection on non-coding DNA.
"This finding suggests that taking an approach based on sequence conservation alone may lead to a biased view of regulatory evolution. Functionality of DNA sequences implies that they can be subject to both negative and positive selection. If a significant fraction of divergence between species observed in non-coding DNA is positively selected rather than selectively neutral or constrained, this could lead to underestimates of the functional importance of non-coding DNA and cause researchers to overlook the contribution of arguably the most interesting class of mutations in genome evolution -- those reflecting adaptive differences between populations and species. "Andolfatto took a different approach toward identifying functional non-coding sequences. He looked at coding and non-coding sequences from twelve D. melanogaster individuals and one D. simulans individual from the X-chromosome. He divided the non-coding sequences into 5 classes: 5' UTRs, 3' UTRs, introns, intergenic sequences within 2kb of a gene, and intergenic sequences more than 4kb from a gene. He then calculated a variety of population genetics statistics based on these sequences to determine if any of the non-coding DNA displays signatures of natural selection. I would expect that the UTRs (sequences that are transcribed, but not translated) are under more functional constraint than the intergenic regions and probably also display more signatures of positive selection. I also would expect that the introns would be constrained and have more evidence of positive selection (due to regulatory elements located within), and that the intergenic sequences located closer to genes are under more selection than intergenic sequences further from genes.
So, what does Andolfatto's data suggest? Surprisinglyy, non-coding DNA is more conserved than silent sites within coding DNA. Silent (or synonymous) sites are nucleotides within coding sequence that can be mutated and not change the amino acid encoded by the codon due to the redundancy of the genetic code. Other research has shown that synonymous sites are under weak selection, but Andolfatto finds that the pattern of polymorphism at non-coding sites resembles that at non-synonymous sites (sites within coding sequences that when mutated lead to a codon encoding a different amino acid) more than synonymous sites. This pretty much rejects the possibility that non-coding sites display patterns of selection because they are linked to coding sequences under selection -- we would expect the same patterns observed at non-coding sites when we look at silent sites because they too would be linked to the selected sequences.
Looking at the relationship between polymorphism and divergence at the synonymous sites, non-synonymous sites, intergenic regions, introns, and UTRs, Andolfatto finds a significant excess of divergence at non-synonymous sites and UTRs. Under a neutral model, we would expect divergence to be a good predictor of the polymorphism at a locus. Too much divergence suggests that positive selection has led to the fixation of mutations in a region. This means that the non-synonymous sites and UTRs are probably under positive selection. Remember, previous studies have been able to identify purifying selection in these types of sequences, but you need the polymorphism data to infer positive selection. By doctoring the data set a bit (eliminating rare variants), Andolfatto also finds evidence for positive selection at the other non-coding sites relative to synonymous sites.
The evidence Andolfatto presents paints a picture of sequence evolution where non-coding sequences are under both selective constraint and driven to fixation by positive selection. This is consistent with regulatory regions playing as important a role in adaptive evolution as protein coding sequences. I guess the evo-devo folks have been on to something all this time, it just took a population geneticist to produce the evidence.
Andolfatto, P. 2005. Adaptive evolution of non-coding DNA in Drosophila. Nature. 437: 1149-1152.