In a previous post
I mentioned three recent papers
from Nature Genetics
that deal with detecting and describing polymorphic deletions using SNPs.
I only described the content of one of the papers, as I spent about half of the post trying to give readers some background as to why this is an interesting issue (ie, the idea that deletions are deleterious).
In this post I will describe one of the other two papers and, hopefully, formulate some sort of unified idea of why we (by we I mean humans, but these findings should be expected in all other eukaryotes) harbor deletion polymorphisms.
I will then add a third (and, probably, final) post on linkage disequilibrium and SNPs.
The approach I previously described uses allele and genotype frequencies to identify clusters of SNPs that are not in Hardy-Weinberg equilibrium. This involves analyzing all of the SNPs in the entire population simultaneously. Another approach, described in the paper from Jonathan Pritchard’s group, uses known relationships of family members (so called “parent-offspring trios”) to identify SNPs that are transmitted from parents to offspring in a non-Mendelian fashion. They start with the observation that individuals carrying a deletion at a SNP locus will appear to be homozygous when genotyped at that SNP (this same assumption was made by Altshuler). They then examined the progeny from the parent-offspring trios for individuals that are homozygous for a particular SNP (note: there will be a lot of homozygous SNPs in any one individual) and see if they could have inherited the same allele from each of their parents. For example, if the parents are genotyped as “AA” and “TT” at a particular SNP, and their child is genotyped as “AA”, either there was a mutation in the germline of the TT parent (changing one of the T alleles to A), or the TT parent is actually “T-“ (where the “-“ means that parent is missing a second copy of the SNP). If one of the parents is T- then the child’s genotype is actually “A-“, and that child inherited the deletion from the parent with the genotype T-.
Figure 1. Examples of four of the seven types of trio genotype configurations used in this analysis.
The true genetic state of each individual is depicted within his or her pedigree symbol. The called genotype, when it differs from the true genotype, is shown outside the pedigree symbol. The three upper configurations (A and C) all result in mendelian incompatibilities. We define 'Type I mendelian incompatibilities' as those that are compatible with a deletion transmitted from parent to child and 'Type II mendelian incompatibilities' as those that are incompatible with the deletion model. Key to figure: A: mendelian incompatibility, genotypes compatible with a deletion transmitted from the mother; C: mendelian incompatibility, genotypes incompatible with a transmitted deletion; E: no mendelian incompatibility, genotypes compatible with a deletion transmitted from the mother (but not the father); G: no mendelian incompatibility, genotypes incompatible with a transmitted deletion. Candidate deletion regions are runs of consecutive SNPs with at least two Type I mendelian incompatibilities and other SNPs that are compatible with a deletion; all the SNPs must suggest transmission from the same parent. See further details in Methods.
Any run of at least two Mendelian inconsistencies was labeled as a deletion. They examined parent-offspring trios from two populations: a European derived population (CEU) harbored 345 deletions and one from Nigeria (YRI) harbored 590. Just like the other paper I described, they validated some of the deletions using quantitative PCR and confirmed that the PCR products from all 12 candidates were in fact valid. They also using an oligonucleotide microarray to test for false positives in 9 of the offspring at 93 deletions and confirmed all but 13 of the deletions (14% false positives). I won’t get into the details of it, but they assessed their power to detect deletions given the amount of polymorphism in the SNP data set, the spacing of SNPs, and the size of the deletions they were identifying.
The median deletion size was 10.6 kb and 8.5 kb in the CEU and YRI samples, respectively, and the size distribution is L-shaped (many small deletions, and a tail containing the large deletions). Most of the deletions were segregating at low frequencies, and 39% were identified in only one trio. Interestingly, some of the deletions at the same locus appear to have different breakpoints, and some deletions sit on multiple haplotype backgrounds, suggesting that certain loci have been deleted independently in multiple lineages. This is nothing new, as many factors (such as repeat sequences flanking a region) can make a particular locus more prone to deletion and duplication (more on this later).
Finally, they took a closer look at deletions that contained genes (exons and introns) and found 267 genes within their entire sample of deleted regions (201 of which were deletions of coding sequence, and 92 were completely deleted genes). There was a deficiency of SNPs in genic regions within deletions compared to genic regions with no association to a deletion, suggesting that purifying selection against haplotypes carrying deletions of genes decreases the variation at these loci. They assigned each gene to a functional class and found an overrepresentation of genes involved in immunity, sensory perception, cell adhesion and signal transduction in their set of 267 deleted genes. These functional classes are similar to those identified in screens for segmental duplications, genes with signatures of positive selection, and lineage specific gene family expansions.
I think the two most interesting finds are the reuse of deletion breakpoint regions (independent origins of the same deletion) and the analysis of functional classes. Many human diseases are the result of the deletion, duplication, or relocation of a particular genomic region. These chromosomal aberrations often occur in somatic cell lines (ie, they are not inherited per se, but the mutation happens some time during the individual’s life history). There is some aspect of heritability when it comes to these types of mutations, as you can inherit a predisposition to a certain genetic disease if you get a defective allele from one parent or you inherit a locus that is predisposed to a deleterious mutation. How can you be predisposed to get a particular mutation? Well, if you have some sort of repetitive sequence (transposable element, segmental duplication, etc) flanking a “disease gene”, that repeat can induce a genomic rearrangement that leads to some deleterious change in that disease gene. The same idea is behind the independent origins of similar deletions that Pritchard’s group proposes.
It appears that certain functional classes of proteins are more prone to rapid evolution, duplication, and deletion. One explanation for the differences in “evolvability” between classes of proteins lies in the differences in purifying selection on different genes. Let’s assume that genes that carry out a more important function than other genes are less robust to mutation (both in amino acid sequence and expression), so that changes to the copy number of particular genes will have deleterious effects. Not only will natural selection remove haplotypes that carry a deletion or duplication of that gene, it will also select against repetitive sequences flanking that gene that would allow for the duplication and deletion events to occur. We see this type of pattern when we look at the location of transposable elements (TEs) in a genome -- they are clustered in intergenic regions, although this may also be due to the effects of TEs on the expression of nearby genes. If only certain genes can withstand having rearrangement inducing repeats in their vicinity, then certain functional classes will be overly duplicated and deleted. Furthermore, some genes (such as large gene families like the odorant receptors) appear to be overly duplicated, suggesting natural selection may favor repetitive sequence near these genes (in fact, duplicated genes alone can spur on more duplication because they are repetitive sequences themselves).
If we imagine that certain classes of genes are under more purifying selection than other classes, then we can expect to see the same types of genes in the rapidly evolving class regardless of how we measure the rate of evolution (nucleotide sequence, segmental duplication, deletion, flanked by repeats, or any other technique). I hope to finish my discussion of SNPs and deletions with my next post in which I will attempt to write about linkage disequilibrium (a subject that gives me trouble).
Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. 2006. A high-resolution survey of deletion polymorphism in the human genome. Nat Genet. 38:75-81
Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA. 2006. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet. 38:82-85
McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S, Gabriel SB Lee C, Daly MJ, Altshuler DM, & The International HapMap Consortium. 2006. Common deletion polyrmorphisms in the human genome. Nat Genet. 38: 86-92.