The applications of genetic linkage and association analysis
Genetic linkage and association analyses are the major tools to identify the genetic basis of diseases or traits. The primary difference between these two approaches is that linkage analysis looks at the relation between the transmission of a locus and the disease/trait within families, whereas association analysis focuses on the relation between a specific allele and the disease/trait within population. In the past three decades, many genetic variants underlying diseases or traits have been identified and different types of variants can be detected by these two methods. We introduce important applications of these tools and the relevant findings as follows.
Genome-wide linkage analysis
Linkage analysis has to be applied to family data since it needs the information of allele transmission within families. Because linkage analysis is based on the information of allele transmission within homogeneous family, it is robust to population stratification. In practice, a genome-wide linkage analysis requires the genotyping of several hundred highly polymorphic microsatellite markers or several thousand well-characterized single nucleotide polymorphism (SNP) markers evenly distributed across the whole genome.
Linkage studies have successfully identified the genetic bases of many Mendelian diseases, such as Huntington’s disease, cystic fibrosis, or early-onset Alzheimer’s disease, which are caused by a mutation in a single gene (Online Mendelian Inheritance in Man (OMIM)). Causal variants of Mendelian diseases often have large effect and are rare in the population. In contrast to Mendelian disease, any disease caused by the joint effect of multiple genes and/or environmental factors is called complex disease. Many common diseases such as diabetes, hypertension, and various cancers are complex diseases. Although linkage analysis is powerful in gene identification of Mendelian disease, this approach has not been successful for complex diseases.
Genome-wide association analysis
In the past decade, based on the idea of “common disease, common variant (CD-CV)” hypothesis, many researchers looked for common variants underlying complex diseases or traits and genome-wide association study (GWAS) has been the major approach. According to the National Institutes of Health (NIH), “a genome-wide association study is defined as any study of genetic variation across the entire human genome that is designed to identify genetic associations with observable traits (such as blood pressure or weight), or the presence or absence of a disease or condition”. GWAS relies on the information of millions of SNPs and patterns of linkage disequilibrium (LD) across the genome provided by the HapMap project, and the development of microarray technologies which can genotype millions of SNPs fast and accurately in a short time. Up to now, more than 2,400 GWAS have been published and more than 14,900 SNPs have been reported to be associated with human diseases or traits (GWAS Catalog). However, most GWAS have identified SNPs conferring small effects, which can explain only small proportion of risk of diseases or variation of quantitative traits. This phenomenon is called “missing heritability” of common disease.
Applications of next-generation sequencing approaches
Recently, to deal with the problem of “missing heritability”, more and more researchers tried to implement association studies of rare variants, which are based on the hypothesis of “common disease, rare variant (CD-RV)”. At the same time, great advances in “next-generation” sequencing (NGS) technologies make the whole-genome or whole-exome sequencing feasible, which could facilitate the identification of rare variants underlying Mendelian or complex diseases. The implementation of NGS approaches will generate a huge amount of sequence data and how to identify the pathogenic mutation(s) from this huge number of variants is a critical problem. Furthermore, since the cost of sequencing is still high, in practice, researchers have to adopt extra information to conduct cost-effective sequencing studies to identify the disease-causing variants. Accumulating examples suggest that integration of linkage analysis and NGS and combination of GWAS, NGS, and imputation could be powerful and cost-effective approaches to identify the disease-causing variants of Mendelian or complex diseases. Results of these studies are summarized in some review papers such as Brunham & Hayden (2013).