Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Case-control data for variant interpretation

Article explaining what approaches can be used to investigate case-control data

The increased prevalence of a variant in affected individuals (case-cohort) compared to unaffected individuals (control cohort) can provide evidence for a variant being pathogenic, and vice-versa.

In the ACMG/AMP classification framework (discussed in the step Guidelines for variant classification and interpretation), this evidence is incorporated under population data using the criteria:

  • PS4: the prevalence of the variant in affected individuals is significantly increased compared to the prevalence in controls, and
  • BS2: the variant is observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous), or a X-linked (hemizygous) disorder with full penetrance expected at an early age).

Traditionally, case and control cohorts are well phenotyped, the control cohort has been screened for the associated disease and shows no symptoms.

For large case/cohort studies the following two metrics can be used to determine if the variant in affected individuals is significantly increased compared to controls:

  • Odds ratio. The ratio of odds of the variant being present in the case-cohort versus the odds of the variant being present in the control cohort. A useful resource for calculating odds ratios and confidence intervals is MedCalc.
  • Relative risk. The ratio of the risk of the variant being present in the case-cohort versus the risk of the variant being present in the control cohort.

In cases where a good phenotype control cohort is not available, other large-scale population datasets, such as gnomAD, can be used. However, if the database contains large numbers of individuals with the disorder being investigated it may not be suitable. For instance, individuals with cardiovascular diseases will be present in gnomAD. Case and control populations should be ethnically matched, and you need to be aware that some populations are poorly represented in many large-scale population datasets.

Screenshot from the DECIPHER webpage: https://www.deciphergenomics.org/sequence-variant/14-23417598-G-A/annotation/disease-cohorts/cardiac/hcm illustrating an output for Cardic FX report Click to enlarge

Figure 1. Cardiac FX cardiac case-cohort data shared through DECIPHER.

For rare diseases, large-scale case-control study data are seldom available and the number of previously identified, unrelated affected individuals can be used as evidence of pathogenicity, as a substitute for traditional case-control data. The specificity and rarity of the phenotype of the affected individuals should be considered, and the variant must not have been reported in the relevant control population(s). The strength of this evidence is less than that determined by traditional case-cohort studies.

Evidence from case-cohort data is especially important for diseases with high genetic heterogeneity (i.e. many different variants cause the disease), as the likelihood of interpreting a variant as pathogenic is often dependent on whether the variant has previously been identified and characterised.

An example of case-control data collated by Cardiac FX (which hosts data from the Cardiac Variant Interpretation Consortium) for 18 genes associated with cardiomyopathies. Cardiomyopathies are inheritable intrinsic heart muscle diseases with genetic heterogeneity, age-related onset and incomplete penetrance and are hence particularly challenging to interpret. For each variant, allele frequency, allele count, and allele number were observed in hypertrophic cardiomyopathy, dilated cardiomyopathy, and healthy volunteer cohorts. This dataset can be used to determine if a variant has been observed in a case or control cohort and is available to view in DECIPHER.

C p.(Val1736Ala)”> Click to enlarge

Figure 2. Hereditary cancer case data shared through CanVar-UK.

An example of case data is shared through the Cancer Predisposition Gene Variant Database (CanVar-UK; from the Cancer Variant Interpretation Group UK) for 16 genes associated with hereditary cancers. Accurate classification of variants in these genes is essential for cancer risk estimation and subsequent patient management. This data has been collated from UK diagnostic labs. Counts of the number of affected probands and tested probands are presented alongside population data from gnomAD v2.1.1, UK Biobank and the 1000 Genomes Project. UKBiobank counts for small insertion/deletion variants are also available and can be filtered by gene from the website homepage. The affected proband count data can be used as evidence for pathogenicity.

Are you still a beginner in the field or do you already apply case-cohort data in your day-to-day work? Let us know in the comments what are your challenges or share some tips and tricks for less experienced learners.

© Wellcome Connecting Science
This article is from the free online

Interpreting Genomic Variation: Overcoming Challenges in Diverse Populations

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now