Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

How in-house dataset help variant interpretation

Article discussing the importance of in-house datasets

In-house datasets of genomic variations comprise variants identified in individuals sequenced locally at clinical or research centres. Unlike population databases, which primarily aim to catalogue variations in unaffected individuals, in-house datasets inherently capture variants from individuals affected by diseases, often alongside their unaffected family members. The initial crucial step in delivering informed care to patients with suspected genetic conditions involves identifying and clinically interpreting disease-causing variants. In this context, in-house datasets can be of great utility.

When examining a variant among the hundreds of thousands identified in a patient, two immediate questions arise:

  1. Is the variant genuine, or is it an artefact stemming from issues in sequencing chemistry or the subsequent bioinformatic process?
  2. If the variant is genuine, could it potentially lead to disease?

Addressing both inquiries necessitates the implementation of rigorous quality control measures and adherence to the guidelines that outline variant interpretation criteria. Moreover, locally generated in-house datasets can also prove helpful in both scenarios (Figure 1).

Illustration of how in-house datasets can be useful for variant interpretation. ‘Variants in a sequenced genome’ can be ‘Frequently encountered artefacts’ and ‘Prevalent benign variants in local population’ constituting ‘in-house datasets’ while ‘variants prevent in control population’ are part of population databases. On the other hand in-house datasets with ‘Locally prevalent disease-causing variants' and population databases with ‘variants rare in control population’ are used to identify ‘candidate disease-causing variants’ Click to enlarge

Figure 1. Key advantages of using in-house datasets for identifying potential disease-causing variants and comparison of its use with population databases.

First, even after implementing stringent quality control measures, encountering false positive variants is a common occurrence. These inaccuracies typically stem from problems in precise sequence detection, alignment, and subsequent variant calling, potentially leading investigators astray. While the gold standard for verifying variant authenticity involves testing with alternative methods such as Sanger sequencing, in-house datasets offer a valuable resource. They provide insights into frequently encountered artefacts, enabling the disregarding of such variants. This proves particularly useful when artefacts are unique to the lab’s methodology or arise from the genomic structure of the local population and might not be mentioned in the existing literature. Furthermore, local datasets can also be used to effectively train variant calling tools to filter out probable artefacts.

Secondly, among the various steps involved in distinguishing potential rare disease-causing variants from common benign variants, an initial step requires filtering through population databases such as gnomAD. As mentioned in the steps Specific databases for specific countries and Interpreting non-coding variation , this approach aims to exclude high-frequency variants under the assumption that prevalence reduces their likelihood of being disease-causing. While this principle typically holds true, numerous instances exist, where disease-causing variants are observed at higher frequency within certain populations. In such cases, in-house datasets can facilitate the identification of locally prevalent disease-causing variants, particularly in understudied populations, thereby preventing their inadvertent exclusion during filtering. Conversely, in-house datasets can also assist in identifying common benign variants within the local population, allowing for their exclusion when prioritising variations for further consideration. This has been demonstrated in a study that a dataset generated from patients and their family members of Indian origin (referred to as ‘refined cohort’ in the article), helped the researchers filter population-specific common benign variants that were not represented in global population databases. This resulted in the filtering of an additional 50% homozygous variants and 37.8% heterozygous variants respectively for clinical correlations.

After confirming the authenticity of the variants and verifying their rarity in the general population, they undergo a thorough examination to evaluate their biological significance. This step helps ascertain whether the variation is likely responsible for the patient’s disease. All relevant evidence is meticulously gathered and assessed to determine the variant’s pathogenicity, following established guidelines. Subsequently, a comprehensive report is generated for the clinic, providing information to the patients and aiding healthcare providers in making informed decisions about their care.

© Wellcome Connecting Science
This article is from the free online

Interpreting Genomic Variation: Overcoming Challenges in Diverse Populations

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now