Specific databases for specific countries
What is a genomic database of variants?
A genomic database is a structured set of variants held in a computer database and can be accessed in several ways, most often through a web interface.
What are the uses of genomic databases/references?
We will limit our discussion to the diagnosis and discovery of monogenic (Mendelian) disorders here. Allele frequency is the most clinically useful information derived from genomic databases. Rare variants cause rare monogenic disorders and hence would not be seen in healthy populations. This is particularly true if the disease is severe (seen early in life, in children) and has a high penetrance and an early age of onset. Mode of inheritance is also taken into consideration as variants that result in an autosomal dominant condition would be absent, and variants that underlie autosomal recessive disorders would be very rarely seen in healthy carriers (heterozygotes), but unlikely in a homozygous state. The carrier frequencies for several genetic diseases can also be estimated from these data. The ability to diagnose patients and determine the carrier status would have a multitude of direct benefits to the patients and society: treatment, genetic counselling, family planning, precision medicine and prenatal diagnosis.
Why do we need genomic information for every ethnicity?
Genome databases are often skewed towards the populations represented. The gnomAD (v4) is the largest aggregation of allele frequencies available to the public and 77% of individuals represented in gnomAD v4 are from European ancestry. This means most populations are underrepresented, including those that make up the majority of the world’s population. Even the disease‐causing variations are known to be population‐specific for common and rare diseases. Genetic data inequality hampers the diagnosis of rare diseases across diverse (all) populations.
How do databases of diverse populations help in determining the variant pathogenicity for monogenic diseases?
While assessing the pathogenicity of a variant, we generally and safely assume that a disease-causing variant would not occur in unaffected individuals, especially for severe conditions with high penetrance. This is done by checking the allele and genotype frequencies in the population.
- Although the presence of a variant more frequently in a local population than expected would favour the benign nature of the variant, the absence would not favour disease causation.
- In the gnomAD, an average individual carries about 200 coding rare variants (<0.1%) in his/her exome. The new coding variants are higher in non-Europeans in line with their poor representation and need more evidence to rule out pathogenicity, underscoring the need for aggregating variant data in the non-European populations to improve the diagnosis of rare monogenic disorders in these populations.
- Alleles causing Mendelian diseases should be rare in all ethnicities (they do not discriminate). If a population is underrepresented, more variants in that population are likely to be labelled ‘possibly disease-causing’. Also, some variants that are rare in the most represented European population, might be common in other populations and are likely to be incorrectly assigned a pathogenic score. Hence it is crucial to have a wider representation of all the populations in reference databases.
- Variants with an allele frequency of less than 1% in gnomAD would usually be prioritised for interpretation. Filtering out variants that occur with a high frequency within the same population as the patient can reduce the number of variants considered from 200 to 50.
- Several non-benign variants in ClinVar can be classified as benign if the variants are seen in a healthy local population. This is more efficient for autosomal dominant conditions with high penetrance, as the occurrence of the variant even in a single healthy individual would be an argument against the pathogenicity.
Here are some important databases:
Caution while using these databases
Do not assume that population databases include only data on healthy individuals, as it is known that they contain several pathogenic variants. Penetrance of the disease and age-of-onset need to be considered when assessing the allele frequency. Population databases can also contain more than one family member, thus giving skewed allelic data. Finally, do not forget to check the quality of the variants (to avoid considering poor quality variants and variants in pseudogenes) in such resources.
Why is equity in human genomics important to all ethnicities?
Mendelian diseases are caused by pathogenic variants irrespective of the ethnicities in which they occur. Similarly, variants that frequently occur in a small ethnic group will be benign across all diverse populations. If databases are not inclusive, a rare variant might be assigned as pathogenic by mistake even in large populations. Moreover, different mutations in the same gene might be responsible for the same disease in different populations (exemplified by founder mutations in consanguineous populations). For instance, cystic fibrosis is commonly caused by a different mutation in patients of European-descent (deltaF508 (c.1521_1523delCTT)) versus patients of African-descent (3120+1G>A (c.2988+1G>A)). Under-representation of diverse populations in genomic databases thus limits our ability to fully understand the genetic architecture of human rare and complex diseases and also exacerbates health inequalities.
Have you collaborated on similar projects or do you know other projects that are not on the table? Share with us in the comments.
Interpreting Genomic Variation: Overcoming Challenges in Diverse Populations
Interpreting Genomic Variation: Overcoming Challenges in Diverse Populations
Reach your personal and professional goals
Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.
Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free