Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

In silico prediction tools

Article describing the main in silico prediction tools

In silico prediction tools can be used to provide evidence for a variant being pathogenic or benign, and are particularly useful for annotating missense variants. In the ACMG/AMP classification framework, this evidence is incorporated under computational and predictive data using the criteria PP3 (multiple lines of computational evidence support a deleterious effect on the gene or gene product e.g. conservation, evolutionary, splicing impact, etc.) and BP4 (Multiple lines of computational evidence suggest no impact on gene or gene product: conservation, evolutionary, splicing impact, etc.).

Many of the algorithms that produce these scores are designed to predict the functional impact of variation caused by single-nucleotide variants, although tools such as CADD also score multi-nucleotide substitutions and insertions/deletions. In addition, it has been shown that a variety of tools perform well in predicting the pathogenicity of in-frame indels. Some tools predict based on a single line of evidence, other tools, such as REVEL, are metapredictors that incorporate information from various sources to make a prediction as to whether the predicted amino acid change will disrupt protein function. Many of the tools incorporate data from multiple different evidence lines and it is important to note that tools can directly incorporate information from other in silico tools, therefore the scores from different tools are not always independent (Figure 1).

Table indicating the tools able to use different features: Sequence identity, Orthologues, Protein domains, Predicted nucleotide mutational rate, Pathogenic variation, Benign variation, Epigenetics (CpG), DNA/RNA sequence context, Gene expression, Residue-specific functional evidence, Protein-specific functional evidence, Amino-acid properties (physicochemical change). Click to enlarge

Figure 1. In silico pathogenicity predictor feature usage and source. Shading indicates that a category of evidence is used by the tool. Codes within each box indicate that the feature is inherited from another tool. Feature lists were taken from the tools’ original publications C, CADD; D, DANN; F, FATHMM; FC, FitCons; MP, MutPred; MT, MutationTaster; P, PolyPhen-2; S, SIFT; V, VEST. Source: Journal of Medical Genetics.

There are specific in silico tools aimed at predicting the impact of splice variants, such as spliceAI, which predicts splice junctions from an arbitrary pre-mRNA transcript sequence using a deep neural network. The prediction tool Pangolin, which uses a deep neural network, also provides tissue-specific predictions (heart, liver, brain, and testis). In some cases, score thresholds are suggested by the tools’ authors, in other cases thresholds are not provided, so interpreting the scores can be more challenging. There are numerous in silico predictors and as more predictors are used, concordance among these becomes increasingly difficult. New in silico prediction tools are constantly being published with iterative improvements, so specific tools and threshold recommendations can be difficult to provide.

The ClinGen Sequence Variant Interpretation Working Group has provided recommendations for the use of computational tools for missense variant pathogenicity classification. They suggest thresholds for various tools, weighting the score in relation to the level of support it provides for the variant being classified as pathogenic or benign. Multiple tools reached score thresholds justifying moderate and several reached strong evidence levels, one reaching a very strong evidence level for benignity for some variants. For CADD, they recommend scores above 25.3 are evidence of pathogenicity and scores below 22.7 benignity, for REVEL, they suggest scores above 0.644 are evidence of pathogenicity and scores below 0.290 benignity. The working group also recommends the use of a single tool to avoid biases, such as reviewing multiple tools for the strongest evidence for a given variant.

The Clinical Genome Resource Variant Curation Expert Panels commonly specify which predictors to use when interpreting variants for a specific gene and/or in a specific disease context, and define thresholds for determining evidence for a variant being pathogenic or benign. The recommended tools and thresholds can be used in these situations.

There are particular challenges with the accuracy of in silico tools when classifying variants identified in populations that are under-represented in global genomic datasets. Datasets used to benchmark the tools have an underrepresentation of diverse populations and there is a low proportion of diverse populations in integrated datasets used in the prediction. Some tools interact with public databases, in which variation has been annotated as benign or pathogenic, to enhance prediction accuracy and the annotation of common variants specific to certain ancestries, may be considered pathogenic in these databases. These issues are likely to lead to in silico tools providing increased false positives and false negatives. As the diversity of global datasets increases, these disparities will become less problematic. A greater emphasis on the use of ethnically matched control populations for variant interpretation can also assist in minimising these risks.

© Wellcome Connecting Science
This article is from the free online

Interpreting Genomic Variation: Overcoming Challenges in Diverse Populations

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now