Predicted Missense Variant Pathogenicity for Inherited Cardiac Condtions


CardioBoost is a disease-specific machine learning classifier to predict the pathogenicity of rare (gnomAD Allele Frequency <=0.1%) missense variant in genes associated with cardiomyopathies and arrhythmias that outperforms existing genome-wide prediction tools. Despite its outperformances, we would like to emphasize that it's not a standalone clinical decision tool to replace the whole ACMG guidelines for clinical variant interpretation (Richards et al. 2015).

To understand more about the limitations of our tool, please read here




Input query variants

(one variant per line)

e.g. MYH7:c.1988G>A

About CardioBoost

CardioBoost is a disease-specific machine learning classifier to predict the pathogenicity of rare (gnomAD Allele Frequency <=0.1%) missense variant in genes associated with cardiomyopathies and arrhythmias that outperforms existing genome-wide prediction tools.

The methods and evaluations are described fully in our following publication:

Zhang, X., Walsh, R., Whiffin, N. et al. Disease-specific variant pathogenicity prediction significantly improves variant interpretation in inherited cardiac conditions. Genet Med (2020). https://doi.org/10.1038/s41436-020-00972-3

The source code and data to reproduce our model development and validation analyses can be found on GitHub. The web app was built using Shiny.

 

Recommended Usage

Although we showed the benefits of the proposed model for gene-disease classification and its superiority over existing genome-wide machine learning tools, we emphasize that CardioBoost is not intended to use as a standalone clinical decision tool to replace the whole ACMG guidelines (Richards et al. 2015) for clinical variant interpretation. For example, a variant with higher than 90% pathogenicity score predicted by CardioBoost as pathogenic shouldn’t straightforwardly be interpreted as Pathogenic without integrating other lines of evidences. Therefore, in the context of inherited cardiac conditions, the clinically-relevant classification by CardioBoost is intended to use as the evidence PP3 within ACMG guidelines as a more reliable and accurate computational tool over genome-wide ones in supporting variant interpretation in Cardiomyopathies and Arrhythmias.



Limitations

Cardioboost has been found to have higher accuracy than existing tools for classification of known variants in genes associated with ICCs. However, for some genes the training and test data remain sparse, and so estimates of performance for those genes have wide confidence intervals. The tool is not intended as a substitute for validated clinical interpretation approaches in any circumstance, and particular care should be taken in considering classifications of variants in genes where gold-standard data are sparse.

In particular, the genes associated with cardiomyopathies having sparse training and test data are: ACTC1,DES,GLA,LAMP2, MYL2,MYL3, PRKAG2 and PTPN11. Likewise, the following genes associated with inherited arrhythmia syndromes having spare training and test data: CALM1,CALM2 and CALM3. The confidence to evaluate the prediction performances on those genes is limited by the size of interpreted variants on those genes.



Inherited Cardiac Conditions

We consider two types of conditions:
  • Cardiomyopathies: dilated cardiomyopathy and hypertrophic cardiomyopathy
  • Inherited Arrhythmia Syndromes: Long QT syndrome and Brugada syndrome

Inherited Cardiac Conditions related genes

The following tables display the genes related to the conditions and only the genes with known pathogenic variants in our curated data sets would be included.

 

Cardiomyopathies

Gene Symbol Ensemble Gene ID Ensemble Transcript ID Ensemble Protein ID
ACTC1 ENSG00000159251 ENST00000290378 ENSP00000290378
DES ENSG00000175084 ENST00000373960 ENSP00000363071
GLA ENSG00000102393 ENST00000218516 ENSP00000218516
LAMP2 ENSG00000005893 ENST00000200639 ENSP00000200639
LMNA ENSG00000160789 ENST00000368300 ENSP00000357283
MYBPC3 ENSG00000134571 ENST00000545968 ENSP00000442795
MYH7 ENSG00000092054 ENST00000355349 ENSP00000347507
MYL2 ENSG00000111245 ENST00000228841 ENSP00000228841
MYL3 ENSG00000160808 ENST00000395869 ENSP00000379210
PLN ENSG00000198523 ENST00000357525 ENSP00000350132
PRKAG2 ENSG00000106617 ENST00000287878 ENSP00000287878
PTPN11 ENSG00000179295 ENST00000351677 ENSP00000340944
SCN5A ENSG00000183873 ENST00000333535 ENSP00000328968
TNNI3 ENSG00000129991 ENST00000344887 ENSP00000341838
TNNT2 ENSG00000118194 ENST00000367318 ENSP00000356287
TPM1 ENSG00000140416 ENST00000403994 ENSP00000385107


Inherited Arrhythmias Syndromes

Gene Symbol Ensemble Gene ID Ensemble Transcript ID Ensemble Protein ID
CACNA1C ENSG00000151067 ENST00000399655 ENSP00000382563
CALM1 ENSG00000198668 ENST00000356978 ENSP00000349467
CALM2 ENSG00000143933 ENST00000272298 ENSP00000272298
CALM3 ENSG00000160014 ENST00000291295 ENSP00000291295
KCNH2 ENSG00000055118 ENST00000262186 ENSP00000262186
KCNQ1 ENSG00000053918 ENST00000155840 ENSP00000155840
SCN5A ENSG00000183873 ENST00000333535 ENSP00000328968


Classification Criteria

Variant classification is based on the pathogenic probability predicted by CardioBoost. According to the ACMG guidelines, we use Pr>=0.9 as the high classification certainty threshold to classify variants. A variant with lower than 90% classification probability is considered as indeterminate with low classification confidence level. In short, a variant is classified given its predicted pathogenicity:

  • pathogenicity>=0.9: Disease-causing
  • pathogenicity<=0.1: Benign
  • 0.1<pathogenicity>0.9: Variant of Uncertain Significance (VUS)


FAQ

Why does CardioBoost not output predictions on my input list of variants?

There are mainly three reasons that CardioBoost would not return any prediction:

  • The gene is not included as disease-related genes described above. Please check the gene lists above.
  • The mutation is not a valid missense change on the gene's canonical transcript (shown in the gene lists above).
  • The variant's gnomAD allele frequency is larger than 0.1%, which can be considered as a common variant and highly likely benign to cardiomyopathies and arrhythmias.


License

The data provided here is available under the ODC Open Database License (ODbL) : you are free to share and modify the data provided here as long as you attribute any public use of the database, or works produced from the database; keep the resulting data-sets open; and offer your shared or adapted version of the dataset under the same ODbL license.

The app is released under a GNU Lesser General Public License v2.1.


Download CardioBoost pathogenicity prediction on all possible missense rare variants of the conditions related genes

Cardiomyopathies variants prediction (tab-delimited text file)
Arrhythmias variants prediction (tab-delimited text file)