Predicted Missense Variant Pathogenicity for Inherited Cardiac Condtions


CardioBoost is a disease-specific machine learning classifier to predict the pathogenicity of rare (gnomAD Allele Frequency <=0.1%) missense variant in genes associated with cardiomyopathies and arrhythmias that outperforms existing genome-wide prediction tools.




Input query variants

(one variant per line)

e.g. MYH7:c.1988G>A

About CardioBoost

CardioBoost is a disease-specific machine learning classifier to predict the pathogenicity of rare (gnomAD Allele Frequency <=0.1%) missense variant in genes associated with cardiomyopathies and arrhythmias that outperforms existing genome-wide prediction tools.

The methods and evaluations are described fully in our manuscript available here [TODO: preprint-link]. The source code for the manuscript, along with all data and code necessary to reproduce the analyses, is available on GitHub.

The web app was built using Shiny, the source code is freely available here.

 

Recommended Usage

Although we showed the benefits of the proposed model for gene-disease classification and its superiority over existing genome-wide machine learning tools, we emphasize that CardioBoost is not intended to use as a standalone clinical decision tool to replace the whole ACMG guidelines (Richards et al. 2015) for clinical variant interpretation. For example, a variant with higher than 90% pathogenicity score predicted by CardioBoost as pathogenic shouldn’t straightforwardly be interpreted as Pathogenic without integrating other lines of evidences. Therefore, in the context of inherited cardiac conditions, the clinically-relevant classification by CardioBoost is intended to use as the evidence PP3 within ACMG guidelines as a more reliable and accurate computational tool over genome-wide ones in supporting variant interpretation in Cardiomyopathies and Arrhythmias.



Inherited Cardiac Conditions

We consider two types of conditions:
  • Cardiomyopathies: dilated cardiomyopathy and hypertrophic cardiomyopathy
  • Inherited Arrhythmias: Long QT syndrome and Brugada syndrome

Inherited Cardiac Conditions related genes

The following tables display the genes related to the conditions and only the genes with known pathogenic variants in our curated data sets would be included.

 

Cardiomyopathies

Gene Symbol Ensemble Gene ID Ensemble Transcript ID Ensemble Protein ID
ACTC1 ENSG00000159251 ENST00000290378 ENSP00000290378
CSRP3 ENSG00000129170 ENST00000533783 ENSP00000431813
DES ENSG00000175084 ENST00000373960 ENSP00000363071
GLA ENSG00000102393 ENST00000218516 ENSP00000218516
LAMP2 ENSG00000005893 ENST00000200639 ENSP00000200639
LMNA ENSG00000160789 ENST00000368300 ENSP00000357283
MYBPC3 ENSG00000134571 ENST00000545968 ENSP00000442795
MYH7 ENSG00000092054 ENST00000355349 ENSP00000347507
MYL2 ENSG00000111245 ENST00000228841 ENSP00000228841
MYL3 ENSG00000160808 ENST00000395869 ENSP00000379210
PLN ENSG00000198523 ENST00000357525 ENSP00000350132
PRKAG2 ENSG00000106617 ENST00000287878 ENSP00000287878
PTPN11 ENSG00000179295 ENST00000351677 ENSP00000340944
RBM20 ENSG00000203867 ENST00000369519 ENSP00000358532
SCN5A ENSG00000183873 ENST00000333535 ENSP00000328968
TAZ ENSG00000102125 ENST00000299328 ENSP00000299328
TNNI3 ENSG00000129991 ENST00000344887 ENSP00000341838
TNNT2 ENSG00000118194 ENST00000367318 ENSP00000356287
TPM1 ENSG00000140416 ENST00000403994 ENSP00000385107
TTN ENSG00000155657 ENST00000589042 ENSP00000467141


Inherited Arrhythmias Syndromes

Gene Symbol Ensemble Gene ID Ensemble Transcript ID Ensemble Protein ID
ANK2 ENSG00000145362 ENST00000264366 ENSP00000264366
CACNA1C ENSG00000151067 ENST00000399655 ENSP00000382563
CALM1 ENSG00000198668 ENST00000356978 ENSP00000349467
CALM2 ENSG00000143933 ENST00000272298 ENSP00000272298
CALM3 ENSG00000160014 ENST00000291295 ENSP00000291295
CAV3 ENSG00000182533 ENST00000343849 ENSP00000341940
KCNE1 ENSG00000180509 ENST00000399289 ENSP00000382228
KCNE2 ENSG00000159197 ENST00000290310 ENSP00000290310
KCNH2 ENSG00000055118 ENST00000262186 ENSP00000262186
KCNJ2 ENSG00000123700 ENST00000535240 ENSP00000441848
KCNQ1 ENSG00000053918 ENST00000155840 ENSP00000155840
SCN5A ENSG00000183873 ENST00000333535 ENSP00000328968


Classification Criteria

Variant classification is based on the pathogenic probability predicted by CardioBoost. According to the ACMG guidelines, we use Pr>=0.9 as the high classification certainty threshold to classify variants. A variant with lower than 90% classification probability is considered as indeterminate with low classification confidence level. In short, a variant is classified given its predicted pathogenicity:

  • pathogenicity>=0.9: Pathogenic/Likely pathogenic
  • pathogenicity<=0.1: Benign/Likely benign
  • 0.1<pathogenicity>0.9: indeterminate


FAQ

Why does CardioBoost not output predictions on my input list of variants?

There are mainly three reasons that CardioBoost would not return any prediction:

  • The gene is not included as disease-related genes described above. Please check the gene lists above.
  • The mutation is not a valid missense change on the gene's canonical transcript (shown in the gene lists above).
  • The variant's gnomAD allele frequency is larger than 0.1%, which can be considered as a common variant and highly likely benign to cardiomyopathies and arrhythmias.


License

The data provided here is available under the ODC Open Database License (ODbL) : you are free to share and modify the data provided here as long as you attribute any public use of the database, or works produced from the database; keep the resulting data-sets open; and offer your shared or adapted version of the dataset under the same ODbL license.

The app is released under a GNU Lesser General Public License v2.1 [TODO:license-link].


Download CardioBoost pathogenicity prediction on all possible missense rare variants of the conditions related genes

Cardiomyopathies variants prediction (tab-delimited text file)
Arrhythmias variants prediction (tab-delimited text file)