PARALOG Annotator




Input a query variant bellow or click here to input a list or variants or upload a vcf file. All variants in Genome build GRCh37 coordinates.

e.g.
1-115256528-T-G



Paralogue Annotation utilizes information from evolutionarily related proteins, specifically paralogues, to help inform the clinical significance of missense variants associated with human diseases.





Shown below are (missense) variants annotated as Pathogenic/Likely Pathogenic in ClinVar found at the equivalent amino acid residue of other members of the protein family by Paralogue Annotation


Loading...




Shown below are all the equivalent amino acid positions across all members of the paralogous family


Loading...



Shown below are (missense) variants annotated as Pathogenic/Likely Pathogenic in ClinVar found at the equivalent amino acid residue of other homologous proteins that share a pfam protein domain


Loading...



Paralogue Annotation utilizes information from evolutionarily related proteins, specifically paralogues, to help inform the clinical significance of missense variants associated with human diseases. The original methodology and implementation of Paralogue Annotation on arrhythmia syndrome genes was published here and here. This web app extends Paralogue Annotation exome-wide, using paralogues defined by Ensembl's gene trees and pathogenic/likely pathogenic missense variants defined by ClinVar.
This web app is currently being built using Shiny, the source code is available at https://github.com/ImperialCardioGenetics/Paralogue_Annotation_App.


Frequently Asked Questions (FAQ)


Q. What genome build coordinates do my variants need to be in?
A. Currently only GRCh37 coordinates are supported. We recommend using Ensembl's liftover service for coordinate conversions.

Q. What are the Para_z scores?
A. The Para_z scores are a measure of paralogue conservation independently derived by Lal et al. (2020). You may therefore find in your results that some Para_z scores do not agree with your expectations. This is because the paralogue alignments used to generate the scores are different to the alignments used here. The Para_z scores can thus be thought as a third-party confidence score of paralogue conservation across aligned positions.

Q. How can we use the conservation of Ref/Alt alleles to filter out results?
A. That is not currently available in this version of the web app.

Q. What paralogue alignments do you use here?
A. We utilize paralogue alignments at the protein level generated by Ensembl, which were obtained through Compara

Q. Why do the results for arrhythmia genes from the original Paralogue Annotation and here differ?
A. This is mainly because the original Paralogue Annotation utilized T-COFFEE for the alignments, whereas Ensembl's alignments are generated by CLUSTAL W instead. Furthermore, variants from HGMD were used instead of ClinVar.

Q. What formats do my input variants have to be in?
A. Currently variants have to be submitted using their chromosome, position, reference allele, and alternate allele using any delimiter in the format of “CHROM:POS:REF:ALT” with separate variants on newlines. Alternatively we also accept VCF as well.


For more details on specific methods, code of how Paralogue Annotation functions, or any other questions please email nyl112@ic.ac.uk


This web app is a work in progress, final version may differ.