DNM data we used in the analysis
1 | readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/DNM_descrip.rds") |
| n_sample | n_DNM | descrip | |
|---|---|---|---|
| <dbl> | <dbl> | <chr> | |
| affected | 6430 | 6788 | coding autosomal SNV in 17478 LoF-intolerant genes |
| sibling | 2179 | 2149 | coding autosomal SNV in 17478 LoF-intolerant genes |
Annotation categories from the paper in Cell
annota (VEP): annotation catagories from the paper in Cell
burden (aff / sib) $= \frac{ASD , nonsynonymous , SNVs \quad / \quad ASD , synonymous , SNVs}{control , ASD , nonsynonymous , SNVs \quad / \quad control , ASD , synonymous , SNVs}$
logRR_paper $= 1 , + , \frac{burden_paper , (aff , / , sib) , - , 1 }{0.05}$
sep: the left columns are derived from the paper, the right columns are calculated from the DNM data we used. The estimation is performed after calibration.
burden (obs / bg) $= \frac{observed }{background}$
logRR_formula $= 1 , + , \frac{burden , (obs , / , bg) , - , 1 }{0.05}$
separate_logRR: separately estimated RR
joint_logRR: jointly estimated RR of annotations 1-5, which were used in that paper.
RR and burden
In TADA-A model, we calibrated mutation rate using the number of synonymous mutations. A uniform prior of 0.05 was used to estimate RR and to calculate posterior probability.
1 | readRDS("cell_annota_tadaA_bd_RR.rds") |
| annota (VEP) | burden_paper (aff / sib) | logRR_paper | sep | burden (aff / sib) | burden (obs/bg) | logRR_formula | separate_logRR | joint_logRR |
|---|---|---|---|---|---|---|---|---|
| <fct> | <dbl> | <dbl> | <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
| PTV_Highest(pLI=0.995-1) | 3.5194598 | 3.9394280 | | | 6.430804 | 3.621622 | 3.978418 | 3.493790881 | 3.3180267 |
| PTV_Middle(pLI=0.5-0.995) | 1.3462415 | 2.0700008 | | | 1.649148 | 1.459459 | 2.321327 | 2.209709635 | 0.3995045 |
| PTV_Lowest(pLI=0-0.5) | 0.9917312 | -0.1807738 | | | 1.195162 | 1.445312 | 2.293166 | 2.081597693 | 1.9613477 |
| Missense_Highest(MPC≥2) | 2.0541788 | 3.0948341 | | | 2.079337 | 1.567686 | 2.513957 | 2.321866609 | 2.2135460 |
| Missense_Middle(MPC=1-2) | 1.1572499 | 1.4219022 | | | 1.183534 | 1.222964 | 1.697317 | 1.623570149 | 1.4426293 |
| Missense_Lowest(MPC<1) | 0.9768610 | -0.6213461 | | | 0.995654 | 1.219199 | 1.683427 | 1.254158652 | NA |
| Synonymous | 1.0000000 | 0.0000000 | | | 1.008760 | 1.000000 | 0.000000 | 0.003533577 | NA |
For PTV categories, the differences of the results between the paper and ours are partly caused by the loss of INDEL in our model, and the discrepancy of coding regions.
For MPC>=2, it may mainly caused by the model.
Other annotations
- We derived a set of priors as
1–FDRfrom that paper, then used the top 1000 genes for estimating relative risks.
RR
DeepSEA
The results of 217 DeepSEA categories (top 5% mutations) are similar, which can be downloaded here.
we selected 5 significant brain-related categories to refit the model.
1 | ab[,-5] # estimate separately |
| separate_logRR | lower_bound | upper_bound | annota | |
|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <chr> | |
| 9 | 1.752452 | 1.3571405 | 2.147764 | ago_adult_brain.BA4.hg19 |
| 10 | 1.074426 | 0.5180078 | 1.630844 | ago_adult_brain.Cingulate.gyrus.hg19 |
| 11 | 1.479920 | 1.0291704 | 1.930669 | elavl_Adult_brain.all_human_samples.hg19 |
| 12 | 1.427915 | 0.9616153 | 1.894214 | elavl_Adult_brain.BA9_Alzheimer.hg19 |
| 13 | 1.442728 | 0.9821886 | 1.903266 | elavl_Adult_brain.BA9.hg19 |
Annotations apart from DeepSEA
1 | rbind(a,b) # estimate separately |
| logRR | lower_bound | upper_bound | annota |
|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <chr> |
| 1.7849791 | 1.6493587 | 1.9205995 | coding constraint >90 |
| 1.4374067 | 0.6965003 | 2.1783131 | CLIPdb |
| 1.3173839 | 1.0880712 | 1.5466965 | RADAR_RBP top5% |
| 0.9380841 | 0.4834046 | 1.3927636 | RBP-VarDB |
| 1.1453607 | 0.5565725 | 1.7341488 | ribosnitch |
| 1.9569585 | 1.8443240 | 2.0695929 | MVP |
| 1.7428031 | 1.6302765 | 1.8553297 | primateAI |
| 1.8734926 | 1.7359787 | 2.0110065 | spidex |
| -1.0000000 | -2.3013881 | 0.3013882 | CADD |
1 | a ### joint estimation of all significant annotations |
| separate_logRR | lower_bound | upper_bound | annota | joint_RR | idx |
|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <chr> | <dbl> | <int> |
| 1.7849791 | 1.6493587 | 1.920600 | coding constraint >90 | 0.5993900 | 1 |
| 1.4374067 | 0.6965003 | 2.178313 | CLIPdb | -0.4573503 | 2 |
| 1.3173839 | 1.0880712 | 1.546697 | RADAR_RBP top5% | 0.2315051 | 3 |
| 0.9380841 | 0.4834046 | 1.392764 | RBP-VarDB | 0.4038981 | 4 |
| 1.1453607 | 0.5565725 | 1.734149 | ribosnitch | -0.3492912 | 5 |
| 1.9569585 | 1.8443240 | 2.069593 | MVP | 1.0807593 | 6 |
| 1.7428031 | 1.6302765 | 1.855330 | primateAI | 0.6242487 | 7 |
| 1.8734926 | 1.7359787 | 2.011007 | spidex | 1.1574875 | 8 |
| 1.7524525 | 1.3571405 | 2.147764 | ago_adult_brain.BA4.hg19 | 0.5180716 | 9 |
| 1.0744256 | 0.5180078 | 1.630844 | ago_adult_brain.Cingulate.gyrus.hg19 | -0.7293009 | 10 |
| 1.4799195 | 1.0291704 | 1.930669 | elavl_Adult_brain.all_human_samples.hg19 | -0.1916567 | 11 |
| 1.4279148 | 0.9616153 | 1.894214 | elavl_Adult_brain.BA9_Alzheimer.hg19 | -0.9256815 | 12 |
| 1.4427275 | 0.9821886 | 1.903266 | elavl_Adult_brain.BA9.hg19 | 0.3635234 | 13 |
risk genes
P: annotations from the paper
O: annotations from the above other sources
misA: BFs of mpc>=2 from the paper
misB: BFs of 2>mpc>=1 from the paper
dn ptv: BFs of denovo ptv from the paper
cc ptv: BFs of case-control ptv from the paper
novel_gene: number of genes out of the list of 102 risk genes
The paper identified 102 risk genes with q-value < 0.1 when de novo and case-control data was used, and 65 genes when only de novo data was used.
Using TADA-A model, we identified 47 genes when only de novo SNVs were considered, and 66 genes when a set of BFs of case-control PTVs from the paper were recruited.
1 | ### compare the results of the paper with ours |
| Annota | Dataset | Model | other_info | n_risk_genes |
|---|---|---|---|---|
| <fct> | <fct> | <fct> | <fct> | <dbl> |
| P | denovo SNVs/INDELs | TADA+(the paper) | no | 65 |
| P | denovo & case-control SNVs/INDELs | TADA+(the paper) | no | 102 |
| P | denovo SNVs | TADA-A | no | 47 |
| P | denovo SNVs | TADA-A | BFs of cc ptv | 66 |
| Annota | prior_for_estimate_RR | gene_for_estimate_RR | prior_for_cal_posterior | addtional_BFs_for_posterior | num_gene_identified | novel_gene | |
|---|---|---|---|---|---|---|---|
| <fct> | <fct> | <fct> | <fct> | <fct> | <dbl> | <dbl> | |
| 1 | P | uniform | all | uniform | no | 47 | 10 |
| 2 | P | uniform | all | uniform | cc ptv | 66 | 14 |
Putting all selected annotations in TADA-A, we tested four scenarios as follows.
1 | ### four scenarios |
| Annota | prior_for_estimate_RR | gene_for_estimate_RR | prior_for_cal_posterior | addtional_BFs_for_posterior | num_gene_identified | novel_gene | scenario | |
|---|---|---|---|---|---|---|---|---|
| <fct> | <fct> | <fct> | <fct> | <fct> | <dbl> | <dbl> | <int> | |
| 3 | P+O | uniform | all | uniform | no | 60 | 23 | 3 |
| 4 | P+O | uniform | all | uniform | cc ptv | 85 | 33 | 4 |
| 5 | O | 1-FDR from the paper | top1000 | uniform | misA*misB*dn ptv*cc ptv | 217 | 123 | 5 |
| 6 | O | 1-FDR from the paper | top1000 | uniform | no | 29 | 13 | 6 |
We took the result of scenario 3 for further analysis.
Enrichment analysis
- GWAS
- functional gene lists
1 | readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/report/1-13_brain_gwas_EA.rds") |
| traits | enrich | pVal | qVal | |
|---|---|---|---|---|
| <fct> | <dbl> | <dbl> | <dbl> | |
| 2 | Schizophrenia | 2.271464 | 0.001156453 | 0.006938721 |
| 4 | Attention deficit hyperactivity disorder | 3.711331 | 0.011461353 | 0.034384059 |
| 5 | Educational attainment | 1.774402 | 0.020703425 | 0.041406851 |
| 1 | Autism spectrum disorder | 2.376599 | 0.088642274 | 0.132963411 |
| 3 | Major depressive disorder | 0.000000 | 1.000000000 | 1.000000000 |
| 6 | Height adjusted BMI | 0.000000 | 1.000000000 | 1.000000000 |
1 | a |
| Geneset | enrich | pVal | n_novel_g | n_novel_g_geneset | n_geneset | n_total_genes | qVal | |
|---|---|---|---|---|---|---|---|---|
| <chr> | <dbl> | <dbl> | <int> | <int> | <int> | <int> | <dbl> | |
| 1 | haploinsufficiency_including_all_without_ncscore | 2.6695 | 0.000003 | 123 | 26 | 1384 | 17478 | 0.000010000 |
| 2 | top5%_brainspan_exp.gene | 1.5099 | 0.142765 | 123 | 9 | 847 | 17478 | 0.142765000 |
| 3 | Petrovski_plosgen_RVIS_score_top5_pct | 3.3414 | 0.000004 | 123 | 19 | 808 | 17478 | 0.000010000 |
| 4 | 160210_GO_brain_genes | 1.3777 | 0.044349 | 123 | 28 | 2888 | 17478 | 0.049276667 |
| 5 | AutismKB | 5.6388 | 0.001972 | 123 | 5 | 126 | 17478 | 0.003286667 |
| 7 | sfrai_genes_high_confidence_new_170717 | 12.9180 | 0.010350 | 123 | 2 | 22 | 17478 | 0.014785714 |
| 9 | Darnell_Cell_2011_FMRP_targets | 4.3345 | 0.000000 | 123 | 23 | 754 | 17478 | 0.000000000 |
| 10 | DAWN_new_q0.05 | 7.0944 | 0.000000 | 123 | 34 | 681 | 17478 | 0.000000000 |
| 11 | Kenny_MP_2014_brain_functional | 3.4999 | 0.014426 | 123 | 5 | 203 | 17478 | 0.018032500 |
| 12 | Irimia_cell_2014_neuron_specific_alternative_splicing | 2.2705 | 0.000111 | 123 | 24 | 1502 | 17478 | 0.000222000 |
candidate variants
- DNMs in risk genes,
which are simultaneously annotated as DeepSEA catagories and coding constraint sequences or synonymous mutations, are candidates variants for further analysis.
1 | readRDS("2021-01-14_1-FDR_s5_RBP_6788SNV_sf_other_annota_pre_prior_1-13_res.rds") |
| idx | DeepSEA_RBP | genename | chr | start | end | REF | ALT | other_annota |
|---|---|---|---|---|---|---|---|---|
| <int> | <chr> | <chr> | <chr> | <int> | <int> | <chr> | <chr> | <chr> |
| 12 | elavl_Adult_brain.BA9.hg19 | TFAP4 | chr16 | 4312613 | 4312614 | G | A | ccr90, syn |
| 9 | ago_adult_brain.Cingulate.gyrus.hg19 | KDM6B | chr17 | 7755276 | 7755277 | G | A | ccr90, syn |
| 10 | elavl_Adult_brain.all_human_samples.hg19 | MEF2D | chr1 | 156452416 | 156452417 | G | A | ccr90, syn |
| 9 | ago_adult_brain.Cingulate.gyrus.hg19 | FBN1 | chr15 | 48766840 | 48766841 | T | C | ccr90, syn |
| 12 | elavl_Adult_brain.BA9.hg19 | MYO9B | chr19 | 17318043 | 17318044 | A | G | ccr90, syn |
| 8 | ago_adult_brain.BA4.hg19 | DENND4A | chr15 | 66048520 | 66048521 | A | G | ccr90, syn |
| 10 | elavl_Adult_brain.all_human_samples.hg19 | DENND4A | chr15 | 66048520 | 66048521 | A | G | ccr90, syn |
| 8 | ago_adult_brain.BA4.hg19 | SHANK3 | chr22 | 51117093 | 51117094 | C | G | ccr90, syn |
| 8 | ago_adult_brain.BA4.hg19 | SCN1A | chr2 | 166848858 | 166848859 | C | G | ccr90, syn |
| 11 | elavl_Adult_brain.BA9_Alzheimer.hg19 | SCN2A | chr2 | 166231425 | 166231426 | A | T | ccr90, syn |
| 8 | ago_adult_brain.BA4.hg19 | TRAF7 | chr16 | 2226079 | 2226080 | C | T | ccr90, syn |
| 12 | elavl_Adult_brain.BA9.hg19 | GRIN2B | chr12 | 13724821 | 13724822 | C | T | ccr90, syn |
| 12 | elavl_Adult_brain.BA9.hg19 | GRIN2B | chr12 | 13768559 | 13768560 | C | T | ccr90, syn |
| 12 | elavl_Adult_brain.BA9.hg19 | FOXP1 | chr3 | 71021816 | 71021817 | C | T | ccr90, syn |
| 8 | ago_adult_brain.BA4.hg19 | ZC3H14 | chr14 | 89034438 | 89034439 | C | T | ccr90, syn |
| 8 | ago_adult_brain.BA4.hg19 | MED13L | chr12 | 116424951 | 116424952 | C | T | ccr90, syn |
| 12 | elavl_Adult_brain.BA9.hg19 | SCN2A | chr2 | 166237635 | 166237636 | C | T | ccr90, syn |
| 12 | elavl_Adult_brain.BA9.hg19 | ZBTB20 | chr3 | 114070427 | 114070428 | G | T | ccr90, syn |
| 10 | elavl_Adult_brain.all_human_samples.hg19 | TBR1 | chr2 | 162274306 | 162274307 | G | T | ccr90, syn |