DNM data we used in the analysis

1
readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/DNM_descrip.rds")
A data.frame: 2 × 3
n_samplen_DNMdescrip
<dbl><dbl><chr>
affected64306788coding autosomal SNV in 17478 LoF-intolerant genes
sibling21792149coding autosomal SNV in 17478 LoF-intolerant genes

Annotation categories from the paper in Cell

annota (VEP): annotation catagories from the paper in Cell

burden (aff / sib) $= \frac{ASD , nonsynonymous , SNVs \quad / \quad ASD , synonymous , SNVs}{control , ASD , nonsynonymous , SNVs \quad / \quad control , ASD , synonymous , SNVs}$

logRR_paper $= 1 , + , \frac{burden_paper , (aff , / , sib) , - , 1 }{0.05}$

sep: the left columns are derived from the paper, the right columns are calculated from the DNM data we used. The estimation is performed after calibration.

burden (obs / bg) $= \frac{observed }{background}$

logRR_formula $= 1 , + , \frac{burden , (obs , / , bg) , - , 1 }{0.05}$

separate_logRR: separately estimated RR

joint_logRR: jointly estimated RR of annotations 1-5, which were used in that paper.

RR and burden

In TADA-A model, we calibrated mutation rate using the number of synonymous mutations. A uniform prior of 0.05 was used to estimate RR and to calculate posterior probability.

1
readRDS("cell_annota_tadaA_bd_RR.rds")
A data.frame: 7 × 9
annota (VEP)burden_paper (aff / sib)logRR_papersepburden (aff / sib)burden (obs/bg)logRR_formulaseparate_logRRjoint_logRR
<fct><dbl><dbl><chr><dbl><dbl><dbl><dbl><dbl>
PTV_Highest(pLI=0.995-1) 3.5194598 3.9394280|6.4308043.6216223.9784183.4937908813.3180267
PTV_Middle(pLI=0.5-0.995)1.3462415 2.0700008|1.6491481.4594592.3213272.2097096350.3995045
PTV_Lowest(pLI=0-0.5) 0.9917312-0.1807738|1.1951621.4453122.2931662.0815976931.9613477
Missense_Highest(MPC≥2) 2.0541788 3.0948341|2.0793371.5676862.5139572.3218666092.2135460
Missense_Middle(MPC=1-2) 1.1572499 1.4219022|1.1835341.2229641.6973171.6235701491.4426293
Missense_Lowest(MPC<1) 0.9768610-0.6213461|0.9956541.2191991.6834271.254158652 NA
Synonymous 1.0000000 0.0000000|1.0087601.0000000.0000000.003533577 NA

For PTV categories, the differences of the results between the paper and ours are partly caused by the loss of INDEL in our model, and the discrepancy of coding regions.

For MPC>=2, it may mainly caused by the model.

Other annotations

  • We derived a set of priors as 1–FDR from that paper, then used the top 1000 genes for estimating relative risks.

RR

DeepSEA

The results of 217 DeepSEA categories (top 5% mutations) are similar, which can be downloaded here.

we selected 5 significant brain-related categories to refit the model.

1
ab[,-5] # estimate separately
A data.frame: 5 × 4
separate_logRRlower_boundupper_boundannota
<dbl><dbl><dbl><chr>
91.7524521.35714052.147764ago_adult_brain.BA4.hg19
101.0744260.51800781.630844ago_adult_brain.Cingulate.gyrus.hg19
111.4799201.02917041.930669elavl_Adult_brain.all_human_samples.hg19
121.4279150.96161531.894214elavl_Adult_brain.BA9_Alzheimer.hg19
131.4427280.98218861.903266elavl_Adult_brain.BA9.hg19

Annotations apart from DeepSEA

1
rbind(a,b) # estimate separately
A data.frame: 9 × 4
logRRlower_boundupper_boundannota
<dbl><dbl><dbl><chr>
1.7849791 1.64935871.9205995coding constraint >90
1.4374067 0.69650032.1783131CLIPdb
1.3173839 1.08807121.5466965RADAR_RBP top5%
0.9380841 0.48340461.3927636RBP-VarDB
1.1453607 0.55657251.7341488ribosnitch
1.9569585 1.84432402.0695929MVP
1.7428031 1.63027651.8553297primateAI
1.8734926 1.73597872.0110065spidex
-1.0000000-2.30138810.3013882CADD
1
a  ### joint estimation of all significant annotations
A data.frame: 13 × 6
separate_logRRlower_boundupper_boundannotajoint_RRidx
<dbl><dbl><dbl><chr><dbl><int>
1.78497911.64935871.920600coding constraint >90 0.5993900 1
1.43740670.69650032.178313CLIPdb -0.4573503 2
1.31738391.08807121.546697RADAR_RBP top5% 0.2315051 3
0.93808410.48340461.392764RBP-VarDB 0.4038981 4
1.14536070.55657251.734149ribosnitch -0.3492912 5
1.95695851.84432402.069593MVP 1.0807593 6
1.74280311.63027651.855330primateAI 0.6242487 7
1.87349261.73597872.011007spidex 1.1574875 8
1.75245251.35714052.147764ago_adult_brain.BA4.hg19 0.5180716 9
1.07442560.51800781.630844ago_adult_brain.Cingulate.gyrus.hg19 -0.729300910
1.47991951.02917041.930669elavl_Adult_brain.all_human_samples.hg19-0.191656711
1.42791480.96161531.894214elavl_Adult_brain.BA9_Alzheimer.hg19 -0.925681512
1.44272750.98218861.903266elavl_Adult_brain.BA9.hg19 0.363523413

risk genes

P: annotations from the paper

O: annotations from the above other sources

misA: BFs of mpc>=2 from the paper

misB: BFs of 2>mpc>=1 from the paper

dn ptv: BFs of denovo ptv from the paper

cc ptv: BFs of case-control ptv from the paper

novel_gene: number of genes out of the list of 102 risk genes

The paper identified 102 risk genes with q-value < 0.1 when de novo and case-control data was used, and 65 genes when only de novo data was used.

Using TADA-A model, we identified 47 genes when only de novo SNVs were considered, and 66 genes when a set of BFs of case-control PTVs from the paper were recruited.

1
2
3
 ### compare the results of the paper with ours
mg
df[1:2,-8] ### details of TADA-A
A data.frame: 4 × 5
AnnotaDatasetModelother_infon_risk_genes
<fct><fct><fct><fct><dbl>
Pdenovo SNVs/INDELs TADA+(the paper)no 65
Pdenovo & case-control SNVs/INDELsTADA+(the paper)no 102
Pdenovo SNVs TADA-A no 47
Pdenovo SNVs TADA-A BFs of cc ptv 66
A data.frame: 2 × 7
Annotaprior_for_estimate_RRgene_for_estimate_RRprior_for_cal_posterioraddtional_BFs_for_posteriornum_gene_identifiednovel_gene
<fct><fct><fct><fct><fct><dbl><dbl>
1Puniformalluniformno 4710
2Puniformalluniformcc ptv6614

Putting all selected annotations in TADA-A, we tested four scenarios as follows.

1
2
### four scenarios
df2
A data.frame: 4 × 8
Annotaprior_for_estimate_RRgene_for_estimate_RRprior_for_cal_posterioraddtional_BFs_for_posteriornum_gene_identifiednovel_genescenario
<fct><fct><fct><fct><fct><dbl><dbl><int>
3P+Ouniform all uniformno 60 233
4P+Ouniform all uniformcc ptv 85 334
5O 1-FDR from the papertop1000uniformmisA*misB*dn ptv*cc ptv2171235
6O 1-FDR from the papertop1000uniformno 29 136

We took the result of scenario 3 for further analysis.

Enrichment analysis

  • GWAS
  • functional gene lists
1
readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/report/1-13_brain_gwas_EA.rds")
A data.frame: 6 × 4
traitsenrichpValqVal
<fct><dbl><dbl><dbl>
2Schizophrenia 2.2714640.0011564530.006938721
4Attention deficit hyperactivity disorder3.7113310.0114613530.034384059
5Educational attainment 1.7744020.0207034250.041406851
1Autism spectrum disorder 2.3765990.0886422740.132963411
3Major depressive disorder 0.0000001.0000000001.000000000
6Height adjusted BMI 0.0000001.0000000001.000000000
1
a
A data.frame: 10 × 8
GenesetenrichpValn_novel_gn_novel_g_genesetn_genesetn_total_genesqVal
<chr><dbl><dbl><int><int><int><int><dbl>
1haploinsufficiency_including_all_without_ncscore 2.66950.000003123261384174780.000010000
2top5%_brainspan_exp.gene 1.50990.142765123 9 847174780.142765000
3Petrovski_plosgen_RVIS_score_top5_pct 3.34140.00000412319 808174780.000010000
4160210_GO_brain_genes 1.37770.044349123282888174780.049276667
5AutismKB 5.63880.001972123 5 126174780.003286667
7sfrai_genes_high_confidence_new_170717 12.91800.010350123 2 22174780.014785714
9Darnell_Cell_2011_FMRP_targets 4.33450.00000012323 754174780.000000000
10DAWN_new_q0.05 7.09440.00000012334 681174780.000000000
11Kenny_MP_2014_brain_functional 3.49990.014426123 5 203174780.018032500
12Irimia_cell_2014_neuron_specific_alternative_splicing 2.27050.000111123241502174780.000222000

candidate variants

  • DNMs in risk genes,
    which are simultaneously annotated as DeepSEA catagories and coding constraint sequences or synonymous mutations, are candidates variants for further analysis.
1
readRDS("2021-01-14_1-FDR_s5_RBP_6788SNV_sf_other_annota_pre_prior_1-13_res.rds")
A data.frame: 19 × 9
idxDeepSEA_RBPgenenamechrstartendREFALTother_annota
<int><chr><chr><chr><int><int><chr><chr><chr>
12elavl_Adult_brain.BA9.hg19 TFAP4 chr16 4312613 4312614GAccr90, syn
9ago_adult_brain.Cingulate.gyrus.hg19 KDM6B chr17 7755276 7755277GAccr90, syn
10elavl_Adult_brain.all_human_samples.hg19MEF2D chr1 156452416156452417GAccr90, syn
9ago_adult_brain.Cingulate.gyrus.hg19 FBN1 chr15 48766840 48766841TCccr90, syn
12elavl_Adult_brain.BA9.hg19 MYO9B chr19 17318043 17318044AGccr90, syn
8ago_adult_brain.BA4.hg19 DENND4Achr15 66048520 66048521AGccr90, syn
10elavl_Adult_brain.all_human_samples.hg19DENND4Achr15 66048520 66048521AGccr90, syn
8ago_adult_brain.BA4.hg19 SHANK3 chr22 51117093 51117094CGccr90, syn
8ago_adult_brain.BA4.hg19 SCN1A chr2 166848858166848859CGccr90, syn
11elavl_Adult_brain.BA9_Alzheimer.hg19 SCN2A chr2 166231425166231426ATccr90, syn
8ago_adult_brain.BA4.hg19 TRAF7 chr16 2226079 2226080CTccr90, syn
12elavl_Adult_brain.BA9.hg19 GRIN2B chr12 13724821 13724822CTccr90, syn
12elavl_Adult_brain.BA9.hg19 GRIN2B chr12 13768559 13768560CTccr90, syn
12elavl_Adult_brain.BA9.hg19 FOXP1 chr3 71021816 71021817CTccr90, syn
8ago_adult_brain.BA4.hg19 ZC3H14 chr14 89034438 89034439CTccr90, syn
8ago_adult_brain.BA4.hg19 MED13L chr12116424951116424952CTccr90, syn
12elavl_Adult_brain.BA9.hg19 SCN2A chr2 166237635166237636CTccr90, syn
12elavl_Adult_brain.BA9.hg19 ZBTB20 chr3 114070427114070428GTccr90, syn
10elavl_Adult_brain.all_human_samples.hg19TBR1 chr2 162274306162274307GTccr90, syn