Overview of datasets
1 | df3 |
| Group | org_count | org_percent | filtered_count | filtered_percent | annovar_obs | annovar_obs_percent | bg_af_calibr | burden_af | bg_bf_calibr | burden_bf |
|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <chr> | <chr> | <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
| PTV_Highest(pLI=0.995-1) | 366 | 5.13 | 289 | 4.69 | 150 | 3.14 | 52 | 2.8846 | 111 | 1.3514 |
| PTV_Middle(pLI=0.5-0.995) | 164 | 2.30 | 122 | 1.98 | 48 | 1.00 | 75 | 0.6400 | 36 | 1.3333 |
| PTV_Lowest(pLI=0-0.5) | 442 | 6.20 | 313 | 5.08 | 156 | 3.26 | 203 | 0.7685 | 116 | 1.3448 |
| Missense_Highest(MPC≥2) | 354 | 4.96 | 323 | 5.24 | 181 | 3.79 | 208 | 0.8702 | 134 | 1.3507 |
| Missense_Middle(MPC=1-2) | 894 | 12.54 | 789 | 12.80 | 584 | 12.22 | 677 | 0.8626 | 433 | 1.3487 |
| Missense_Lowest(MPC<1) | 3155 | 44.24 | 2804 | 45.48 | 2221 | 46.45 | 2774 | 0.8006 | 1646 | 1.3493 |
| Synonymous | 1756 | 24.62 | 1526 | 24.75 | 1441 | 30.14 | 1441 | 1.0000 | 1068 | 1.3493 |
| Total | 7131 | 100.00 | 6166 | 100.00 | 4781 | 100.00 | NA | NA | NA | NA |
org: dataset from the paper
- 7131 DNMs from 6430 affected individuals
- protein-coding autosomal
- syn + Mis + PTV (‘splice_donor_variant’, ‘splice_acceptor_variant’, ‘stop_gained’, ‘frameshift_variant’)
filter: dataset used in our model
- 6166 DNMs filtered from the 7131 DNMs
- filter criteria
- within our coding windows
- variant type of ALT should be SNV
- sample size = 4059
Annotations used in that paper were annotated by VEP, whereas ours were from ANNOVAR, which may render RR different. It will take a few more days to get VEP annotations.
bg_af(bf)_calibr: the expected number of background mutations after (before) calibration
$burden = \frac{annovar_obs}{annovar_background}$
I’m currently confused by the burden which is less than 1.
Mutation rate calibration
- calibrate using synonymous mutations (n_DNM = 6166,n_sample = 4059)
- the expected number of background synonymous mutations is 1068, while the observed one is 1441
- supposed burden = 1, then took 1441/1068 as a scaling factor to perform calibration
RR estimated separately and in paper
- RR separately estimated from our model (the first 4 columns)
- RR reported in that paper (the last 2 columns)
1 | rr2 |
| logRR | lower_bound | upper_bound | RR | annota | RR_paper | logRR_paper |
|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | <chr> | <dbl> | <dbl> |
| 0.1557098 | -2.765621 | 3.077041 | 1.168487 | annovar_syn | 1.126390 | 0.119018 |
| 2.3474430 | 2.113385 | 2.581501 | 10.458792 | annovar_MPC>=2 | 22.149989 | 3.097837 |
| 1.4654350 | 1.200204 | 1.730666 | 4.329426 | annovar_1<=MPC<2 | 4.179995 | 1.430310 |
| 1.2471260 | 1.107593 | 1.386660 | 3.480326 | annovar_0<=MPC<1 | NA | NA |
| 3.1838690 | 2.913624 | 3.454113 | 24.139971 | annovar_pLI>=0.995 | 50.560417 | 3.923169 |
| 2.2840890 | 1.732580 | 2.835598 | 9.816739 | annovar_0.5>=pLI>0.995 | 6.836726 | 1.922309 |
| 1.7402540 | 1.241026 | 2.239483 | 5.698791 | annovar_0>=pLI>0.5 | NA | NA |
number of risk gene
- paper: 65 genes
- our model: 40 genes