目前进展及分析
- 详见1,2
下一步计划
- 100 rounds simulation (尚未完全跑完)
- 完成CHD部分
- 挑选candidate genes
ASD
simulation
parameters:
3 annotations (Lof: ptv > 0.995, union of spidex_low3 and spliceai, without ptv, mpc>2) , log-RR = 3, 1, 2
pi = 0.05, Sample size N = 6000, num_of_genes = 5000
average of 100 rounds simulation
effect size
sim_logRRis within the confidence interval- pi_estimated = 0.055
1 | readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/rr_allinfo.dt.rds") |
| joint_estim_pi | joint_fix_pi0.05 | sim_logRR | annota | separate_fix_pi | upper_bound | lower_bound |
|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <chr> | <dbl> | <dbl> | <dbl> |
| 2.60486096 | 2.6759428 | 3.0 | Lof: ptv > 0.995 | 2.732214 | 3.7379131 | 1.72651467 |
| 1.63200522 | 1.6966532 | 2.0 | union of spidex_low3 and spliceai, without ptv | 1.835084 | 2.7070583 | 0.96310953 |
| 0.65504221 | 0.6385113 | 1.0 | mpc2 | 1.208715 | 2.6279897 | -0.21055962 |
| 0.05489308 | NA | 0.1 | pi_estimate | NA | 0.1691957 | -0.05940955 |
FDR
fdr is always less than cutoff.
1 | all_g = rep(4896,4) |
| all_g | risk_g | pi | FDR_cutoff | g_identified | g_false | FDR |
|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
| 4896 | 239 | 0.04881536 | 0.2 | 2 | 0 | 0.0000000 |
| 4896 | 239 | 0.04881536 | 0.4 | 6 | 2 | 0.3333333 |
| 4896 | 239 | 0.04881536 | 0.6 | 19 | 10 | 0.5263158 |
| 4896 | 239 | 0.04881536 | 0.8 | 122 | 95 | 0.7786885 |
effect size of 14 SNV annotations
https://yfu1116.github.io/project/2021-05-20-fix-pi-VS-estim-pi-effect-size-num-enrich/
effect size of frameshift
- calibrate frameshift rate using the number of non-frameshift from siblings
1 | glm(observed_nonfs_count ~ SNV_rate_2N,family=poisson(link="log"),data=df) |
estimate effect size using EM
RR = 22.5, logRR = 3.1
details:https://yfu1116.github.io/project/2021-05-24-nonfs-rate-fs-gama-Copy1/
novel genes
1 | desc = c("baseline (MPC + PTV)", |
| desc | pi_method1 | num_risk_g1 | pi_method2 | num_risk_g2 |
|---|---|---|---|---|
| <fct> | <fct> | <fct> | <fct> | <dbl> |
| baseline (MPC + PTV) | fix | 42 | estimate | 54 |
| all SNV annota | fix | 66 | estimate | 81 |
| all SNV annota + frameshift | -- | -- | estimate | 172 |
1 | comparsion = c("`snv (estim pi)` vs. `snv (fix pi)`", |
| comparsion | risk_g | novel_g | novel_g_enrich_terms | conclusion |
|---|---|---|---|---|
| <fct> | <fct> | <dbl> | <fct> | <fct> |
| `snv (estim pi)` vs. `snv (fix pi)` | 81 vs. 66 | 15 | Neurodevelopmental Disorders | `estimate pi` outperforms `fix pi` |
| `frameshift + snv` vs. `snv` | 172 vs. 81 | 93 | Neurodevelopmental Disorders | frameshift model works |
| `frameshift + snv` vs. `baseline` | 172 vs. 54 | 126 | Neurodevelopmental Disorders | other annota helps to identify ND genes |
Enrichment analysis for novel ASD risk genes from frameshift + snv vs. baseline (merge genes from the same family)
GO

DisGeNET (gene-disease association)

Additional issue:
有关overlapping genes:
window中保留了overlap的区间,有些基因,如UGT系列,exon区域大部分重叠,这些重叠的基因都会被模型找出,导致enrichment analysis相关类别(Neurodevelopmental Disorders)的富集度也减小,因此。。只保留一个
1 | UGT1A3 |
potentially damaging mutations
candidates:
CHD
effect size of all SNV annotations
- ptv 0-0.05 在Joint estimation中表现奇怪,将其去除再做joint estim