Notes:
Data structure array
EM rectify
Optim
Annotation 占总数的比值,是否过于conserved, overlap heatmap
Estimation of indel rate
SNV_rate: 2N * summation over all possible mutation rates of bases in window i
Y ~ possion($\mu$)
Y ~ possion($e^{\beta_0 + \beta_1 * SNV_rate + \beta_2 * GC_content}$)
log(obs indel count) ~ $\beta_0$ + $\beta_1$ * SNV_rate + $\beta_2$ * GC_content
$\mu$ indel rates over 2N individuals.
1 | library(data.table) |
17478
| genename | GC_content | SNV_rate | indel_count | length | |
|---|---|---|---|---|---|
| <fct> | <dbl> | <dbl> | <int> | <int> | |
| 1 | OR4F5 | 0.4281050000000000133049 | 0.00001156033999999999950243 | 0 | 918 |
| 2 | SAMD11 | 0.6826832583463340320407 | 0.00007795386489999999845429 | 0 | 3205 |
| 3 | NOC2L | 0.5827767403863509665385 | 0.00009993501699999999762520 | 0 | 5539 |
1 | N=rep(6430,nrow(df)) |
Call:
glm(formula = indel_count ~ SNV_2N + GC_content, family = poisson(link = "log"),
data = df)
Deviance Residuals:
Min 1Q Median
-5.3747723046756075149 -0.2721079831257477433 -0.2543498238112971155
3Q Max
-0.2414373113538021975 7.6654372768663812110
Coefficients:
Estimate Std. Error
(Intercept) -3.55875256114883642411 0.24395625412901122964
SNV_2N 0.32595843840840044159 0.01404195081335555921
GC_content -0.26301045582160398340 0.47999476989334100008
z value Pr(>|z|)
(Intercept) -14.5876699999999992485 < 2e-16 ***
SNV_2N 23.2131900000000008788 < 2e-16 ***
GC_content -0.5479399999999999826 0.58373
---
Signif. codes:
0 ‘***’ 0.001000000000000000020817 ‘**’ 0.01000000000000000020817 ‘*’
0.05000000000000000277556 ‘.’ 0.1000000000000000055511 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 4441.4194643153414290 on 17477 degrees of freedom
Residual deviance: 4263.1231826492557957 on 17475 degrees of freedom
AIC: 5387.1954605381806687
Number of Fisher Scoring iterations: 14frameshift rate
$\frac{the , number , of ,frameshift}{the , number , of , indel}$
in unaffected siblings.
fs_fraction = 0.744186
Estimation of frameshift effect size
- EM
1 | setwd("/storage11_7T/fuy/TADA-A/cell_WES/DNM/indel_rate/test") |
1 | readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/indel_rate/indel_EM.rds") |
| n_genes | gama | log_gm | desc |
|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <fct> |
| 136 | 3.742353998933570125018 | 1.319714824909679951048 | update gama |
| 435 | 36.893193501840499948230 | 3.608027076087319873210 | update gama and pi0 |
Q:
difference between FDR and FDR2
若更新pi0,则初始pi0几乎无影响
- validate gamma by simulation
1 | mg2[1:n_risk_genes,]$sim_fs_ct = rpois(n_risk_genes, |
1 | 100 rounds |
SNV model
1 | pi0 = rep(0.05,n_genes) |
New annotations
- spliceAI
- DS_AG Delta score (acceptor gain)
- DS_AL Delta score (acceptor loss)
- DS_DG Delta score (donor gain)
- DS_DL Delta score (donor loss)
- PPI
- Interactome INSIDER (probably damaging) & MPC > 2
- trap score (no responding)
- deepsea clustering
1 | setwd("/storage11_7T/fuy/TADA-A/annotation/spliceai") |
| logRR | lower_bound | upper_bound | annota |
|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <chr> |
| 2.0294776266614920068321 | 1.1561569801227173925895 | 2.902798273200266621075 | DS_AG (acceptor gain) |
| -0.9999999671218416930074 | -12.3062343578646924413533 | 10.306234423621010165562 | DS_AL (acceptor loss) |
| 2.0682421448743815162175 | 1.2534159904270909535740 | 2.883068299321672078861 | DS_DG (donor gain) |
| 0.3651119660030998637090 | -0.5284987397323961388906 | 1.258722671738595977331 | DS_DL (donor loss) |
1 | readRDS("/storage11_7T/fuy/TADA-A/annotation/PPI/PPI_rr.rds") |
| logRR | lower_bound | upper_bound | annota |
|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <chr> |
| 2.939515000000000100044 | 2.570340999999999986869 | 3.308688000000000073442 | Interactome INSIDER (probably damaging) & MPC > 2 |