Statistics of previous simulated parameters
N=6000, logRR=3,2,1, prior=0.05
100 rounds of simulation
1 readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/org.rr.summary.rds")
ptv>0.995 splice-ptv mpc>2 estim pr
Min. :-3.046 Min. :-1.305 Min. :-5.4747 Min. :0.005745
1st Qu.: 2.297 1st Qu.: 1.451 1st Qu.: 0.5679 1st Qu.:0.039856
Median : 2.923 Median : 1.857 Median : 0.9348 Median :0.062510
Mean : 2.720 Mean : 1.771 Mean : 0.6039 Mean :0.092598
3rd Qu.: 3.329 3rd Qu.: 2.141 3rd Qu.: 1.2976 3rd Qu.:0.086761
Max. : 4.440 Max. : 3.181 Max. : 2.0051 Max. :0.985894 Abnormal values may caused by a small number of risk genes (risk g count) with that annotation.
1 2 3 4 5 library(readxl) f=read_xlsx("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/sub/res/sim_res.xlsx") colnames(f)[8]="overlp g" f=f[order(f$mpc2),] head(f,3)
A tibble: 3 × 12
rd annota obs mut count exp mut count logBurden risk g count annota2 overlp g ptv0.995 spidex_spliceai-ptv mpc2 estim pr
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
dt17 mpc2 62 60.67249 0.02164399 6 ptv995 0 2.592528 0.8751324 -5.474727 0.07493411
dt17 ptv995 20 10.28819 0.66473607 6 spidex_spliceai-ptv 0 2.592528 0.8751324 -5.474727 0.07493411
dt17 spidex_spliceai-ptv 64 57.85016 0.10102689 13 mpc2 1 2.592528 0.8751324 -5.474727 0.07493411
convergence to true parameters at larger sample sizes and larger effect sizes
100 rounds of simulation
sample size=20000
logRR = 10, 9, 8, prior = 0.05
initial values for optim: c(runif(3,3,15), runif(1,0,1))
statistics for estimated parameters 1 readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/0818sim/org.rr.summary.rds")
ptv>0.995 splice-ptv mpc>2 estim pi
Min. : 9.973 Min. :8.992 Min. :7.950 Min. :0.03650
1st Qu.: 9.995 1st Qu.:8.997 1st Qu.:7.984 1st Qu.:0.04140
Median :10.000 Median :8.999 Median :7.994 Median :0.04337
Mean :10.000 Mean :9.000 Mean :7.996 Mean :0.04343
3rd Qu.:10.006 3rd Qu.:9.003 3rd Qu.:8.006 3rd Qu.:0.04583
Max. :10.019 Max. :9.013 Max. :8.030 Max. :0.05029 no abnormal value; a larger number of risk genes with each annotation, e.g.,
1 2 f=read_xlsx("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/0818sim/sub/res/0818sim_res.xlsx") head(f,3)
A tibble: 3 × 12
rd annota obs mut count exp mut count logBurden risk g count annota2 overlp ptv0.995 spidex_spliceai-ptv mpc2 estim pr
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
dt1 mpc2 6761227 202.24164 10.417252 75 ptv995 16 9.986159 8.993375 7.99933 0.04063152
dt1 ptv995 31183 34.29395 6.812659 20 spidex_spliceai-ptv 20 9.986159 8.993375 7.99933 0.04063152
dt1 spidex_spliceai-ptv 6808209 192.83386 10.471811 235 mpc2 73 9.986159 8.993375 7.99933 0.04063152
fdr 1 2 readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/0818sim/org.fdr.check.res.rds") # readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/0818sim/summary.fdr.check.res.rds")
A data.frame: 10 × 2
cutoff.lst fdr.average
<dbl> <dbl>
0.00 0.00000000
0.02 0.01787403
0.05 0.04781413
0.08 0.07768562
0.10 0.09793087
0.15 0.14818145
0.20 0.19850607
0.30 0.29898883
0.50 0.49947964
1.00 0.96111469
FDR calibration under feature selection
10 features, logRR= c(10,9,8,5,4,3,-8,-6,-3,-1)
select annotations 1:6 to perfrom joint estim
100 rounds of simulation
1 readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/10annota_sim/org.sig.rr.summary.rds")
ptv>0.995 splice-ptv mpc>2 annota4
Min. : 9.585 Min. :8.797 Min. :7.139 Min. :4.247
1st Qu.: 9.964 1st Qu.:8.940 1st Qu.:7.860 1st Qu.:4.563
Median :10.015 Median :8.992 Median :8.025 Median :4.689
Mean :10.006 Mean :8.991 Mean :8.012 Mean :4.699
3rd Qu.:10.060 3rd Qu.:9.049 3rd Qu.:8.217 3rd Qu.:4.879
Max. :10.323 Max. :9.120 Max. :8.403 Max. :5.219
annota5 annota6 estim pi
Min. :3.034 Min. :-0.3537 Min. :0.02795
1st Qu.:3.263 1st Qu.: 1.9511 1st Qu.:0.03261
Median :3.493 Median : 2.6265 Median :0.03416
Mean :3.507 Mean : 2.2810 Mean :0.03401
3rd Qu.:3.757 3rd Qu.: 2.9690 3rd Qu.:0.03571
Max. :4.064 Max. : 3.1343 Max. :0.04001 1 readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/10annota_sim/sig.org.fdr.check.res.rds")
A data.frame: 12 × 2
cutoff.lst fdr.average
<dbl> <dbl>
0.00 0.00000000
0.02 0.01303481
0.05 0.04314728
0.08 0.07325138
0.10 0.09291694
0.15 0.14364696
0.20 0.19338758
0.30 0.29221063
0.50 0.49273081
0.70 0.69554809
0.90 0.89860055
1.00 0.98029115
CHD res, PTV 0-0.5, estim logRR in the joint model = -8 ptv 0-0.05 observed mutation=70, expected =49, burden=1.43, 但这些变异所在的基因FDR都大于0.98 (using all SNV annotations),因此在joint estim中effect size≈0
Test if feature selection inflates the FDR
It seems no inflation occured.
1 2 3 4 5 6 7 8 9 type=c("CHD select features","ASD select features","30 annota without selection") num_risk_g_snv = c(19,18,20) num_risk_g_snv_indel = c(33,31,33) frameshift_logRR=c(3.97,3.96,3.95) estim_prior = c(0.022,0.022,0.023) gene_desc=c("baseline","1 diff gene","2 diff genes") df=data.frame(type,num_risk_g_snv,num_risk_g_snv_indel,estim_prior,gene_desc) saveRDS(df,"/storage11_7T/fuy/TADA-A/CHD/cross_valid_CHD_annota.rds") df
A data.frame: 3 × 5
type num_risk_g_snv num_risk_g_snv_indel estim_prior gene_desc
<fct> <dbl> <dbl> <dbl> <fct>
CHD select features 19 33 0.022 baseline
ASD select features 18 31 0.022 1 diff gene
30 annota without selection 20 33 0.023 2 diff genes
RBP_GWAS annotation results of CHD|ASD|DD: https://yfu1116.github.io/project/2021-08-26-all-RR/
effect size of the union of significant catagories > 1, so as the union of non-significant ones
numbers of risk genes 1 readRDS("/storage11_7T/fuy/TADA-A/num_of_risk_g_ASD_CHD_DD.rds")
A data.frame: 3 × 7
type sample_size baseline(MPC+PTV) all_snv estim_pi frameshift_logRR frameshift+all snv
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
CHD 2645 12 22 0.021 3.97 33
ASD 6430 54 86 0.168 2.58 170
DD 31058 391 495 0.101 2.77 507
Assess new genes literature search (CHD) https://github.com/yfu1116/project/blob/master/files/0906indel%2Bsnv_PPI_BP_Disease.xlsx
network analysis Functional enrichments in 33 CHD risk genes’ network:
1 2 3 4 setwd("/storage11_7T/fuy/TADA-A/CHD/33string") library(data.table) f=fread("33BP.tsv")[,-7] head(f)
A data.table: 6 × 7
#term ID term description observed gene count background gene count strength false discovery rate matching proteins in your network (labels)
<chr> <chr> <int> <int> <dbl> <dbl> <chr>
GO:0007507 heart development 10 522 1.06 0.00017 RAF1,SMAD2,NOTCH1,TSC1,ROCK2,PTPN11,CTNNB1,PTEN,SOS1,CHD7
GO:0010628 positive regulation of gene expression 17 2337 0.63 0.00017 IAPP,RAF1,FLT4,SMAD2,NOTCH1,MYRF,NAA15,KMT2D,ROCK2,PTPN11,CTNNB1,ACTB,RPL5,PTEN,AKAP12,CHD7,NSD1
GO:0043410 positive regulation of MAPK cascade 10 543 1.04 0.00017 IAPP,RAF1,FLT4,NOTCH1,ROCK2,PTPN11,CTNNB1,PTEN,AKAP12,GPBAR1
GO:0060322 head development 11 788 0.92 0.00017 RAF1,NOTCH1,TSC1,DSCAML1,PTPN11,CTNNB1,ACTB,PTEN,SOS1,CHD7,RBFOX2
GO:0070374 positive regulation of ERK1 and ERK2 cascade 7 209 1.30 0.00017 IAPP,FLT4,NOTCH1,PTPN11,PTEN,AKAP12,GPBAR1
GO:0070848 response to growth factor 10 524 1.05 0.00017 RAF1,FLT4,SMAD2,NOTCH1,ROCK2,PTPN11,CTNNB1,PTEN,SOS1,RBFOX2
1 2 k=fread("33diseases.tsv")[,-7] k
A data.table: 24 × 7
#term ID term description observed gene count background gene count strength false discovery rate matching proteins in your network (labels)
<chr> <chr> <int> <int> <dbl> <dbl> <chr>
DOID:3490 Noonan syndrome 5 24 2.09 4.92e-06 LZTR1,RAF1,RASA2,PTPN11,SOS1
DOID:0080690 RASopathy 5 30 1.99 6.67e-06 LZTR1,RAF1,RASA2,PTPN11,SOS1
DOID:0060085 Organ system benign neoplasm 7 215 1.29 1.00e-04 LZTR1,TSC1,CTNNB1,ACTB,PTEN,SOS1,CYP21A2
DOID:0060115 Nervous system benign neoplasm 4 29 1.91 2.30e-04 LZTR1,TSC1,CTNNB1,PTEN
DOID:14291 Noonan syndrome with multiple lentigines 3 6 2.47 2.60e-04 RAF1,PTPN11,SOS1
DOID:305 Carcinoma 6 269 1.12 3.60e-03 RAF1,NOTCH1,TSC1,CTNNB1,ACTB,PTEN
DOID:0060090 Central nervous system benign neoplasm 3 20 1.95 4.10e-03 TSC1,CTNNB1,PTEN
DOID:4079 Heart valve disease 3 23 1.89 5.30e-03 NOTCH1,PTPN11,SOS1
DOID:12270 Coloboma 4 84 1.45 6.10e-03 CRIM1,KMT2D,ACTB,CHD7
DOID:0050686 Organ system cancer 8 722 0.82 6.90e-03 RAF1,NOTCH1,TSC1,KMT2D,PTPN11,CTNNB1,ACTB,PTEN
DOID:162 Cancer 9 951 0.75 6.90e-03 RAF1,NOTCH1,TSC1,KMT2D,PTPN11,CTNNB1,ACTB,PTEN,NSD1
DOID:18 Urinary system disease 6 329 1.03 6.90e-03 RAF1,SMAD2,TSC1,KMT2D,CTNNB1,PTEN
DOID:3996 Urinary system cancer 4 89 1.43 6.90e-03 RAF1,KMT2D,CTNNB1,PTEN
DOID:863 Nervous system disease 13 2157 0.55 7.00e-03 LZTR1,NOTCH1,CRIM1,NAA15,TSC1,KMT2D,KIAA0196,COPB2,CTNNB1,ACTB,PTEN,CHD7,GAN
DOID:6420 Pulmonary valve stenosis 2 3 2.60 7.40e-03 PTPN11,SOS1
DOID:3165 Skin benign neoplasm 3 33 1.73 7.60e-03 CTNNB1,ACTB,SOS1
DOID:0050469 Costello syndrome 2 4 2.47 9.30e-03 PTPN11,SOS1
DOID:11054 Urinary bladder cancer 3 40 1.65 1.06e-02 RAF1,KMT2D,PTEN
DOID:1749 Squamous cell carcinoma 3 45 1.60 1.34e-02 RAF1,NOTCH1,PTEN
DOID:4159 Skin cancer 3 51 1.54 1.83e-02 RAF1,CTNNB1,ACTB
DOID:3118 Hepatobiliary disease 4 153 1.19 2.25e-02 NOTCH1,TSC1,CTNNB1,PTEN
DOID:0060037 Developmental disorder of mental health 6 512 0.84 2.83e-02 NAA15,KMT2D,KIAA0196,CTNNB1,ACTB,PTEN
DOID:0080355 Hepatobiliary system cancer 3 63 1.45 2.83e-02 TSC1,CTNNB1,PTEN
DOID:5520 Head and neck squamous cell carcinoma 2 10 2.07 2.83e-02 NOTCH1,PTEN
76 CHD risk genes (FDR < 0.3) were used to perform String network interaction, with minimum required interaction score = 0.9.
yellow: the unknown CHD genes