Statistics of previous simulated parameters

  • N=6000, logRR=3,2,1, prior=0.05
  • 100 rounds of simulation
1
readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/org.rr.summary.rds")
  ptv>0.995        splice-ptv          mpc>2           estim pr       
Min.   :-3.046   Min.   :-1.305   Min.   :-5.4747   Min.   :0.005745  
1st Qu.: 2.297   1st Qu.: 1.451   1st Qu.: 0.5679   1st Qu.:0.039856  
Median : 2.923   Median : 1.857   Median : 0.9348   Median :0.062510  
Mean   : 2.720   Mean   : 1.771   Mean   : 0.6039   Mean   :0.092598  
3rd Qu.: 3.329   3rd Qu.: 2.141   3rd Qu.: 1.2976   3rd Qu.:0.086761  
Max.   : 4.440   Max.   : 3.181   Max.   : 2.0051   Max.   :0.985894  

Abnormal values may caused by a small number of risk genes (risk g count) with that annotation.

1
2
3
4
5
library(readxl)
f=read_xlsx("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/sub/res/sim_res.xlsx")
colnames(f)[8]="overlp g"
f=f[order(f$mpc2),]
head(f,3)
A tibble: 3 × 12
rdannotaobs mut countexp mut countlogBurdenrisk g countannota2overlp gptv0.995spidex_spliceai-ptvmpc2estim pr
<chr><chr><dbl><dbl><dbl><dbl><chr><dbl><dbl><dbl><dbl><dbl>
dt17mpc2 6260.672490.02164399 6ptv995 02.5925280.8751324-5.4747270.07493411
dt17ptv995 2010.288190.66473607 6spidex_spliceai-ptv02.5925280.8751324-5.4747270.07493411
dt17spidex_spliceai-ptv6457.850160.1010268913mpc2 12.5925280.8751324-5.4747270.07493411

convergence to true parameters at larger sample sizes and larger effect sizes

  • 100 rounds of simulation

  • sample size=20000

  • logRR = 10, 9, 8, prior = 0.05

  • initial values for optim: c(runif(3,3,15), runif(1,0,1))

statistics for estimated parameters

1
readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/0818sim/org.rr.summary.rds")
  ptv>0.995        splice-ptv         mpc>2         estim pi      
Min.   : 9.973   Min.   :8.992   Min.   :7.950   Min.   :0.03650  
1st Qu.: 9.995   1st Qu.:8.997   1st Qu.:7.984   1st Qu.:0.04140  
Median :10.000   Median :8.999   Median :7.994   Median :0.04337  
Mean   :10.000   Mean   :9.000   Mean   :7.996   Mean   :0.04343  
3rd Qu.:10.006   3rd Qu.:9.003   3rd Qu.:8.006   3rd Qu.:0.04583  
Max.   :10.019   Max.   :9.013   Max.   :8.030   Max.   :0.05029  

no abnormal value; a larger number of risk genes with each annotation, e.g.,

1
2
f=read_xlsx("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/0818sim/sub/res/0818sim_res.xlsx")
head(f,3)
A tibble: 3 × 12
rdannotaobs mut countexp mut countlogBurdenrisk g countannota2overlpptv0.995spidex_spliceai-ptvmpc2estim pr
<chr><chr><dbl><dbl><dbl><dbl><chr><dbl><dbl><dbl><dbl><dbl>
dt1mpc2 6761227202.2416410.417252 75ptv995 169.9861598.9933757.999330.04063152
dt1ptv995 31183 34.29395 6.812659 20spidex_spliceai-ptv209.9861598.9933757.999330.04063152
dt1spidex_spliceai-ptv6808209192.8338610.471811235mpc2 739.9861598.9933757.999330.04063152

fdr

1
2
readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/0818sim/org.fdr.check.res.rds")
# readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/0818sim/summary.fdr.check.res.rds")
A data.frame: 10 × 2
cutoff.lstfdr.average
<dbl><dbl>
0.000.00000000
0.020.01787403
0.050.04781413
0.080.07768562
0.100.09793087
0.150.14818145
0.200.19850607
0.300.29898883
0.500.49947964
1.000.96111469

FDR calibration under feature selection

  • 10 features, logRR= c(10,9,8,5,4,3,-8,-6,-3,-1)
  • select annotations 1:6 to perfrom joint estim
  • 100 rounds of simulation
1
readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/10annota_sim/org.sig.rr.summary.rds")
  ptv>0.995        splice-ptv         mpc>2         annota4     
Min.   : 9.585   Min.   :8.797   Min.   :7.139   Min.   :4.247  
1st Qu.: 9.964   1st Qu.:8.940   1st Qu.:7.860   1st Qu.:4.563  
Median :10.015   Median :8.992   Median :8.025   Median :4.689  
Mean   :10.006   Mean   :8.991   Mean   :8.012   Mean   :4.699  
3rd Qu.:10.060   3rd Qu.:9.049   3rd Qu.:8.217   3rd Qu.:4.879  
Max.   :10.323   Max.   :9.120   Max.   :8.403   Max.   :5.219  
   annota5         annota6           estim pi      
Min.   :3.034   Min.   :-0.3537   Min.   :0.02795  
1st Qu.:3.263   1st Qu.: 1.9511   1st Qu.:0.03261  
Median :3.493   Median : 2.6265   Median :0.03416  
Mean   :3.507   Mean   : 2.2810   Mean   :0.03401  
3rd Qu.:3.757   3rd Qu.: 2.9690   3rd Qu.:0.03571  
Max.   :4.064   Max.   : 3.1343   Max.   :0.04001  
1
readRDS("/storage11_7T/fuy/TADA-A/cell_WES/DNM/simulation/10annota_sim/sig.org.fdr.check.res.rds")
A data.frame: 12 × 2
cutoff.lstfdr.average
<dbl><dbl>
0.000.00000000
0.020.01303481
0.050.04314728
0.080.07325138
0.100.09291694
0.150.14364696
0.200.19338758
0.300.29221063
0.500.49273081
0.700.69554809
0.900.89860055
1.000.98029115

CHD res, PTV 0-0.5, estim logRR in the joint model = -8

ptv 0-0.05 observed mutation=70, expected =49, burden=1.43, 但这些变异所在的基因FDR都大于0.98 (using all SNV annotations),因此在joint estim中effect size≈0

Test if feature selection inflates the FDR

  • It seems no inflation occured.
1
2
3
4
5
6
7
8
9
type=c("CHD select features","ASD select features","30 annota without selection")
num_risk_g_snv = c(19,18,20)
num_risk_g_snv_indel = c(33,31,33)
frameshift_logRR=c(3.97,3.96,3.95)
estim_prior = c(0.022,0.022,0.023)
gene_desc=c("baseline","1 diff gene","2 diff genes")
df=data.frame(type,num_risk_g_snv,num_risk_g_snv_indel,estim_prior,gene_desc)
saveRDS(df,"/storage11_7T/fuy/TADA-A/CHD/cross_valid_CHD_annota.rds")
df
A data.frame: 3 × 5
typenum_risk_g_snvnum_risk_g_snv_indelestim_priorgene_desc
<fct><dbl><dbl><dbl><fct>
CHD select features 19330.022baseline
ASD select features 18310.0221 diff gene
30 annota without selection20330.0232 diff genes

RBP_GWAS annotation

results of CHD|ASD|DD: https://yfu1116.github.io/project/2021-08-26-all-RR/

  • effect size of the union of significant catagories > 1, so as the union of non-significant ones

numbers of risk genes

1
readRDS("/storage11_7T/fuy/TADA-A/num_of_risk_g_ASD_CHD_DD.rds")
A data.frame: 3 × 7
typesample_sizebaseline(MPC+PTV)all_snvestim_piframeshift_logRRframeshift+all snv
<fct><dbl><dbl><dbl><dbl><dbl><dbl>
CHD 2645 12 220.0213.97 33
ASD 6430 54 860.1682.58170
DD 310583914950.1012.77507

Assess new genes

literature search (CHD)

https://github.com/yfu1116/project/blob/master/files/0906indel%2Bsnv_PPI_BP_Disease.xlsx

network analysis

Functional enrichments in 33 CHD risk genes’ network:

1
2
3
4
setwd("/storage11_7T/fuy/TADA-A/CHD/33string")
library(data.table)
f=fread("33BP.tsv")[,-7]
head(f)
A data.table: 6 × 7
#term IDterm descriptionobserved gene countbackground gene countstrengthfalse discovery ratematching proteins in your network (labels)
<chr><chr><int><int><dbl><dbl><chr>
GO:0007507heart development 10 5221.060.00017RAF1,SMAD2,NOTCH1,TSC1,ROCK2,PTPN11,CTNNB1,PTEN,SOS1,CHD7
GO:0010628positive regulation of gene expression 1723370.630.00017IAPP,RAF1,FLT4,SMAD2,NOTCH1,MYRF,NAA15,KMT2D,ROCK2,PTPN11,CTNNB1,ACTB,RPL5,PTEN,AKAP12,CHD7,NSD1
GO:0043410positive regulation of MAPK cascade 10 5431.040.00017IAPP,RAF1,FLT4,NOTCH1,ROCK2,PTPN11,CTNNB1,PTEN,AKAP12,GPBAR1
GO:0060322head development 11 7880.920.00017RAF1,NOTCH1,TSC1,DSCAML1,PTPN11,CTNNB1,ACTB,PTEN,SOS1,CHD7,RBFOX2
GO:0070374positive regulation of ERK1 and ERK2 cascade 7 2091.300.00017IAPP,FLT4,NOTCH1,PTPN11,PTEN,AKAP12,GPBAR1
GO:0070848response to growth factor 10 5241.050.00017RAF1,FLT4,SMAD2,NOTCH1,ROCK2,PTPN11,CTNNB1,PTEN,SOS1,RBFOX2
1
2
k=fread("33diseases.tsv")[,-7]
k
A data.table: 24 × 7
#term IDterm descriptionobserved gene countbackground gene countstrengthfalse discovery ratematching proteins in your network (labels)
<chr><chr><int><int><dbl><dbl><chr>
DOID:3490 Noonan syndrome 5 242.094.92e-06LZTR1,RAF1,RASA2,PTPN11,SOS1
DOID:0080690RASopathy 5 301.996.67e-06LZTR1,RAF1,RASA2,PTPN11,SOS1
DOID:0060085Organ system benign neoplasm 7 2151.291.00e-04LZTR1,TSC1,CTNNB1,ACTB,PTEN,SOS1,CYP21A2
DOID:0060115Nervous system benign neoplasm 4 291.912.30e-04LZTR1,TSC1,CTNNB1,PTEN
DOID:14291 Noonan syndrome with multiple lentigines 3 62.472.60e-04RAF1,PTPN11,SOS1
DOID:305 Carcinoma 6 2691.123.60e-03RAF1,NOTCH1,TSC1,CTNNB1,ACTB,PTEN
DOID:0060090Central nervous system benign neoplasm 3 201.954.10e-03TSC1,CTNNB1,PTEN
DOID:4079 Heart valve disease 3 231.895.30e-03NOTCH1,PTPN11,SOS1
DOID:12270 Coloboma 4 841.456.10e-03CRIM1,KMT2D,ACTB,CHD7
DOID:0050686Organ system cancer 8 7220.826.90e-03RAF1,NOTCH1,TSC1,KMT2D,PTPN11,CTNNB1,ACTB,PTEN
DOID:162 Cancer 9 9510.756.90e-03RAF1,NOTCH1,TSC1,KMT2D,PTPN11,CTNNB1,ACTB,PTEN,NSD1
DOID:18 Urinary system disease 6 3291.036.90e-03RAF1,SMAD2,TSC1,KMT2D,CTNNB1,PTEN
DOID:3996 Urinary system cancer 4 891.436.90e-03RAF1,KMT2D,CTNNB1,PTEN
DOID:863 Nervous system disease 1321570.557.00e-03LZTR1,NOTCH1,CRIM1,NAA15,TSC1,KMT2D,KIAA0196,COPB2,CTNNB1,ACTB,PTEN,CHD7,GAN
DOID:6420 Pulmonary valve stenosis 2 32.607.40e-03PTPN11,SOS1
DOID:3165 Skin benign neoplasm 3 331.737.60e-03CTNNB1,ACTB,SOS1
DOID:0050469Costello syndrome 2 42.479.30e-03PTPN11,SOS1
DOID:11054 Urinary bladder cancer 3 401.651.06e-02RAF1,KMT2D,PTEN
DOID:1749 Squamous cell carcinoma 3 451.601.34e-02RAF1,NOTCH1,PTEN
DOID:4159 Skin cancer 3 511.541.83e-02RAF1,CTNNB1,ACTB
DOID:3118 Hepatobiliary disease 4 1531.192.25e-02NOTCH1,TSC1,CTNNB1,PTEN
DOID:0060037Developmental disorder of mental health 6 5120.842.83e-02NAA15,KMT2D,KIAA0196,CTNNB1,ACTB,PTEN
DOID:0080355Hepatobiliary system cancer 3 631.452.83e-02TSC1,CTNNB1,PTEN
DOID:5520 Head and neck squamous cell carcinoma 2 102.072.83e-02NOTCH1,PTEN

76 CHD risk genes (FDR < 0.3) were used to perform String network interaction, with minimum required interaction score = 0.9.

yellow: the unknown CHD genes
h404fO.png