全基因组关联分析
单倍体型分析
LCYE associations across seasons
Mixed Model Environment Avg, Observation No. 157 2003 154 Ratio Across Environments** 2002 44 2003 156 2004 154 2005 156 2003
0.5
Ear diameter (Low population structure)
a.
0.4 Simple Q 0.3 K Q+K 0.3
b.
0.4
Simple Q 0.4
c.
Q GC Q+K 0.3 Simple
Cumulative P
K
0.2 0.2 GC 0.1 GC 0.1 0.1 0.2 Q+K
6.02
HYD1 HYD2 IspFg ZDS
7.02
8.02
9.02
DXSe
10.02
6.03 IPP1 IPP2 6.04
7.03
8.03
9.03
10.03
7.04
8.04
பைடு நூலகம்9.04
10.04
DXSc 6.05
7.05
LYCe 8.05
9.05
10.05
δ- Carotene LCY-b α- Carotene HYD-e Lutein
0 0 (0) 0.2 (0.8) 0.4 (3.3) 0.6 (7.1) 0.8 (11.9) 1 (17.4)
0
Genetic effect (Phenotypic variation explained in %)
Genetic effect (Phenotypic variation explained in %)
Sequencing partial gene in whole panel
Look for the associations based on LD
Estimate the LD of the target gene
Sequencing alignment using Biolign/Bioedit/Cluster
关联分析一些问题讨论
1)候选基因策略
2)全基因组策略
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
A A
G G
A A
A A
G G
G G
A A
G G
A A
Candidate gene selection
Population development
gene sequencing Phenotyping Association analysis
Pop.
P1 P2 P3
LCYE
SNP216 3'TE 5'TE
HYDB1 D4 6 1 3 3'TE 20 5 22 12 10 1
60 87 31
23 40 8
lycopene
LCYE LCYB
δ-carotene
LCYB
γ-carotene
LCYB
α-carotene
HYDb
β-carotene
PZA03371.2 PZB01389.1
0.110
0.052 0.430
gn1 (homeobox transcription factor)
? abi1 (ABA insensitive 1)
1383
1429 1455 1486 1497
PZA03637.3
PZA03635.1 PZB01186.1 PZA03573.4 PZA03395.2
See another presentation
Estimate the LD of the target gene
Software--- Tassel As demo by Xiaohong Show results with two way
连锁不平衡
a
A
B
b
读杨小红等 作物学报, 2007 综述
Q + K model has best Type I error control, most important when trait is related to population structure (e.g., flowering time).
Statistical power
Flowering time (High population structure)
Section 3
Association analysis --TASSEL
几个值得讨论的问题
等位基因频率
Haplotype 分析
LD的影响
等位基因频率
功能位点的频率往往是严重偏离1:1的---符合
生物学逻辑 VA基因的例子 抗旱基因的例子
GGPP
PSY PDS Z-ISO ZDS/CRTISO
0.056
0.085 0.481 0.061 0.076
set105 (SET domain-containing protein)
set104 (SET domain-containing protein) mitochondrial phosphate transporter zmet3 (DNA cytosine methyltransferase) putative SF16 protein
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
R2
0
500
1000
1500
2000
2500 bp
0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 1 250 500 750 1000 1500 2000
Population development
Total Chr1 Chr2
Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Average
Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Chr10
2-5K
Diversity inbreds are the best choice for developing an association mapping panel
Chr.
1
LD 1.5-2k 2-5k 5-10k 5-10k 1-1.5k <1k 5-10k 5-10k 1.5-2k
10M 100M 200M 2-5k 200M+
Chr1
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0 0.1k 0.2k 0.3k 0.4k 0.6k 1k 1.5k 2K 5K 10K 50K 100k 1M 5M Chr10
708
753 1003
PZB01400.2
PZB00728.1 LYCE.4
0.063
0.326 0.313
zmAO (aldehyde oxidase)
acp (acyl carrier protein) lcye(Lycopene epsilon-cyclase)
1257
1305 1379
PZB01482.3
群体结构
False positive Power
Section 2
Various association samples
e
Population structure
d
c
a
b
Familial relatedness
Yu et al., Nat Genet 38: 203-208 (2006)
G
site 21 24 144 221 307 563 SNP PZB01403.4 PZD00056.3 PZB02194.1 PZD00027.3 PZB00137.1 PZA03301.5 MAF 0.054 0.212 0.373 0.090 0.420 0.056 Candidate or nearest gene(s) zmAO(aldehyde oxidase) mads2(MADS box protein 2) ivr1(invertase gene) zmm16(putative MADS-domain transcription factor) pif3(Phytochrome Interacting Factor 3) Harpin-induced 1 domain containing protein
K
Simple Q K Q+K GC
0.4 0.5
0 0 0.1 0.2 0.3 Observed P 0.4 0.5
0 0 0.1 0.2 0.3 Observed P 0.4 0.5
0 0 0.1 0.2 0.3 Observed P
A straight diagonal line indicates an appropriate control of false positives.
0.6
0.6
Simple
0.4 GC 0.4
GC
GC
0.2
0.2
0.2
Simple Q K Q+K GC
0 (0) 0.2 (0.8) 0.4 (3.3) 0.6 (7.1) 0.8 (11.9) 1 (17.4)
0 0 (0) 0.2 (0.8) 0.4 (3.3) 0.6 (7.1) 0.8 (11.9) 1 (17.4)