Chapter16High-Throughput Methods for SNP GenotypingChunming Ding and Shengnan JinAbstractSingle nucleotide polymorphisms(SNPs)are ideal markers for identifying genes associated with complex diseases for two main reasons.Firstly,SNPs are densely located on the human genome at about one SNP per approximately500–1,000base pairs.Secondly,a large number of commercial platforms are available for semiautomated or fully automated SNP genotyping.These SNP genotyping platforms serve different purposes since they differ in SNP selection,reaction chemistry,signal detection,throughput,cost,and assay flexibility.This chapter aims to give an overview of some of these platforms by explaining the technologies behind each platform and identifying the best application scenarios for each platform through cross-comparison.The readers may delve into more technical details in the following chapters.Key words:Whole genome association,fine mapping,single nucleotide polymorphism,copy number variation,haplotyping.1.IntroductionSingle nucleotide polymorphisms(SNPs)are best known asgenetic markers in disease-association studies to identify genesassociated with complex diseases(1,2).However,SNPs are alsoused in many other clinically and biologically important applica-tions(3).A large variety of commercial platforms are available forsemiautomated or fully automated SNP genotyping analysis.Onthe basis of the purposes of the study,SNP genotyping can bedivided into two domains:whole genome association(WGA)andfine mapping(Fig.16.1).Most of the genotyping platforms canbe classified accordingly.This chapter aims to briefly explain theprinciples behind various platforms which lead to a comparison ofthese platforms so that the readers will get a quick overview beforedelving into the technical details of some of these methods in thefollowing chapters.A.A.Komar(ed.),Single Nucleotide Polymorphisms,Methods in Molecular Biology578,DOI10.1007/978-1-60327-411-1_16,ªHumana Press,a part of Springer Science+Business Media,LLC2003,20092452.Chemistries and Detection Methods for SNP GenotypingOver the years,a number of chemistries were developed for dis-tinguishing two alleles of a SNP.The key for their adoption in high-throughput studies is dependent on the suitability for auto-mation.An ideal chemistry has to be universally applicable to any SNP (or to a substantial proportion of all human SNPs).Addi-tionally,high automation demands minimum steps in genotyping.It may be fair to say that no single SNP genotyping platform is good enough to serve all purposes.Generally,the chemistries for SNP genotyping can be roughly divided into two types based on the key reaction allowing for the SNP detection:(1)nonenzymatic differential hybridization (see Chapters 18and 19in this volume);(2)enzymatic reactions (see Chapter 23in this volume).Differential hybridization relies on different melting tempera-tures for matched and mismatched probes binding to the target DNA sequences.The Affymetrix SNP microarray employs this principle.For each SNP,four to six probes (25-mers each)are used.Affymetrix arrays can achieve very high density to accommodate millions of probes on a single chip.The newest Affymetrix Human SNP Array 6.0contains probes for 906,600SNPs and an additional 946,000probes for asses-sing copy number variations (CNVs).All the few million probes will be hybridized to their target sequences under the same temperature and buffer condition for the same amount of time,which is ideal for automated high-throughput SNP genotyping.However,the probes have to be effective in110100100010000SNP NumberSample SizeFig.16.1.An overview of platforms with regard to throughput of single nucleotide polymorphisms and sample size.Platforms are selected on the basis of reasonable running costs.246Ding and JinHigh-Throughput Methods for SNP Genotyping247 differentiating matched and mismatched targets.The probe sequences are determined by the local SNP sequences.Con-sequently,certain SNPs with‘‘odd’’local sequences cannot be selected,even if they are crucial tagging SNPs,SNPs in reg-ulatory regions,or SNPs that can change protein coding sequences(see Note1).Another example of differential hybridization is the Taq-Man SNP assay(see Chapters18and19for details).For each SNP,two TaqMan probes specific for each allele are used. These two probes carry different fluorescent dyes.The pre-sence of an allele(or both alleles for heterozygotes)is detected by the corresponding fluorescence signal(s)generated via5’-exonuclease cleavage of the probe(s).The main draw-back for the TaqMan SNP assay is its incapability to achieve even a very modest multiplex level.However,collaboration between Applied Biosystems and BioTrove(with their Open-Array platform)has enabled3,072TaqMan reactions(each reaction has a volume of only33nL)on a single slide.This platform may be particularly powerful when an extremely high number of samples is tested.Biomark(Fluidigm)is another system capable of miniaturized TaqMan assays to enable high throughput genotyping.For SNP genotyping based on enzymatic selectivity,there are mainly two types of assays.The first one is the primer extension(or single base extension,or minisequencing;see Chapter23in this volume).An extension primer annealing to the50end of a SNP site is extended by one or just a few bases.SNP calling is based on either the incorporated fluorescent nucleotide(SNPstream)or the extension product molecular weight(MassArray iPlex Gold assay). These assays provide a low background noise since the enzymatic fidelity in incorporating the right nucleotide is extremely high. The second one is based on DNA ligation.Molecular inversion probe technology(4)developed by ParAllele Biosciences(now part of Affymetrix,and used in the Affymetrix GeneChip custom SNP kits)is one example.Another example is SNPlex(Applied Biosystems).SNPlex achieves up to48-plex by including a series of unique ZipCode TM sequences in the allele-specific probes.The corresponding ZipChute TM probes of different lengths hybridize to the ZipCode TM sequences,and are subsequently separated and detected by capillary electrophoresis.In general,differential hybridization based platforms rely entirely on hybridization thermodynamic difference between matched and mismatched pairing of probes and targets.The selec-tion of analyzable SNPs is highly dependent on the local SNP sequence.Enzymatic selectivity based platforms are less dependent on SNP local sequences and are likely to be applicable to more SNPs.However,there are often more steps involved in SNP analysis,making full automation more complicated.3.Genotyping Platforms3.1.Genotyping Platforms for WGA Studies In earlier WGA studies,it was quite common that fewer than 100,000SNPs were analyzed,since the cost was too high to include more SNPs.However,the paradigm has shifted signifi-cantly,thanks to(1)detailed HapMap data guiding the selection of tagging SNPs,and(2)vastly improved ultrathroughput(in terms of SNP number,see Note2)genotyping platforms.At the moment,the Illumina BeadArray(newest version,High Density Human1M-Duo)and the Affymetrix SNP microarray(newest version,Human SNP Array6.0)are the most widely used plat-forms in WGA studies.Although both are named as‘‘array’’and have similar through-put,these two platforms differ substantially in many aspects.First of all,they use different methods for discriminating the two alleles of a SNP.The Affymetrix microarray technology uses differential hybridization between a set of25-mer probes matching to one of the two SNP alleles.As discussed earlier,this may limit the selec-tion of SNPs.However,since the human genome contains over five million SNPs,the Affymetrix SNP array can still include close to one million SNPs.The Illumina BeadArray technology uses primer extension to distinguish the two SNP alleles.Theoretically, the enzymatic fidelity in primer extension to distinguish the two SNP alleles is extremely high,regardless of local SNP sequences. Thus,BeadArray may be less limited in SNP selection.However, extra steps of primer extension and staining must be carried out before signals can be detected.Another important difference between the two platforms is the selection of SNPs.The Illumina system places more emphasis on tagging SNPs than the Affymetrix system.This may be due to the two constrains imposed on the Affymetrix system:(1)SNP local sequence content suitable for the universal hybridization condition;(2)a complexity-reduction step through selectively amplifying200–1,100-bp fragments generated by restriction enzyme digestion.However,whether a strictly tagging SNP based selection approach is superior to a hybrid selection approach (half tagging,half random SNPs)is still being debated.Rigorous comparison is not likely to be carried out given the prohibitive cost.Additionally,it is still not entirely clear how important are the SNPs that are not in the typical haplotype blocks for identifying genes associated with complex diseases.At any rate,with more SNPs detectable on a single chip,we may be able to analyze a sufficient number of tagging and random SNPs simultaneously.There are other technical differences that may not be relevant to the end users.For example,the Illumina BeadArray layout is unique for each chip.A decoding step is needed to determine248Ding and Jingeometrically how the beads specific for the SNPs are arranged on the chip.The Affymetrix SNP array uses25-mers for SNP calling via differential hybridization,while the Illumina BeadArray uses 50-mers for target capture and primer extension via hybridization.3.2.Genotyping Platforms for Fine Mapping Fine mapping here is defined as SNP genotyping analysis at a high density for selective genomic regions.Fine mapping often follows large-scale WGA studies to zoom into potential genes associated with the disease of interest.Fine mapping studies differ from WGA studies dramatically in many aspects,notably:1.Many fewer SNPs(e.g.,fewer than1,000)are genotyped.2.Such SNPs will be highly dependent on a particular disease ofinterest.Although one SNP array(Illumina,Affymetrix,or others)can be used for WGA studies of any disease,SNPs selected for fine mapping of one disease are likely to be mostly different from those selected for fine mapping of another disease3.Fine mapping may involve a larger sample size.In summary,fine mapping will require the genotyping of fewer (fewer than1,000)SNPs highly specific for each disease for a larger sample size.Once a WGA study has been done and potential targets have been identified,fine mapping should be performed immediately. Additionally,since potentially any SNP can be directly disease causing,it is essential to achieve a high call rate(call rate is defined as the success rate for correctly genotyping the entire SNP panel). Additionally,cost is also an issue to consider(see Note3).For these reasons,a good genotyping platform for fine mapping should achieve a high call rate for all selected SNPs,without time-consuming assay optimization processes,and at a relatively high multiplex level(e.g.,more than24SNPs for each individual reaction).SNP calling based entirely on differential hybridization is unli-kely to be highly successful in fine mapping.It may be very difficult if one needs to design discriminating probes for all1,000selected SNPs as the local sequences of these SNPs may have very different thermodynamic profiles(see Note4).Possibly for this reason, Affymetrix acquired ParAllele Biosciences for its molecular inver-sion probe technology for custom SNP genotyping arrays.The custom SNP genotyping arrays do not rely on differential hybridi-zation for SNP calling.Primer extension and allele-specific ligation-based platforms are more suitable for fine mapping applications.A number of commercial platforms are available(Table16.1).Since systematic and direct comparison of these platforms is not available,we will have to rely on company application notes and publications report-ing use of each technology for a rough comparison.High-Throughput Methods for SNP Genotyping249T a b l e 16.1C o m p a r i s o n o f f i n e -m a p p i n g g e n o t y p i n g p l a t f o r m sP l a t f o r mP r o v i d e r C h e m i s t r y D e t e c t i o n N u m b e r o f S N P sN u m b e r o f s a m p l e sN o t eU R LS N P s i n g l e n u c l e o t i d e p o l y m o r p h i s m aN o t t r u e m u l t i p l e x i n g ,64u n i p l e x T a q M a n S N P a s s a y s i n 64d i f f e r e n t n a n o h o l e s .250Ding and JinHigh-Throughput Methods for SNP Genotyping251Two platforms actually significantly surpass the arbitrary1,000SNPs cutoff mentioned earlier.The Illumina iSelectBeadArray uses single base extension,the same underlying chem-istry and detection as the High Density Human1M-Duo array,for genotyping up to60,800user-selected SNPs from12sampleson a single chip.The Affymetrix GeneChip custom SNP kits usethe molecular inversion probe technology acquired from ParAlleleBiosciences.These custom arrays can analyze3,000,5,000,or10,000user-selected SNPs for a single sample.One drawback forthese two platforms is the turnaround time,since at least3monthsis required for assay designs and array delivery.For a typical fine mapping project following a WGA study,itmight not be necessary to analyze tens of thousands of SNPs.Thus,a higher sample number throughput at a reasonable SNP numberthroughput(fewer than1,000SNPs)may be preferred.To this end,a few platforms are great choices for fine mapping,including theMassArray system(Sequenom)(see Chapter20in this volume),SNPlex(Applied Biosystems),and SNPstream(Beckman Coulter,in collaboration with Orchid Cellmark).These platforms can allachieve multiplex genotyping at20-plex or more routinely for96or384different reactions on a single plate.They are highly flexiblein several ways.Firstly,the throughput of SNP number and samplesize can be balanced at the users’discretion.Secondly,the turn-around time for assay design and delivery of reagents is much fasterthan the custom arrays from Illumina and Affymetrix(Table16.1).Failed SNP assays can be redesigned and reordered quickly.Unlessthe SNP number to be analyzed is well above1,000,these platformsmay be the first choices.4.New Advancesand OtherOutstanding IssuesThere are at least two exciting features about genomic research.One is the constant development of better and more affordabletechnologies(just like personal computers).The other feature isthe acquisition of new insights into gene structure and func-tion.One such example is the Vs are much lessfrequently found in the human genome than SNPs,with prob-ably around a few thousand to tens of thousand CNVs in theentire human genome.However,these variations involve muchlarger DNA segments,ranging from a few kilobases to a fewmegabases(5).Their importance in human health is manifestedby a number of diseases,such as CHARGE syndrome(6)andParkinson’s disease(7).252Ding and JinThe platform suppliers have taken notice of the importance ofCNVs.Both the Affymetrix Human SNP Array6.0and the IlluminaHigh Density Human1M-Duo offer good coverage for CNV analy-sis.For example,the Human SNP Array6.0targets3,182distinct,nonoverlapping segments with on average61probe sets per region.Earlier versions of these platforms have been used for CNV analysis(8–11).It is foreseeable that CNV analysis will be part of most,if not all,WGA studies.Other platforms are likely to follow the trend.Given thelimited number of CNVs in the human genome,fine mappinggenotyping platforms may also be useful for validation studies.Forexample,the MassArray iPlex platform will launch the CNV gen-otyping application by2008.Serious limitations in SNP genotyping are still present though.Atleast two of them are worth mentioning.The first one is SNP coveragefor different ethnic groups.The statistics provided by the best WGAplatforms are based on a very limited number of ethnic groups.Forexample,CHB(Han Chinese in Beijing)is not likely to represent allpeople in China,given that there are56distinct ethnic groups inChina.It may be necessary to include more SNPs for better coverageof other ethnic groups.Another limitation is on haplotype analysis.Allthe platforms mentioned in this chapter,when used in their standardformat,cannot achieve direct molecular haplotyping.Instead,statis-tical methods are used to infer haplotype information.Ultimately,the best solution to all the issues mentioned above,especially related to better and robust identification of the genesassociated with complex diseases,may come from the fourth-generation(see Note5and Chapter5in this volume),probablysingle molecule based,capable of sequencing the human genome forless than US$1,000.5.SummaryScientists and engineers have come a long way developing a wideselection of SNP genotyping platforms.It is now prime time tocarry out WGA studies to identify genes associated with complexdiseases,potentially yielding biomarkers for disease diagnosis andprognosis,and targets for drug development.Both a WGA plat-form and a fine mapping platform may be needed for a compre-hensive study.The technology will continue to be improved toinclude more SNPs.New technology(e.g.,for whole genomesequencing at low cost;see also Chapters5and6in this volume)will likely appear in the next5–10years and a paradigm shift inWGA studies may happen then.6.Notes1.At a fixed hybridization temperature,robust differential hybridi-zation may not be achieved for matched and mismatched targetsif a local SNP sequence has very high or very low GC content.2.Throughput is often defined by the number of SNPs that can begenotyped in one run,but this might not be entirely accurate as inmany situations(particularly when SNPs are served as biomarkersfor molecular diagnosis)DNA sample throughput may be moreimportant.3.Genotyping1,000SNPs for2,000samples(a total of twomillion SNP genotyping assays)is a lot more costly thangenotyping one million SNPs for two samples.In addition,since the1,000SNPs are highly dependent on the disease ofinterest,custom designs and even assay optimization areneeded,which further adds to the cost and time.4.To design a hybridization-based SNP microarray for the selected1,000SNPs is a lot more difficult than for a panel of any1,000SNPs.For the latter,the designer can choose any1,000SNPsfrom more than five million SNPs available by selecting thoseSNPs located in sequences with similar thermodynamic profiles.5.We consider the slab gel sequencing the first generation,capillary sequencing the second generation,and the Roche454,Illumina Genome Analyzer,and Applied BiosystemsSOLiD platforms as the third generation. AcknowledgementsC.D.is supported by the Stanley Ho Centre for Emerging Infec-tious Diseases and the Li Ka Shing Institute of Health Sciences. References1.Glazier,A.M.,Nadeau,J.H.and Aitman,T.J.(2002)Finding genes that underlie complex traits.Science298,2345–2349. 2.Becker,K.G.,Barnes,K.C.,Bright,T.J.and Wang,S.A.(2004)The genetic associa-tion database.Nat.Genet.36,431–432. 3.Ding,C.(2007)’Other’applications of sin-gle nucleotide polymorphisms.Trends Bio-technol.25,279–283.4.Hardenbol,P.,Baner,J.,Jain,M.,Nilsson,M.,Namsaraev, E. A.,Karlin-Neumann,G.A.,Fakhrai-Rad,H.,Ronaghi,M.,Willis,T.D.,Landegren,U.and Davis,R.W.(2003)Multiplexed genotyping with sequence-tagged molecular inversion probes.Nat.Biotechnol.21,673–678.5.Iafrate, A.J.,Feuk,L.,Rivera,M.N.,Listewnik,M.L.,Donahoe,P.K.,Qi,Y.,High-Throughput Methods for SNP Genotyping253Scherer,S.W.and Lee,C.(2004)Detection of large-scale variation in the human gen-ome.Nat.Genet.36,949–951.6.Jongmans,M.C.,Admiraal,R.J.,van derDonk,K.P.,Vissers,L.E.,Baas,A.F., Kapusta,L.,van Hagen,J.M.,Donnai,D., de Ravel,T.J.,Veltman,J.A.,Geurts van Kessel,A.,De Vries,B.B.,Brunner,H.G., Hoefsloot,L.H.and van Ravenswaaij,C.M.(2006)CHARGE syndrome:the pheno-typic spectrum of mutations in the CHD7 gene.J.Med.Genet.43,306–314.7.Singleton, A. B.,Farrer,M.,Johnson,J.,Singleton,A.,Hague,S.,Kachergus,J.,Huli-han,M.,Peuralinna,T.,Dutra,A.,Nussbaum, R.,Lincoln,S.,Crawley,A.,Hanson,M., Maraganore,D.,Adler,C.,Cookson,M.R., Muenter,M.,Baptista,M.,Miller,D.,Blan-cato,J.,Hardy,J.and Gwinn-Hardy,K.(2003)alpha-Synuclein locus triplication causes Parkinson’s disease.Science302,841.8.Redon,R.,Ishikawa,S.,Fitch,K.R.,Feuk,L.,Perry,G.H.,Andrews,T.D.,Fiegler,H., Shapero,M.H.,Carson,A.R.,Chen,W., Cho,E.K.,Dallaire,S.,Freeman,J.L., Gonzalez,J.R.,Gratacos,M.,Huang,J., Kalaitzopoulos,D.,Komura,D.,MacDonald, J.R.,Marshall,C.R.,Mei,R.,Montgomery, L.,Nishimura,K.,Okamura,K.,Shen,F., Somerville,M.J.,Tchinda,J.,Valsesia,A., Woodwark,C.,Yang,F.,Zhang,J.,Zerjal, T.,Armengol,L.,Conrad,D.F.,Estivill,X., Tyler-Smith,C.,Carter,N.P.,Aburatani,H.,Lee,C.,Jones,K.W.,Scherer,S.W.andHurles,M.E.(2006)Global variation incopy number in the human genome.Nature444,444–454.9.Bae,J.S.,Cheong,H.S.,Kim,J.O.,Lee,S.O.,Kim,E.M.,Lee,H.W.,Kim,S.,Kim,J.W.,Cui,T.,Inoue I.,and Shin,H.D.(2008)Identification of SNP markers forcommon CNV regions and association ana-lysis of risk of subarachnoid aneurysmalhemorrhage in Japanese population.Bio-mun.373,593–596.10.Blauw,H.M.,Veldink,J.H.,van Es,M.A.,van Vught,P.W.,Saris,C.G.,van derZwaag, B.,Franke,L.,Burbach,J.P.,Wokke,J.H.,Ophoff,R.A.and van denBerg,L.H.(2008)Copy-number variationin sporadic amyotrophic lateral sclerosis:agenome-wide ncet Neurol.7,319–326.11.Gunnarsson,R.,Staaf,J.,Jansson,M.,Ottesen,A.M.,Goransson,H.,Liljedahl,U.,Ralfkiaer,U.,Mansouri,M.,Buhl,A.M.,Smedby,K. E.,Hjalgrim,H.,Syvanen,A.C.,Borg,A.,Isaksson,A.,Jurlander,J.,Juliusson,G.and Rosenquist,R.(2008)Screening for copy-number alterations andloss of heterozygosity in chronic lympho-cytic leukemia–a comparative study of fourdifferently designed,high resolutionmicroarray platforms.Genes ChromosomesCancer47,697–711.254Ding and Jin。