当前位置:文档之家› 生物信息学前沿技术动态

生物信息学前沿技术动态


BGI is constructing supercomputing platform to match the requirements
Solution for research labs: Cloud computing
From genotype to Phenotype!
mRNA
Protein
Faster with less RAM usage

Half memory usage than SOAP

43 and 30 times faster for single-end and paired-end reads,
respectively
(3) SOAPsnp
- SNP detection for short reads re-sequencing
Outline



Introduction Sequencing technologies Bioinformatic analysis Applications



Tree of Life (de novo) Population evolution and Breeding (Resequencing) Disease (Resequencing) Epigenome Transcriptome Metagenomics Proteomics
Genome
Genotype
Epigenetics
Metabolome
Phenotype
ncRNA
Genotype
Intermediate Phenotype
Molecular Phenotype
Outline



Introduction Sequencing technologies Bioinformatics analysis Applications
(1) SOAP

Single-end reads alignment





25~60 bp read length Ungapped and gapped alignment Allow at most two mismatches in default One continuous gap with a size ranging from 1 to 3bp is accepted Ungapped hits have precedence over gapped hits Since 3’-end of read exhibit a much higher number of sequencing errors, SOAP can iteratively trim lowquality read end and redo alignment until hits are detected or remaining sequence is too short For multiple equal-best hits, user can instruct the program to report none, random one, or all of them
Algorithm:
Sequencing reads
SOAP Map reads onto reference genome
Use Bayes’ theorem to infer the genotype given the observed allele types and quality scores on each chromosomal site.
Published Bioinformatics Tools
SOAP
- Short Oligonucleotide Alignment Program
• BGI developed software package • Website:

• >10,000 users
• SOAPsnp:
Ruiqiang Li, Yingrui Li, Xiaodong Fang, Huanming Yang, Jian Wang, Karsten Kristiansen, Jun Wang. SNP detection for massively parallel whole genome resequencing. Genome Research. 2009
Outline



Introduction Sequencing technologies Bioinformatic analysis Applications



Tree of Life (de novo) Population evolution and Breeding (Resequencing) Disease (Resequencing) Epigenomics Transcriptomics Metagenomics Proteomics
• SOAP2:
Ruiqiang Li, Chang Yu, Yingrui Li, Tak-Wah Lam, Siu-Ming Yiu, Karsten Kristiansen, Jun Wang. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009
• SOAPindel, SOAPsv:
is coming …
• SOAPdenovo:
Ruiqiang Li, Hongmei Zhu, Jue Ruan, et al. De novo assembly of the human genomes with massively parallel short read sequencing. Genome Research. 2009
The fast-revolution of DNA sequencing technology
Plan to commercialize in this year • PacBio, (Real-time single molecular sequencing, long reads 1-10kb) • visiGene, AB (Real-time single molecular sequencing) • Ion Torrent (Semiconductor chip, measuring pH changing, quite low price)
Outline



Introduction Sequencing technologies Bioinformatic analysis Applications



Tree of Life (de novo) Population evolution and Breeding (Resequencing) Disease (Resequencing) Epigenomics Transcriptomics Metagenomics Proteomics
Start-developing technology (Nanopore, <$100/genome) • Several companies, include illumina, IBM, etc.
a. Sanger sequencing method
b. nextgeneration sequencing method
- An improved version
(2) SOAP2
Improvements:

Use Burrows Wheeler Transformation (BWT) compressed index instead of the seed algorithm No read length limitation Allow more mismatches and longer gaps for long reads Support various input and output file formats



Tree of Life (de novo) Population evolution and Breeding (Resequencing) Disease (Resequencing) Epigenomics Transcriptomics Metagenomics Proteomics
Output unpaired hits for structural variation (SV) detection

Benchmark

10M single-end Illumina/Solexa reads with length 32bp against a 5Mb human genome region. (refer to SOAP paper for details)

Paired-end reads alignment

Align a pair of reads simultaneously

A pair will be aligned when two reads are mapped with the right orientation relationship and proper distance
Prior probability of each genotype
Recalibrate sequencing quality score
Calculate likelihood of each genotype Inferred genotype via Bayes’ theorem
相关主题