当前位置:文档之家› 基因组数据分析

基因组数据分析


2 第二代测序分析工具
3 第二代测序平台数据
• illumina Hiseq2500 (solexa)


读长: 250nt 格式: fastq 读长: 50nt 格式: csfasta
• ABI SOLiD


• Roche GS FLX (454)


读长: 800~1000nt 格式: sff/fasta
• runAssembly -o outputdir (-large) 1.sff • Result files




454AllContigs.fna 454LargeContigs.fna 454ReadStatus.txt (Assembled/Singleton/Repeat) 454Contigs.ace
• Soapdenovo
/soapdenovo.html
• Velvet
/~zerbino/velvet/
• ABySS
http://www.bcgsc.ca/platform/bioinfo/software/abyss
小 RNA 测序
2 第二代测序分析工具
• 超过 1000 种分析工具

/wiki/Software/list
• 常规分析 – calling, quality control, alignment/assembly, SNP/Indel discovery, SNP annotation • 高级分析 – functional polymorphism, disease/phenotype, genomic coordinate
*Linux, 64bit CPU, 4G-256G memory
5.3 Solexa 数据
5.3 Solexa 数据
• *.contig
Contigs file
• *.scafSeq
Scaffolds file
5.4 Solid 数据
• Reads correction – SOLiD Accuracy Enhancement Tool (SAET) /gf/project/saet/
• Index reference sequences – 2bwt-builder ref.fa
• Mapping s.fq> -D <ref.fa.index> -o <output>

pair end soap -a <reads1.fq> -b <reads2.fq> -D <ref.fa.index> -o <PE_output> -2 <SE_output> -m <min_insert_size> -x <max_insert_size>
5.6 Gene and Genome Annotation
• De novo prediction


GeneScan Augustus
• Homology-based prediction
• Reference gene set
谢谢 !
4.3 Solexa 数据 : SOAP2
4.4 Solid 数据 : BioScope
4.4 Solid 数据
4.4 Solid 数据
4.5 454 数据 : newbler
• RunMapping -o outputdir ref.fa 1.sff … • 454ReadStatus.txt
• Scaffolding • Fix gap • Gene and Genomics annotation
5.1 常规分析流程
5.1 常规分析流程
5.2 de novo 分析工具
5.3 Solexa 数据
• Correction tool for SOAPdenovo
/
• Assembly – 1. SOLiD de novo Accessory Tools /gf/project/denovo/
2. Velvet /~zerbino/velvet/
5.5 454 数据



short reads: Solexa long reads: 3730, 454 reads hybrid reads: short + long reads
• SNP/INDEL Calling
4.2 常规分析工具
4.3 Solexa 数据
• BWA
/

bwa sampe ref.fa aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam
4.3 Solexa 数据 : SAM 格式
/wiki/SAM
4.3 Solexa 数据 : SOAP2
4.6 SNP/INDEL Calling
• Samtools
- /
- $ samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf - $ bcftools view var.raw.bcf | vcfutils.pl varFilter – D100 > var.flt.vcf - The VCF format (Variant Call Format):
Small InDel SNP annotation SNP annotation Genome assembly Gene expression Annotation and target prediction
小 RNA 测序
4.1 常规分析流程
• Reads correction • Assembly
4.6 SNP/INDEL Calling
• GATK: Genome Analysis Toolkit
– /gatk/
5 de novo 常规分析
5.1 常规分析流程
• Reads correction • Assembly



short reads: Solexa long reads: 3730, 454 reads hybrid reads: short + long reads
3.1 Solexa – fastq 格式
3.1 Solexa – fastq 格式
/wiki/FASTQ_format
3.2 Solid – csfasta 格式
3.3 fasta 格式
4 基因组常规分析
SNP
全基因组 / 外显子组测序
基因组 目标区域深度测序 De novo 测序 mRNA 测序 转录组
第二代测序中的数据分析 ( 基因组 )
1 第二代测序分析类型
SNP
全基因组 / 外显子组测序
基因组 目标区域深度测序 De novo 测序 mRNA 测序 转录组
Small InDel SNP annotation SNP annotation Genome assembly Gene expression Annotation and target prediction

is: bwtsw:
< 2Gb > 2Gb

• Mapping – bwa aln ref.fa short_read.fq > aln_sa.sai
• Output alignments in the SAM format – bwa samse ref.fa aln_sa.sai short_read.fq > aln.sam
• SAMtools
/
• SOAP2
/
• SOAPsnp
/soapsnp.html
4.3 Solexa 数据 : BWA
• Index reference sequences – bwa index -a is/bwtsw ref.fa
相关主题