当前位置:文档之家› NCBI网站BLAST使用方法介绍

NCBI网站BLAST使用方法介绍

CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCC
Human genome statistics CCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAAC
: NCBI’s tool
科学的方法:可以认我们研究我们不懂的数据!——比较的方法
BLAST and Molecular Evolution
3000 Myr
BLAST Screening
1000 Myr
先找到相似的序列
540 Myr
再找出相似序列间的关系
MLH1
MutL
Human
Fly
Worm
Yeast
DNA Polymerase Replication
N
N OPOPOPO
O
H
H
H
H
OH
H
N
N OPOPOPO
O
H
H
H
H
H
H
NH2 N
N
NH2 N
N
传统分子技术必然会让位于BLAST为主的生物信息技术
Sanger’s ddNTP Sequencing
What does this sequence mean?
details)
Nucleotide Words
Query: GTACTGGACATGGACCCTACAGGAA
11-mer
GTACTGGACAT
WORD SIZE
default
minimum
Make a lookup TACTGGACATG
table of words
blastn
11
ACTGGACATGG
• To identify and annotate sequences • To evaluate evolutionary relationships • Other:
– model genomic structure (e.g., Spidey) – check primer specificity in silico
Global vs Local Alignment
Seq1: WHEREISWALTERNOW
(16aa)
Seq2: HEWASHEREBUTNOWISHERE (21aa)
Global
Seq1: 1
W--HEREISWALTERNOW 16
W HERE
Seq2: 1 HEWASHEREBUTNOWISHERE
7
mCeTgaGblGasAt CATG2G8 A
8
TGGACATGGAC
GGACATGGACC GACATGGACCC
ACATGGACCCT ...
Protein Words
Query: GTQITVEDLFYNIATRRKALKN
WGoTrdQsize = 3 (default)
TQI
Word size can E 5 W HERE
Seq2: 3 WASHERE 9
Local
Seq1: 1 W--HERE 5 W HERE
Seq2: 15 WISHERE 21
The Flavors of BLAST
• Standard BLAST
– traditional “contiguous” word hit – position independent scoring – nucleotide, protein and translations (blastn, blastp,
BLAST
Basic Local Alignment Search Tool
Lushan Wang 2010.11.24
生物信息的获取方式
• 1、以生物学信息为主检索数据——Entrez • 2、以序列为主检索相关信息——BLAST • 生物信息学时代BLAST相当于分子生物学
进代的“PCR”技术
– DNA vs DNA blastn
– DNA translation vs Protein blastx
– Protein vs Protein blastp – Protein vs DNA translation tblastn – DNA translation vs DNA translation tblastx
限 制
目标基因

重组 基因
传统分子生 物学方法
现代生物信 息学方法
BLAST
宿主菌
细胞转化
几周的时间 蛋白质分离纯化及性质测定
Gene family Or
Protein Family
几分钟的时间
Function annotation

BLAST
Bioinfomatics database
• RPS BLAST
– searches a database of PSSMs – tool for conserved domain searches
Basic Local Alignment Search Tool
• Widely used similarity search tool • Heuristic approach based on Smith Waterman algorithm • Finds best local alignments • Provides statistical significance • All combinations (DNA/Protein) query and database.
Basic Local Alignment Search Tool
• Why use sequence similarity? • BLAST algorithm • BLAST statistics • BLAST output • Examples
Why Do We Need Sequence Similarity Searching?
Program blastx
Query
N PPP
Database
P
PPP PPP
PPP PPP
tblastn
P
PPP N
tblastx
N PPP
N
PPP
How BLAST Works
• Make lookup table of “words” for query • Scan database for hits • Ungapped extensions of hits (initial HSPs) • Gapped extensions (no traceback) • Gapped extensions (traceback; alignment
Bacteria
Pancreatic carcinoma
Alzheimer’s Disease
Ataxia
Colon
telangiectasia cancer
如何找出序列间的相似性?
Seq 1 Seq 2
Global alignment
Seq 1 Seq 2
Local alignment
Global vs Local Alignment
Web Access
Text
Wang LS, Gao PJ, cellulase,et al.
Entrez
Sequence
BLAST
Structure
VAST

ENTER Sequences
Here
计算机怎么会读我们读不懂的数据? TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
• www, standalone, and network clients
NucleotideTranslated BLAST P rotein
Particularly useful for nucleotide sequences without protein annotations, such as ESTs or genomic DNA
blastx, tblastn, tblastx)
• Megablast
– optimized for large batch searches – can use discontiguous words
• PSI-BLAST
– constructs PSSMs automatically; uses as query – very sensitive protein search
AACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTAAACCCTAACCCTAACCCTAACC ACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCTAACCCCTAACCCTAACCCTA CTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACC ACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTCGCGGTACCCTCAGCCGGC CCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAG AACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGGCGCAG CAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGACACATGCTAGCGCGTCGGGGTGGAGGCGTGGCGC CGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTACCGCGTCCAGGGGTGGAGGCG CGCAGGCGCAGAGAGGCGCACCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGTG GCGTGGCGCAGGCGCAGAGACGCAAGCCTACGGGCGGGGGTTGGGGGGGCGTGTGTTGCAGGAGCAA CGCACGGCGCCGGGCTGGGGCGGGGGGAGGGTGGCGCCGTGCACGCGCAGAAACTCACGTCACGGTG CGGCGCAGAGACGGGTAGAACCTCAGTAATCCGAAAAGCCGGGATCGACCGCCCCTTGCTTGCAGCC CACTACAGGACCCGCTTGCTCACGGTGCTGTGCCAGGGCGCCCCCTGCTGGCGACTAGGGCAACTGC GCTCTCTTGCTTAGAGTGGTGGCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCT TGTATAGTGGTGGCACGCCGCCTGCTGGCAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGTGTA GCAGCACGCCCACCTGCTGGCAGCTGGGGACACTGCCGGGCCCTCTTGCTCCAACAGTACTGGCGGA TAGGGAAACACCCGGAGCATATGCTGTTTGGTCTCAGTAGACTCCTAAATATGGGATTCCTGGGTTT AGTAAAAAATAAATATGTTTAATTTGTGAACTGATTACCATCAGAATTGTACTGTTCTGTATCCCAC CAATGTCTAGGAATGCCTGTTTCTCCACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTG CCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTG ATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCGCAAATTTGCCGGATTTCCTTTGCTGTTCCTGC TAGTTTAAACGAGATTGCCAGCACCGGGTATCATTCACCATTTTTCTTTTCGTTAACTTGCCGTCAG
相关主题