当前位置:文档之家› 生物信息学课件_L6

生物信息学课件_L6

– 广义:指下列所有的序列功能特征区域代表形式。一致性序列即是一种 序列模型,它能够从已观测到的序列字符出现的频率来预测还未观测到 的序列字符出现的频率。由于模型允许查询序列中出现部分匹配的情况, 因此常用来发现远源序列家族成员,增加分析的敏感性。 – 狭义:a single string with the most likely sequence (+/- wildcards)
• Correctness Evaluation of representation
• Mainly Storage Database for motifs/domains and their representations • How to detect/summarize motifs/domains in sequences
Previous lecture
• Tips of Blast Search
– Evaluate the significance of your results
• View the E value, Bit Scores and Coverage region • Sometimes need a reciprocal BLAST
13
/cgi/content/full/OC_sigtrans; 2001/113/re22
Question 2
How to represent these “functional region” (motifs/domains) in sequences?
14
Outline for this lecture
• Advanced Blast
– MegaBLAST is suit for the alignments of about 95% identities within the same or closely related species. – PSI-BLAST is used to get more target sequences and allow the user to select sequences to build the PSSM for the next PSI-BLAST iteration – PHI-BLAST is used to limits alignments to those that match a pattern in the query
• Mainly Storage Database for motifs/domains and their representations • How to detect/summarize motifs/domains in sequences
15
Four representation
• Consensus sequence (一致性序列)
Generally we call them sequence motif or sequence domain.
7
Outline for this lecture
• Terms (术语) of functional region in sequences • Representation (表示形式)of functional region (motifs/domains) in sequences
– a list of the nucleotide/amino acid frequencies/probability value/weight at each position
谱是指序列特征区域每一个位点上核苷酸/氨基酸残基的频率/概率值/权重。
• Hidden Markov Model (HMM, 隐马可夫模型)
• Main software for MSA
– ClustalW/X, T-coffee, MUSCLE… 2
Lect 6 特征序列的发现、总结与预测
凌毅 bioinfo_cau@
3
Pre-Question
• What is protein family? • How they were detected? • Let us find the answer in Wikipedia, the free encyclopedia using google search
– Use Request ID to recall the recent blast result within 24h
– View different reports of the blast result such as taxonomy report, search summary etc. – Adjust the Blast strategy, if the results are more/less…
• Sequences share common motifs do not directly mean they have the common ancient (homolog,同系 物/同源物).
9
Term
domain (结构域)
• Also a conserved sequence region, defined as an independent functional or structural unit. The combination of domains in a single protein determined its overall function. 结构域是蛋白质中有着特定功能/独立结构的保 守序列区域。而整个蛋白质的功能由其所含有 的多个结构域共同决定。 • Could consist of 40-700 residues, average length: 100 residues
5
序列中的 Functional region
6
In this lecture, “functional region” is defined broadly refers to subsequences of large sequences that share some common functionality.
1
Previous lecture II
• Multiple Sequence Alignment
– Definition: 3 or more sequences are partially or completely aligned. Resides in the same column means homologous, evolutional and structural meaning – Properties (usually use protein sequence for MSA, not necessary a “correct” alignment of protein family) – Features (conserved position must be important for maintain its structure/function, conserved region like hydrophobicity/hydrophilicity motif or helix/sheet motif, consistent patterns of insertions or deletions (indels) – Use of MSA (more sensitive in detecting homologs than pairwise alignment, could find functional/structural conserved residues or region in the sequences,…) – Methods to do MSA • Exhaustive or heuristic algorithms – dynamic programming, progressive, iterative, consistency-based, structure-based • Three stages of construct MSA using progressive methods – Global pairwise alignment, making guide tree, progressive align the sequence according to the order of guide tree
• Terms (术语) of functional region in sequences • Representation (表示形式)of functional region (motifs/domains) in sequences
• Correctness Evaluation of representation
8
Term
motif (模体、基序)
• A short conserved region in a DNA or protein sequence which associated with distinct functions. • Average length: 10-20, or even shorter • In proteins, motif refers to highly conserved parts of domains, but a domain may or may not include motifs within its boundaries.
10
Question1
Which part of DNA/protein sequence could form “functional region”?
11
Motifs in DNA/RNA
Domain/motifs Motifs/Domains in protein
相关主题