当前位置：文档之家› 生物信息学复习

生物信息学复习

试卷习题--课后习题--概念题1. EST【Expressed Sequence Tag (表达序列标签) :Randomly selected, partial cDNA sequence; represents it’s corresponding mRNA. dbEST is a large database of ESTs at GenBank, NCBI.】。

2. STS【Sequence Tagged Site （序列标签位点），Short cDNA sequences （200 to 500bp）of regions that have been physically mapped. STSs provide unique landmarks, or identifiers, throughout the genome. Useful as a framework for further sequencing。

】3. Sequence Alignment 【The process of lining up two or more sequences (DNA, RNA or amino acid) to achieve maximal levels of identity (and conservation, in the case of amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology】4. 序列相似性【是序列比对过程中，用来描述检测序列和目标序列之间相同DNA碱基或氨基酸残基所占比例的术语】。

5. 同源序列【是指从某一共同祖先经趋异进化而形成的不同序列】6. Algorithm（算法）【A systematic procedure for solving a problem in a finite number of steps, typically involving a repetition of operations. Once specified, an algorithm can be written in a computer language and run as a program.】7. 序列相似性搜索【将查询序列（query sequence）与整个数据库中的所有序列进行比对，从数据库中获得与其最相似序列的过程。

能最快速的获得有关查询序列的大量有价值的参考信息，对于进一步分析其结构和功能都会有很大的帮助。

】8. 序列同源性分析【是将待研究序列加入到一组与之同源，但来自不同物种的序列中进行多序列同时比较，以确定该序列与其它序列间的同源性大小】。

9. Orthologs（直系同源）【Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function】。

10. Paralogs（旁系同源）【Homologous sequences within a single species that arose by gene duplication】。

11. A Position-specific scoring matrix (PSSM) is defined as a table that contains probability information of amino acids or nucleotides at each position of an ungapped multiple sequence alignment.12. A profile is a PSSM with penalty information regarding insertions and deletions for a sequence family.13. 核酸序列预测【指利用一些计算方式（计算机程序）从基因组序列中发现基因及其表达调控元件的位置和结构的过程，包括基因预测和表达调控元件预测】。

14. ORF【一个开放阅读框（ORF, open reading frame）是一个(中间)没有终止密码子的蛋白质编码序列】。

15. Motif（模体）【A motif is a short conserved sequence pattern associated with distinct functions of a protein or DNA. It is often associated with a distinct structural site performing a particular function. A typical motif, such as a Zn-finger motif, is ten to twenty amino acids long.】16. Domain（结构域）【A domain is also a conserved sequence pattern, defined as an independent functional and structural unit. Domains are normally longer than motifs. A domain consists of more than 40 residues and up to 700 residues, with an average length of 100 residues.】17. Homology Modeling【同源建模方法：如果两个蛋白质序列在80个以上残基的序列比对中显示出25％的一致性，那么这两个蛋白质就具有相似的结构，这就是同源建模方法的理论基础。

如果一条结构未知的序列(通常称为目标序列)可以在已知结构库中找到一条或一条以上的蛋白质满足上面的条件，那么已知的结构就可以用作目标序列的结构，所用的已知的蛋白质结构通常称作模板结构。

】18. Fold Recognition【折叠识别方法：折叠模式是关于蛋白质的一个结构类，那些具有相似的二级结构组成、数目以及排列的蛋白质被归入到一个相同的折叠模式类里面。

在一个折叠模式类里面的蛋白质序列相似度不一定很高，但它们都有相似的结构特征。

据理论分析，大自然中存在的总的折叠模板类数目少于1000个。

所以就可以利用这些知识来进行蛋白质折叠结构地预测，即折叠识别的方法。

】19. GSS（Genome survey sequences：基因组综述序列）是指【（DDBJ／EMBL／GenBank 中的这个部分与EST很相似，不同之处只在于这些序列是来自于基因组，而不是cDNA （mRNA）。

GSS部分包含（但不限于）下列类型的数据：随机的基因组序列片段，cosmid/BAC/YAC末端序列（这些可能但并不必须与染色体有关），外显子标记的基因组序列，Alu PCR序列。

】20. HTGS/HTG是指【（High-throughput genome sequences：高通量基因组序列（HTG是DDBJ/EMBL/GenBank的HTGS部分）。

世界上许多测序中心正在对人类及其它高等真核生物基因组进行大规模测序工作。

一般认为将这些测序工作的中间结果放在数据库中一个单独的部分比较好，因为通常这些未完成的记录中存在许多空缺，准确性比较低，而且缺少注释，还达不到DDBJ/EMBL/GenBank记录所要求的标准。

】21. molecular clock（分子钟）是指【一种假说，认为在进化过程中核苷酸或氨基酸序列以大致固定的速率发生替换。

这样，给定标准时间和分子钟，序列的差异度就可以用来计算分子突变发生的时间。

】22. DNA物理图谱是指【（DNA链的限制性酶切片段的排列顺序，即酶切片段在DNA链上的定位）】。

试卷习题--课后习题--填空和判断题1. 三大权威核酸序列数据库分别是：美国生物技术信息中心的（GenBank）；欧洲生物信息研究所的（EMBL Nucleotide Datebase or EMBL bank）；日本遗传研究所的（DDBJ）。

2. （DEFINITION行）在GenBank记录中用以总结记录的生物意义。

包括物种来源、基因/蛋白质名称。

如果是非编码区，则包含对序列功能的简单描述；如果是编码区，则标明该序列是partial cds还是complete cds。

3. 检索号是从数据库中检索一个记录的主要关键词。

这个号码始终和序列在一起。

就是说，当序列被更新时，例如更正一个核苷酸，这个号码（不会）改变。

版本号的格式为（检索号.版本号），用于识别数据库中一条单一的特定核苷酸序列。

序列更新，版本号也将（增加），与其后的GI号平行运行。

当依核苷酸序列进行蛋白质翻译时，翻译的蛋白质发生任何改变，核苷酸序列都将被赋予一个新的GI号。

4. ________是GenBank数据库的基本文件记录格式，也是最广泛地用以表示生物序列的格式之一（GenBank flatfile（GBFF））5. NCBI所管理的生物文献数据库是（Pubmed）6. （Entrez）是NCBI所管理的数据库检索工具；（SRS）是EBI所管理的数据库检索工具。

7. EST的全称是（Expressed Sequence），中文为（表达序列标签）。

8. Entrez使用3种逻辑运算符对检索关键词做最基本的限定，分别是（and）、（or）、（not）。

9. （bankit）是向genbank在线提交数据的工具；大量的序列提交可以由（Sequin）程序完成。

10. 序列比较的基本操作是（比对/Alignment）。

11. Alignment is carried out from beginning to end of both sequences to find the best possible alignment across the entire length between the two sequences. This kind of alignment is (Global alignment).12. (Local alignment) only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions.13. 两条序列比对的质量以（得分或距离）来说明。

e商务文档

生物信息学复习

相关文档推荐：