2013/5/23生物信息学概论2013-5提纲1. 发展简史 2. 主要研究领域 3. 软件和工具1. 发展简史1946年 1946 年美国生产出第一台全自动电子数字计算机“埃尼阿克”12013/5/231. 发展简史1955年 1955 年Frederick Sanger determined the complete amino acid sequence of insulin in 1955 and earned him his first Nobel prize in Chemistry in 1958.1. 发展简史1965年 1965 年The first Atlas of Protein Sequence and Structure contained sequence information on 65 proteins.Dr. Margaret Oakley Dayhoff (1925-1983) was a pioneer in the use of computers in chemistry and biology, beginning with her PhD thesis project in 1948. Her work was multi-disciplinary, and used her knowledge of chemistry, mathematics, biology and computer science to develop an entirely new field. She is credited today as a founder of the field of Bioinformatics.1. 发展简史1965年 1965 年First use of molecular sequences for evolutionary studiesOne of the founding fathers of the field of molecular evolutionZuckerkandl, E. and Pauling, L. (1965). "Molecules as documents of evolutionary history." Journal of theoretical biology 8(2): 357.22013/5/231. 发展简史1967年 1967 年Use of molecular sequences to build treesFitch, W. M. and Margoliash, E. (1967). "Construction of phylogenetic trees." Science 155(3760): 279-284.1. 发展简史1970年 1970 年NeedlemanNeedleman -Wunsch algorithm比较两条序列在全局范围的相似性Needleman, S. and Wunsch, C. (1970 ). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." J Mol Biol. 48(3): 443-53.1. 发展简史1974年 1974 年 First secondary structure prediction methodChou, P. Y. and Fasman, G. D. (1974). "Prediction of protein conformation." Biochemistry 13(2): 222-245.32013/5/231. 发展简史1981年 1981 年SmithSmith -Waterman algorithm比较两条序列在局部范围的相似性SMITH, T. F. and WATERMAN, M. (1981). "Identification of common molecular subsequences." J. Mol. Biol. 147: 195-197.1. 发展简史1987年 1987 年The first approach for an efficient multiple sequence alignment procedure, later implemented in CLUSTAL多序列比对算法Feng, D. and Dolittle, R. F. (1987). "Progressive sequence alignment as a prerequisite to correct phylogenetic trees." J. Mol. Eovl 60: 351-360.1. 发展简史1990年 1990 年BLAST数据库局部相似性搜索工具Altschul, S et al. (1990 ). "Basic local alignment search tool." J Mol Biol. 215(3): 403-10.42013/5/231. 发展简史:基因组计划的实施1990人类基因组计划 (Human Genome Project, HGP)开始 实施1995第一个细菌基因组被完 全测序:嗜血流感菌 (Haemophilus influnzae)52013/5/231996第一个真核生物基因 组被完全测序:酵母。
1996第一个古细菌基因组 (Methanococcus jannaschii)测序完成。
19979月,大肠杆菌K12基 因组测序结果发表。
大肠杆菌基因组大小 约为4600kb,含有约 4000个基因。
62013/5/231998完成第一个多细胞生物线 虫(C. elegans)的基因 组测序。
线虫基因大小为 97 Mbp,含有2万个基因。
20003月,完成黑腹果蝇 (Drosophila melanogaster ) 基因组测序。
2001 2/152/1672013/5/23/Portals/0/Documents/Helicos%20tSMS%20Technology%20Primer.pdfGOLD• /cgibin/GOLD/index.cgiDNA测序技术和计算机技术的 发展、基因组计划的实施,改变了 生物学的研究模式。
82013/5/23National Center for Biotechnology Information,NCBISCIENCE VOL 295 1 MARCH 2002提纲1. 发展简史 2. 主要研究领域 3. 软件和工具92013/5/232.1 序列分析• DNA sequences– – – – – genes that encode polypeptides (proteins) RNA genes regulatory sequences structural motifs repetitive sequences• Sequence alignment • Genome annotation– gene finding – junk DNA2.2 计算进化生物学• Trace the evolution of a large number of organisms by measuring changes in their DNA • Compare entire genomes, which permits the study of more complex evolutionary events– gene duplication – horizontal gene transfer – prediction of factors important in bacterial speciation.• Build complex computational models of populations • Reconstruct the now more complex tree of life2.3 文献分析• Employ computational and statistical linguistics to mine this growing library of text resources. For example:– abbreviation recognition – identify the long-form and abbreviation of biological terms, – named entity recognition – recognizing biological terms such as gene names – protein-protein interaction – identify which proteins interact with which proteins from text• The area of research draws from statistics and computational linguistics.10。