Gi:40644130 allene oxide cyclase 【Niootiana tabacum】丙二烯氧化环化酶(allene oxide cyclase, AOC)Gi:140083805 cytosolic class II small heatshock protein HSP17.5 【Rosa hybrid cultivar】Gi:289487897 Lasoorbate peroxidasa 【Bruguiera gymnorhiza】比如是核糖体16S, 18S,或ITS等DNA序列,一般在Blast n 中搜索,到底是用megablast,discontiguous megablast还是blastn要根据你的序列与数据库序列的相似性,一般首先用blastn,它对相似性要求较低,可发现大量相似序列,如果进一步要求,再选择megablast 等。
但是注意blastn搜索数据库对核酸序列相似性要求较高,如果序列保守性不高,比如新的RNA病毒的基因组序列,可能很难得到结果,这时需要用blastp 或Blast x等。
如果是编码蛋白的基因序列,可先将其翻译成蛋白(注意,一条序列理论上有六种编码可能)然后分别去blastp搜索蛋白数据库,当然,你也可直接将其在Blast x中搜索,Blast x会自动将六种编码可能分别翻译后搜索蛋白数据库。
Blastp/PSI-Blast/PHI-BLAST都是蛋白序列与蛋白序列之间的Blast比对。
1,Blastp: 标准的蛋白序列与蛋白序列之间的比对Standard protein BLAST is designed for protein searches.Blastp用于确定查询的氨基酸序列在蛋白数据库中找到相似的序列。
跟其它的Blast程序一样,目的是要找到相似的区域。
2,PSI-BLAST : 敏感度更高的蛋白序列与蛋白序列之间的比对PSI-BLAST is designed for more sensitive protein-protein similarity searches.Position-Specific Iterated (PSI)-BLAST,是一种更加高灵敏的Blastp程序,对于发现远亲物种的相似蛋白或某个蛋白家族的新成员非常有效。
当你使用标准的Blastp比对失败时,或比对的结果仅仅是一些假基因或推测的基因序列时("hypothetical protein" or "similar to..."),你可以选择PSI-BLAST重新试试。
3,PHI-BLAST : 模式发现迭代BLASTPHI-BLAST can do a restricted protein pattern search.PHI-BLAST, 模式发现迭代BLAST, 用蛋白查询来搜索蛋白数据库的一个程序。
仅仅找出那些查询序列中含有的特殊模式的对齐。
PHI的语法详细介绍看这里:/blast/html/PHIsyntax.htmlThe syntax for patterns in PHI-BLAST follows the conventions of PROSITE. When using the stand-alone program, it is permissible to have multiple patterns in a file separated by a blank line between patterns. When using the Web-page only one pattern is allowed per query.Valid protein characters for PHI-BLAST patterns:ABCDEFGHIKLMNPQRSTVWXYZUValid DNA characters for PHI-BLAST patterns:ACGTOther useful delimiters:[ ] means any one of the characters enclosed in the brackets e.g., [LFYT] means one occurrence of L or F or Y or T- means nothing (this is a spacer character used by PROSITE)x(5) means 5 positions in which any residue is allowed (and similarly for any other single number in parentheses after x)x(2,4) means 2 to 4 positions where any residue is allowed, and similarly for any other two numbers separated by a comma; the first number should be < the second number. can occur only at the end of a pattern and means nothing it may occur before a period (another spacer used by PROSITE) may be used at the end of the pattern and means nothing.When using the stand-alone program, the pattern should be in a file, with the first line starting:ID followed by 2 spaces and a text string giving the pattern a name. There should also be a line starting PA followed by 2 spaces followed by the pattern description.All other PROSITE codes in the first two columns are allowed, but only the HI code, described below is relevant to PHI-BLAST.Here is an example from PROSITE.ID CNMP_BINDING_2; PATTERN.AC PS00889;DT OCT-1993 (CREATED); OCT-1993 (DA TA UPDA TE); NOV-1995 (INFO UPDATE).DE Cyclic nucleotide-binding domain signature 2.PA [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV].NR /RELEASE=32,49340;NR /TOTAL=57(36); /POSITIVE=57(36); /UNKNOWN=0(0); /FALSE_POS=0(0);NR /FALSE_NEG=1; /PARTIAL=1;CC /TAXO-RANGE=??EP?; /MAX-REPEAT=2;The line starting ID gives the pattern a name. The lines starting AC, DT, DE, NR, NR, CC are relevant to PROSITE users, but irrelevant to PHI-BLAST. These lines are tolerated, but ignored by PHI-BLAST.The line starting PA describes the pattern as: one of LIVMF followed by G followed by E followed by any single character followed by one of GAS followed by one of LIVM followed by any 5 to 11 characters followed by R followed by one of STAQ followed by A followed by any single character followed by one of LIVMA followed by any single character followed by one of STACVIn this case the pattern ends with a period. It can end with nothing after the last specifying symbol or any number of > signs or periods or combination thereof.Here is another example, illustrating the use of an HI line.ID ER_TARGET; PATTERN. PA [KRHQSA]-[DENQ]-E-L>. HI (19 22) HI (201 204)In this example, the HI lines specify that the pattern occurs twice, once from positions 19 through 22 in the sequence and once from positions 201 through 204 in the sequence. These specifications are relevant when stand-alone HI-BLAST is used with the seedp option, in which the interesting occurrences of the pattern in the sequence are specified. In this case the HI lines specify which occurrence(s) of the pattern should be used to find good alignments.In general, the seedp option is more useful than the standard patternp option ONLY when the pattern occurs K > 1 times in the sequence AND the user is interested in matching to J < K of those occurrences. Then using the HI lines enables the user to specify which occurrences are of interest.Disclaimer Privacy statement1、基本概念相似性(Similarity)是指序列比对过程中用来描述检测序列和目标序列之间相同或相似碱基或氨基酸残基占全部比对碱基或氨基酸残基的比例的高低,属于量的判断。