生物信息学软件的使用(以MC4R基因为例)第一章从NCBI上查找DNA、mRNA、蛋白质序列一、以猪的黑素皮质素受体4(MC4R, melanocortin-4 re-ceptor)基因为例,介绍如何从NCBI 上查找DNA、mRNA、氨基酸序列。
1.首先查找MC4R的DNA序列。
在百度里输入NCBI,打开后得到的结果如下网页:在Search 栏输入“MC4R pig”,在下拉菜单里选择Gene,然后点击Search,得到如下结果:点击第一个ID为397359的链接,得到如下的结果:可以看到该基因位于猪的1号染色体上,在右下方有个“Go to nucleotide”即进入核酸序列,有三种格式(用红圈标记的),经常用的是“FASTA”和“GenBank”,“FASTA”格式的比较简洁,不包含任何的数字,就全部是碱基,序列的对比和分析是就要用到这种格式;而“GenBank”格式就比较详细,可以查看到很多信息,比如碱基数、mRNA序列、内含子、外显子、CDS,以及氨基酸序列等等之类的。
点击GenBank后得到如下结果:Sus scrofa breed mixed chromosome 1,Sscrofa10.2 DNALOCUS NC_010443 2265 bp DNA linear CON 29-SEP-2013 DEFINITION Sus scrofa breed mixed chromosome 1, Sscrofa10.2.ACCESSION NC_010443 REGION: complement(178553488..178555752) GPC_000000583 VERSION NC_010443.4 GI:347618793DBLINK BioProject: PRJNA28993Assembly: GCF_000003025.5KEYWORDS RefSeq.SOURCE Sus scrofa (pig)ORGANISM Sus scrofaEukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus.COMMENT REFSEQ INFORMATION: The reference sequence is identical toCM000812.4.On Oct 11, 2011 this sequence version replaced gi:333795951.Assembly Name: Sscrofa10.2The genomic sequence for this RefSeq record is from the genomeassembly released by the Swine Genome Sequencing Consortium asSscrofa10.2 in August 2011 (see/Projects/S_scrofa). Sscrofa10.2 is a mixed assembly of clones and contigs from the whole-genome shotgunproject AEMK00000000.1.##Genome-Annotation-Data-START##Annotation Provider :: NCBIAnnotation Status :: Full annotationAnnotation Version :: Sus scrofa Annotation Release 104Annotation Pipeline :: NCBI eukaryotic genome annotationpipelineAnnotation Software Version :: 5.1Annotation Method :: Best-placed RefSeq; GnomonFeatures Annotated :: Gene; mRNA; CDS; ncRNA##Genome-Annotation-Data-END##FEATURES Location/Qualifierssource 1..2265/organism="Sus scrofa"/mol_type="genomic DNA"/db_xref="taxon:9823"/chromosome="1"/breed="mixed"gene 1..2265/gene="MC4R"/note="melanocortin 4 receptor; Derived by automatedcomputational analysis using gene prediction method:BestRefSeq."/db_xref="GeneID:397359"mRNA join(1..681,834..2265)/gene="MC4R"/product="melanocortin 4 receptor"/inference="similar to RNA sequence, mRNA (samespecies):RefSeq:NM_214173.1"/exception="annotated by transcript or proteomic data"/note="The RefSeq transcript has 2 indels compared to this genomic sequence; Derived by automated computationalanalysis using gene prediction method: BestRefSeq."/transcript_id="NM_214173.1"/db_xref="GI:55741558"/db_xref="GeneID:397359"CDS join(534..681,834..1685)/gene="MC4R"/inference="similar to AA sequence (samespecies):RefSeq:NP_999338.1"/exception="annotated by transcript or proteomic data"/note="The RefSeq protein has 1 indel compared to thisgenomic sequence; Derived by automated computationalanalysis using gene prediction method: BestRefSeq."/codon_start=1/product="melanocortin receptor 4"/protein_id="NP_999338.1"/db_xref="GI:55741559"/db_xref="GeneID:397359"/translation="MNSTHHHGMHTSLHFWNRSTYGLHSNASEPLGKGYSEGGCYEQL FVSPEVFVTLGVISLLENILVIVAIAKNKNLHSPMYFFICSLAVADMLVSVSNGSETI VITLLNSTDTDAQSFTVNIDNVIDSVICSSLLASICSLLSIAVDRYFTIFYALQYHNI MTVKRVGIIISCIWAVCTVSGVLFIIYSDSSAVIICLITVFFTMLALMASLYVHMFLM ARLHIKRIAVLPGTGTIRQGANMKGAITLTILIGVFVVCWAPFFLHLIFYISCPQNPY CVCFMSHFNLYLILIMCNSIIDPLIYALRSQELRKTFKEIICCYPLGGLCDLSSRY" ORIGIN1 tcacagactc cccaggactt ggattggtca gaaagaagca gaggaggagc cactgtgcac61 attttttttt ccccttcaca caccataaaa atcacagagg caactaacac tcacagcaaa121 gcttcaggtt gggaactgat tctctctgcg aggcagctga tctgagcatg cgcacacaga181 ttcattcttc tcccaatagc acagcagccg ctaggaaaat tattttgaaa agacctgaat241 gcattaagac taaagttaaa gtggaagtga gaacaaaata tcaaacagca gactcgacag301 agaatgagcg tcttgaagcc taagatttca aagtgatgct aatcagagcc ctacctgaaa361 gagactaaaa actccatttc aagcttcgga gcatgtgata tttattcaca acaggcattc421 caatttcagc ctcataactt tcagacagat aaagacttgg agaaaatcgc tgaggctacc481 tgacccagga gcttaaatca ggtcagaggg gatctcaacc cacctggcgc aggatgaact541 caacccatca ccatggaatg catacttctc tccacttctg gaaccgcagc acctacggac601 tgcacagcaa tgccagtgag ccccttggaa aagagctact ctgaaggagg atgctacgag661 caactttttg tctctcctga ggtgtttgtg actctgggtg tcataagcct gt[gap 100 bp] Expand Ns813 aaacgacg gcgtctctct gaggtgtttg841 tgactctggg tgtcataagc ctgttggaga acattctggt gattgtggcc atagccaaga901 acaagaatct gcattcaccc atgtactttt tcatctgtag cctggctgtg gctgatatgc961 tggtgagcgt ttccaatggg tcagaaacca ttgtcatcac cctattaaac agcacggaca1021 cggacgcaca gagtttcaca gtgaatattg ataatgtcat tgactcagtg atctgtagct1081 ccttactcgc ctcaatttgc agcctgcttt cgattgcagt ggacaggtat tttactatct1141 tttatgctct ccagtaccat aacattatga cagttaagcg ggttggaatc atcatcagtt1201 gtatctgggc agtctgcacg gtgtcgggtg ttttgttcat catttactca gatagcagtg1261 ctgttattat ctgcctcata accgtgttct tcaccatgct ggctctcatg gcttctctct1321 atgtccacat gttcctcatg gccagactcc acattaagag gatcgccgtc ctcccaggca1381 ctggcaccat ccgccaaggt gccaacatga agggggcaat taccctgacc atcttgattg1441 gggtctttgt ggtctgctgg gcccccttct tcctccactt aatattctat atctcctgcc1501 cccagaatcc atactgtgtg tgcttcatgt ctcactttaa tttgtatctc atcctgatca1561 tgtgtaattc catcatcgat cccctgattt atgcactccg gagccaagaa ctgaggaaaa1621 ccttcaaaga gatcatctgt tgctatcccc tgggtggcct ctgtgatttg tctagcagat1681 attaaatggg gacagaggag acttataaat gcaagcataa gagactttct ccttacacag1741 tctggacaat atgcttcaac aacagcattt tcttgtaagg catcagttga gacattctat1801 tgtataaatt taagttcgtg attctgctca gtctctgtgt atttttaagg tcttgctacc1861 ttttggctgt aaaatgttta tctatactac aggttatagg cacaatggat ttataaaaaa 1921 gaaaaaagtc cttatgaaaa gttaattaat gtatcttgtc attcgaaagg atttgacaca 1981 ttgcttgttt tagtaaaatg gaaatcacag tttcattaaa tatatcctaa taaatggttg 2041 ctaatattac actatacaac gctgaagtgt agaggtttga ttctagcatt gaggggagaa 2101 atactgaaac aagtgtttaa tcattaaaaa ataagctgaa atttcaacta atttaataaa 2161 acatgctcat tctccctgtg cagaaggaga aatgaagctt ctactgggag aaaaacagtt 2221 actaaaaaaa agtgggggga tattttgagt ttgaaaacta tgttt//2.查找mRNA和氨基酸序列第一步和查DNA序列的一样,先打开NCBI,得到如下主页。