登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Black

Life is a travel

 
 
 

日志

 
 

GenBank  

2009-12-28 11:16:20|  分类: Bioinformatics |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

    Dennis A. Benson, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell and
    David L. Wheeler*
    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health,
    Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
    Received September 18, 2007; Accepted October 10, 2007

    ABSTRACT

        GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI’s retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov

     

    INTRODUCTION

        GenBank (1) is a comprehensive public database of nucleotide sequences and supporting bibliographic and biological annotation, built and distributed by the NationalCenter for Biotechnology Information (NCBI),

    a division of the National Library of Medicine (NLM),located on the campus of the US National Institutes of

    Health (NIH) in Bethesda, MD, USA. NCBI builds GenBank primarily from the submission of sequence data from authors and from the bulk submission of expressed sequence tag (EST), genome survey sequence (GSS), and other high-throughput data from sequencing centers. The US Office of Patents and Trademarks also contributes sequences from issued patents. GenBank, the European Molecular Biology Laboratory Nucleotide Sequence Database (EMBL) (2)in Europe, and the DNA Databank of Japan (DDBJ) (3)comprise the International Nucleotide Sequence Database

    Collaboration (INSDC), and are members of a longstanding collaboration in which data is exchanged daily to

    ensure a uniform and comprehensive collection of sequence information. NCBI makes the GenBank data available at no cost over the Internet, via FTP and via a wide range of Web-based retrieval and analysis services which operate on the GenBank data (4).

     

    ORGANIZATION OF THE DATABASE

        From its inception, GenBank has doubled in size about every 18 months. The traditional GenBank divisions

    contain over 80 billion nucleotide bases from more than 76 million individual sequences, with 15 million new

    sequences added in the past year. Contributions from Whole Genome Shotgun (WGS) projects supplement the data in the traditional divisions to bring the total beyond 190 billion bases. Complete genomes (www.ncbi.nlm.

    nih.gov/Genomes/index.html) continue to represent a rapidly growing segment of the database, with some 200

    of more than 570 complete microbial genomes in GenBank deposited over the past year. The number of eukaryote genomes for which coverage and assembly are significant continues to increase as well, with over 190 assemblies now available, including that of the reference human genome.

     

    Sequence-based taxonomy

        Database sequences are classified and can be queried using a comprehensive sequence-based taxonomy (www.ncbi.nlm.nih.gov/sites/entrez? db=taxonomy) developed by NCBI in collaboration with EMBL and DDBJ and with the valuable assistance of external advisers and curators.More than 260 000 named species are represented in

    GenBank and new species are being added at the rate of over 1700 per month. About 12% of the sequences in

    GenBank are of human origin and 8% of all sequences are human expressed sequence tags (ESTs). The top species in GenBank in terms of number of bases are Homo sapiens (12.7 billion bases), Mus musculus (8.3 billion), Rattus

    norvegicus (5.8 billion), Bos taurus (3.8 billion), Zea mays (3.6 billion), Danio rerio (2.8 billion), Sus scrofa

    (1.9 billion), Oryza sativa (1.5 billion), Strongylocentrotus purpuratus (1.4 billion), Xenopus tropicalis (1.1 billion) and Pan troglodytes (940 million).

     

    GenBank records and divisions

        Each GenBank entry includes a concise description of the sequence, the scientific name and taxonomy of thesource organism, bibliographic references and a table of features (www.ncbi.nlm.nih.gov/collab/FT/index.html)

    listing areas of biological significance, such as coding regions and their protein translations, transcription units,repeat regions and sites of mutations or modifications.The files in the GenBank distribution have traditionally been partitioned into divisions that roughly correspond to taxonomic groups such as bacteria (BCT), viruses (VRL), primates (PRI) and rodents (ROD). In recent years, divisions have been added to support specific sequencing strategies. These include divisions for expressed sequence tag (EST), genome survey (GSS), highthroughput genomic (HTG), high-throughput cDNA (HTC) and environmental sample (ENV) sequences,making a total of 18 divisions. For convenience in filetransfer, the GenBank data is partitioned into multiplefiles, currently more than 1300, for the bimonthly GenBank releases on NCBIs FTP site.Expressed sequence tags (ESTs). ESTs continue to be a major source of new sequence records and gene sequences, comprising over 25 billion nucleotide bases in GenBank release 161. Over the past year, the number of ESTs has increased by over 19% to a total of 45.5 million sequences representing more than 1370 different organisms.The top organisms represented in the EST division are Homo sapiens (8.1 million records), Mus musculus (4.9 million), Bos taurus (1.5 million), Sus scrofa (1.5 million), Danio rerio (1.4 million) and Arabidopsisthaliana (1.3 million). As part of its daily processing of GenBank EST data, NCBI identifies through BLAST searches all homologies for new EST sequences and incorporates that information into the companion database, dbEST (www.ncbi.nlm.nih.gov/dbEST/index.html)(5). The data in dbEST is processed further to produce the UniGene database (www.ncbi.nlm.nih.gov/sites/entrez?db=unigene) of more than 1.5 million gene-oriented sequence clusters representing over 85 organisms and described more fully in Ref. (4).Sequence-tagged sites (STSs), genome survey sequences (GSSs) and environmental sample sequences (ENV). The STS division of GenBank (www.ncbi.nlm.nih.gov/dbSTS/index.html) contains over 930 000 sequences, including anonymous STSs based on genomic sequence as well as gene-based STSs derived from the 30 ends of genes and ESTs. These STS records usually include mapping information.

        The GSS division of GenBank (www.ncbi.nlm.nih.gov/dbGSS/index.html) has grown over the past year by 29% to a total of 21 million records for over 670 organisms and contributes over 13.5 billion nucleotide bases. GSS sequences are the products of as many as 80 different experimental techniques, including metagenomic surveys of sequences arising from biological communities. However, about half of all GSS records are single reads from Bacterial Artificial Chromosomes (BAC-ends) used in a variety of genome sequencing projects. The most highly represented species in the GSS division, including metagenomic surveys, are marine metagenome (2.6 million records), Zea mays (2.1 million), Mus musculus (1.8 million) and Homo sapiens (1.1 million). The human data has been used (www.ncbi.nlm.nih.gov/projects/genome/clone/) along with the STS records in tiling the BACs for the Human Genome Project (6).The ENV division of GenBank accommodates non-WGS sequences obtained via environmental sampling methods in which the source organism is unknown.Records in the ENV division contain ENV in the keyword field and use an/environmental_sample qualifier in the source feature. As of GenBank release 161, the ENV division of GenBank contained over 600 000 sequences,comprising 403 million base pairs.High-throughput genomic (HTG) and high-throughput cDNA (HTC) sequences. The HTG division of

    GenBank (www.ncbi.nlm.nih.gov/HTGS/) contains unfinished large-scale genomic records, which are in transition to a finished state (7). These records are designated as Phase 03 depending on the quality of the data. Upon reaching Phase 3, the finished state, HTG records are moved into the appropriate organism division of GenBank. As of release 161 of GenBank, the HTG division comprised 18 billion base pairs of sequence, an increase of more than 2 billion bases over the past year.The HTC division of GenBank accommodates highthroughput cDNA sequences. HTCs are of draft quality but may contain 50UTRs and 30UTRs, partial coding regions and introns. HTC sequences which are finished and of high quality are moved to the appropriate organism GenBank division. GenBank release 161 contained more than 429 000 HTC sequences totaling 570 million bases. A project generating HTC data is described in Ref. (8).Whole Genome Shotgun (WGS) sequence. More than 101 billion bases of WGS sequence appear in GenBank as sets of WGS contigs, many of them bearing annotations originating from a single sequencing project. These sequences are issued accession numbers consisting of a 4-letter project ID, followed by a two-digit version number and a 6-digit contig ID. Hence, the WGS accession number AAAA01072744 is assigned to contig number 072744 of the first version of project AAAA.Whole Genome Shotgun (WGS) sequencing projects have contributed some 25 million contigs to GenBank, a 39% increase over last years total. These primary sequences have been used to construct 4.1 million large-scale assemblies of scaffolds and chromosomes. WGS project contigs for Homo sapiens, Pan trodlodytes, Macacca mulatta, Equus caballus, Canis familiaris, Drosophila,Saccharomyces and 800 other organisms and environmentalsamples are available. For a complete list of WGS projects with links to the data, see (www.ncbi.nlm.nih.gov/projects/WGS/WGSprojectlist.cgi).

    Although WGS project sequences may be annotated,many low-coverage genome projects do not contain annotation. Because these sequence projects are ongoing and incomplete, these annotations may not be tracked from one assembly version to the next and should be considered preliminary.Submitters ofWGS sequences, and genomic sequences in general, are urged to use a new set of evidence tags of the form/experimental=text and/inference=TYPE:text,whereTYPE is one of a number of standard inference types and text is made up of structured text. These new qualifiers replace evidence=experimental andevidence=non-experimental, respectively, which are no longer supported.

     

    Special Record types

        Third Party Annotation (TPA). Third Party Annotation (TPA) records support the reporting of published sequence annotation by a scientist other than the original submitter of the primary sequence record in DDBJ/EMBL/GenBank. TPA records fall into one of two categories, experimental, in which case there is direct experimental evidence for the existence of the annotated molecule, and inferential, in which case the experimental evidence is indirect. TPA sequences may be created by assembling a number of primary sequences. The format of a TPA record (e.g. BK000016) is similar to that of a conventional GenBank record but includes the labelTPA: at the beginning of each Definition Line and the keywords Third Party Annotation; TPA in the Keywords field. The Comment field of TPA records lists the primary sequences used to assemble the TPA sequence; the Primary field provides the base ranges of the primary sequences that contribute to the TPA sequence.

    Over 5500 TPA records are contained in GenBank release 161, including 2170 for Drosophila melanogaster, 960 for Homo sapiens, 330 for Oryza sativa and 290 for Mus musculus. TPA sequences are not released to the public until their accession numbers or sequence data and annotation appear in a peer-reviewed biological journal.TPA submissions to GenBank may be made using either BankIt or Sequin. For more information on TPA, see(www.ncbi.nlm.nih.gov/Genbank/TPA.html). GenBank CON records for assemblies of smaller records. Although many genomes, such as bacterial genomes, are represented in GenBank as single sequences,it is desirable from the standpoints of data transfer and analysis to break some very long sequences, such as portions of eukaryotic genomes, into smaller segments. In these cases, CON division records for the entire sequence are produced that contain assembly instructions to allow the seamless display and download of the full sequence. Many CON records also include annotations.

    BUILDING THE DATABASE

        The data in GenBank, and the collaborating databases EMBL and DDBJ, is submitted primarily by individual authors to one of the three databases, or by sequencing centers as batches of EST, STS, GSS, HTC, WGS or HTG sequences. Data is exchanged daily with DDBJ and EMBL so that the daily updates from NCBI servers incorporate the most recently available sequence data from all sources.

    Direct electronic submission Virtually all records enter GenBank as direct electronic submissions (www.ncbi.nlm.nih.gov/Genbank/index.html), with the majority of authors using the BankIt or Sequin programs. Many journals require authors with sequence data to submit the data to a public database as a condition of publication.GenBank staff can usually assign an accession number to a sequence submission within two working days of receipt, and do so at a rate of almost 1600 per day. The accession number serves as confirmation that the sequence has been submitted and allows readers of articles, in which the sequence is cited, to retrieve the data. Direct submissions receive a quality assurance review that includes checks for vector contamination, proper translation of coding regions, correct taxonomy and correct bibliographic citations. A draft of the GenBank record is passed back to the author for review before it enters the database. Authors may ask that their sequences be kept confidential until the time of publication. Since GenBank policy requires that the deposited sequence data be made public when the sequence or accession number is published, authors are instructed to inform GenBank staff of the publication date of the article in which the sequence is cited in order to ensure a timely release of the data. Although only the submitting scientist is permitted to modify sequence data or annotations, all users are encouraged to report lags in releasing data or possible errors or omissions to GenBank at (update@ncbi.nlm.nih.gov).NCBI works closely with sequencing centers to ensure timely incorporation of bulk data into GenBank for public release. GenBank offers special batch procedures for large-scale sequencing groups to facilitate data submission, including the program tbl2asn, described at (www.ncbi.nlm.nih.gov/Sequin/table.html).Submission using BankIt. About a third of author submissions are received through NCBIs Web-based data submission tool, BankIt (www.ncbi.nlm.nih.gov/BankIt). Using BankIt, authors enter sequence information directly into a form and add biological annotation such as coding regions or mRNA features. Free-form text boxes, list boxes and pull-down menus allow the submitter to further describe the sequence without having to learn formatting rules or restricted vocabularies. Before creating a draft record in GenBank flat file format for the submitter to review, BankIt validates submissions, flagging many common errors and checks for vector contamination using a variant of BLAST called Vecscreen.

        BankIt is the tool of choice for simple submissions,especially when only one or a small number of records is

    to be submitted (7). BankIt can also be used by submitters to update their existing GenBank records. Submission using Sequin and tbl2asn. NCBI also offers a standalone multi-platform submission program called Sequin (www.ncbi.nlm.nih.gov/Sequin/index.html) that can be used interactively with other NCBI sequence retrieval and analysis tools. Sequin handles simple sequences such as a cDNA, as well as segmented entries, phylogenetic studies, population studies, mutation studies,environmental samples and alignments for which BankIt and other Web-based submission tools are not well-suited. Sequin has convenient editing and complex annotation capabilities and contains a number of built-in validation functions for quality assurance. In addition, Sequin is able to accommodate large sequences, such as that of the 5.6 Mb Escherichia coli genome, and read in a full complement of annotations via simple tables. Versions for Macintosh, PC and Unix computers are available via anonymous FTP at (ftp.ncbi.nih.gov) in the sequindirectory. Once a submission is completed, submitters can e-mail the Sequin file to the address (gb-sub@ncbi.nlm.nih.gov).Submitters of large, heavily annotated genomes may find it convenient to use tbl2asn, referenced above under Direct submission, to convert a table of annotations generated via an annotation pipeline into an ASN.1(Abstract Syntax Notation One) record suitable for submission to GenBank.Submission of barcode sequences. The Consortium for the Barcode of Life (CBOL) is an international initiative to develop DNA barcoding as a tool for characterizing species of organisms using a short, usually a 648 bp DNA sequence derived from a portion of the cytochrome oxidase subunit I gene. NCBI, in collaboration with CBOL, (www.barcoding.si.edu/index.htm) has created an online tool for the bulk submission of barcode sequences to GenBank (www.ncbi.nlm.nih.gov/BankIt/websub/? tool=barcode) that allows users to upload files containing a batch of sequences with associated source information. It is anticipated that this tool will be used for other types of bulk submissions in the near future.


    Sequence identifiers and accession numbers

        Accession.Version.  Each GenBank record, consisting of both a sequence and its annotations, is assigned a unique identifier, the accession number that is shared across the three collaborating databases (GenBank, DDBJ, EMBL) and remains constant over the lifetime of the record even when there is a change to the sequence or annotation. Each version of the DNA sequence within a GenBank record is also assigned a unique NCBI identifier, called gi,that appears on the VERSION line of GenBank flat file records following the accession number. A third identifier of the form Accession.version, also displayed on the VERSION line of flat file records, contains the information present in both the gi and accession numbers.An entry appearing in the database for the first time has an Accession.version identifier equivalent to the ACCESSION number of the GenBank record followed by.1 to indicate the first version of the sequence for the record, e.g.: ACCESSION AF000001 VERSION AF000001:1 GI : 987654321 

        When a change is made to a sequence in a GenBank record, a new gi number is issued to the sequence and the version extension of the  Accession.version identifier is incremented. The accession number for the record as a whole remains unchanged and the older sequence remains available under the old Accession.version identifier and gi. A similar system tracks changes in the corresponding protein translations. These identifiers appear as qualifiers for CDS features in the FEATURES portion of a GenBank entry, e.g./protein_id=AAA00001.1. Protein sequence translations also receive their own unique gi number, which appears as a second qualifier on the CDS feature, e.g.:=db xref  : :GI : 1233445: ::

     

    Ensuring stable access to sequence data

        A convenient way to share the data among a set of collaborators is to post the data to a locally maintained Web site. However, if original data and updates are not simultaneously submitted to a central repository,significant problems can arise. The access lifetime of the data may be reduced. The ephemeral nature of much of the content on the Web is part of the common experience. In one attempt to quantify content lifetime, 360 randomly selected web pages were tracked for a period of four years, and a halflife of only two years was measured for the set (9). While a well-maintained web page can certainly persist for longer than two years, the relatively short half-life reported for this set of pages is worth noting. The full biological context of the data may not be realized. Even during the accessible lifetime of locally posted sequence data, the full biological context of a sequence may not be realized, if the sequence cannot be conveniently compared to othersperhaps derived from distantly related organisms that are beyond the scope of the host web page.Existing data in heavily used, centralized databases will become outdated. If updates to sequences contained within centralized databases are made to a local page, but not also made to corresponding records in a central database, the newer data will not reach the wider research community and much of its impact will be lost.Submission of sequence data to a centralized repository solves these problems. Centralized databases, such as GenBank and the other members of the INSDC, ensure stable access to sequence data by providing versioned releases available by FTP, Web interfaces to a uniform data set and archival redundancy. Combining new data with that of other researchers worldwide within a central database provides a broad biological context that stimulates discover keeping each sequence up to date magnifies the utility of all the sequences in the database.


    RETRIEVING GENBANK DATA

    The Entrez system

        The sequence records in GenBank are accessible via Entrez (www.ncbi.nlm.nih.gov/sites/gquery), a flexible database retrieval system that covers 35 biological databases. Entrez databases contain DNA and protein sequences derived from GenBank and other sources, genome maps, population, phylogenetic and environmental sequence sets, gene expression data, the NCBI taxonomy, protein domain information and protein structures from the Molecular Modeling Database, MMDB (10). Each database is linked to the scientific literature via PubMed and PubMed Central. 

    Associating sequence records with sequencing projects

        The ability to identify all GenBank records submitted by a specific group or those with a particular focus, such as metagenomic surveys, is essential for the analysis of large volumes of sequence data. The use of organism or submitter names as a means to define such a set of sequences is unreliable. The Genome Project Database, developed at NCBI and subsequently adopted across the INSDC, allows sequencing centers to register projects under a unique project identifier, enabling reliable linkage between sequencing projects and the data they produce. A new PROJECT line appearing in GenBank flat files identifies the sequencing projects with which a GenBank sequence record is associated. The PROJECT line may contain multiple identifiers of the form type and value, respectively, separated by a semicolon. As an example, the PROJECT line below associates a GenBank sequence record with Genome Project(www.ncbi.nlm.nih.gov/sites/? db=genomeprj) record entrez18787. PROJECT GenomeProject : 18787 Genome Project record 18787

    provides details of the progress made in the effort to sequence Anolis carolinensis (the green anole) (www.broad.mit.edu/models/anole/). Within the Entrez system, such a sequence record is linked directly to the appropriate Genome Project record; conversely, Genome Project records link back to associated sequence records.

    BLAST sequence-similarity searching

        Sequence-similarity searches are the most fundamental and frequent type of analysis performed on the GenBank data. NCBI offers the BLAST (www.ncbi.nlm.nih.gov/BLAST/) family of programs to detect similarities between a query sequence and database sequences (11,12). BLAST searches may be performed on NCBIs Web site (13), or via a set of standalone programs distributed by FTP. BLAST is discussed in a separate article in this issue (4). Obtaining GenBank by FTP NCBI distributes GenBank releases in the traditional flat file format as well as in the ASN.1 format used for internal maintenance. The full bimonthly GenBank release and the daily updates, which also incorporate sequence data from EMBL and DDBJ, are available by anonymous FTP from NCBI at (ftp.ncbi.nih.gov) or (www.ncbi.nlm.nih.gov/Ftp/) as well as from a mirror site at the University of Indiana (ftp://bio-mirror.net/biomirror/genbank/). The full release in flat file format is available as compressed files in the directory, genbank with a non-cumulative set of updates contained in daily-nc. A script is provided in the tools directory of the GenBank FTP site to convert a set of daily updates into a cumulative update.

    MAILING ADDRESS

        GenBank,  NationalCenter for Biotechnology Information, Building 38A, Room 3N-301-B, 8600 Rockville Pike, Bethesda, MD20894, USA. Tel: +1 301

    496 2475; Fax: +1 301 480 9241.

    ELECTRONIC ADDRESSES

    info@ncbi.nlm.nih.gov NCBI Home Page. gb-sub@ncbi.nlm.nih.gov Submission of sequence data to GenBank. update@ncbi.nlm.nih.gov Revisions to, or notification of release of confidential GenBank entries.

    info@ncbi.nlm.nih.gov General information about NCBI and services.

    CITING GENBANK

        If you use the GenBank database in your published research, we ask that this article be cited.

    ACKNOWLEDGEMENTS

        Funding to pay the Open Access publication charges for this article was provided by the Intramural Research

    Program of the National Institutes of Health, National Library of Medicine. Conflict of interest statement. None declared.

    REFERENCES

    1. Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and Wheeler,D.L. (2007) GenBank. Nucleic Acids Res., 35(Database issue), 2125.

    2. Kulikova,T., Akhtar,R., Aldebert,P., Althorpe,N., Andersson,M.,Baldwin,A., Bates,K., Bhattacharyya,S., Bower,L. et al. (2007)EMBL Nucleotide Sequence Database in 2006. Nucleic Acids Res.,35(Database issue), 1620.

    3. Sugawara,H., Abe,T., Gojobori,T. and Tateno,Y. (2007) DDBJ working on evaluation and classification of bacterial genes in INSDC. Nucleic Acids Res., 35(Database issue), 1315.

    4. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K.,Chetvernin,V., Church,D.M., DiCuccio,M., Edgar,R. et al. (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., This issue (Database issue).

    5. Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) dbEST database for expressed sequence tags. Nat. Genet., 4, 332333.

    6. Smith,M.W., Holmsen,A.L., Wei,Y.H., Peterson,M. and Evans,G.A. (1994) Genomic sequence sampling: a strategy for high resolution sequence-based physical mapping of complex genomes. Nat. Genet., 7, 4047.

    7. Kans,J. and Ouellette,B. (2001) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins chapter Submitting DNA Sequences to the Databases, John Wiley and Sons, Inc.: New York, NY, pp. 6581.

    8. Kawai,J., Shinagawa,A., Shibata,K., Yoshino,M., Itoh,M., Ishii,Y., Arakawa,T., Hara,A., Fukunishi,Y. et al. (2001) Functional annotation

    of a full-length mouse cDNA collection. Nature, 409, 685690.

    9. Koehler,W. (2002) Web page change and persistence a four-year longitudinal study. J. Am. Soc. Inf. Sci. Technol., 53, 162171.

    10. Wang,Y., Addess,K.J., Chen,J., Geer,L.Y., He,J., He,S., Lu,S., Madej,T., Marchler-Bauer,A. et al. (2007) MMDB: annotating protein sequences with Entrezs 3D-structure database. Nucleic Acids Res., 35(Database issue), 298300.

    11. Altschul,S.F., Madden,T.L., Scha¨ ffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.

    12. Zhang,Z., Scha¨ ffer,A.A., Miller,W., Madden,T.L., Lipman,D.J., Koonin,E.V. and Altschul,S.F. (1998) Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res., 26, 39863990.

    13. Ye,J., McGinnis,S. and Madden,T.L. (2006) BLAST: improvements for better sequence analysis. Nucleic Acids Res., 34(Web Serverissue), 69.

  评论这张
 
阅读(1505)| 评论(4)

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018