DNA Data Bank of Japan DNA Database Release 79.0, Sep. 2009, including 108,593,519 entries, 106,684,379,504 bases Last published date in the present release: August 28, 2009 ------------------------------------------------------------------------------- Table of contents ------------------------------------------------------------------------------- 1. Introduction 1.1. Announcement for changes in the present release 1.2. Announcement for the forthcoming changes 2. DDBJ flat file format 2.1. LOCUS line 2.2. DEFINITION line 2.3. ACCESSION line 2.4. VERSION line 2.5. KEYWORDS line 2.6. SOURCE line 2.7. REFERENCE line 2.8. COMMENT line 2.9. FEATURES line 2.10. BASE COUNT line 2.11. ORIGIN line 3. Dataset categories 3.1. Division categories 3.2. TPA separated from primary dataset 3.3. Notice for patent related sequence data 4. DDBJ staff 5. Acknowledgment 6. File categories 7. Sample of the contents in each file 7.1. Part of the contents in the file 'ddbjbct1.seq' 7.2. Part of the contents in the accession number index file 'ddbjacc1.idx' 7.3. Part of the contents in the keyword phrase index file 'ddbjkey1.idx' 7.4. Part of the contents in the journal citation index file 'ddbjjou1.idx' 7.5. Part of the contents in the gene name index 'ddbjgen.idx' 8. Release history 9. File list ------------------------------------------------------------------------------- 1. Introduction The present release contains the newest data prepared by the DNA Data Bank of Japan (DDBJ), GenBank (*), and EMBL-Bank/European Bioinformatics Institute (EMBL-Bank/EBI) as of August 28, 2009. This unified database was made possible thanks to the international collaboration among the three data banks. All the entries have accordingly been annotated using the feature keys common to them. In 2005, DDBJ, EMBL-Bank and GenBank agreed to call their collaboration "the International Nucleotide Sequence Database Collaboration (INSDC); http://www.insdc.org " and to call the unified nucleotide sequence database "the International Nucleotide Sequence Database (INSD)". *'GenBank' is a trademark of NIH, USA, and is operated by National Center for Biotechnology Information (NCBI) at NIH. This database may be copied and redistributed without permission on the condition that all the statements in this release note are reproduced in each copy. See also '3.3. Notice for patent related sequence data' below. 1.1. Announcement for changes in the present release A new line, DBLINK, has replaced PROJECT line in the present DDBJ release. Following the agreement at the INSD collaborative meeting in 2008, the scope of the project ID has expanded to include projects that are not necessarily targeted to the sequencing of a complete genome. In addition, there are other resources such as the Trace Assembly Archive at the NCBI and the like. Therefore, we have decided to replace the PROJECT line by a new line format, "DBLINK". The replacement is illustrated in the following; From the use of the PROJECT line (-rel. 78); ------------------------------------------------------------------------------- LOCUS AP000000 4700000 bp DNA circular BCT 27-FEB-2009 DEFINITION Escherichia coli DDBJ genomic DNA, complete genome. ACCESSION AP000000 VERSION AP000000.1 PROJECT GenomeProject:99999 KEYWORDS . ------------------------------------------------------------------------------- To the DBLINK line format (rel. 79-); ------------------------------------------------------------------------------- LOCUS AP000000 4700000 bp DNA circular BCT 27-FEB-2009 DEFINITION Escherichia coli DDBJ genomic DNA, complete genome. ACCESSION AP000000 VERSION AP000000.1 DBLINK Project:99999 KEYWORDS . ------------------------------------------------------------------------------- 1.2. Announcement for the forthcoming changes Revision of the DDBJ/EMBL/GenBank Feature Table: Definition: Following the agreement at the INSD collaborative meeting in 2009, the document, DDBJ/EMBL/GenBank Feature Table: Definition, will be revised in October, 2009. See also '2.9. FEATURES line' below. The revised points are introduced in advance on the following URL; http://www.ddbj.nig.ac.jp/insdc/icm2009-e.html#ft At DDBJ, the retrofit for this revision will be completed by the next periodical release to be published in December 2009. Please note that during this transitional period, some entries will be retrofitted. 2. DDBJ flat file format The database is a collection of "entry" which is the unit of the data. The entries submitted to databanks were processed and publicized according to the DDBJ format for distribution (flat file). The flat file includes the sequence and the information of submitters, references, source organisms, and "feature" information, etc. The items of the DDBJ flat file are explained at following; ------------------------------------------------------------------------------- LOCUS AB000000 450 bp mRNA linear HUM 08-JUL-2002 DEFINITION Homo sapiens GAPD mRNA for glyceraldehyde-3-phosphate dehydrogenase, partial cds. ACCESSION AB000000 VERSION AB000000.1 KEYWORDS . SOURCE Homo sapiens ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 450) AUTHORS Mishima,H. and Shizuoka,T. TITLE Direct Submission JOURNAL Submitted (30-NOV-2000) to the DDBJ/EMBL/GenBank databases. Contact:Hanako Mishima National Institute of Genetics, DNA Data Bank of Japan; 1111, Yata, Mishima, Shizuoka 411-8540, Japan REFERENCE 2 (sites) AUTHORS Mishima,H., Shizuoka,T. and Fuji,I. TITLE Glyceraldehyde-3-phosphate dehydrogenase expressed in human liver JOURNAL Unpublished (2002) COMMENT Human cDNA sequencing project. FEATURES Location/Qualifiers source 1..450 /chromosome="12" /clone="GT200015" /clone_lib="lambda gt11 human liver cDNA (GeneTech. No.20)" /map="12p13" /mol_type="mRNA" /organism="Homo sapiens" /tissue_type="liver" CDS 86..>450 /codon_start=1 /gene="GAPD" /product="glyceraldehyde-3-phosphate dehydrogenase" /protein_id="BAA12345.1" /transl_table=1 /translation="MAKIKIGINGFGRIGRLVARVALQSDDVELVAVNDPFITTDYMT YMFKYDTVHGQWKHHEVKVKDSKTLLFGEKEVTVFGCRNPKEIPWGETSAEFVVEYTG VFTDKDKAVAQLKGGAKKV" BASE COUNT 102 a 119 c 131 g 98 t ORIGIN 1 cccacgcgtc cggtcgcatc gcacttgtag ctctcgaccc ccgcatctca tccctcctct 61 cgcttagttc agatcgaaat cgcaaatggc gaagattaag atcgggatca atgggttcgg 121 gaggatcggg aggctcgtgg ccagggtggc cctgcagagc gacgacgtcg agctcgtcgc 181 cgtcaacgac cccttcatca ccaccgacta catgacatac atgttcaagt atgacactgt 241 gcacggccag tggaagcatc atgaggttaa ggtgaaggac tccaagaccc ttctcttcgg 301 tgagaaggag gtcaccgtgt tcggctgcag gaaccctaag gagatcccat ggggtgagac 361 tagcgctgag tttgttgtgg agtacactgg tgttttcact gacaaggaca aggccgttgc 421 tcaacttaag ggtggtgcta agaaggtctg // ------------------------------------------------------------------------------- 2.1. LOCUS line The format of LOCUS line in the flat file is shown below; --------- -------- Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-28 Locus name 29-29 space 30-40 Length of sequence, right-justified 41-41 space 42-43 'bp' 44-47 spaces 48-54 DNA, RNA, mRNA, rRNA, tRNA or cRNA, left justified 55-55 space 56-63 'linear' followed by two spaces, or 'circular' 64-64 space 65-67 The division code (see '3.1. Division categories') 68-68 space 69-79 Date, in the form dd-MMM-yyyy (e.g., 08-JUL-2002) ------------------------------------------------------------------------------ 2.2. DEFINITION line The definition briefly describes the information of gene(s). "DEFINITION" is constructed by each of the three data banks. 2.3. ACCESSION line This line shows accession number of the entry data. A unique accession number is issued to the data submitter by each of the three data banks. The accession number is composed of 1 alphabet character and 5 digits (ex. A12345) or 2 alphabet characters and 6 digits (ex. AB123456). The former style was used in 1980s, but later the latter style was introduced because of data explosion. All the entries designated by the accession numbers with the prefixes given below have been collected and processed by DDBJ, and the rest have been done by GenBank and EMBL/EBI. ------------------------------------------------------------------------------- C, D, E, AB, AG, AK, AP, AT, AU, AV, BA, BB, BD, BJ, BP, BR, BS, BW, BY, CI, CJ, DA, DB, DC, DD, DE, DF, DG, DH, DI, DJ, DK, DL, DM, FS, FT ------------------------------------------------------------------------------- You can find the list of the prefixes of the accession numbers at the following URL; http://www.ddbj.nig.ac.jp/sub/prefix.html If multiple entries are united to an entry, or if an entry is extensively modified after the submission, the responsible data banks may assign a new accession number to it. In these cases, the new accession number is called the primary accession number, and the old accession number(s) is/are called the secondary accession number(s). In the flat file, the primary accession number is indicated first, then the secondary accession number(s) follows. You can find the same updated entry with both the primary and the secondary accession numbers. 2.4. VERSION line This line consists of an accession number and a version number, like "AB123456.1", in which the digit(s) after the period is a version number. The data open to public for the first time is version number as "1". The reason for adding VERSION is that since a released sequence sometimes revised by the submitter, the accession number alone cannot specify the sequence in question causing the user a trouble. The number is increased by one every time when a revised sequence is made public. 2.5. KEYWORDS line The data banks describe this line, if necessary. In many cases, the categories of the data (EST, HTG etc.), gene names and product names included in "KEYWORDS". 2.6. SOURCE line This line shows the scientific name on organism from which the sequence is obtained and an organelle type if the sequence is derived from an organelle other than the nucleus. 2.7. REFERENCE line The information on the submitters and references related to the submitted sequence is indicated in REFERENCE line. 2.8. COMMENT line. The information about an entry that can not be described using FEATURES or the other fields. 2.9. FEATURES line Biological features of a submitted sequence data are described with "Feature" key (the biological nature of the annotated feature), "Location" (the region of the sequence which corresponds to Feature), and "Qualifier" (supplementary information about Feature). The "Feature" and "Qualifier" keys used in the present release is defined by DDBJ/EMBL/GenBank Feature Table: Definition (Version 8.0 October, 2008). The document is continuously updated every half year. You can find its newest version on URL; http://www.ddbj.nig.ac.jp/FT/full_index.html 2.10. BASE COUNT line In the BASE COUNT line of the DDBJ flat file, 9 digits are allocated for each number of a (adenine), c (cytosine), g(guanine) and t (thymine). In the case of RNA sequence, uracil is indicated as "t" according to the rule of the international nucleotide database. In accordance with the relaxation of sequence length limitation, GenBank had already dropped the BASE COUNT line from their flat file format from GenBank Release 138 (Oct. 2003). DDBJ has decided to maintain the BASE COUNT line in our flat file format from the view that GC contents are still important information to characterize the sequence. 2.11. ORIGIN line The sequence data starts from the next line of ORIGIN. The sequence is indicated as lower case letters, delimited by space per 10 bases, starts a new line by 60 bases. The numbers described at left side of lines mean the ordinal number of the top base of the line. 3. Dataset categories There have been a number of genome projects going on worldwide. Among them human genome projects have probably been most productive and yielded a large number of ordinary sequences, huge amounts of genome sequences and EST (expressed sequence tags). Thus, we DDBJ have the human (HUM) division solely for human sequences and the primate (PRI) division for non-human primate sequences, while PRI division of GenBank database contains human sequences too. Note that the other divisions such as EST, GSS, and HTC may also contain human sequences. The present release is divided into 22 categories of organisms and others. See also '6. File categories' and '9. File list' below. The contents of the 22 categories are shown in the following. 3.1. Division categories The first 21 divisions are given below; HUM; human PRI; primates (other than human) ROD; rodents MAM; mammals (other than primates and rodents) VRT; vertebrates (other than mammals) INV; invertebrates (animals other than vertebrates) PLN; plants, fungi, plastids (eukaryotes other than animals) BCT; bacteria (including both Eubacteria and Archaea) VRL; viruses PHG; bacteriophages ENV; sequences obtained via environmental sampling methods SYN; synthetic constructs EST; expressed sequence tags; short single pass cDNA sequences GSS; genome survey sequences; short single pass genomic sequences TSA; transcriptome shotgun assemblies HTC; high throughput cDNA sequences; The sequence submitted from cDNA sequencing projects except for EST. This division is to include unfinished high throughput cDNA sequences, each of which has 5'UTR and 3'UTR at both ends and part of a coding region. The sequence may also include introns. When the sequence becomes finished later, it moves to the corresponding taxonomic division. HTG; high throughput genomic sequences The sequence submitted mainly from genome sequencing projects which regarded a clone as a sequencing unit. STS; sequence tagged sites The tag site for genome sequencing. The information of chromosome, map, PCR_condition is mandatory for this division. PAT; sequence data related to patent application The data those which the Japanese Patent Office (JPO), United States Patent and Trademark Office (USPTO), the European Patent Office (EPO), and Korean Intellectual Property Office (KIPO) collected, processed and released. See also '3.3. Notice for patent related sequence data' below. UNA; the data not annotated The UNA division is not used for recently submitted sequences. CON; Contig / Constructed To conjugate a series of entries, such as those submitted from a genome project, each of the three data banks constructs an entry and assign an accession number to a large scale sequence dataset. Such entries are classified into the CON division. The entry in the CON division has the information of joined accession numbers instead of the sequence data. The corresponding entries of the CON entry have been submitted to other divisions. The entries and bases in the CON division are not counted in the released numbers given on the top of the release note. 3.2. TPA separated from primary dataset TPA (Third Party Annotation) data are also available. The TPA data are a complement to the existing DDBJ/EMBL/GenBank comprehensive database of primary nucleotide sequences, which typically result from direct sequencing of cDNAs, ESTs, genomic DNAs etc. Primary entries are defined to be data for which the submitting group has done the sequencing and annotation, and as 'owner' of these data has privileges to submit updates/corrections etc. Primary entries used to build a TPA sequence are those that have been experimentally determined and are publicly available in the DDBJ/EMBL/GenBank databases. They may not be from a proprietary database. The entries and bases in TPA are not counted in the released numbers given on the top of the release note. See also the following URLs; http://www.ddbj.nig.ac.jp/sub/tpa-e.html http://www.insdc.org/TPA.html 3.3. Notice for patent related sequence data This release includes PAT division for patent related sequence data as described above. The data those which Japanese Patent Office (JPO), United States Patent and Trademark Office (USPTO), European Patent Office (EPO), and Korean Intellectual Property Office (KIPO) collected, processed and released. The prefixes of accession numbers for the patent related sequence data are shown below; ------------------------------------------ JPO : E, BD, DD, DJ, DL, DM KIPO : DI USPTO: I, AR, DZ, EA, GC, GP EPO : A, AX, CQ, CS, FB, GM, GN, HA, HB ------------------------------------------ Note also that unauthorized use of the patented data may cause legal issues for which DDBJ takes no responsibility. 4. DDBJ staff This release is published by the following DDBJ staff. Jun Mashima, Hideo Aono, Yoshiyuki Ehara, Mayumi Ejima, Masato Endo, Masahiro Fujimoto, Daisuke Fukuda, Mariko Gojobori, Tatsukazu Hashimoto, Tomohiro Hirai, Fumie Hirata, Nobuhiro Hoshi, Takuya Hosokawa, Tsutomu Ikesaka, Kazuya Kanno, Shingo Kawahara, Tatsuko Kawamoto, Takahiro Kazama, Satoshi Kitadate, Wataru Kodachi, Yuichi Kodama, Junko Kohira, Tomohiro Koike, Takehide Kosuge, Kyungbum Lee, Mika Maki, Haruka Mamiya, Hisako Mashima, Kimiko Mimura, Naoko Murakata, Sachiko Nagira, Masahiko Nagura, Asami Nozaki, Toshihisa Okido, Katsunaga Sakai, Tomonori Sangawa, Satoshi Saruhashi, Makoto Sato, Yukie Shinyama, Rie Sugita, Kimiko Suzuki, Kazuya Takei, Wataru Tanabe, Haru Tsutsui, Hiroaki Yamada, Keisuke Yamamoto, Kenji Yamamoto, Makoto Yamamoto, Emi Yokoyama, Takashi Gojobori, Eli Kaminuma, Osamu Ogasawara, Kosaku Okubo, Toshihisa Takagi and Yasukazu Nakamura Center for Information Biology and DNA Data Bank of Japan National Institute of Genetics Research Organization of Information and Systems Mishima 411-8540, Japan Phone: +81 55 981 6853 FAX: +81 55 981 6849 E-mail: ddbj@ddbj.nig.ac.jp (for general inquiry) ddbjsub@ddbj.nig.ac.jp (for data submission) ddbjupdt@ddbj.nig.ac.jp (for updates and notification of publication) WWW: http://www.ddbj.nig.ac.jp/ (for DDBJ WWW server) http://sakura.ddbj.nig.ac.jp/ (for DDBJ sequence data submission system) 5. Acknowledgment We are grateful to NCBI and EBI for a firm friendship and an excellent collaboration with us. We also thank the Japanese Patent Office for a steady cooperation with us. The operation of DDBJ is supported by the Ministry of Education, Culture, Sports, Science and Technology, and we would gratefully note this here. DDBJ uses the Super-SINET computer network for data collection, data exchange and various services. 6. File categories This release covers 22 categories (see also '3. Dataset categories'.) of organisms and others as follows: ------------------------------------------------------------------------------ ddbjbct; Category for bacteria ddbjcon; Category for CON (contig sequences) ddbjenv; Category for ENV (environmental samples) ddbjest; Category for EST (expressed sequence tags) ddbjgss; Category for GSS (genome survey sequences) ddbjhtc; Category for HTC (high throughput cDNA sequences) ddbjhtg; Category for HTG (high throughput genomic sequences) ddbjhum; Category for human ddbjinv; Category for invertebrates ddbjmam; Category for mammals other than primates and rodents ddbjpat; Category for patents ddbjphg; Category for phages ddbjpln; Category for plants ddbjpri; Category for primates other than human ddbjrod; Category for rodents ddbjsts; Category for STS (sequence tagged sites) ddbjsyn; Category for synthetic DNAs ddbjtpa; Category for TPA (third party annotation) ddbjtsa; Category for TSA (transcriptome shotgun assemblies) ddbjuna; Category for unannotated sequences ddbjvrl; Category for viruses ddbjvrt; Category for vertebrates other than mammals ------------------------------------------------------------------------------ Some of above in the present release are recorded in multiple ddbj***###.seq files, each of which at most has 1.5 GB storage capacity as follows, respectively. --------------------- ddbjbct : 7 files ddbjenv : 3 files ddbjest : 137 files ddbjgss : 51 files ddbjhtc : 2 files ddbjhtg : 22 files ddbjhum : 6 files ddbjinv : 3 files ddbjpat : 13 files ddbjpln : 6 files ddbjpri : 2 files ddbjrod : 5 files ddbjsts : 3 files ddbjvrl : 2 files ddbjvrt : 3 files ddbjcon : 20 files --------------------- The index files included in this release are ddbjacc#.idx, ddbjgen.idx, ddbjjou#.idx, and ddbjkey#.idx. See also '9. File list'. All of them except ddbjgen.idx are recorded in multiple ddbj***#.idx files, each of which at most has 1.5 GB storage capacity. 7. Sample of the contents in each file 7.1. Part of the contents in the file 'ddbjbct1.seq' This shows all pieces of information on one entry in DDBJ format. ------------------------------------------------------------------------------ LOCUS D87069 993 bp mRNA linear BCT 05-OCT-2006 DEFINITION Escherichia coli mRNA for RNA polymerase sigma subunit, truncated form of sigma-38, complete cds. ACCESSION D87069 VERSION D87069.1 KEYWORDS RNA polymerase sigma subunit, truncated form of sigma-38. SOURCE Escherichia coli ORGANISM Escherichia coli Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia. REFERENCE 1 (bases 1 to 993) AUTHORS Jishage,M. TITLE Direct Submission JOURNAL Submitted (14-AUG-1996) to the DDBJ/EMBL/GenBank databases. Contact:Miki Jishage National Institute of Genetics, Molecular Genetics; Yata 1111, Mishima, Shizuoka 411, Japan REFERENCE 2 (bases 1 to 993) AUTHORS Jishage,M. and Ishihama,A. TITLE Variation in RNA polymerase sigma subunit composition within different stocks of Escherichia coli starin W3110 JOURNAL Unpublished (1996) REFERENCE 3 AUTHORS Ivanova,A., Renshaw,M., Guntaka,R. and Eisenstark,A. TITLE DNA base sequence variability in katF (putative sigma factor) gene Escherichia coli JOURNAL Nucleic Acids Res. 20, 5479-5480 (1992) REFERENCE 4 AUTHORS Takayanagi,Y., Tanaka,K. and Takahashi,H. TITLE Structure of the 5' upstream region and the regulation of the rpoS gene of Escherichia coli JOURNAL Mol. Gen. Genet. 243, 525-531 (1994) COMMENT FEATURES Location/Qualifiers source 1..993 /db_xref="taxon:562" /mol_type="mRNA" /organism="Escherichia coli" /strain="W3110" CDS 1..810 /note="the gene has four single base changes, resulting in two amino acid substitutions and an amber mutation" /product="RNA polymerase sigma subunit, truncated form of sigma-38" /protein_id="BAA13238.1" /transl_table=11 /translation="MSQNTLKVHDLNEDAEFDENGVEVFDEKALVEYEPSDNDLAEEE LLSQGATQRVLDATQLYLGEIGYSPLLTAEEEVYFARRALRGDVASRRRMIESNLRLV VKIARRYGNRGLALLDLIEEGNLGLIRAVEKFDPERGFRFSTYATWWIRQTIERAIMN QTRTIRLPIHIVKELNVYLRTARELSHKLDHEPSAEEIAEQLDKPVDDVSRMLRLNER ITSVDTPLGGDSEKALLDILADEKENGPEDTTQDDDMKQSIVKWLFELNAK" variation 75 /citation=[3] /replace="t" variation 97 /citation=[3] /replace="t" variation 99 /citation=[3] /replace="t" variation 808 /citation=[3] /replace="t" BASE COUNT 254 a 223 c 291 g 225 t ORIGIN 1 atgagtcaga atacgctgaa agttcatgat ttaaatgaag atgcggaatt tgatgagaac 61 ggagttgagg tttttgacga aaaggcctta gtagaatatg aacccagtga taacgatttg 121 gccgaagagg aactgttatc gcagggagcc acacagcgtg tgttggacgc gactcagctt 181 taccttggtg agattggtta ttcaccactg ttaacggccg aagaagaagt ttattttgcg 241 cgtcgcgcac tgcgtggaga tgtcgcctct cgccgccgga tgatcgagag taacttgcgt 301 ctggtggtaa aaattgcccg ccgttatggc aatcgtggtc tggcgttgct ggaccttatc 361 gaagagggca acctggggct gatccgcgcg gtagagaagt ttgacccgga acgtggtttc 421 cgcttctcaa catacgcaac ctggtggatt cgccagacga ttgaacgggc gattatgaac 481 caaacccgta ctattcgttt gccgattcac atcgtaaagg agctgaacgt ttacctgcga 541 accgcacgtg agttgtccca taagctggac catgaaccaa gtgcggaaga gatcgcagag 601 caactggata agccagttga tgacgtcagc cgtatgcttc gtcttaacga gcgcattacc 661 tcggtagaca ccccgctggg tggtgattcc gaaaaagcgt tgctggacat cctggccgat 721 gaaaaagaga acggtccgga agataccacg caagatgacg atatgaagca gagcatcgtc 781 aaatggctgt tcgagctgaa cgccaaatag cgtgaagtgc tggcacgtcg attcggtttg 841 ctggggtacg aagcggcaac actggaagat gtaggtcgtg aaattggcct cacccgtgaa 901 cgtgttcgcc agattcaggt tgaaggcctg cgccgtttgc gcgaaatcct gcaaacgcag 961 gggctgaata tcgaagcgct gttccgcgag taa // ------------------------------------------------------------------------------ 7.2. Part of the contents in the accession number index file 'ddbjacc1.idx' The following excerpt from the accession number index file illustrates the format of the index. ------------------------------------------------------------------------------ D00001 ECPBPA BCT X04516 D00002 ECPYRC BCT X04469 D00003 HUMP450M HUM D00003 D00004 FLBFLBL40 VRL D00004 D00005 IBAMEM682 VRL D00005 D00006 BACPNS1981 BCT D00006 D00007 CHKCALGRP VRT D00007 D00008 ECPNTAB BCT X04195 D00009 DROPER1 INV D00009 ------------------------------------------------------------------------------ 7.3. Part of the contents in the keyword phrase index file 'ddbjkey1.idx' Keyword phrases consist of names for gene products and other characteristics of sequence entries. ------------------------------------------------------------------------------ "COAT PROTEIN SMO511347 VRL AJ511347 'TNPA GENE UBA564903 BCT AJ564903 'ZINC-FINGER' MOTIF PRNS53 VRL X60546 (+) MATING TYPE SURFACE PROTEIN ABGPSSP PLN M94861 (1,3 TABETGLUB PLN Z22874 (1,3)-BETA-D-GLUCAN BINDING PROTEIN AJ606470 INV AJ606470 (1,3)BETA-GLUCAN SYNTHASE NCU09275 PLN U09275 (1,4)-BETA-D-ARABINOXYLAN ARABINOFURANOHYDROLASE ANAXHA PLN Z78011 ANTUAXHA PLN Z78010 (1,6)-BETA-GLUCAN BIOSYNTHESIS YSAKRE1A PLN M81588 (1-3)-BETA-GLUCANASE NTSP41AGN PLN X81560 PA13BGPT PLN X57794 (1-3,1-4)-BETA-D-GLUCANASE HVBDG PLN X52572 (1-4)-BETA-MANNAN ENDOHYDROLASE CAR278996 PLN AJ278996 CAR293305 PLN AJ293305 (2',5'-OLIGOISOADENYLATE SYNTHETASE-DEPENDENT) AL138776 HUM AL138776 (2'-5') OLIGO(A) SYNTHASE E16 SSO4G06 EST F14610 (2'-5')OLIGOADENYLATE SYNTHETASE HSA225089 HUM AJ225089 HUMSYN25A HUM D00068 SSA225090 MAM AJ225090 (6')-IB' AMINOGLYCOSIDE ACETYLTRANSFERASE AXY278514 BCT AJ278514 PAE291609 BCT AJ291609 (8,11)-LINOLEOYL DESATURASE COF245938 PLN AJ245938 ------------------------------------------------------------------------------ 7.4. Part of the contents in the journal citation index file 'ddbjjou1.idx' The journal citation index file lists all of the citations that appear in the references. ------------------------------------------------------------------------------ (ER) AAPS PHARMSCI. 4 (3), DOI 10.1208/PS040315 (2002) AY170916 ROD AY170916 (ER) AM. J. HUM. GENET. 76 (1) (2004) IN PRESS AY753209S1 HUM AY753209 AY753209S2 HUM AY753210 (ER) ARCH. VIROL. (2004) IN PRESS AF531505 VRL AF531505 AY518899 VRL AY518899 AY518900 VRL AY518900 AY518901 VRL AY518901 AY518902 VRL AY518902 AY518903 VRL AY518903 AY518904 VRL AY518904 AY518905 VRL AY518905 AY518906 VRL AY518906 AY518907 VRL AY518907 AY518908 VRL AY518908 AY518909 VRL AY518909 AY518910 VRL AY518910 AY518911 VRL AY518911 AY518912 VRL AY518912 AY518913 VRL AY518913 AY518914 VRL AY518914 AY518915 VRL AY518915 AY518916 VRL AY518916 AY518917 VRL AY518917 AY518918 VRL AY518918 AY518919 VRL AY518919 AY518920 VRL AY518920 AY518921 VRL AY518921 AY518922 VRL AY518922 AY518923 VRL AY518923 AY518924 VRL AY518924 AY518925 VRL AY518925 AY518926 VRL AY518926 AY518927 VRL AY518927 AY518928 VRL AY518928 AY518929 VRL AY518929 AY518930 VRL AY518930 AY518931 VRL AY518931 AY518932 VRL AY518932 AY521234 VRL AY521234 AY521235 VRL AY521235 AY521236 VRL AY521236 AY521237 VRL AY521237 AY521238 VRL AY521238 (ER) ARTERIOSCLER. THROMB. VASC. BIOL. (2004) IN PRESS AY563557 HUM AY563557 (ER) BIOCHEM. BIOPHYS. RES. COMMUN. 325 (1), 203-214 (2004) AY563137 HUM AY563137 (ER) BIOCHEM. J./10.1042/BJ20030293 HSA496460 HUM AJ496460 ------------------------------------------------------------------------------ 7.5. Part of the contents in the gene name index file 'ddbjgen.idx' This file lists all the gene names that appear in the feature table. ------------------------------------------------------------------------------ 'ARR BX927156 BCT BX927156 'BGLG BX927156 BCT BX927156 'BGLS BX927148 BCT BX927148 'BGLY' BX927156 BCT BX927156 'BRNQ AF305888 BCT AF305888 'COMK AL591983 BCT AL591983 AL596172 BCT AL596172 'CRCB BX927155 BCT BX927155 'CRTI BX927155 BCT BX927155 'DPPE LDDIPEP BCT Z34898 'FIC BX936398 BCT BX936398 ------------------------------------------------------------------------------ 8. Release history Release Date Entries Bases Comments 79 09/09 108,593,519 106,684,379,504 78 06/09 105,737,359 104,597,360,291 77 03/09 102,099,156 101,765,388,414 76 12/08 98,220,409 98,741,908,446 75 09/08 92,840,037 95,219,505,205 TSA division started 74 06/08 87,903,140 91,294,770,939 73 03/08 83,167,582 86,099,950,395 KIPO inclusion started 72 12/07 79,004,098 82,592,245,487 Most of E-mail addresses discarded 71 09/07 76,273,345 79,706,204,461 70 06/07 72,801,679 76,788,510,646 69 03/07 67,523,680 71,775,679,500 PROJECT line started Indexes for categories terminated 68 12/06 64,267,978 68,259,314,742 1.5 GB storage started 67 09/06 61,144,621 65,443,024,193 66 06/06 58,176,628 62,945,843,881 65 03/06 55,890,995 60,564,721,635 TPA subcategories started 64 12/05 52,272,669 56,098,558,378 Some index files split 63 09/05 47,741,593 52,246,110,341 62 06/05 45,249,444 49,158,155,283 ENV division started Version for release note started 61 03/05 43,118,204 47,099,081,750 Changed style of release note 60 12/04 40,583,945 44,416,752,273 /db_xref="H-inv:**" started 59 09/04 37,926,117 42,245,956,937 58 06/04 34,917,581 39,812,635,108 57 03/04 32,693,678 38,008,449,840 56 12/03 30,405,173 36,079,046,032 55 09/03 27,753,140 34,280,225,489 54 06/03 25,149,821 32,162,041,177 53 02/03 23,250,813 29,711,299,332 52 12/02 20,354,812 26,931,456,316 51 09/02 18,401,358 22,782,404,136 TPA started 50 06/02 17,260,693 20,158,357,982 49 04/02 16,503,157 18,579,627,226 48 01/02 15,016,100 16,197,713,855 47 10/01 13,266,610 14,145,671,645 46 07/01 12,313,759 13,037,646,166 45 04/01 11,434,113 12,207,092,905 HTC division started 44 01/01 10,165,597 11,136,298,841 43 10/00 8,666,551 10,034,532,698 42 07/00 7,554,995 8,880,721,093 41 04/00 5,962,608 6,409,581,885 CON division started 40 01/00 5,388,125 4,762,696,173 RNA division terminated 39 10/99 4,810,773 3,728,000,562 NID and PID discarded 38 07/99 4,294,369 3,098,519,597 37 03/99 3,311,627 2,375,261,951 VERSION, /protein_id started 36 01/99 3,073,166 2,190,425,560 35 10/98 2,759,261 1,957,341,169 34 07/98 2,412,785 1,708,580,623 33 04/98 2,174,769 1,479,303,279 32 01/98 1,956,669 1,300,950,613 31 10/97 1,731,532 1,139,869,464 Adoption of the unified taxonomy database 30 07/97 1,534,115 992,788,339 NID and PID terminated 29 04/97 1,270,194 841,415,232 28 01/97 1,154,120 756,785,219 HTG division started ORG division terminated 27 10/96 936,697 608,103,057 GSS division started 26 07/96 835,552 551,932,448 25 04/96 744,490 499,300,364 /translation started 24 01/96 637,508 431,771,652 23 10/95 569,757 390,694,350 22 07/95 437,588 322,982,425 HUM division started 21 04/95 274,596 250,875,023 20 01/95 239,689 231,299,557 19 10/94 204,332 205,274,131 18 07/94 185,230 192,473,021 17 04/94 169,957 179,942,209 16 01/94 154,626 165,017,628 15 10/93 131,649 147,224,690 14 07/93 120,350 138,686,333 JPO inclusion started 13 04/93 112,067 129,784,445 12 01/93 97,683 120,815,244 EST division started 11 07/92 65,693 84,839,075 10 01/92 59,317 77,805,556 GenBank/EMBL inclusion started 9 07/91 1,130 2,002,124 8 01/91 879 1,573,442 7 07/90 681 1,154,211 6 01/90 496 841,236 5 07/89 395 679,378 4 01/89 302 535,985 3 07/88 230 345,850 2 01/88 142 199,392 1 07/87 66 108,970 Started with DDBJ only ------------------ Since release 75 ------------------ A new division for assembled mRNA sequences, Transcriptome Shotgun Assembly (TSA), has been included since the release 75. With new sequencing technologies in use, INSDC have faced many requests to accept assembled EST sequences. These sequence data have become more useful than used to be, although they may not be correctly assembled or exist in nature. Therefore, INSDC decided to collect assembled EST sequences and classified them into the new division 'TSA'. TSA sequences are shotgun assemblies of primary sequences deposited in the EST division of INSDC, race Archive (TA) or Short-Read Archive (SRA). Two specific keywords, "TSA" and "Transcriptome Shotgun Assembly", are present in all TSA entries. The new division code, "TSA", is also described in the the LOCUS line in all TSA entries. No format changes in the flat file are anticipated for the TSA division, however, note that TSA entries make use of the same PRIMARY line that is described for the entries in TPA category (See also '3.2. TPA separated from primary dataset'). The PRIMARY block contains references to the underlying reads/transcripts that are assembled to construct a TSA record. Note that it is required for a TSA submission to submit sequence data of primary transcripts to the EST division of INSDC, TA, or SRA. More information about how to submit a TSA entry is provided via the following URL; http://www.ddbj.nig.ac.jp/sub/tsa-e.html ------------------ Since release 73 ------------------ Introduction of the sequence data from the Korean Intellectual Property Office: The nucleotide sequence data transferred from Korean Intellectual Property Office (KIPO) have been included in DDBJ release. See also, '3.1. Division categories' and '3.3. Notice for patent related sequence data'. ------------------ Since release 72 ------------------ Deletion of E-mail address, phone and fax numbers from DDBJ flat file: To follow the Japanese law of protecting personal information, DDBJ deleted both phone and fax numbers, and E-mail address from the flat files of the entries submitted to DDBJ. It would be also helpful to protect DDBJ releases against SPAM mail senders. DDBJ retrofitted most of all entries submitted to DDBJ, not to GenBank or EMBL, by the DDBJ periodical release 72. Previously, the submitter information was described in JOURNAL line at REFERENCE 1 as, -------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Taro Mishima, DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan (E-mail:ddbj@ddbj.nig.ac.jp, URL:http://www.ddbj.nig.ac.jp/, Tel:81-12-345-6789, Fax:81-12-345-9876) -------------------------------------------------------------------------------- After the deletion or the information in question, DDBJ flat file is either one of the following two types; Type 1: Phone and fax numbers and E-mail address are deleted. -------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ ------------------------------------------------------------------------------- Type 2: When the submitters wish to keep their contact information disclosed, it is described as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ E-mail :ddbj@ddbj.nig.ac.jp Phone :81-12-345-6789 Fax :81-12-345-9876 ------------------------------------------------------------------------------- ------------------ Since release 69 ------------------ Introduction of the project ID at PROJECT line in DDBJ flat file: Following the agreement at the INSD collaborative meeting in 2006, INSDC has started to assign the project ID for submissions from sequencing projects. The description of project ID is shown as below; ---------------------------------------------------------------------------- A unique identifier, assigned at the time of the submission by a sequencing project that informed INSDC of the submission beforehand. It is recommended that the submitter quotes the assigned project ID in all communication with INSDC databases to allow for easier and faster tracking of issues. The project ID field provides an umbrella identifier that points to all related sequence data for the project. ---------------------------------------------------------------------------- The PROJECT lines contain INSDC-assigned ID for the sequencing project. It will be appeared between VERSION and KEYWORDS lines in DDBJ flat files, from the DDBJ periodical release, 69 as shown below. See also '2. DDBJ flat file format'. ---------------------------------------------------------------------------- ACCESSION AB012345 VERSION AB012345.1 PROJECT GenomeProject:123 KEYWORDS . ---------------------------------------------------------------------------- Termination of providing the index files for each category: For users logging in one of our computers (supernig), we provided index files for each category. However, as the computer system in our institute was replaced with a new one which does not have a service using the index files, we terminated providing the index files. ------------------ Since release 68 ------------------ Split of files: We changed the maximum file size from 300 MB to 1.5 GB, because the network capacity has been remarkably increased. Each file named as ddbj***##.seq has at most 1.5 GB storage capacity. See also the sections, '6. File categories' and '9. File list'. ------------------ Since release 65 ------------------ Introduction of two types of TPA entries: According to the decision of ICM 2005, TPA data set is now classified into two categories, "TPA:experimental" and "TPA:inferential", to distinguish TPA annotation supported by wet-lab. experimental evidence and that inferred. The retrofit to divide TPA entries into two categories starts from the release 65. You can find the description of the two TPA categories at the following URLs; http://www.ddbj.nig.ac.jp/sub/tpa-e.html http://www.insdc.org/TPA.html See also '3.2. TPA separated from primary dataset'. ------------------ Since release 64 ------------------ Split of index files: In the present release, some of index files (ddbjacc.idx, ddbjjou.idx, and ddbjkey.idx) have been greater than 2 GB in the file size. So, these have been recorded in multiple ddbj****.idx files, each of which at most has 1.5 GB storage capacity as follows, respectively. See also 6., 7.2., 7.3., 7.4. and 9. ------------------ Since release 62 ------------------ Release version number is introduced: DDBJ has started to include the item, 'version', for its release note, which indicates a version for its periodical release. It is expressed like '62.0', in which the digit(s) after the period is a version number. The reason for adding the version number is that a released data is sometimes revised due to urgent and necessary corrections. The number is increased by one every time when a revised periodical release is made public until the next release. Introduction of ENV division: Recently, the submissions of the sequences derived from environmental samples have rapidly increased. To accommodate such submissions, a new division, ENV, has been created (See also '3.1. Division categories'). This division contains the sequences obtained via direct molecular isolation such as PCR, DGGE, or any anonymous method. In the past, the sequences derived from environmental samples belonged to taxonomic divisions, mainly BCT. At DDBJ, the retrofit to transfer relevant entries from taxonomic divisions to the ENV division starts in the present release, and ends by the next periodical release. Please note that during this transitional period, some entries to be eventually placed in the ENV division will be found in other divisions. Strand information is removed: The strand information of LOCUS line in the flat file has been removed as shown below. See also '2.1. LOCUS line'. ---------------------------------------------------------------------------- Old (-rel. 61): 44-44 space 45-47 spaces, ss- (single-stranded), ds- (double-stranded), or ms- (mixed-stranded) New (rel. 62-): 44-47 spaces ---------------------------------------------------------------------------- ------------------ Since release 61 ------------------ The style of release note (this file) has been changed. Some entries have the sequential format for the secondary accession numbers in the ACCESSION line, in order to make the expression of secondary accession numbers in the past short. For example; ------------------------------------------------------------------------------ Before; ACCESSION AB000802 D85885 D85886 D85887 After; ACCESSION AB000802 D85885-D85887 ------------------------------------------------------------------------------ See also '2.3. ACCESSION line'. ------------------ Since release 60 ------------------ The cross-reference to the H-invitational has been included. ------------------ Since release 56 ------------------ The three data banks have agreed that the maximum length limitation (350 kb) of a submitted sequence be relaxed. The BASE COUNT line of the DDBJ flat file format has been changed, corresponding to the relaxation of the maximum sequence length restriction in the entry that had been practiced at DDBJ/EMBL/GenBank International Nucleotide Sequence Databases. In the BASE COUNT line of the DDBJ flat file, 6 digits had been allocated for each number of a, c, g, t and other bases in the sequence. Hereafter, in the new flat file format, 9 digits are allocated for each number of a, c, g and t, while the numbers of other bases are removed. In accordance with the relaxation of sequence length limitation, GenBank had already dropped the BASE COUNT line from their flat file format from GenBank Release 138 (Oct. 2003). We DDBJ have decided to maintain the BASE COUNT line in our flat file format from the view that GC contents are still important information to characterize the sequence. The changes in the BASE COUNT line are shown below. ---------------------------------------------------------------------------- Old (-rel. 55): 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 |----|----|----|----|----|----|----|----|----|----|----|----|----|----| BASE COUNT 123456 a 123456 c 123456 g 123456 t 123456 others New (rel. 56-): 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 |----|----|----|----|----|----|----|----|----|----|----|----|----|----| BASE COUNT 123456789 a 123456789 c 123456789 g 123456789 t ---------------------------------------------------------------------------- The SOURCE in the flat file is revisited and revised if necessary in accordance with the unified taxonomy database common to the three data banks. ------------------ Since release 54 ------------------ '/sequenced_mol' qualifier has been changed to '/mol_type' qualifier. We accordingly completed retrofitting the pertinent entries. This change was made on the agreement at the INSD collaborative meeting in 2002. ------------------ Since release 51 ------------------ The TPA (Third Party Annotation) dataset has been available. The dataset is a complement to the existing DDBJ/EMBL/GenBank database of the primary nucleotide sequences which were obtained from direct sequencing of cDNAs, ESTs, genomic DNAs etc. The format of LOCUS line in the flat file has been changed as shown below to adjust to the GenBank format. ------------------------------------------------------------------------------ Old (-rel. 50): LOCUS AB000001 660 bp DNA PLN 01-FEB-2001 New (rel. 51-): LOCUS AB000001 660 bp DNA linear PLN 01-FEB-2001 ------------------------------------------------------------------------------ ------------------ Since release 45 ------------------ The HTC (High Throughput cDNA) division has been included. This is to include unfinished high throughput cDNA sequences, each of which has 5'UTR and 3'UTR at both ends and part of a coding region. The sequence may also include introns. When the sequence becomes finished later, it moves to the corresponding taxonomic division. The sequence is accompanied with a keyword, HTC (High Throughput cDNA), which is dropped when the sequence is finished and moved to a taxonomic division. ------------------ Since release 41 ------------------ The CON division has been included. This division is to show the order of related sequences in a genome, and expressed by join and the accession numbers of the sequences. The contents of the CON division are compiled by the three data banks not by the data submitter. ------------------ Since release 40 ------------------ The RNA division was terminated. The RNA data have been redistributed according to the category of the organism. Therefore, you will find a human RNA sequence, for example, in the HUM division. ------------------ Since release 37 ------------------ The three data banks include the item VERSION in the flat file, which indicates a version of a submitted nucleotide sequence. It is expressed like AB123456.1, in which the digit(s) after the period is a version number. The reason for adding VERSION is that since a released sequence sometimes revised by the submitter, the accession number alone cannot specify the sequence in question causing the user a trouble. The number is increased by one every time when a revised sequence is made public. Accordingly, the translated protein sequence will be accompanied with a /protein_id which is expressed as BAA12345.1, in which the digit(s) after the period is again a version number. The number is increased by one when the corresponding nucleotide sequence is revised and the protein sequence is changed as a result, and when the revised protein sequence is made public. ------------------ Since release 31 ------------------ We have started adopting the unified taxonomy database to unify the biological source of the sequence. The database is made up with scientific names, ID of unidentified organisms, and synthetic constructs etc. ------------------ Since release 30 ------------------ NID and PID were terminated. This change was made on the agreement at the INSD collaborative meeting in 1999. ------------------ Since release 28 ------------------ The HTG (High Throughput Genomic sequence) has been included. This division was created to cope with genome project teams which deal with a clone as a sequencing unit. We terminated the ORG (Organelle) division. Thus, if you are interested in human mitochondrial sequences, for example, you are now advised to refer to the HUM division. ------------------ Since release 27 ------------------ The GSS division has been included. GSS stands for Genome Survey Sequence, which is similar to EST, except that GSS is genomic DNA whereas EST is cDNA. ------------------ Since release 25 ------------------ DDBJ release contains amino acid sequences that were translated from the corresponding nucleotide sequences of the database. In the translation we paid much attention to the fact that some species or organella have a codon different from the universal one, and used the proper codon table. ------------------ Since release 22 ------------------ The HUM division has been included. Human genome projects have probably been most productive and yielded a large number of sequences Thus, we have the human (HUM) division solely for human sequences and the primate (PRI) division for non-human primate sequences. ------------------ Since release 12 ------------------ The EST (Expressed Sequence Tag) division has been included. The number of ESTs has been increasing at an enormous rate and is expected to be growing even more rapidly in the future. Thus, we created a division for ESTs ------------------ Since release 10 ------------------ The sequences submitted to GenBank or EMBL have been included in the release. 9. File list The files in this release are arranged in the following order with non-labeled format. ----------------------------------------------------------------------- file name file size ----------------------------------------------------------------------- ddbjrel.txt (DDBJ release note) 74327 ddbjacc1.idx (Accession number index file 1) 1499999996 ddbjacc2.idx (Accession number index file 2) 1500000002 ddbjacc3.idx (Accession number index file 3) 1236764074 ddbjgen.idx (Gene name index file) 156015823 ddbjjou1.idx (Journal citation index file 1) 1499999844 ddbjjou2.idx (Journal citation index file 2) 1473835629 ddbjjou3.idx (Journal citation index file 3) 1487747813 ddbjkey1.idx (Keyword phrase index file 1) 1499999962 ddbjkey2.idx (Keyword phrase index file 2) 1499999965 ddbjkey3.idx (Keyword phrase index file 3) 1253810657 ----------------------------------------------------------------------- file name number of entries number of bases file size ----------------------------------------------------------------------- ddbjbct1.seq 127817 616136068 1504194045 ddbjbct2.seq 92691 644274342 1499839435 ddbjbct3.seq 387 676083678 1500293207 ddbjbct4.seq 349 665368796 1506250061 ddbjbct5.seq 442 661967404 1499271858 ddbjbct6.seq 101554 607626376 1499002167 ddbjbct7.seq 182398 291429252 840737753 ddbjenv1.seq 579274 401845636 1499000615 ddbjenv2.seq 540085 443559013 1499001487 ddbjenv3.seq 239592 243988089 776440090 ddbjest1.seq 462813 173172874 1499001434 ddbjest2.seq 490594 191924821 1499000585 ddbjest3.seq 498273 206053226 1499000068 ddbjest4.seq 480017 204896085 1499000151 ddbjest5.seq 549956 296796836 1499003177 ddbjest6.seq 562937 339877410 1499002517 ddbjest7.seq 540812 306808734 1499000967 ddbjest8.seq 388277 120596323 1499002224 ddbjest9.seq 495931 213797750 1499000263 ddbjest10.seq 513972 237391841 1499001531 ddbjest11.seq 473101 200493453 1498999981 ddbjest12.seq 358556 127289837 1499002847 ddbjest13.seq 274402 83720942 1499002309 ddbjest14.seq 274770 108867109 1499004393 ddbjest15.seq 390657 182574143 1499002565 ddbjest16.seq 481626 250978272 1499002531 ddbjest17.seq 461738 243108701 1499000700 ddbjest18.seq 449573 244605522 1499001025 ddbjest19.seq 466292 229992032 1499002967 ddbjest20.seq 461212 278047138 1499003560 ddbjest21.seq 474022 286864710 1499000623 ddbjest22.seq 465461 242825685 1499001424 ddbjest23.seq 446673 263158051 1499000190 ddbjest24.seq 516622 284131498 1499002052 ddbjest25.seq 539128 314977880 1499002409 ddbjest26.seq 414027 207962062 1499000525 ddbjest27.seq 432449 261989748 1499000313 ddbjest28.seq 480048 267205470 1499000904 ddbjest29.seq 516719 257572298 1499000820 ddbjest30.seq 450240 241459010 1499002525 ddbjest31.seq 453376 256685112 1499001551 ddbjest32.seq 443162 293081741 1499000454 ddbjest33.seq 409888 289018332 1499002894 ddbjest34.seq 516883 309834416 1499000900 ddbjest35.seq 626881 358860792 1499002167 ddbjest36.seq 470992 301866353 1499002449 ddbjest37.seq 403523 230865186 1499005080 ddbjest38.seq 259092 96576864 1499000755 ddbjest39.seq 258073 106030961 1499001375 ddbjest40.seq 336043 166193676 1499001651 ddbjest41.seq 468467 266808752 1498999978 ddbjest42.seq 486058 269226586 1499002496 ddbjest43.seq 447197 240040924 1499000313 ddbjest44.seq 500274 286477873 1499000529 ddbjest45.seq 490868 255611588 1499003175 ddbjest46.seq 447033 257710268 1499001136 ddbjest47.seq 544220 285341059 1499000371 ddbjest48.seq 435482 254285192 1499001247 ddbjest49.seq 376295 210084453 1499005565 ddbjest50.seq 262113 131695196 1499002876 ddbjest51.seq 266932 107996390 1499003701 ddbjest52.seq 315866 140904886 1499000775 ddbjest53.seq 433472 248201998 1499000818 ddbjest54.seq 546086 308869736 1499003983 ddbjest55.seq 430177 284556401 1499000635 ddbjest56.seq 438583 245532160 1499000012 ddbjest57.seq 473685 275366896 1499001969 ddbjest58.seq 450666 244100432 1499000104 ddbjest59.seq 458655 266257710 1499002056 ddbjest60.seq 461447 275008745 1499001089 ddbjest61.seq 423144 249051085 1499001875 ddbjest62.seq 500197 329005349 1499000100 ddbjest63.seq 440179 271486904 1499001473 ddbjest64.seq 449731 231214529 1499000466 ddbjest65.seq 447034 277046684 1499000184 ddbjest66.seq 431638 276660186 1499002896 ddbjest67.seq 362303 242476223 1499000134 ddbjest68.seq 448411 240408986 1499000488 ddbjest69.seq 419225 234690194 1499001976 ddbjest70.seq 426702 235520183 1499000894 ddbjest71.seq 449157 240502219 1499001710 ddbjest72.seq 513380 301691126 1499001009 ddbjest73.seq 516874 315743231 1499000140 ddbjest74.seq 584281 335009735 1499002756 ddbjest75.seq 461028 282960931 1499000267 ddbjest76.seq 398659 303840475 1499003067 ddbjest77.seq 507836 291646809 1499004416 ddbjest78.seq 402436 287587942 1499000860 ddbjest79.seq 377765 252537044 1499001644 ddbjest80.seq 379460 270041575 1499003000 ddbjest81.seq 421060 302151070 1499003610 ddbjest82.seq 436702 314609369 1499001640 ddbjest83.seq 466214 310974004 1499001293 ddbjest84.seq 463142 295952499 1499000195 ddbjest85.seq 547021 240019448 1499000968 ddbjest86.seq 563606 275467596 1499002870 ddbjest87.seq 474238 310611946 1499000713 ddbjest88.seq 521328 304593750 1499002047 ddbjest89.seq 585075 320521393 1499000448 ddbjest90.seq 617778 261191600 1499001367 ddbjest91.seq 526549 302225497 1499002857 ddbjest92.seq 492681 305139519 1499003648 ddbjest93.seq 568652 216586527 1499002343 ddbjest94.seq 522314 296713138 1499001908 ddbjest95.seq 471770 298303192 1499000591 ddbjest96.seq 427883 243806856 1499002590 ddbjest97.seq 670121 143005371 1499001701 ddbjest98.seq 452002 303048077 1499002964 ddbjest99.seq 576071 224294501 1499002033 ddbjest100.seq 495504 301102241 1499000273 ddbjest101.seq 505316 312359641 1499001881 ddbjest102.seq 532044 280227532 1499002176 ddbjest103.seq 583534 218092926 1499000632 ddbjest104.seq 503605 301473085 1499003289 ddbjest105.seq 415086 276950237 1499002422 ddbjest106.seq 480320 287128027 1499000266 ddbjest107.seq 444281 292676148 1499000069 ddbjest108.seq 483130 366286001 1499002017 ddbjest109.seq 416585 266487066 1499001315 ddbjest110.seq 406199 274645005 1499000383 ddbjest111.seq 464773 282580510 1499000925 ddbjest112.seq 448023 295711017 1499001001 ddbjest113.seq 481727 276617651 1499000100 ddbjest114.seq 366562 226688690 1499002975 ddbjest115.seq 468295 241283272 1499000205 ddbjest116.seq 495230 275015030 1499003073 ddbjest117.seq 398979 254422552 1499001470 ddbjest118.seq 481509 293074745 1498999928 ddbjest119.seq 404312 265382910 1499002948 ddbjest120.seq 351449 202788665 1499001620 ddbjest121.seq 449844 111619243 1499002594 ddbjest122.seq 646114 334019471 1499002044 ddbjest123.seq 463057 276632716 1499000817 ddbjest124.seq 533026 275145823 1499000569 ddbjest125.seq 547361 278526182 1499003738 ddbjest126.seq 500364 166170093 1499001326 ddbjest127.seq 451142 76392378 1499000604 ddbjest128.seq 469613 225631459 1499001141 ddbjest129.seq 467988 308730491 1499004003 ddbjest130.seq 430906 299362137 1499001326 ddbjest131.seq 466133 213513938 1499001793 ddbjest132.seq 466846 269728806 1499001609 ddbjest133.seq 484015 301776490 1499001140 ddbjest134.seq 401429 265956719 1499001049 ddbjest135.seq 459971 266772081 1499000301 ddbjest136.seq 467864 176194693 1499000191 ddbjest137.seq 50112 18137192 154850362 ddbjgss1.seq 482943 349183853 1499003457 ddbjgss2.seq 446041 340655483 1499001825 ddbjgss3.seq 446971 336567923 1499000115 ddbjgss4.seq 564616 274158358 1499001453 ddbjgss5.seq 488000 252334640 1499001554 ddbjgss6.seq 465925 255693052 1498999943 ddbjgss7.seq 389199 193207429 1499001636 ddbjgss8.seq 418615 211086323 1499002543 ddbjgss9.seq 500177 291435466 1499001978 ddbjgss10.seq 557651 311384098 1499002086 ddbjgss11.seq 497462 293022692 1499001705 ddbjgss12.seq 535510 350863477 1499000383 ddbjgss13.seq 522519 366675113 1499001027 ddbjgss14.seq 515467 355187223 1499001723 ddbjgss15.seq 607545 340216873 1499001349 ddbjgss16.seq 605576 372175751 1499001201 ddbjgss17.seq 565570 314441271 1499002382 ddbjgss18.seq 522912 376469102 1499000391 ddbjgss19.seq 511993 342196473 1499000430 ddbjgss20.seq 600937 385664536 1499000198 ddbjgss21.seq 586557 417986657 1499002242 ddbjgss22.seq 537444 313044521 1499001523 ddbjgss23.seq 479536 289009984 1499000321 ddbjgss24.seq 520579 347556793 1499001914 ddbjgss25.seq 532333 338791724 1499003047 ddbjgss26.seq 538345 343749019 1499000448 ddbjgss27.seq 606394 297467632 1499002407 ddbjgss28.seq 555017 266366513 1499002470 ddbjgss29.seq 538753 377366240 1499000443 ddbjgss30.seq 476921 328959344 1499001531 ddbjgss31.seq 485235 378792649 1499002132 ddbjgss32.seq 568968 339410348 1499000554 ddbjgss33.seq 526491 340096779 1499002832 ddbjgss34.seq 483893 344243895 1499001437 ddbjgss35.seq 538164 301109246 1499000835 ddbjgss36.seq 526885 316147341 1499000535 ddbjgss37.seq 510181 260800122 1499002639 ddbjgss38.seq 415836 337629760 1499002795 ddbjgss39.seq 420931 348258425 1499000694 ddbjgss40.seq 428017 342061323 1499000243 ddbjgss41.seq 421863 347314880 1499000151 ddbjgss42.seq 427717 348142225 1499000762 ddbjgss43.seq 421994 340872138 1499000476 ddbjgss44.seq 468855 341855790 1499001217 ddbjgss45.seq 520281 330905609 1499001034 ddbjgss46.seq 612157 396170664 1499001815 ddbjgss47.seq 598439 418844163 1499001016 ddbjgss48.seq 540858 386571891 1499000701 ddbjgss49.seq 487715 252022530 1499001612 ddbjgss50.seq 494157 410009201 1499001398 ddbjgss51.seq 389303 291673694 1096751028 ddbjhtc1.seq 275179 361535228 1499003908 ddbjhtc2.seq 274968 274936961 974402250 ddbjhtg1.seq 11402 1118291358 1499143107 ddbjhtg2.seq 7563 1118465177 1499263986 ddbjhtg3.seq 5909 1130684567 1499147192 ddbjhtg4.seq 5455 1140379386 1499166048 ddbjhtg5.seq 5317 1144261890 1499122135 ddbjhtg6.seq 5349 1144179544 1499043809 ddbjhtg7.seq 6572 1132526856 1499188015 ddbjhtg8.seq 6868 1143033446 1499217446 ddbjhtg9.seq 6263 1139507876 1499122671 ddbjhtg10.seq 6322 1133245545 1499108340 ddbjhtg11.seq 7025 1123845018 1499007899 ddbjhtg12.seq 7012 1124862105 1499092854 ddbjhtg13.seq 6937 1141722492 1499134558 ddbjhtg14.seq 6986 1135228166 1499149234 ddbjhtg15.seq 6814 1141624402 1499215961 ddbjhtg16.seq 6348 1139258084 1499126717 ddbjhtg17.seq 6577 1139099238 1499104755 ddbjhtg18.seq 8072 1145678393 1499023692 ddbjhtg19.seq 6065 1136618124 1499022806 ddbjhtg20.seq 6683 1158078678 1499042240 ddbjhtg21.seq 6876 1157166060 1499043298 ddbjhtg22.seq 187 20047482 26127543 ddbjhum1.seq 28747 1049673000 1499209619 ddbjhum2.seq 8116 1069615594 1499093894 ddbjhum3.seq 146079 823883646 1499027573 ddbjhum4.seq 21539 1076797031 1499041624 ddbjhum5.seq 273653 538848118 1499008453 ddbjhum6.seq 3552 33738028 52018747 ddbjinv1.seq 236504 704896538 1499000402 ddbjinv2.seq 441921 439338845 1499002704 ddbjinv3.seq 169702 584903472 1246958144 ddbjmam.seq 210277 578488523 1194811601 ddbjpat1.seq 1035380 520096759 1499001025 ddbjpat2.seq 776725 494042969 1499000962 ddbjpat3.seq 744946 347076778 1499000335 ddbjpat4.seq 695832 603038213 1499000553 ddbjpat5.seq 740100 390559394 1499000861 ddbjpat6.seq 737783 333822745 1499000772 ddbjpat7.seq 737883 380955032 1499002291 ddbjpat8.seq 712462 723680903 1498999934 ddbjpat9.seq 1255365 253011396 1499002635 ddbjpat10.seq 797903 613354904 1498999995 ddbjpat11.seq 930872 456548087 1499000653 ddbjpat12.seq 1250596 273046042 1499000374 ddbjpat13.seq 482971 208061636 760246192 ddbjphg.seq 4539 36215949 88478279 ddbjpln1.seq 128187 907381708 1499000517 ddbjpln2.seq 235383 571115698 1499001553 ddbjpln3.seq 88262 889754293 1499002705 ddbjpln4.seq 282370 602797497 1499002795 ddbjpln5.seq 485408 417938675 1499001114 ddbjpln6.seq 327024 315566739 1046568929 ddbjpri1.seq 49198 1079577322 1499101070 ddbjpri2.seq 27342 77691441 156906734 ddbjrod1.seq 35067 1019158156 1499023601 ddbjrod2.seq 5885 1092044199 1499044575 ddbjrod3.seq 41064 1054126070 1499122119 ddbjrod4.seq 78014 894939942 1499233334 ddbjrod5.seq 206935 146485121 537023406 ddbjsts1.seq 417738 210868731 1499003731 ddbjsts2.seq 339435 238864052 1499001891 ddbjsts3.seq 553547 179847339 1455725561 ddbjsyn.seq 87961 131208388 468050449 ddbjtsa.seq 118887 36334961 245478396 ddbjuna.seq 288 483930 1371025 ddbjvrl1.seq 398276 404136019 1499207625 ddbjvrl2.seq 330947 382102227 1348542891 ddbjvrt1.seq 244308 671926947 1499004374 ddbjvrt2.seq 60544 1028536370 1499052274 ddbjvrt3.seq 296934 669525346 1489092414 ------------------------------------------------------------------------------ Total 108593519 106684379504 388729409918 ddbjtpa.seq 18310 25650052 96485605 ddbjcon1.seq 703761 0 1499025150 ddbjcon2.seq 217086 0 1499000015 ddbjcon3.seq 270285 0 1499007111 ddbjcon4.seq 580323 0 1499000109 ddbjcon5.seq 480787 0 1499001183 ddbjcon6.seq 349219 0 1499003454 ddbjcon7.seq 330565 0 1499001974 ddbjcon8.seq 414478 0 1499913391 ddbjcon9.seq 260749 0 1499004902 ddbjcon10.seq 224486 0 1499005397 ddbjcon11.seq 268111 0 1499000296 ddbjcon12.seq 241865 0 1499004502 ddbjcon13.seq 258656 0 1499004375 ddbjcon14.seq 314508 0 1499000635 ddbjcon15.seq 290243 0 1499003260 ddbjcon16.seq 288637 0 1499000745 ddbjcon17.seq 281313 0 1499002112 ddbjcon18.seq 258272 0 1499002130 ddbjcon19.seq 282865 0 1499001944 ddbjcon20.seq 134370 0 823369674 The entries and bases in the CON division and TPA dataset are not counted in the numbers given on the top of the release note or 'Total' on the above table.