DNA Data Bank of Japan

                              DNA Database

Release 80.0, Dec. 2009, including 112,314,250 entries, 109,636,862,252 bases
Last published date in the present release: November 20, 2009

------------------------------------------------------------------------------- 
Table of contents
-------------------------------------------------------------------------------

  1. Introduction
    1.1. Announcement for changes in the present release
    1.2. Announcement for the forthcoming changes

  2. DDBJ flat file format
    2.1.  LOCUS line
    2.2.  DEFINITION line
    2.3.  ACCESSION line
    2.4.  VERSION line
    2.5.  KEYWORDS line
    2.6.  SOURCE line
    2.7.  REFERENCE line
    2.8.  COMMENT line
    2.9.  FEATURES line
    2.10. BASE COUNT line
    2.11. ORIGIN line

  3. Dataset categories
    3.1. Division categories
    3.2. TPA separated from primary dataset
    3.3. Notice for patent related sequence data

  4. DDBJ staff

  5. Acknowledgment

  6. File categories

  7. Sample of the contents in each file
    7.1. Part of the contents in the file 'ddbjbct1.seq'
    7.2. Part of the contents in the accession number index file 'ddbjacc1.idx'
    7.3. Part of the contents in the keyword phrase index file 'ddbjkey1.idx'
    7.4. Part of the contents in the journal citation index file 'ddbjjou1.idx'
    7.5. Part of the contents in the gene name index 'ddbjgen.idx'

  8. Release history

  9. File list
-------------------------------------------------------------------------------

1. Introduction

The present release contains the newest data prepared by the DNA Data Bank of 
Japan (DDBJ), GenBank (*), and EMBL-Bank/European Bioinformatics Institute 
(EMBL-Bank/EBI) as of November 20, 2009.  This unified database was made 
possible thanks to the international collaboration among the three data banks.
All the entries have accordingly been annotated using the feature keys common 
to them.  

In 2005, DDBJ, EMBL-Bank and GenBank agreed to call their collaboration 
"the International Nucleotide Sequence Database Collaboration (INSDC); 
http://www.insdc.org " and to call the unified nucleotide sequence database 
"the International Nucleotide Sequence Database (INSD)".  

*'GenBank' is a trademark of NIH, USA, and is operated by National Center for 
Biotechnology Information (NCBI) at NIH.

This database may be copied and redistributed without permission on the 
condition that all the statements in this release note are reproduced in each 
copy.  See also '3.3. Notice for patent related sequence data' below.  


1.1. Announcement for changes in the present release

The format of the SOURCE line in DDBJ flat file has been changed:  

As results of this change, 1) the order of organism name and organelle name 
is changed and 2) some of DDBJ flat files have included a common name like as 
GenBank flat files.  The change is shown below in detail.  

----------------
Old (-rel. 79)
----------------
Format:  
SOURCE      <organism_name> [<organelle_code>]
Example:  
SOURCE      Homo sapiens mitochondrion

----------------
New (rel. 80-)
----------------
Format:  
SOURCE      [<organelle_code>] <organism_name> [(<genbank_common_name>)]
Example:  
SOURCE      mitochondrion Homo sapiens (human)

See also '2. DDBJ flat file format'.  


1.2. Announcement for the forthcoming changes

TPA category data will be excluded from DDBJ periodical release:  

Since September 2002 (DDBJ release 51), we have provided DDBJ periodical 
releases including TPA category data.  However, it is potentially confusing 
for users, because TPA category is not primary nucleotide sequence data.  
Therefore, DDBJ will terminate to include TPA data in the next periodical 
release.  

Please note that we will continue to provide TPA category data released from 
INSDC independently of DDBJ periodical release at following DDBJ FTP site.  
FTP site for TPA data: ftp://ftp.ddbj.nig.ac.jp/ddbj_database/tpa/

See also '3. Dataset categories', '3.2. TPA separated from primary dataset' and 
'6. File categories'.  


2. DDBJ flat file format

The database is a collection of "entry" which is the unit of the data.  The 
entries submitted to databanks were processed and publicized according to the 
DDBJ format for distribution (flat file).  The flat file includes the sequence 
and the information of submitters, references, source organisms, and "feature" 
information, etc.  The items of the DDBJ flat file are explained at following; 

-------------------------------------------------------------------------------
LOCUS       AB000000                 450 bp    mRNA    linear   HUM 08-JUL-2002
DEFINITION  Homo sapiens GAPD mRNA for glyceraldehyde-3-phosphate
            dehydrogenase, partial cds.
ACCESSION   AB000000
VERSION     AB000000.1
KEYWORDS    .
SOURCE      Homo sapiens (human)
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE   1  (bases 1 to 450)
  AUTHORS   Mishima,H. and Shizuoka,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (30-NOV-2000) to the DDBJ/EMBL/GenBank databases.
            Contact:Hanako Mishima
            National Institute of Genetics, DNA Data Bank of Japan; 1111, Yata,
            Mishima, Shizuoka 411-8540, Japan
REFERENCE   2  
  AUTHORS   Mishima,H., Shizuoka,T. and Fuji,I.
  TITLE     Glyceraldehyde-3-phosphate dehydrogenase expressed in human liver
  JOURNAL   Unpublished (2002)
COMMENT     Human cDNA sequencing project.
FEATURES             Location/Qualifiers
     source          1..450
                     /chromosome="12"
                     /clone="GT200015"
                     /clone_lib="lambda gt11 human liver cDNA (GeneTech.
                     No.20)"
                     /map="12p13"
                     /mol_type="mRNA"
                     /organism="Homo sapiens"
                     /tissue_type="liver"
     CDS             86..>450
                     /codon_start=1
                     /gene="GAPD"
                     /product="glyceraldehyde-3-phosphate dehydrogenase"
                     /protein_id="BAA12345.1"
                     /transl_table=1
                     /translation="MAKIKIGINGFGRIGRLVARVALQSDDVELVAVNDPFITTDYMT
                     YMFKYDTVHGQWKHHEVKVKDSKTLLFGEKEVTVFGCRNPKEIPWGETSAEFVVEYTG
                     VFTDKDKAVAQLKGGAKKV"
BASE COUNT          102 a          119 c          131 g           98 t
ORIGIN
        1 cccacgcgtc cggtcgcatc gcacttgtag ctctcgaccc ccgcatctca tccctcctct
       61 cgcttagttc agatcgaaat cgcaaatggc gaagattaag atcgggatca atgggttcgg
      121 gaggatcggg aggctcgtgg ccagggtggc cctgcagagc gacgacgtcg agctcgtcgc
      181 cgtcaacgac cccttcatca ccaccgacta catgacatac atgttcaagt atgacactgt
      241 gcacggccag tggaagcatc atgaggttaa ggtgaaggac tccaagaccc ttctcttcgg
      301 tgagaaggag gtcaccgtgt tcggctgcag gaaccctaag gagatcccat ggggtgagac
      361 tagcgctgag tttgttgtgg agtacactgg tgttttcact gacaaggaca aggccgttgc
      421 tcaacttaag ggtggtgcta agaaggtctg
//
-------------------------------------------------------------------------------


2.1. LOCUS line

The format of LOCUS line in the flat file is shown below; 
---------  --------
Positions  Contents
---------  --------
  01-05    'LOCUS'
  06-12     spaces
  13-28     Locus name
  29-29     space
  30-40     Length of sequence, right-justified
  41-41     space
  42-43     'bp'
  44-47     spaces
  48-54     DNA, RNA, mRNA, rRNA, tRNA or cRNA, left justified
  55-55     space
  56-63     'linear' followed by two spaces, or 'circular'
  64-64     space
  65-67     The division code (see '3.1. Division categories')
  68-68     space
  69-79     Date, in the form dd-MMM-yyyy (e.g., 08-JUL-2002)
------------------------------------------------------------------------------


2.2. DEFINITION line

The definition briefly describes the information of gene(s).  "DEFINITION" is 
constructed by each of the three data banks.  


2.3. ACCESSION line

This line shows accession number of the entry data.  
A unique accession number is issued to the submitter of sequence data by each 
of the three data banks.  The accession number is composed of 1 alphabet 
character and 5 digits (ex. A12345) or 2 alphabet characters and 6 digits 
(ex. AB123456).  The former style was used in 1980s, but later the latter 
style was introduced because of data explosion.  
All the entries designated by the accession numbers with the prefixes given 
below have been collected and processed by DDBJ, and the rest have been done 
by GenBank and EMBL-Bank/EBI.  

-------------------------------------------------------------------------------
  C, D, E, AB, AG, AK, AP, AT, AU, AV, BA, BB, BD, BJ, BP, BR, BS, BW, BY, 
  CI, CJ, DA, DB, DC, DD, DE, DF, DG, DH, DI, DJ, DK, DL, DM, FS, FT
-------------------------------------------------------------------------------

You can find the list of the prefixes of  the accession numbers at the 
following URL;
http://www.ddbj.nig.ac.jp/sub/prefix.html
If multiple entries are united to an entry, or if an entry is extensively 
modified after the submission, the responsible data banks may assign a new 
accession number to it.  In these cases, the new accession number is called 
the primary accession number, and the old accession number(s) is/are 
called the secondary accession number(s).  In the flat file, the primary 
accession number is indicated first, then the secondary accession number(s) 
follows.  You can find the same updated entry with both the primary and the 
secondary accession numbers.  


2.4. VERSION line

This line consists of an accession number and a version number, like 
"AB123456.1", in which the digit(s) after the period is a version number.  
The data open to public for the first time is version number as "1".  The 
reason for adding VERSION is that since a released sequence sometimes 
revised by the submitter, the accession number alone cannot specify the 
sequence in question causing the user a trouble.  The number is increased 
by one every time when a revised sequence is made public.  


2.5. KEYWORDS line

The data banks describe this line, if necessary.  In many cases, the 
categories of the data (EST, HTG etc.), gene names and product names 
included in "KEYWORDS".  


2.6. SOURCE line

This line shows the scientific name (and a corresponding common name, if 
defined as "Genbank common name" in taxonomy database) on organism from which 
the sequence is obtained and an organelle type if the sequence is derived 
from an organelle other than the nucleus.  


2.7. REFERENCE line

The information on the submitters and references related to the submitted 
sequence is indicated in REFERENCE line.  


2.8. COMMENT line.

The information about an entry that cannot be described using FEATURES or 
the other fields.  


2.9. FEATURES line

Biological features of a submitted sequence data are described with 
"Feature" key (the biological nature of the annotated feature), "Location"
(the region of the sequence which corresponds to Feature), and "Qualifier" 
(supplementary information about Feature).  The "Feature" and "Qualifier" keys 
used in the present release is defined by DDBJ/EMBL/GenBank Feature Table: 
Definition Version 8.2 (December, 2009).  The document is continuously updated 
every half year.  You can find its newest version on URL;
http://www.ddbj.nig.ac.jp/FT/full_index.html


2.10. BASE COUNT line

In the BASE COUNT line of the DDBJ flat file, 9 digits are allocated for each 
number of a (adenine), c (cytosine), g(guanine) and t (thymine).  In the case 
of RNA sequence, uracil is indicated as "t" according to the rule of the 
international nucleotide database.  


2.11. ORIGIN line

The sequence data starts from the next line of ORIGIN.  The sequence is 
indicated as lower case letters, delimited by space per 10 bases, starts a new 
line by 60 bases.  The numbers described at left side of lines mean the ordinal 
number of the top base of the line.  


3. Dataset categories

There have been a number of genome projects going on worldwide.  Among them 
human genome projects have probably been most productive and yielded a large 
number of ordinary sequences, huge amounts of genome sequences and EST 
(expressed sequence tags).  Thus, we DDBJ have the human (HUM) division solely 
for human sequences and the primate (PRI) division for non-human primate 
sequences, while PRI division of GenBank database contains human sequences too.  
Note that the other divisions such as EST, GSS, and HTC may also contain human 
sequences.  
The present release is divided into 22 categories of organisms and others.  See 
also '6. File categories' and '9. File list' below.  The contents of the 22 
categories are shown in the following.


3.1. Division categories

The first 21 divisions are given below;  

HUM; human  
PRI; primates (other than human) 
ROD; rodents 
MAM; mammals (other than primates and rodents) 
VRT; vertebrates (other than mammals) 
INV; invertebrates (animals other than vertebrates) 
PLN; plants, fungi, plastids (eukaryotes other than animals) 
BCT; bacteria (including both Eubacteria and Archaea) 
VRL; viruses 
PHG; bacteriophages 
ENV; sequences obtained via environmental sampling methods 
SYN; synthetic constructs (artificially constructed sequences) 
EST; expressed sequence tags; short single pass cDNA sequences 
GSS; genome survey sequences; short single pass genomic sequences 
TSA; transcriptome shotgun assemblies 
HTC; high throughput cDNA sequences; 
     The sequence submitted from cDNA sequencing projects except for EST.  
     This division is to include unfinished high throughput cDNA sequences, 
     each of which has 5'UTR and 3'UTR at both ends and part of a coding region.
     The sequence may also include introns.  When the sequence becomes finished 
     later, it moves to the corresponding taxonomic division.  
HTG; high throughput genomic sequences 
     The sequence submitted mainly from genome sequencing projects which 
     regarded a clone as a sequencing unit.  
STS; sequence tagged sites 
     The tag site for genome sequencing.  The information of chromosome, map, 
     is mandatory for this division.  
PAT; sequence data related to patent application
     The data those which the Japanese Patent Office (JPO), United States Patent 
     and Trademark Office (USPTO), the European Patent Office (EPO), and Korean 
     Intellectual Property Office (KIPO) collected, processed and released.  See 
     also '3.3. Notice for patent related sequence data' below.  
UNA; the data not annotated 
     The UNA division is not used for recently submitted sequences.  
CON; Contig / Constructed 
     To conjugate a series of entries, such as those submitted from a genome 
     project, each of the three data banks constructs an entry and assign an 
     accession number to a large scale sequence dataset.  Such entries are 
     classified into the CON division.  The entry in the CON division has the 
     information of joined accession numbers instead of the sequence data.  
     The corresponding entries of the CON entry have been submitted to other 
     divisions.  The entries and bases in the CON division are not counted in 
     the released numbers given on the top of the release note.  


3.2. TPA separated from primary dataset

TPA (Third Party Annotation) data are also available.  The TPA data are a 
complement to the existing DDBJ/EMBL-Bank/GenBank comprehensive database of 
primary nucleotide sequences, which typically result from direct sequencing 
of cDNAs, ESTs, genomic DNAs etc.  Primary entries are defined to be data 
for which the submitting group has done the sequencing and annotation, and as 
'owner' of these data has privileges to submit updates/corrections etc.  
Primary entries used to build a TPA sequence are those that have been 
experimentally determined and are publicly available in the DDBJ/EMBL-Bank/
GenBank databases.  They may not be from a proprietary database.  The entries 
and bases in TPA are not counted in the released numbers given on the top of 
the release note.  
See also the following URLs;  
http://www.ddbj.nig.ac.jp/sub/tpa-e.html
http://www.insdc.org/TPA.html


3.3. Notice for patent related sequence data

This release includes PAT division for patent related sequence data as described 
above.  The data those which Japanese Patent Office (JPO), United States Patent 
and Trademark Office (USPTO), European Patent Office (EPO), and Korean 
Intellectual Property Office (KIPO) collected, processed and released.  The 
prefixes of accession numbers for the patent related sequence data are shown 
below; 

   ----------------------------------------------
    JPO  : E, BD, DD, DJ, DL, DM
    KIPO : DI
    USPTO: I, AR, DZ, EA, GC, GP
    EPO  : A, AX, CQ, CS, FB, GM, GN, HA, HB, HC
   ----------------------------------------------

Note also that unauthorized use of the patented data may cause legal issues 
for which DDBJ takes no responsibility.


4. DDBJ staff

This release is published by the following DDBJ staff.  

Jun Mashima, Hideo Aono, Yoshiyuki Ehara, Mayumi Ejima, Masato Endo, 
Masahiro Fujimoto, Daisuke Fukuda, Mariko Gojobori, Tatsukazu Hashimoto, 
Tomohiro Hirai, Fumie Hirata, Nobuhiro Hoshi, Tsutomu Ikesaka, 
Fumiyasu Ishikawa, Kazuya Kanno, Shingo Kawahara, Tatsuko Kawamoto, 
Takahiro Kazama, Satoshi Kitadate, Wataru Kodachi, Yuichi Kodama, 
Junko Kohira, Tomohiro Koike, Takehide Kosuge, Fumiko Kubodera, Kyungbum Lee, 
Mika Maki, Haruka Mamiya, Hisako Mashima, Kimiko Mimura, Naoko Murakata, 
Sachiko Nagira, Masahiko Nagura, Asami Nozaki, Toshihisa Okido, 
Katsunaga Sakai, Satoshi Saruhashi, Makoto Sato, Yukie Shinyama, 
Naoki Shiraishi, Rie Sugita, Kimiko Suzuki, Kazuya Takei, Wataru Tanabe, 
Haru Tsutsui,Hiroaki Yamada, Keisuke Yamamoto, Kenji Yamamoto, 
Makoto Yamamoto, Emi Yokoyama, Takashi Gojobori, Eli Kaminuma, 
Osamu Ogasawara, Kosaku Okubo, Toshihisa Takagi and Yasukazu Nakamura


Center for Information Biology and DNA Data Bank of Japan
National Institute of Genetics
Research Organization of Information and Systems

Mishima 411-8540, Japan 
Phone:  +81 55 981 6853
FAX:    +81 55 981 6849
E-mail: ddbj@ddbj.nig.ac.jp  (for general inquiry)
        ddbjsub@ddbj.nig.ac.jp  (for data submission)
        ddbjupdt@ddbj.nig.ac.jp (for updates and notification of publication)
WWW:    http://www.ddbj.nig.ac.jp/ (for DDBJ WWW server)
        http://sakura.ddbj.nig.ac.jp/ 
        (for DDBJ sequence data submission system)


5. Acknowledgment

We are grateful to NCBI and EBI for a firm friendship and an excellent 
collaboration with us.  We also thank the Japanese Patent Office for a steady 
cooperation with us.  The operation of DDBJ is supported by the Ministry of 
Education, Culture, Sports, Science and Technology, and we would gratefully 
note this here.  DDBJ uses the Super-SINET computer network for data 
collection, data exchange and various services.   


6. File categories

This release covers 22 categories (see also '3. Dataset categories'.) of 
organisms and others as follows: 
------------------------------------------------------------------------------
ddbjbct; Category for bacteria
ddbjcon; Category for CON (contig sequences)
ddbjenv; Category for ENV (environmental samples)
ddbjest; Category for EST (expressed sequence tags)
ddbjgss; Category for GSS (genome survey sequences)
ddbjhtc; Category for HTC (high throughput cDNA sequences)
ddbjhtg; Category for HTG (high throughput genomic sequences)
ddbjhum; Category for human
ddbjinv; Category for invertebrates
ddbjmam; Category for mammals other than primates and rodents
ddbjpat; Category for patents
ddbjphg; Category for phages
ddbjpln; Category for plants
ddbjpri; Category for primates other than human
ddbjrod; Category for rodents
ddbjsts; Category for STS (sequence tagged sites)
ddbjsyn; Category for synthetic DNAs
ddbjtpa; Category for TPA (third party annotation)
ddbjtsa; Category for TSA (transcriptome shotgun assemblies)
ddbjuna; Category for unannotated sequences
ddbjvrl; Category for viruses
ddbjvrt; Category for vertebrates other than mammals
------------------------------------------------------------------------------

Some of above in the present release are recorded in multiple ddbj***###.seq 
files, each of which at most has 1.5 GB storage capacity as follows, 
respectively.  

---------------------
ddbjbct :   7 files
ddbjenv :   4 files
ddbjest : 139 files
ddbjgss :  54 files
ddbjhtc :   2 files
ddbjhtg :  22 files
ddbjhum :   6 files
ddbjinv :   3 files
ddbjpat :  14 files
ddbjpln :   6 files
ddbjpri :   2 files
ddbjrod :   5 files
ddbjsts :   3 files
ddbjvrl :   3 files
ddbjvrt :   4 files
ddbjcon :  20 files
---------------------

The index files included in this release are ddbjacc#.idx, ddbjgen.idx, 
ddbjjou#.idx, and ddbjkey#.idx.  See also '9. File list'.  All of them except 
ddbjgen.idx are recorded in multiple ddbj***#.idx files, each of which at most 
has 1.5 GB storage capacity.  


7. Sample of the contents in each file

7.1. Part of the contents in the file 'ddbjbct1.seq'

This shows all pieces of information on one entry in DDBJ format.  
------------------------------------------------------------------------------
LOCUS       D87069                   993 bp    mRNA    linear   BCT 05-OCT-2006
DEFINITION  Escherichia coli mRNA for RNA polymerase sigma subunit, truncated
            form of sigma-38, complete cds.
ACCESSION   D87069
VERSION     D87069.1
KEYWORDS    RNA polymerase sigma subunit, truncated form of sigma-38.
SOURCE      Escherichia coli
  ORGANISM  Escherichia coli
            Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
            Enterobacteriaceae; Escherichia.
REFERENCE   1  (bases 1 to 993)
  AUTHORS   Jishage,M.
  TITLE     Direct Submission
  JOURNAL   Submitted (14-AUG-1996) to the DDBJ/EMBL/GenBank databases.
            Contact:Miki Jishage
            National Institute of Genetics, Molecular Genetics; Yata 1111,
            Mishima, Shizuoka 411, Japan
REFERENCE   2  
  AUTHORS   Jishage,M. and Ishihama,A.
  TITLE     Variation in RNA polymerase sigma subunit composition within
            different stocks of Escherichia coli starin W3110
  JOURNAL   Unpublished (1996)
REFERENCE   3  
  AUTHORS   Ivanova,A., Renshaw,M., Guntaka,R. and Eisenstark,A.
  TITLE     DNA base sequence variability in katF (putative sigma factor) gene
            Escherichia coli
  JOURNAL   Nucleic Acids Res. 20, 5479-5480 (1992)
REFERENCE   4  
  AUTHORS   Takayanagi,Y., Tanaka,K. and Takahashi,H.
  TITLE     Structure of the 5' upstream region and the regulation of the rpoS
            gene of Escherichia coli
  JOURNAL   Mol. Gen. Genet. 243, 525-531 (1994)
COMMENT     
FEATURES             Location/Qualifiers
     source          1..993
                     /db_xref="taxon:562"
                     /mol_type="mRNA"
                     /organism="Escherichia coli"
                     /strain="W3110"
     CDS             1..810
                     /note="the gene has four single base changes, resulting
                     in two amino acid substitutions and an amber mutation"
                     /product="RNA polymerase sigma subunit, truncated form of
                     sigma-38"
                     /protein_id="BAA13238.1"
                     /transl_table=11
                     /translation="MSQNTLKVHDLNEDAEFDENGVEVFDEKALVEYEPSDNDLAEEE
                     LLSQGATQRVLDATQLYLGEIGYSPLLTAEEEVYFARRALRGDVASRRRMIESNLRLV
                     VKIARRYGNRGLALLDLIEEGNLGLIRAVEKFDPERGFRFSTYATWWIRQTIERAIMN
                     QTRTIRLPIHIVKELNVYLRTARELSHKLDHEPSAEEIAEQLDKPVDDVSRMLRLNER
                     ITSVDTPLGGDSEKALLDILADEKENGPEDTTQDDDMKQSIVKWLFELNAK"
     variation       75
                     /citation=[3]
                     /replace="t"
     variation       97
                     /citation=[3]
                     /replace="t"
     variation       99
                     /citation=[3]
                     /replace="t"
     variation       808
                     /citation=[3]
                     /replace="t"
BASE COUNT          254 a          223 c          291 g          225 t
ORIGIN      
        1 atgagtcaga atacgctgaa agttcatgat ttaaatgaag atgcggaatt tgatgagaac
       61 ggagttgagg tttttgacga aaaggcctta gtagaatatg aacccagtga taacgatttg
      121 gccgaagagg aactgttatc gcagggagcc acacagcgtg tgttggacgc gactcagctt
      181 taccttggtg agattggtta ttcaccactg ttaacggccg aagaagaagt ttattttgcg
      241 cgtcgcgcac tgcgtggaga tgtcgcctct cgccgccgga tgatcgagag taacttgcgt
      301 ctggtggtaa aaattgcccg ccgttatggc aatcgtggtc tggcgttgct ggaccttatc
      361 gaagagggca acctggggct gatccgcgcg gtagagaagt ttgacccgga acgtggtttc
      421 cgcttctcaa catacgcaac ctggtggatt cgccagacga ttgaacgggc gattatgaac
      481 caaacccgta ctattcgttt gccgattcac atcgtaaagg agctgaacgt ttacctgcga
      541 accgcacgtg agttgtccca taagctggac catgaaccaa gtgcggaaga gatcgcagag
      601 caactggata agccagttga tgacgtcagc cgtatgcttc gtcttaacga gcgcattacc
      661 tcggtagaca ccccgctggg tggtgattcc gaaaaagcgt tgctggacat cctggccgat
      721 gaaaaagaga acggtccgga agataccacg caagatgacg atatgaagca gagcatcgtc
      781 aaatggctgt tcgagctgaa cgccaaatag cgtgaagtgc tggcacgtcg attcggtttg
      841 ctggggtacg aagcggcaac actggaagat gtaggtcgtg aaattggcct cacccgtgaa
      901 cgtgttcgcc agattcaggt tgaaggcctg cgccgtttgc gcgaaatcct gcaaacgcag
      961 gggctgaata tcgaagcgct gttccgcgag taa
//
------------------------------------------------------------------------------


7.2. Part of the contents in the accession number index file 'ddbjacc1.idx' 

The following excerpt from the accession number index file illustrates the
format of the index.  
------------------------------------------------------------------------------
D00001       ECPBPA       BCT X04516
D00002       ECPYRC       BCT X04469
D00003       HUMP450M     HUM D00003
D00004       FLBFLBL40    VRL D00004
D00005       IBAMEM682    VRL D00005
D00006       BACPNS1981   BCT D00006
D00007       CHKCALGRP    VRT D00007
D00008       ECPNTAB      BCT X04195
D00009       DROPER1      INV D00009
------------------------------------------------------------------------------


7.3. Part of the contents in the keyword phrase index file 'ddbjkey1.idx'

Keyword phrases consist of names for gene products and other characteristics
of sequence entries.  
------------------------------------------------------------------------------
"COAT PROTEIN
             SMO511347    VRL AJ511347
'TNPA GENE
             UBA564903    BCT AJ564903
'ZINC-FINGER' MOTIF
             PRNS53       VRL X60546
(+) MATING TYPE SURFACE PROTEIN
             ABGPSSP      PLN M94861
(1,3
             TABETGLUB    PLN Z22874
(1,3)-BETA-D-GLUCAN BINDING PROTEIN
             AJ606470     INV AJ606470
(1,3)BETA-GLUCAN SYNTHASE
             NCU09275     PLN U09275
(1,4)-BETA-D-ARABINOXYLAN ARABINOFURANOHYDROLASE
             ANAXHA       PLN Z78011      ANTUAXHA     PLN Z78010
(1,6)-BETA-GLUCAN BIOSYNTHESIS
             YSAKRE1A     PLN M81588
(1-3)-BETA-GLUCANASE
             NTSP41AGN    PLN X81560      PA13BGPT     PLN X57794
(1-3,1-4)-BETA-D-GLUCANASE
             HVBDG        PLN X52572
(1-4)-BETA-MANNAN ENDOHYDROLASE
             CAR278996    PLN AJ278996    CAR293305    PLN AJ293305
(2',5'-OLIGOISOADENYLATE SYNTHETASE-DEPENDENT)
             AL138776     HUM AL138776
(2'-5') OLIGO(A) SYNTHASE E16
             SSO4G06      EST F14610
(2'-5')OLIGOADENYLATE SYNTHETASE
             HSA225089    HUM AJ225089    HUMSYN25A    HUM D00068
             SSA225090    MAM AJ225090
(6')-IB' AMINOGLYCOSIDE ACETYLTRANSFERASE
             AXY278514    BCT AJ278514    PAE291609    BCT AJ291609
(8,11)-LINOLEOYL DESATURASE
             COF245938    PLN AJ245938
------------------------------------------------------------------------------


7.4. Part of the contents in the journal citation index file 'ddbjjou1.idx'

The journal citation index file lists all of the citations that appear in the
references.  
------------------------------------------------------------------------------
(ER) AAPS PHARMSCI. 4 (3), DOI 10.1208/PS040315 (2002)
             AY170916     ROD AY170916
(ER) AM. J. HUM. GENET. 76 (1) (2004) IN PRESS
             AY753209S1   HUM AY753209    AY753209S2   HUM AY753210
(ER) ARCH. VIROL. (2004) IN PRESS
             AF531505     VRL AF531505    AY518899     VRL AY518899
             AY518900     VRL AY518900    AY518901     VRL AY518901
             AY518902     VRL AY518902    AY518903     VRL AY518903
             AY518904     VRL AY518904    AY518905     VRL AY518905
             AY518906     VRL AY518906    AY518907     VRL AY518907
             AY518908     VRL AY518908    AY518909     VRL AY518909
             AY518910     VRL AY518910    AY518911     VRL AY518911
             AY518912     VRL AY518912    AY518913     VRL AY518913
             AY518914     VRL AY518914    AY518915     VRL AY518915
             AY518916     VRL AY518916    AY518917     VRL AY518917
             AY518918     VRL AY518918    AY518919     VRL AY518919
             AY518920     VRL AY518920    AY518921     VRL AY518921
             AY518922     VRL AY518922    AY518923     VRL AY518923
             AY518924     VRL AY518924    AY518925     VRL AY518925
             AY518926     VRL AY518926    AY518927     VRL AY518927
             AY518928     VRL AY518928    AY518929     VRL AY518929
             AY518930     VRL AY518930    AY518931     VRL AY518931
             AY518932     VRL AY518932    AY521234     VRL AY521234
             AY521235     VRL AY521235    AY521236     VRL AY521236
             AY521237     VRL AY521237    AY521238     VRL AY521238
(ER) ARTERIOSCLER. THROMB. VASC. BIOL. (2004) IN PRESS
             AY563557     HUM AY563557
(ER) BIOCHEM. BIOPHYS. RES. COMMUN. 325 (1), 203-214 (2004)
             AY563137     HUM AY563137
(ER) BIOCHEM. J./10.1042/BJ20030293
             HSA496460    HUM AJ496460
------------------------------------------------------------------------------


7.5. Part of the contents in the gene name index file 'ddbjgen.idx'

This file lists all the gene names that appear in the feature table.  
------------------------------------------------------------------------------
'ARR
             BX927156     BCT BX927156
'BGLG
             BX927156     BCT BX927156
'BGLS
             BX927148     BCT BX927148
'BGLY'
             BX927156     BCT BX927156
'BRNQ
             AF305888     BCT AF305888
'COMK
             AL591983     BCT AL591983    AL596172     BCT AL596172
'CRCB
             BX927155     BCT BX927155
'CRTI
             BX927155     BCT BX927155
'DPPE
             LDDIPEP      BCT Z34898
'FIC
             BX936398     BCT BX936398
------------------------------------------------------------------------------


8. Release history

Release  Date      Entries            Bases  Comments
 80     12/09  112,314,250  109,636,862,252  
 79     09/09  108,593,519  106,684,379,504  DBLINK line started 
                                             PROJECT line terminated
 78     06/09  105,737,359  104,597,360,291
 77     03/09  102,099,156  101,765,388,414
 76     12/08   98,220,409   98,741,908,446
 75     09/08   92,840,037   95,219,505,205  TSA division started
 74     06/08   87,903,140   91,294,770,939
 73     03/08   83,167,582   86,099,950,395  KIPO inclusion started
 72     12/07   79,004,098   82,592,245,487  Most of E-mail addresses discarded
 71     09/07   76,273,345   79,706,204,461
 70     06/07   72,801,679   76,788,510,646
 69     03/07   67,523,680   71,775,679,500  PROJECT line started
                                             Indexes for categories terminated
 68     12/06   64,267,978   68,259,314,742  1.5 GB storage started
 67     09/06   61,144,621   65,443,024,193
 66     06/06   58,176,628   62,945,843,881
 65     03/06   55,890,995   60,564,721,635  TPA subcategories started
 64     12/05   52,272,669   56,098,558,378  Some index files split
 63     09/05   47,741,593   52,246,110,341
 62     06/05   45,249,444   49,158,155,283  ENV division started
                                             Version for release note started
 61     03/05   43,118,204   47,099,081,750  Changed style of release note
 60     12/04   40,583,945   44,416,752,273  /db_xref="H-inv:**" started
 59     09/04   37,926,117   42,245,956,937
 58     06/04   34,917,581   39,812,635,108
 57     03/04   32,693,678   38,008,449,840
 56     12/03   30,405,173   36,079,046,032
 55     09/03   27,753,140   34,280,225,489
 54     06/03   25,149,821   32,162,041,177
 53     02/03   23,250,813   29,711,299,332
 52     12/02   20,354,812   26,931,456,316
 51     09/02   18,401,358   22,782,404,136  TPA started
 50     06/02   17,260,693   20,158,357,982
 49     04/02   16,503,157   18,579,627,226
 48     01/02   15,016,100   16,197,713,855
 47     10/01   13,266,610   14,145,671,645
 46     07/01   12,313,759   13,037,646,166
 45     04/01   11,434,113   12,207,092,905  HTC division started
 44     01/01   10,165,597   11,136,298,841
 43     10/00    8,666,551   10,034,532,698
 42     07/00    7,554,995    8,880,721,093
 41     04/00    5,962,608    6,409,581,885  CON division started
 40     01/00    5,388,125    4,762,696,173  RNA division terminated
 39     10/99    4,810,773    3,728,000,562  NID and PID discarded
 38     07/99    4,294,369    3,098,519,597
 37     03/99    3,311,627    2,375,261,951  VERSION, /protein_id started
 36     01/99    3,073,166    2,190,425,560
 35     10/98    2,759,261    1,957,341,169
 34     07/98    2,412,785    1,708,580,623
 33     04/98    2,174,769    1,479,303,279
 32     01/98    1,956,669    1,300,950,613
 31     10/97    1,731,532    1,139,869,464  Adoption of the unified taxonomy 
                                             database
 30     07/97    1,534,115      992,788,339  NID and PID terminated
 29     04/97    1,270,194      841,415,232
 28     01/97    1,154,120      756,785,219  HTG division started
                                             ORG division terminated
 27     10/96      936,697      608,103,057  GSS division started
 26     07/96      835,552      551,932,448
 25     04/96      744,490      499,300,364  /translation started
 24     01/96      637,508      431,771,652
 23     10/95      569,757      390,694,350
 22     07/95      437,588      322,982,425  HUM division started
 21     04/95      274,596      250,875,023
 20     01/95      239,689      231,299,557
 19     10/94      204,332      205,274,131
 18     07/94      185,230     192,473,021
 17     04/94      169,957     179,942,209
 16     01/94      154,626     165,017,628
 15     10/93      131,649     147,224,690
 14     07/93      120,350     138,686,333  JPO inclusion started
 13     04/93      112,067     129,784,445
 12     01/93       97,683     120,815,244  EST division started
 11     07/92       65,693      84,839,075
 10     01/92       59,317      77,805,556  GenBank/EMBL inclusion started
  9     07/91        1,130       2,002,124
  8     01/91          879       1,573,442
  7     07/90          681       1,154,211
  6     01/90          496         841,236
  5     07/89          395         679,378
  4     01/89          302         535,985
  3     07/88          230         345,850
  2     01/88          142         199,392
  1     07/87           66         108,970  Started with DDBJ only


------------------
Since release 79
------------------

A new line, DBLINK, has replaced PROJECT line:  

Following the agreement at the INSD collaborative meeting in 2008, the scope 
of the project ID has expanded to include projects that are not necessarily 
targeted to the sequencing of a complete genome.  In addition, there are other 
resources such as the Trace Assembly Archive at the NCBI and the like.  

Therefore, we have decided to replace the PROJECT line by a new line format, 
"DBLINK".  

The replacement is illustrated in the following; 

From the use of the PROJECT line (-release 78); 
-------------------------------------------------------------------------------
LOCUS       AP000000             4700000 bp    DNA     circular BCT 27-FEB-2009
DEFINITION  Escherichia coli DDBJ genomic DNA, complete genome.
ACCESSION   AP000000
VERSION     AP000000.1
PROJECT     GenomeProject:99999
KEYWORDS    .
-------------------------------------------------------------------------------

To the DBLINK line format (release 79-); 
-------------------------------------------------------------------------------
LOCUS       AP000000             4700000 bp    DNA     circular BCT 27-FEB-2009
DEFINITION  Escherichia coli DDBJ genomic DNA, complete genome.
ACCESSION   AP000000
VERSION     AP000000.1
DBLINK      Project:99999
KEYWORDS    .
-------------------------------------------------------------------------------


------------------
Since release 75
------------------

A new division for assembled mRNA sequences, Transcriptome Shotgun Assembly 
(TSA), has been included since the release 75.  

With new sequencing technologies in use, INSDC have faced many requests to 
accept assembled EST sequences.  These sequence data have become more useful 
than used to be, although they may not be correctly assembled or exist in 
nature.  Therefore, INSDC decided to collect assembled EST sequences and 
classified them into the new division 'TSA'.  

TSA sequences are shotgun assemblies of primary sequences deposited in the 
EST division of INSDC, race Archive (TA) or Short-Read Archive (SRA).  Two 
specific keywords, "TSA" and "Transcriptome Shotgun Assembly", are present 
in all TSA entries.  The new division code, "TSA", is also described in the 
the LOCUS line in all TSA entries.

No format changes in the flat file are anticipated for the TSA division, 
however, note that TSA entries make use of the same PRIMARY line that is 
described for the entries in TPA category (See also '3.2. TPA separated from 
primary dataset').  The PRIMARY block contains references to the underlying 
reads/transcripts that are assembled to construct a TSA record.  

Note that it is required for a TSA submission to submit sequence data of 
primary transcripts to the EST division of INSDC, TA, or SRA.  More 
information about how to submit a TSA entry is provided via the following 
URL; http://www.ddbj.nig.ac.jp/sub/tsa-e.html


------------------
Since release 73
------------------

Introduction of the sequence data from the Korean Intellectual Property Office:  

The nucleotide sequence data transferred from Korean Intellectual Property 
Office (KIPO) have been included in DDBJ release.  See also, '3.1. Division 
categories' and '3.3. Notice for patent related sequence data'.  


------------------
Since release 72
------------------

Deletion of E-mail address, phone and fax numbers from DDBJ flat file:  

To follow the Japanese law of protecting personal information, DDBJ deleted 
both phone and fax numbers, and E-mail address from the flat files of the 
entries submitted to DDBJ.  It would be also helpful to protect DDBJ releases 
against SPAM mail senders.  
DDBJ retrofitted most of all entries submitted to DDBJ, not to GenBank or 
EMBL-Bank, by the DDBJ periodical release 72.  

Previously, the submitter information was described in JOURNAL line at REFERENCE 
1 as, 
--------------------------------------------------------------------------------
REFERENCE   1  (bases 1 to 1200)
  AUTHORS   Mishima,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases.
            Taro Mishima, DNA Data Bank of Japan, National Institute of
            Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan
            (E-mail:ddbj@ddbj.nig.ac.jp, URL:http://www.ddbj.nig.ac.jp/,
            Tel:81-12-345-6789, Fax:81-12-345-9876)
--------------------------------------------------------------------------------

After the deletion or the information in question, DDBJ flat file is either one 
of the following two types;  

Type 1: Phone and fax numbers and E-mail address are deleted.  
--------------------------------------------------------------------------------
REFERENCE   1  (bases 1 to 1200)
  AUTHORS   Mishima,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases.
            Contact:Taro Mishima
            DNA Data Bank of Japan, National Institute of Genetics; 1111, 
            Yata, Mishima, Shizuoka 411-8540, Japan
            URL    :http://www.ddbj.nig.ac.jp/
-------------------------------------------------------------------------------

Type 2: When the submitters wish to keep their contact information disclosed, 
it is described as, 
-------------------------------------------------------------------------------
REFERENCE   1  (bases 1 to 1200)
  AUTHORS   Mishima,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases.
            Contact:Taro Mishima
            DNA Data Bank of Japan, National Institute of Genetics; 1111, 
            Yata, Mishima, Shizuoka 411-8540, Japan
            URL    :http://www.ddbj.nig.ac.jp/
            E-mail :ddbj@ddbj.nig.ac.jp
            Phone  :81-12-345-6789
            Fax    :81-12-345-9876
-------------------------------------------------------------------------------


------------------
Since release 69
------------------

Introduction of the project ID at PROJECT line in DDBJ flat file: 
Following the agreement at the INSD collaborative meeting in 2006, INSDC has 
started to assign the project ID for submissions from sequencing projects.  
The description of project ID is shown as below;  
----------------------------------------------------------------------------
  A unique identifier, assigned at the time of the submission by a sequencing 
  project that informed INSDC of the submission beforehand.  It is recommended 
  that the submitter quotes the assigned project ID in all communication with 
  INSDC databases to allow for easier and faster tracking of issues.  
  The project ID field provides an umbrella identifier that points to all 
  related sequence data for the project.  
----------------------------------------------------------------------------
The PROJECT lines contain INSDC-assigned ID for the sequencing project.  
It will be appeared between VERSION and KEYWORDS lines in DDBJ flat files, 
from the DDBJ periodical release, 69 as shown below.  See also '2. DDBJ flat 
file format'.  
----------------------------------------------------------------------------
ACCESSION   AB012345
VERSION     AB012345.1
PROJECT     GenomeProject:123
KEYWORDS    .
----------------------------------------------------------------------------


Termination of providing the index files for each category: 
For users logging in one of our computers (supernig), we provided index 
files for each category.  However, as the computer system in our institute 
was replaced with a new one which does not have a service using the index 
files, we terminated providing the index files.  


------------------
Since release 68
------------------

Split of files:  
We changed the maximum file size from 300 MB to 1.5 GB, because the network 
capacity has been remarkably increased.  Each file named as ddbj***##.seq 
has at most 1.5 GB storage capacity.  See also the sections, '6. File 
categories' and '9. File list'.  


------------------
Since release 64
------------------

Split of index files:  
In the present release, some of index files (ddbjacc.idx, ddbjjou.idx, and 
ddbjkey.idx) have been greater than 2 GB in the file size.  So, these have been 
recorded in multiple ddbj****.idx files, each of which at most has 1.5 GB 
storage capacity as follows, respectively.  See also 6., 7.2., 7.3., 7.4. 
and 9.  


------------------
Since release 62
------------------

Release version number is introduced:  
DDBJ has started to include the item, 'version', for its release note, which 
indicates a version for its periodical release.  It is expressed like '62.0', 
in which the digit(s) after the period is a version number.  The reason for 
adding the version number is that a released data is sometimes revised due to 
urgent and necessary corrections.  The number is increased by one every time 
when a revised periodical release is made public until the next release.  

Introduction of ENV division:  
Recently, the submissions of the sequences derived from environmental samples 
have rapidly increased.  To accommodate such submissions, a new division, ENV, 
has been created (See also '3.1. Division categories').  This division contains 
the sequences obtained via direct molecular isolation such as PCR, DGGE, or any 
anonymous method.  In the past, the sequences derived from environmental 
samples belonged to taxonomic divisions, mainly BCT.  At DDBJ, the retrofit to 
transfer relevant entries from taxonomic divisions to the ENV division starts 
in the present release, and ends by the next periodical release.  Please note 
that during this transitional period, some entries to be eventually placed in 
the ENV division will be found in other divisions.  

Strand information is removed:  
The strand information of LOCUS line in the flat file has been removed as shown 
below.  See also '2.1. LOCUS line'.  
----------------------------------------------------------------------------
Old (-rel. 61):
  44-44     space
  45-47     spaces, ss- (single-stranded), ds- (double-stranded), or 
             ms- (mixed-stranded)
New (rel. 62-):
  44-47     spaces
----------------------------------------------------------------------------


------------------
Since release 61
------------------
The style of release note (this file) has been changed.  

Some entries have the sequential format for the secondary accession numbers in 
the ACCESSION line, in order to make the expression of secondary accession 
numbers in the past short.  For example;
------------------------------------------------------------------------------
Before;
ACCESSION   AB000802 D85885 D85886 D85887
After;
ACCESSION   AB000802 D85885-D85887
------------------------------------------------------------------------------
See also '2.3. ACCESSION line'.  


------------------
Since release 60
------------------
The cross-reference to the H-invitational has been included.


------------------
Since release 56
------------------
The three data banks have agreed that the maximum length limitation (350 kb)
of a submitted sequence be relaxed.

The BASE COUNT line of the DDBJ flat file format has been changed, 
corresponding to the relaxation of the maximum sequence length restriction in 
the entry that had been practiced at DDBJ/EMBL/GenBank International Nucleotide 
Sequence Databases.  In the BASE COUNT line of the DDBJ flat file, 6 digits 
had been allocated for each number of a, c, g, t and other bases in the 
sequence.  Hereafter, in the new flat file format, 9 digits are allocated for 
each number of a, c, g and t, while the numbers of other bases are removed.  
In accordance with the relaxation of sequence length limitation, GenBank had 
already dropped the BASE COUNT line from their flat file format from GenBank 
Release 138 (Oct. 2003).  We DDBJ have decided to maintain the BASE COUNT line 
in our flat file format from the view that GC contents are still important 
information to characterize the sequence.  The changes in the BASE COUNT line 
are shown below.  
----------------------------------------------------------------------------
Old (-rel. 55): 
    1    6   11   16   21   26   31   36   41   46   51   56   61   66   71
    |----|----|----|----|----|----|----|----|----|----|----|----|----|----|
    BASE COUNT   123456 a 123456 c 123456 g 123456 t 123456 others

New (rel. 56-): 
    1    6   11   16   21   26   31   36   41   46   51   56   61   66   71
    |----|----|----|----|----|----|----|----|----|----|----|----|----|----|
    BASE COUNT    123456789 a    123456789 c    123456789 g    123456789 t
----------------------------------------------------------------------------


------------------
Since release 54
------------------
'/sequenced_mol' qualifier has been changed to '/mol_type' qualifier.  We 
accordingly completed retrofitting the pertinent entries.  
This change was made on the agreement at the INSD collaborative meeting in 2002.


------------------
Since release 51
------------------
The TPA (Third Party Annotation) dataset has been available.  The dataset is 
a complement to the existing DDBJ/EMBL/GenBank database of the primary 
nucleotide sequences which were obtained from direct sequencing of cDNAs, 
ESTs, genomic DNAs etc.  

The format of LOCUS line in the flat file has been changed as shown below 
to adjust to the GenBank format.  
------------------------------------------------------------------------------
Old (-rel. 50): 
LOCUS       AB000001      660 bp    DNA             PLN       01-FEB-2001
New (rel. 51-): 
LOCUS       AB000001                 660 bp    DNA     linear   PLN 01-FEB-2001
------------------------------------------------------------------------------


------------------
Since release 45
------------------
The HTC (High Throughput cDNA) division has been included.  This is to include 
unfinished high throughput cDNA sequences, each of which has 5'UTR and 3'UTR 
at both ends and part of a coding region.  The sequence may also include 
introns.  When the sequence becomes finished later, it moves to the 
corresponding taxonomic division.  The sequence is accompanied with a keyword, 
HTC (High Throughput cDNA), which is dropped when the sequence is finished and 
moved to a taxonomic division.  


------------------
Since release 41
------------------
The CON division has been included.  This division is to show the order of 
related sequences in a genome, and expressed by join and the accession numbers 
of the sequences.  The contents of the CON division are compiled by the three 
data banks not by the data submitter.  


------------------
Since release 40
------------------
The RNA division was terminated.  The RNA data have been redistributed 
according to the category of the organism.  Therefore, you will find a human 
RNA sequence, for example, in the HUM division.  


------------------
Since release 37
------------------
The three data banks include the item VERSION in the flat file, which 
indicates a version of a submitted nucleotide sequence.  It is expressed 
like AB123456.1, in which the digit(s) after the period is a version number.  
The reason for adding VERSION is that since a released sequence sometimes 
revised by the submitter, the accession number alone cannot specify the 
sequence in question causing the user a trouble.  The number is increased by 
one every time when a revised sequence is made public.  

Accordingly, the translated protein sequence will be accompanied with a 
/protein_id which is expressed as BAA12345.1, in which the digit(s) after the 
period is again a version number.  The number is increased by one when the  

corresponding nucleotide sequence is revised and the protein sequence is 
changed as a result, and when the revised protein sequence is made public.


------------------
Since release 31
------------------
We have started adopting the unified taxonomy database to unify the biological 
source of the sequence.  The database is made up with scientific names, ID of 
unidentified organisms, and synthetic constructs etc.  


------------------
Since release 30
------------------
NID and PID were terminated.  This change was made on the agreement at the 
INSD collaborative meeting in 1999.  


------------------
Since release 28
------------------
The HTG (High Throughput Genomic sequence) has been included.  This division 
was created to cope with genome project teams which deal with a clone as a 
sequencing unit.  

We terminated the ORG (Organelle) division.  Thus, if you are interested in 
human mitochondrial sequences, for example, you are now advised to refer to 
the HUM division.  


------------------
Since release 27
------------------
The GSS division has been included.  GSS stands for Genome Survey Sequence, 
which is similar to EST, except that GSS is genomic DNA whereas EST is cDNA.  


------------------
Since release 25
------------------
DDBJ release contains amino acid sequences that were translated from the 
corresponding nucleotide sequences of the database.  In the translation we paid 
much attention to the fact that some species or organella have a codon 
different from the universal one, and used the proper codon table.  


------------------
Since release 22
------------------
The HUM division has been included.  Human genome projects have probably been 
most productive and yielded a large number of sequences  Thus, we have the 
human (HUM) division solely for human sequences and the primate (PRI) division 
for non-human primate sequences.  


------------------
Since release 12
------------------
The EST (Expressed Sequence Tag) division has been included.  The number of 
ESTs has been increasing at an enormous rate and is expected to be growing even 
more rapidly in the future.  Thus, we created a division for ESTs  


------------------
Since release 10
------------------
The sequences submitted to GenBank or EMBL have been included in the release.  


9. File list

The files in this release are arranged in the following order with non-labeled 
format.  

-----------------------------------------------------------------------
file name                                               file size
-----------------------------------------------------------------------
ddbjrel.txt   (DDBJ release note)                           75000
ddbjacc1.idx  (Accession number index file 1)          1499999996
ddbjacc2.idx  (Accession number index file 2)          1500000031
ddbjacc3.idx  (Accession number index file 3)          1381944772
ddbjgen.idx   (Gene name index file)                    163710327
ddbjjou1.idx  (Journal citation index file 1)          1499999831
ddbjjou2.idx  (Journal citation index file 2)          1454122803
ddbjjou3.idx  (Journal citation index file 3)          1418735566
ddbjjou4.idx  (Journal citation index file 4)           258874324
ddbjkey1.idx  (Keyword phrase index file 1)            1499999963
ddbjkey2.idx  (Keyword phrase index file 2)            1499999934
ddbjkey3.idx  (Keyword phrase index file 3)            1413211823
-----------------------------------------------------------------------

file name          number of entries   number of bases  file size
-----------------------------------------------------------------------
ddbjbct1.seq               129209       613638874      1499348792
ddbjbct2.seq                92698       652132311      1509656780
ddbjbct3.seq                  420       676241706      1504463697
ddbjbct4.seq                  341       661552144      1499093696
ddbjbct5.seq                  435       663102613      1500396322
ddbjbct6.seq                  709       653709773      1506705469
ddbjbct7.seq               306911       496269834      1426879484
ddbjenv1.seq               576831       403389390      1499001179
ddbjenv2.seq               556093       417732687      1499000823
ddbjenv3.seq               565941       394191636      1499000077
ddbjenv4.seq               170094        64371100       352881974
ddbjest1.seq               461404       172637580      1499002915
ddbjest2.seq               489028       191392155      1499000466
ddbjest3.seq               496920       205467042      1499000920
ddbjest4.seq               478884       204162986      1498999949
ddbjest5.seq               546754       291205940      1499002329
ddbjest6.seq               551454       337215376      1499001096
ddbjest7.seq               538635       306812523      1499000074
ddbjest8.seq               400326       128158087      1499002885
ddbjest9.seq               489111       209783226      1499001480
ddbjest10.seq              510492       236251653      1499002355
ddbjest11.seq              468827       200490760      1499002619
ddbjest12.seq              370249       131547538      1499000062
ddbjest13.seq              274520        83762061      1499002191
ddbjest14.seq              274923       108605048      1499002676
ddbjest15.seq              382328       177870132      1499000514
ddbjest16.seq              479011       247892799      1499001775
ddbjest17.seq              462383       242653503      1499002390
ddbjest18.seq              451373       248304279      1499002875
ddbjest19.seq              462263       221825009      1499002328
ddbjest20.seq              462463       279232331      1499002415
ddbjest21.seq              468086       286559524      1499001653
ddbjest22.seq              467151       243886628      1499001853
ddbjest23.seq              446563       262673638      1499002463
ddbjest24.seq              503584       277432447      1499001917
ddbjest25.seq              547861       319817725      1499000018
ddbjest26.seq              416846       210553350      1499004614
ddbjest27.seq              432324       254397545      1499004312
ddbjest28.seq              477395       270470220      1499002909
ddbjest29.seq              515302       263384582      1499001478
ddbjest30.seq              452799       242537461      1499001631
ddbjest31.seq              453703       252183270      1499002281
ddbjest32.seq              443568       287884104      1499001866
ddbjest33.seq              409721       292959220      1499000681
ddbjest34.seq              493725       294026400      1499000537
ddbjest35.seq              642292       369730350      1499001184
ddbjest36.seq              471898       299473787      1499002834
ddbjest37.seq              419548       248027878      1499002905
ddbjest38.seq              258434        97094937      1499005072
ddbjest39.seq              259183       104673524      1499002426
ddbjest40.seq              311729       149337517      1499000073
ddbjest41.seq              481658       272673941      1499000927
ddbjest42.seq              480385       266516041      1499001053
ddbjest43.seq              447811       239670072      1499001814
ddbjest44.seq              476863       281211438      1499001852
ddbjest45.seq              512028       258106319      1499001576
ddbjest46.seq              432339       256005292      1499001145
ddbjest47.seq              554728       284750760      1499002345
ddbjest48.seq              428077       243981202      1499003519
ddbjest49.seq              400570       234629213      1499002648
ddbjest50.seq              262884       133947127      1499002674
ddbjest51.seq              267929       109298176      1499001521
ddbjest52.seq              309827       137084057      1499000950
ddbjest53.seq              415953       231398699      1499001731
ddbjest54.seq              557331       317128704      1499000710
ddbjest55.seq              425608       285769731      1499002185
ddbjest56.seq              442664       243880762      1499004219
ddbjest57.seq              474751       279955587      1499002850
ddbjest58.seq              427838       233439127      1499000973
ddbjest59.seq              481013       269844807      1499003044
ddbjest60.seq              447471       274791409      1499000006
ddbjest61.seq              421060       241211479      1499003178
ddbjest62.seq              491602       333612285      1499002109
ddbjest63.seq              447195       274073621      1499000763
ddbjest64.seq              446588       223983272      1499001024
ddbjest65.seq              433523       267891753      1499001269
ddbjest66.seq              434411       280891233      1499001886
ddbjest67.seq              394656       255699918      1499001682
ddbjest68.seq              429679       239367934      1499002079
ddbjest69.seq              425061       236637032      1499003467
ddbjest70.seq              428103       237481211      1499000758
ddbjest71.seq              427786       226516744      1499000673
ddbjest72.seq              513041       298303550      1499001390
ddbjest73.seq              536955       329744498      1499002254
ddbjest74.seq              584333       334571089      1499001759
ddbjest75.seq              448201       280398580      1499000944
ddbjest76.seq              434359       317888954      1499001102
ddbjest77.seq              446580       282431747      1499000647
ddbjest78.seq              467686       291941605      1499003346
ddbjest79.seq              385832       260983985      1499001006
ddbjest80.seq              381209       272885182      1499002440
ddbjest81.seq              403044       277363025      1499001881
ddbjest82.seq              408248       309911176      1499001691
ddbjest83.seq              481764       299845788      1499002468
ddbjest84.seq              436669       317999917      1499001930
ddbjest85.seq              523370       300536593      1499000670
ddbjest86.seq              587250       199377016      1499000340
ddbjest87.seq              500442       311561648      1499002784
ddbjest88.seq              499150       314020304      1499000594
ddbjest89.seq              510460       310276158      1499000763
ddbjest90.seq              660770       302531345      1499002476
ddbjest91.seq              567713       267364474      1499001332
ddbjest92.seq              509626       314080577      1499002310
ddbjest93.seq              504264       279014868      1499001369
ddbjest94.seq              547023       208454257      1499002396
ddbjest95.seq              512732       319174329      1499002295
ddbjest96.seq              416100       236794894      1499001819
ddbjest97.seq              605258       181015265      1499000745
ddbjest98.seq              510085       267085177      1499000617
ddbjest99.seq              565916       210110336      1499000532
ddbjest100.seq             478394       287633671      1499000303
ddbjest101.seq             494765       331331158      1499001279
ddbjest102.seq             510056       300353813      1499002087
ddbjest103.seq             585271       221618333      1499000711
ddbjest104.seq             548024       284031484      1499001664
ddbjest105.seq             423945       268892276      1499000639
ddbjest106.seq             461327       281845613      1499001076
ddbjest107.seq             448137       285259570      1499001886
ddbjest108.seq             465543       337124785      1499002095
ddbjest109.seq             445889       301519732      1499000963
ddbjest110.seq             406158       275837115      1499002273
ddbjest111.seq             450233       292798608      1499000930
ddbjest112.seq             448150       285113756      1499002798
ddbjest113.seq             430618       268035093      1499002248
ddbjest114.seq             443090       248547074      1499001744
ddbjest115.seq             377118       240357665      1499001931
ddbjest116.seq             535477       252177529      1499001099
ddbjest117.seq             444955       274454223      1499002117
ddbjest118.seq             451789       290227258      1499001922
ddbjest119.seq             488577       305122767      1499001821
ddbjest120.seq             301056       178362547      1499002729
ddbjest121.seq             418714       157979994      1499001788
ddbjest122.seq             529704       241408720      1498999966
ddbjest123.seq             603823       309253393      1499000067
ddbjest124.seq             428162       264913874      1499002199
ddbjest125.seq             557125       259846517      1499000313
ddbjest126.seq             524346       281644770      1499002428
ddbjest127.seq             501048       163282677      1499001720
ddbjest128.seq             450320        76907330      1499002599
ddbjest129.seq             469195       227374149      1499000544
ddbjest130.seq             465235       306792744      1499002700
ddbjest131.seq             433984       295648956      1499001997
ddbjest132.seq             467550       233493165      1499000536
ddbjest133.seq             461099       251746195      1499001903
ddbjest134.seq             490313       303285397      1499000406
ddbjest135.seq             409115       269880556      1499000731
ddbjest136.seq             457076       268573061      1499001470
ddbjest137.seq             409620       260018066      1499006282
ddbjest138.seq             330697       178728887      1499000501
ddbjest139.seq             344225       126146468      1088163214
ddbjgss1.seq               478720       345736293      1498999982
ddbjgss2.seq               446202       341981188      1499002065
ddbjgss3.seq               441849       334127565      1499001409
ddbjgss4.seq               563976       272512682      1499002041
ddbjgss5.seq               485446       252775708      1499001419
ddbjgss6.seq               465218       255357356      1499000135
ddbjgss7.seq               390524       193635244      1499000196
ddbjgss8.seq               413528       207709650      1499002022
ddbjgss9.seq               495820       286403440      1499001872
ddbjgss10.seq              553297       308811739      1499000520
ddbjgss11.seq              496018       296305677      1499000429
ddbjgss12.seq              538845       351887502      1499000695
ddbjgss13.seq              514077       354489573      1499001513
ddbjgss14.seq              514040       357852488      1499001317
ddbjgss15.seq              601982       341847636      1499000136
ddbjgss16.seq              608571       364013777      1499001973
ddbjgss17.seq              564906       322446219      1499000063
ddbjgss18.seq              522127       373026712      1499001479
ddbjgss19.seq              511557       339572142      1499001410
ddbjgss20.seq              572413       369213573      1499000370
ddbjgss21.seq              580207       408088681      1499001977
ddbjgss22.seq              545069       331671819      1499003476
ddbjgss23.seq              472749       279843200      1499002156
ddbjgss24.seq              513771       326557684      1499002676
ddbjgss25.seq              535297       357134427      1499000258
ddbjgss26.seq              531008       341108576      1499001342
ddbjgss27.seq              585234       299547302      1499001123
ddbjgss28.seq              575068       278559672      1499001055
ddbjgss29.seq              548317       346699256      1499002254
ddbjgss30.seq              473299       340856099      1499001686
ddbjgss31.seq              484654       378581237      1499001560
ddbjgss32.seq              552162       342544711      1499000694
ddbjgss33.seq              544962       331090945      1499000408
ddbjgss34.seq              468538       358200413      1499002489
ddbjgss35.seq              544053       318329545      1499001062
ddbjgss36.seq              516256       275561391      1499000758
ddbjgss37.seq              553203       289423824      1499002599
ddbjgss38.seq              391830       315140786      1499000527
ddbjgss39.seq              413248       339433287      1499001646
ddbjgss40.seq              426374       344074270      1499000066
ddbjgss41.seq              414645       338703029      1499002658
ddbjgss42.seq              424080       344716913      1499001049
ddbjgss43.seq              420431       340692606      1499001016
ddbjgss44.seq              429490       345699346      1499000632
ddbjgss45.seq              525238       325917340      1499001841
ddbjgss46.seq              582733       369763942      1499000911
ddbjgss47.seq              593037       405848075      1499000260
ddbjgss48.seq              579095       421251944      1499002185
ddbjgss49.seq              448441       246795133      1499000771
ddbjgss50.seq              520064       344140024      1499002312
ddbjgss51.seq              522106       391987025      1499000098
ddbjgss52.seq              518836       415834903      1499002058
ddbjgss53.seq              539144       364771048      1499002485
ddbjgss54.seq              195566       120905205       539805263
ddbjhtc1.seq               275123       360774310      1499000265
ddbjhtc2.seq               276231       277639262       987667416
ddbjhtg1.seq                11401      1118106723      1499008476
ddbjhtg2.seq                 7562      1118206224      1499000303
ddbjhtg3.seq                 5909      1130568515      1499078479
ddbjhtg4.seq                 5460      1140430800      1499337500
ddbjhtg5.seq                 5333      1144058630      1499095416
ddbjhtg6.seq                 5356      1144161975      1499207460
ddbjhtg7.seq                 6578      1132361542      1499177428
ddbjhtg8.seq                 6856      1143130880      1499127577
ddbjhtg9.seq                 6263      1139357283      1499006185
ddbjhtg10.seq                6324      1133166929      1499088723
ddbjhtg11.seq                7038      1123549489      1499014023
ddbjhtg12.seq                7001      1125082547      1499014075
ddbjhtg13.seq                6951      1141441040      1499029226
ddbjhtg14.seq                6972      1135065149      1499048608
ddbjhtg15.seq                6773      1141428421      1499206964
ddbjhtg16.seq                6350      1139131623      1499063961
ddbjhtg17.seq                6595      1139284036      1499248367
ddbjhtg18.seq                8370      1145868803      1499026200
ddbjhtg19.seq                6034      1135505551      1499182889
ddbjhtg20.seq                6650      1158954688      1499034066
ddbjhtg21.seq                6725      1158900026      1499090793
ddbjhtg22.seq                1206       182110542       236617007
ddbjhum1.seq                28858      1048594958      1499026986
ddbjhum2.seq                 8106      1069520945      1499078907
ddbjhum3.seq               146156       823329575      1499051539
ddbjhum4.seq                21614      1075074539      1499090481
ddbjhum5.seq               257552       552590554      1499002309
ddbjhum6.seq                27043        54114569       123876674
ddbjinv1.seq               236355       704880888      1500609987
ddbjinv2.seq               439035       440400289      1499000606
ddbjinv3.seq               231872       661371806      1489009647
ddbjmam.seq                217485       601162981      1242048654
ddbjpat1.seq              1034560       519799563      1499001462
ddbjpat2.seq               776082       493372155      1498999999
ddbjpat3.seq               739411       347619438      1499000375
ddbjpat4.seq               702648       595130668      1499019033
ddbjpat5.seq               733075       398324077      1499000664
ddbjpat6.seq               741114       330134244      1499002416
ddbjpat7.seq               689653       384960021      1499030184
ddbjpat8.seq               795953       529837449      1499000276
ddbjpat9.seq               901072       524578374      1499000971
ddbjpat10.seq              853831       591002271      1499001188
ddbjpat11.seq             1062924       349490134      1499024689
ddbjpat12.seq              907316       553044343      1499002777
ddbjpat13.seq             1289574       204174591      1498999966
ddbjpat14.seq              427788       348052322       983718601
ddbjphg.seq                  5197        39126337        96086137
ddbjpln1.seq               120435       917898994      1499005548
ddbjpln2.seq               242742       561459365      1499015264
ddbjpln3.seq                82583       903114238      1499001046
ddbjpln4.seq               290645       588781520      1499008603
ddbjpln5.seq               469117       429720764      1499001873
ddbjpln6.seq               397593       386597718      1288554313
ddbjpri1.seq                49336      1078473653      1499071017
ddbjpri2.seq                28263        91490437       177043642
ddbjrod1.seq                35308      1018082649      1499205003
ddbjrod2.seq                 5883      1092198781      1499173413
ddbjrod3.seq                41060      1053684471      1499152008
ddbjrod4.seq                78001       892261371      1499142198
ddbjrod5.seq               209340       161549754       564136519
ddbjsts1.seq               416891       210378788      1499003729
ddbjsts2.seq               338718       238537638      1499000438
ddbjsts3.seq               555160       180702534      1464739613
ddbjsyn.seq                 90452       136220312       484363192
ddbjtsa.seq                149952        49551403       308743455
ddbjuna.seq                   290          486282         1376896
ddbjvrl1.seq               396651       403471290      1499002176
ddbjvrl2.seq               366497       424080440      1499154411
ddbjvrl3.seq                 8179         7830515        28673227
ddbjvrt1.seq               242907       676346554      1499002257
ddbjvrt2.seq                61994      1019056563      1499001540
ddbjvrt3.seq               290725       684517330      1499000770
ddbjvrt4.seq                40735        29826512       119257689
------------------------------------------------------------------------------
Total                   112314250    109636862252    401275591695


ddbjtpa.seq                 55263        56513248       243397881
ddbjcon1.seq               257953               0      1499004559
ddbjcon2.seq               254669               0      1499000281
ddbjcon3.seq               539097               0      1499001017
ddbjcon4.seq               400768               0      1499002027
ddbjcon5.seq               341434               0      1499008632
ddbjcon6.seq               267241               0      1499009598
ddbjcon7.seq               451916               0      1499001994
ddbjcon8.seq               298033               0      1499079031
ddbjcon9.seq               238473               0      1500896037
ddbjcon10.seq              228272               0      1499002762
ddbjcon11.seq              242281               0      1499000662
ddbjcon12.seq              267602               0      1499002823
ddbjcon13.seq              289371               0      1499001028
ddbjcon14.seq              317181               0      1499005138
ddbjcon15.seq              275959               0      1499003652
ddbjcon16.seq              276545               0      1499000094
ddbjcon17.seq              285254               0      1499001420
ddbjcon18.seq              266917               0      1499004394
ddbjcon19.seq              265757               0      1499001068
ddbjcon20.seq              100619               0       256569521

The entries and bases in the CON division and TPA dataset are not counted in 
the numbers given on the top of the release note or 'Total' on the above table.