DNA Data Bank of Japan

                              DNA Database

   Release 37, Mar. 1999, including 3,311,627 entries, 2,375,261,951 bases


This database may be copied and redistributed without permission on the 
condition that all the statements in this release note are reproduced in each 
copy.

The present release contains the newest data prepared by the DNA Data Bank of 
Japan (DDBJ), GenBank, and European Molecular Biology Laboratory/European 
Bioinformatics Institute (EMBL/EBI) as of Feb. 16, 1999.  This unified database 
was made possible thanks to the international collaboration among the three
data banks.  All the entries have accordingly been annotated with the feature 
keys common to them. 

All the entries designated by the accession numbers with the prefixes "C", "D", 
"E", "AB", "AG", "AP", "AT" and "AU" have been collected and processed by DDBJ, 
and the rest have been prepared by GenBank and EMBL/EBI.  Since the content of 
a nucleotide sequence is often revised due to replacements, additions and 
deletions of bases made by the submitter, the accession number sometimes does 
not work to tell which sequence is really in question.  Thus, an additional 
identifier was introduced to specify a particular sequence in a series of 
revised sequences.  This identifier is called NID.  For the same reason for 
translated amino acid sequences, PID was brought into being.

There have been a number of genome projects going on worldwide.  Among them 
human genome projects have probably been most productive and yielded a large 
number of ordinary sequences and huge amounts of ESTs.  Thus, we have the human 
(HUM) division solely for human sequences and the primate (PRI) division for 
non-human primate sequences.  Note that the EST division also contains human 
sequences.

The present release does not have the ORG division.  Thus, if you are interested
in human mitochondrial sequences, for example, you are now advised to refer to 
the HUM division.

This release also includes an independent division (PAT) for patent data.  The 
patent data are those which the Japanese Patent Office (JPO), United States 
Patent and Trademark Office (USPTO), and the European Patent Office (EPO) 
collected and processed.  The accession numbers of the patent data collected 
by the Japanese Patent Office start with the prefix "E", those collected and 
supplied by USPTO and GenBank respectively start with "I", and those collected 
and supplied by EPO and EMBL/EBI respectively start with "A".  The entries with 
the prefixes "I", "A" and "E" were allocated to a file (ddbjpat.seq) in the 
DDBJ format.  Note also that unauthorized use of the patent data may cause 
legal issues for which we have no responsibility.

In this release, the SOURCE in the flat file was revisited and revised if 
necessary in accordance with the unified taxonomy database common to the three 
data banks.

The number of ESTs has been increasing at an enormous rate and is expected 
to be growing even more rapidly in the future.  To cope with this situation 
and handle the data files with least possible time and manner, we split the 
EST data in eighteen files in the present release; ddbjest1 for entries with 
the accession numbers with A to M prefixes, ddbjest2 for those with N to S, 
ddbjest3 for those with T to Z, and ddbjest4 to ddbjest18 contain those with 
two letter prefixes.  The files from the 4th to 17th contain 100,000 entries 
each, and the 18th one does the rest.

The present release includes the GSS division.  GSS stands for the Genome 
Survey Sequence, which is similar to EST, except that GSS is genomic DNA 
whereas EST is cDNA.  This division is divided into six files; each of the 
first five files contains 100,000 entries and the last one does the rest.  This 
release also includes the High Throughput Genome Sequence (HTGS) which comes 
mainly from genome project teams which deal with a clone as a sequencing unit.

The index files are not presented in this release except for ddbjacc.idx, 
ddbjgen.idx, ddbjjou.idx, and ddbjkey.idx.  Instead, we have included a program 
by which to make the index files not presented in this release.  For the use of 
the program, see the files, seq2indexes.doc, seq2indexes.c, and seq2indexes.h 
in this release.

The present release contains amino acid sequences that were translated from 
the corresponding nucleotide sequences in our database. In the translation 
we paid much attention to the fact that some species or organella have a 
codon different from the universal one, and used the proper codon table.  
However, if you find an incorrect codon in a translated sequence, please let 
us know.

As announced in the previous release note, from this release, the three data 
banks add a new item VERSION in the flat file, which indicates a version of a 
submitted nucleotide sequence (see Table 1).  It is expressed like AB123456.1, 
in which the digit(s) after the period is a version number.  The reason for 
adding VERSION is that since a submitted sequence sometimes revised by the 
submitter, the accession number alone cannot specify the sequence in question 
causing the user a trouble.  The number is increased by one every time when a 
revised sequences made public.  Accordingly, the translated protein sequence 
will be accompanied with a /protein_id which is expressed like BAA12345.1, in 
which the digit(s) after the period is again a version number.  The number is 
increased by one when the corresponding nucleotide sequence is revised and the 
protein sequence is changed as a result, and when the revised protein sequence 
is made public.  To implement the new numbers synchronously among the three 
data banks, we issued this release earlier than the ordinarily scheduled time. 
The present NID and PID will be not in use in the near future.

This release was published by the following DDBJ staff.

General administration
    T. Gojobori, T. Imanishi, Y. Fukuma, A. Watanabe, Y. Ueda, Y. Katsube, 
    K. Okuda, J. Sugiyama, J. Bellgard, H. Tsutsui(hold), Y. Noguchi, R. Chapman 
Database construction
    Y. Tateno, M. Ota, S. Miyazaki(hold), N. Yasuda, Y. Sato, H. Tsutsui, 
    C. Hamamatsu, M. Hirashima, A. Hasegawa, A. Suzuki, Y. Yamamoto, 
    M. Ejima, M. Okaneya, N. Endo, M. Iwase, R. Suzuki, R. Uchida, 
    Y. Shidahara, M. Gojobori, K. Nomura, M. Imma, J. Muroya, N. Ohkubo, 
    A. Shimada
Database software development and management
    H. Sugawara, S. Miyazaki, T. Okayama, S. Misu, T. Mizunuma, Y. Kawanishi,
    K. Goto, Y. Fukuma(hold), M. Kikuchi(hold), T. Futatsuki, H. Hashimoto, 
    H. Harimoto, T. Horiguchi, Y. Minesaki, R. Tanabe, H. Yamamoto, K. Mamiya, 
    T. Takaki, S. Sato, H. Ichinose
System management
    K. Nishikawa, K. Ikeo, T. Ito, A. Murakami, I. Mochizuki, M. Kikuchi, 
    T. Narita, M. Nagura
Editorial and public relations
    N. Saitou, K. Fukami-Kobayashi, M. Horie, Y. Daito, Y. Hattori, T. Kawamoto, 
    S. Nagira, K. Ichikawa


DNA Data Bank of Japan
Center for Information Biology
National Institute of Genetics
Mishima 411-8540, Japan 
Phone:  +81 559 81 6853
FAX:    +81 559 81 6849
E-mail: ddbj@ddbj.nig.ac.jp  (for general inquiry)
        ddbjsub@ddbj.nig.ac.jp  (for data submission)
        ddbjupdt@ddbj.nig.ac.jp (for updates and notification of publication)
WWW:    http://www.ddbj.nig.ac.jp (for DDBJ WWW server)
        http://sakura.ddbj.nig.ac.jp (for DDBJ sequence data submission system 
                                   SAKURA)

Acknowledgement: We are grateful to NCBI and EMBL/EBI for permitting us
to include the data they have collected and processed in the present release.
We also thank the Japanese Patent Office for kindly allowing us to distribute the patent data they collected and processed.


DDBJ Database Release History

Release  Date     Entries     Bases          Comments
------------------------------------------------------------------------
 37     03/99   3,311,627   2,375,261,951   VERSION, /protein_id started
 36     01/99   3,073,166   2,190,425,560 
 35     10/98   2,759,261   1,957,341,169
 34     07/98   2,412,785   1,708,580,623
 33     04/98   2,174,769   1,479,303,279
 32     01/98   1,956,669   1,300,950,613
 31     10/97   1,731,532   1,139,869,464   Adoption of the unified taxonomy
                                            database
 30     07/97   1,534,115     992,788,339   NID and PID started
 29     04/97   1,270,194     841,415,232   
 28     01/97   1,154,120     756,785,219   HTG division started
                                            ORG division eliminated
 27     10/96     936,697     608,103,057   GSS division started
 26     07/96     835,552     551,932,448   
 25     04/96     744,490     499,300,364   /translation started
 24     01/96     637,508     431,771,652   
 23     10/95     569,757     390,694,350   
 22     07/95     437,588     322,982,425   HUM division started
 21     04/95     274,596     250,875,023   
 20     01/95     239,689     231,299,557   
 19     10/94     204,332     205,274,131   
 18     07/94     185,230     192,473,021   
 17     04/94     169,957     179,942,209   
 16     01/94     154,626     165,017,628   
 15     10/93     131,649     147,224,690   
 14     07/93     120,350     138,686,333   
 13     04/93     112,067     129,784,445   
 12     01/93      97,683     120,815,244   EST division started
 11     07/92      65,693      84,839,075   
 10     01/92      59,317      77,805,556   GenBank/EMBL inclusion started
  9     07/91       1,130       2,002,124   
  8     01/91         879       1,573,442   
  7     07/90         681       1,154,211   
  6     01/90         496         841,236   
  5     07/89         395         679,378   
  4     01/89         302         535,985   
  3     07/88         230         345,850   
  2     01/88         142         199,392   
  1     07/87          66         108,970   Started with DDBJ only
------------------------------------------------------------------------


This release covers 18 categories of organisms and others as follows:
------------------------------------------------------------------------------
ddbjbct.*** Category for bacteria
ddbjest.*** Category for EST (expressed sequence tag)
ddbjhtg.*** Category for HTG (high throughput genomic sequencing)
ddbjhum.*** Category for human
ddbjgss.*** Category for GSS (Genome Survey Sequence)
ddbjinv.*** Category for invertebrates
ddbjmam.*** Category for mammals other than primates and rodents
ddbjpat.*** Category for patents
ddbjphg.*** Category for phages
ddbjpln.*** Category for plants
ddbjpri.*** Category for primates other than human
ddbjrna.*** Category for RNAs
ddbjrod.*** Category for rodents
ddbjsts.*** Category for STS (sequence tagged site)
ddbjsyn.*** Category for synthetic DNAs
ddbjuna.*** Category for unannotated sequences
ddbjvrl.*** Category for viruses
ddbjvrt.*** Category for vertebrates other than mammals
------------------------------------------------------------------------------


Each category then has the following nine files. Note that all the files 
except for ddbj***.seq and ddbj***.sdr may include more than 80 characters in 
one line. If this is the case, the line is folded at every 81th column in the 
file on the distribution tape with the fixed record size of 80 bytes.
------------------------------------------------------------------------------
ddbj***.seq  List of an entry in DDBJ format, see Table 1.
ddbj***.acc  List of the accession numbers, see Table 2 .
ddbj***.aut  List of the authors, see Table 3.
ddbj***.dir  List of the short directory in DDBJ style, see Table 4.
ddbj***.idx  List of indices, see Table 5.
ddbj***.jou  List of the journals, see Table 6.
ddbj***.key  List of the key words, see Table 7.
ddbj***.org  List of the species names, see Table 8.
ddbj***.sdr  List of the short directory in GenBank style, see Table 9.
------------------------------------------------------------------------------


Table 1. Part of the contents in the file 'ddbjbct.seq'.
This shows all pieces of information on one entry in DDBJ format.
------------------------------------------------------------------------------
LOCUS       D87069        993 bp    mRNA            BCT       07-FEB-1999
DEFINITION  Escherichia coli mRNA for RNA polymerase sigma subunit, truncated
            form of sigma-38, complete cds.
ACCESSION   D87069
NID         d1070184
VERSION     D87069.1
KEYWORDS    RNA polymerase sigma subunit, truncated form of sigma-38.
SOURCE      Escherichia coli (strain:W3110) cDNA to mRNA.
  ORGANISM  Escherichia coli
            Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;
            Escherichia.
REFERENCE   1  (bases 1 to 993)
  AUTHORS   Jishage,M.
  TITLE     Direct Submission
  JOURNAL   Submitted (14-AUG-1996) to the DDBJ/EMBL/GenBank databases. Miki
            Jishage, National Institute of Genetics, Molecular Genetics; Yata
            1111, Mishima, Shizuoka 411, Japan (E-mail:mjishage@lab.nig.ac.jp,
            Tel:0559-81-6742, Fax:0559-81-6746)
  STANDARD  full staff_review
REFERENCE   2  (bases 1 to 993)
  AUTHORS   Jishage,M. and Ishihama,A.
  TITLE     Variation in RNA polymerase sigma subunit composition within
            different stocks of Escherichia coli starin W3110
  JOURNAL   Unpublished (1996)
  STANDARD  full staff_review
REFERENCE   3  (sites)
  AUTHORS   Ivanova,A., Renshaw,M., Guntaka,R. and Eisenstark,A.
  TITLE     DNA base sequence variability in katF (putative sigma factor) gene
            Escherichia coli
  JOURNAL   Nucleic Acids Res. 20, 5479-5480 (1992)
  STANDARD  full staff_review
REFERENCE   4  (sites)
  AUTHORS   Takayanagi,Y., Tanaka,K. and Takahashi,H.
  TITLE     Structure of the 5' upstream region and the regulation of the rpoS
            gene of Escherichia coli
  JOURNAL   Mol Gen Genet 243, 525-531 (1994)
  STANDARD  full staff_review
COMMENT     
FEATURES             Location/Qualifiers
     source          1..993
                     /organism="Escherichia coli"
                     /sequenced_mol="cDNA to mRNA"
                     /strain="W3110"
     CDS             1..810
                     /db_xref="PID:d1013928"
                     /note="the gene has four single base changes, resulting
                     in two amino acid substitutions and an amber mutation"
                     /product="RNA polymerase sigma subunit, truncated form of
                     sigma-38"
                     /protein_id="BAA13238.1"
                     /translation="MSQNTLKVHDLNEDAEFDENGVEVFDEKALVEYEPSDNDLAEEE
                     LLSQGATQRVLDATQLYLGEIGYSPLLTAEEEVYFARRALRGDVASRRRMIESNLRLV
                     VKIARRYGNRGLALLDLIEEGNLGLIRAVEKFDPERGFRFSTYATWWIRQTIERAIMN
                     QTRTIRLPIHIVKELNVYLRTARELSHKLDHEPSAEEIAEQLDKPVDDVSRMLRLNER
                     ITSVDTPLGGDSEKALLDILADEKENGPEDTTQDDDMKQSIVKWLFELNAK"
                     /transl_table=11
     mutation        75
                     /citation=[3]
                     /replace="t"
     mutation        97
                     /citation=[3]
                     /replace="t"
     mutation        99
                     /citation=[3]
                     /replace="t"
     mutation        808
                     /citation=[3]
                     /replace="t"
BASE COUNT      254 a    223 c    291 g    225 t      0 others
ORIGIN      
        1 atgagtcaga atacgctgaa agttcatgat ttaaatgaag atgcggaatt tgatgagaac
       61 ggagttgagg tttttgacga aaaggcctta gtagaatatg aacccagtga taacgatttg
      121 gccgaagagg aactgttatc gcagggagcc acacagcgtg tgttggacgc gactcagctt
      181 taccttggtg agattggtta ttcaccactg ttaacggccg aagaagaagt ttattttgcg
      241 cgtcgcgcac tgcgtggaga tgtcgcctct cgccgccgga tgatcgagag taacttgcgt
      301 ctggtggtaa aaattgcccg ccgttatggc aatcgtggtc tggcgttgct ggaccttatc
      361 gaagagggca acctggggct gatccgcgcg gtagagaagt ttgacccgga acgtggtttc
      421 cgcttctcaa catacgcaac ctggtggatt cgccagacga ttgaacgggc gattatgaac
      481 caaacccgta ctattcgttt gccgattcac atcgtaaagg agctgaacgt ttacctgcga
      541 accgcacgtg agttgtccca taagctggac catgaaccaa gtgcggaaga gatcgcagag
      601 caactggata agccagttga tgacgtcagc cgtatgcttc gtcttaacga gcgcattacc
      661 tcggtagaca ccccgctggg tggtgattcc gaaaaagcgt tgctggacat cctggccgat
      721 gaaaaagaga acggtccgga agataccacg caagatgacg atatgaagca gagcatcgtc
      781 aaatggctgt tcgagctgaa cgccaaatag cgtgaagtgc tggcacgtcg attcggtttg
      841 ctggggtacg aagcggcaac actggaagat gtaggtcgtg aaattggcct cacccgtgaa
      901 cgtgttcgcc agattcaggt tgaaggcctg cgccgtttgc gcgaaatcct gcaaacgcag
      961 gggctgaata tcgaagcgct gttccgcgag taa
//
------------------------------------------------------------------------------


Table 2. Part of the contents in the file 'ddbjbct.acc'.
The first column refers to the secondary accession number, second column to 
the locus name, and third to the primary accession number. The primary number 
may be the same as the secondary number. They are arranged in the ascending 
order of the secondary accession numbers.
------------------------------------------------------------------------------
D00001 -> ECOPBPAA   X04516
D00002 -> ECOPYRH    X04469
D00006 -> PNS981TET  D00006
D00020 -> COLE2LYS   D00020
D00021 -> COLE31YS   D00021
D00038 -> BRLAM330   D00038
D00066 -> BAC139AC   D00066
D00067 -> ECONANA    M20207
D00069 -> ECOUVRD2   D00069
D00087 -> BACXYNAA   D00087
------------------------------------------------------------------------------


Table 3. Part of the contents in the file 'ddbjbct.aut'.
For each author name given on the left to the arrow, the corresponding locus 
name and primary accession number are respectively listed on the right. They 
are arranged in the alphabetical order of the author names.
------------------------------------------------------------------------------
Aan,F. -> STYCRR     X05210
Aan,F. -> STYENZI    M76176
Aaronson,W. -> ECOKPSD    M64977
Aaronson,W. -> ECONEUA    J05023
Abad-Lapuebla,M.A. -> VIBTDHI    D90238
Abdel-Mawgood,A.L. -> CYAPSBHA   X16394
Abdel-Meguid,S.S. -> TRNGDRECM  J01843
Abdelal,A. -> STYCARA    M36540
Abdelal,A. -> STYCARAB   X13200
Abdelal,A.H. -> PSENOSA    M60717
------------------------------------------------------------------------------


Table 4. Part of the short directory in DDBJ style in the file 'ddbjbct.dir'.
For each locus name given in the first column, the corresponding primary 
accession number, molecular type, number of nucleotide pairs, and description 
for the locus are respectively listed. They are arranged in the alphabetical 
order of the locus names.
------------------------------------------------------------------------------
ABCAARAA   M34830 ds-DNA    1624 A.aceti acetic acid resistance protein (aarA)
gene, complete cds.
ABCADHCC   D00635 ds-DNA    4230 A. polyoxogenes alcohol dehydrogenase (EC 
1.1.99.8) and cytochrome c genes.
ABCALDH    D00521 ds-DNA    2683 A.polyoxogenes membrane-bound aldehyde 
dehydrogenase gene, complete cds and flanks.
ABCBCSAA   M37202 ds-DNA    9540 A.xylinum bcs B, bcs C and bcs D genes, 
complete cds and bcs A gene, partial cds.
ABCCELA    M76548 ds-DNA    1165 Acetobacter xylinum UDP pyrophosphorylase 
(celA) gene, complete cds.
ABCCELSYN  X54676 ds-DNA    5363 A. xylinum gene for cellulose biosynthesis
ABCIS1380  D10043 ds-DNA    1665 A.pasteurianus insertion sequence IS1380.
ACAADH1    D90004 ds-DNA    2467 Acetobacter aceti(K6033) alcohol dehydrogenase
subunit gene(adh1).
ACCAAC2    M62833 ds-DNA    1123 Acinetobacter baumannii aminoglycoside 
acetyltr ansferase (aac2) gene, complete cds.
ACCACEAA   M62822 ds-DNA    1874 A.baumannii chloramphenicol acetyltransferase
(cat) gene, complete cds.
------------------------------------------------------------------------------


Table 5. Part of the contents in the file 'ddbjbct.idx'.
The first column refers to the locus name, second column to the starting site 
of the locus in byte, and third to its ending site in byte. They are arranged 
in the alphabetical order of the locus names.
------------------------------------------------------------------------------
%*****************************
#ABCAARAA       0       3211
#ABCADHCC       3212    10608
#ABCALDH        10609   15864
#ABCBCSAA       15865   29583
#ABCCELA        29584   32289
#ABCCELSYN      32290   40960
#ABCIS1380      40961   44711
#ACAADH1        44712   49357
#ACCAAC2        49358   52395
------------------------------------------------------------------------------


Table 6. Part of the contents in the file 'ddbjbct.jou'.
This gives information on the journal in which sequence data were published.
------------------------------------------------------------------------------
(in) Chaloupka,J. and Krumphanzl,V. (Eds.); Extracellular Enzymes of 
Microorganisms:  129-137, Plenum Press, New York (1987) -> BACAMYABS  M57457
(in) Ganesan,A.T., Chang,S. and Hoch,J.A. (Eds.); Molecular Cloning and Gene 
Regulation in Bacilli:  3-10, Academic Press, New York (1982) -> BACRG16S   
M55011
(in) Ganesan,A.T., Chang,S. and Hoch,J.A. (Eds.); Molecular Cloning and Gene 
Regulation in Bacilli:  3-10, Academic Press, New York (1982) -> BACRG16SA  
M55006
(in) Ganesan,A.T., Chang,S. and Hoch,J.A. (Eds.); Molecular Cloning and Gene 
Regulation in Bacilli:  3-10, Academic Press, New York (1982) -> BACRG16SB  
M55008
(in) Hoch,J.A. and Setlow,P. (Eds.); Molecular Biology of Microbial 
Differentiation:  85-94, American Society for Microbiology, Washington, DC 
(1985) -> BACSPOII   M57606
(in) Holmgren,A. (Ed.); Thioredoxin and Glutaredoxin Systems: Structure and 
Function: 11-19, Unknown name, Unknown city (1986) -> ECOTRXA1   M54881
(in) Kjeldgaard,N.C. and Maaloe,O. (Eds.); Control of ribosome synthesis:  
138-143, Academic Press, New York (1976) -> ECOLAC     J01636
(in) Losick,R. and Chamberlin,M. (Eds.); RNA polymerase:  455-472, Cold 
Spring Harbor Laboratory, Cold Spring Harbor, NY (1976) -> ECOTGY1    K01197
(in) Sikes,C.S. and Wheeler,A.P. (Eds.); Surface reactive peptides and 
polymers. Discovery and commercialization.:  186-200, American Chemical 
Society, Washington, D.C. (1991) -> ECOTGP     J01714
(in) Sund,H. and Blauer,G. (Eds.); Protein-Ligand Interactions:  193-207, 
Walter de Gruyter, New York (1975) -> ECOLAC     J01636
(in) Wu,R. and Grossman,L. (Eds.); Methods in Enzymology, Recombinant DNA, 
part E:  In press, Academic Press, New York, N.Y. (1986) -> PLMCG      M11320
Acta Microbiol. Pol. 35, 175-190 (1986) -> ECOTGG1    M54893
Actinomycetologica 5, 14-17 (1991) -> STMARGG    D00799
Adv. Biophys. 21, 115-133 (1986) -> R10REP     M26840
Adv. Biophys. 21, 175-192 (1986) -> ECONUSAA   M26839
Adv. Enzyme Regul. 21, 225-237 (1983) -> ECOPURFA   M26893
Adv. Exp. Med. Biol. 195, 239-246 (1986) -> ECOAPT     M14040
Agric. Biol. Chem. 50, 2155-2158 (1986) -> ECONANA    M20207
Agric. Biol. Chem. 50, 2771-2778 (1986) -> BRLAM330   D00038
Agric. Biol. Chem. 51, 2019-2022 (1987) -> BACCGT     D00129
Agric. Biol. Chem. 51, 2641-2648 (1987) -> STRSAGP    D00219
Agric. Biol. Chem. 51, 2807-2809 (1987) -> BACPGECR   M35503
Agric. Biol. Chem. 51, 3133-3135 (1987) -> BACXYLAP   D00312
Agric. Biol. Chem. 51, 455-463 (1987) -> BACHDCRY   D00117
Agric. Biol. Chem. 51, 953-955 (1987) -> BACXYNAA   D00087
Agric. Biol. Chem. 52, 1565-1573 (1988) -> BACIP135   D00348
Agric. Biol. Chem. 52, 1785-1789 (1988) -> BACTMR     D00343
Agric. Biol. Chem. 52, 2243-2246 (1988) -> PSEGI      D00342
Agric. Biol. Chem. 52, 399-406 (1988) -> BACAMYEB   M35517
Agric. Biol. Chem. 52, 479-487 (1988) -> ECAPALI    D00217
------------------------------------------------------------------------------


Table 7. Part of the contents in the file 'ddbjbct.key'.
For the locus and accession number respectively given on the right to the 
arrow, the corresponding key words are listed on the left. 
------------------------------------------------------------------------------
A.aceti acetic acid resistance protein (aarA) gene, complete cds.       -> 
ABCAARAA     M34830
acetic acid resistance protein.         -> ABCAARAA     M34830
Cloning of genes responsible for acetic acid resistance in acetobacter aceti   
-> ABCAARAA      M34830
A. polyoxogenes alcohol dehydrogenase (EC 1.1.99.8) and cytochrome c genes.    
-> ABCADHCC      D00635
alcohol dehydrogenase; cytochrome c.    -> ABCADHCC     D00635
Cloning and sequencing of the gene cluster encoding two subunits of membrane-
bound alcohol dehydrogenase from Acetobacter polyoxogenes  -> ABCADHCC     
D00635
These data kindly submitted in computer readable form by: Toshimi Tamaki 
Nakano Central Biochemical Institute 2-6 Nakamura-cho Handa-shi, Aichi-ken 
475 Japan Phone: 0569-21-3331 Fax: 0569-23-8486     -> ABCADHCC     D00635
A.polyoxogenes membrane-bound aldehyde dehydrogenase gene, complete cds and 
flanks.     -> ABCALDH      D00521
aldehyde dehydrogenase gene; ethanol oxidation; membrane-bound enzyme.  -> 
ABCALDH      D00521
Nucleotide sequence of the membrane-bound aldehyde dehydrogenase gene from 
Acetobacter polyoxogenes     -> ABCALDH      D00521
------------------------------------------------------------------------------


Table 8. Part of the contents in the file 'ddbjbct.org'.
For the locus and accession number respectively given on the right to the 
arrow, the corresponding taxonomic names are listed on the left.  They are 
arranged in the alphabetical order of the species names.
------------------------------------------------------------------------------
A. nidulans 6301 DNA. Anacystis nidulans Prokaryota; Bacteria; Gracilicutes; 
Oxyphotobacteria; Cyanobacteria.   -> ANIRUBPS     X00019
A. nidulans DNA, clone pAN4. Anacystis nidulans Prokaryota; Bacteria; 
Gracilicutes; Oxyphotobacteria; Cyanobacteria.    -> ANIRGGX      X00343
A. nidulans DNA. Anacystis nidulans Prokaryota; Bacteria; Gracilicutes; 
Oxyphotobacteria; Cyanobacteria.        -> ANIRGG       X00512
A. polyoxogenes genomic DNA. Acetobacter polyoxogenes Prokaryota; Bacteria; 
Gracilicutes; Scotobacteria; Aerobic rods and cocci; Azotobacteraceae.      -
> ABCADHCC     D00635
A. quadruplicatum (strain PR-6) DNA, clone pAQPR1. Agmenellum quadruplicatum 
Prokaryota; Bacteria; Gracilicutes; Oxyphotobacteria; Cyanobacteria.       -> 
AQUPCAB      K02660
A. quadruplicatum (strain PR6) DNA. Agmenellum quadruplicatum Prokaryota; 
Bacteria; Gracilicutes; Oxyphotobacteria; Cyanobacteria.      -> AQUCPCAB     
K02659
A. vinelandii DNA. Azotobacter vinelandii Prokaryota; Bacteria; Gracilicutes; 
Scotobacteria; Aerobic rods and cocci; Azotobacteraceae.  -> AVINIFUSV    
M17349
A.aceti (strain 10-8) DNA, clone pAR1611. Acetobacter aceti Prokaryota; 
Bacteria; Gracilicutes; Scotobacteria; Aerobic rods and cocci; 
Azotobacteraceae.       -> ABCAARAA      M34830
A.actinomycetemcomitans (strain JP2) DNA, clone lambda-OP8. Actinobacillus 
actinomycetemcomitans Prokaryota; Bacteria; Gracilicutes; Scotobacteria; 
Facultatively anaerobic rods; Pasteurellaceae.      -> ACNLKTXN     M27399
A.anitratum DNA, clone pLJD1. Acinetobacter anitratum Prokaryota; Bacteria; 
Gracilicutes; Scotobacteria; Neisseriaceae.         -> ACCCITSYN    M33037
------------------------------------------------------------------------------


Table 9. Part of the short directory file in GenBank style in the file 
'ddbjbct.sdr'.
The short directory file contains brief descriptions of all of the sequence 
entries contained in the GenBank style. 
------------------------------------------------------------------------------
ABCAARAA    A.aceti acetic acid resistance protein (aarA) gene, complete 1624bp
ABCADHCC    A. polyoxogenes alcohol dehydrogenase (EC 1.1.99.8) and      4230bp
ABCALDH     A.polyoxogenes membrane-bound aldehyde dehydrogenase gene,   2683bp
ABCBCSABCD  A.xylinum bcs A, B, C and D genes, complete cds's.           9540bp
ABCCELA     Acetobacter xylinum UDP pyrophosphorylase (celA) gene,       1165bp
ABCCELSYN   A. xylinum gene for cellulose biosynthesis                   5363bp
ABCIS1380   A.pasteurianus insertion sequence IS1380.                    1665bp
ACAADH1     Acetobacter aceti(K6033) alcohol dehydrogenase subunit       2467bp
ACCAAC2     Acinetobacter baumannii aminoglycoside acetyltransferase     1123bp
ACCACEAA    A.baumannii chloramphenicol acetyltransferase (cat) gene,    1874bp
ACCAPHA6    Acinetobacter baumannii aphA-6 gene.                         1170bp
ACCBENABCA  A.calcoaceticus BenA, BenB, BenC, BenD, and BenE proteins   15922bp
ACCCAT      Acinetobacter calcoaceticus cat operon.                     15922bp
ACCCATAM    A.calcoaceticus catA and catM genes, encoding catechol 1,    5537bp
ACCCHMO     Acinetobacter sp. cyclohexanone monooxygenase gene, complete 2128bp
ACCCITSYN   A.anitratum citrate synthase gene, complete cds.             1895bp
------------------------------------------------------------------------------


In addition to the 9 tables the five following index files are included in 
this release. These files were prepared irrespective of the 14 categories of 
taxonomic divisions.

 Accession number index file
 Keyword phrase index file
 Author name index file
 Journal citation index file
 Gene name index file

A brief description is given for each file in the following.


Table 10. Part of the accession number index file in the 'ddbjacc.idx'.
The following excerpt from the accession number index file illustrates the 
format of the index. Note that as mentioned above there are such a case where 
an accession number for a taxonomic category is the same  as that for EST or 
ORG; for example, PRI D12345 and EST D12345 under the same accession number 
D12345.
------------------------------------------------------------------------------
M33790       SHFINVEA   BCT M33790
M33791       BACORF2    BCT M33791
M33792       FTRCPRBCLC ORG X55829 FTRCPRBCLC PLN X55829
M33793       FTRCPPRBCL ORG X55830 FTRCPPRBCL PLN X55830
M33794       ATPCPARRBC ORG X55831 ATPCPARRBC PLN X55831 ATPCPRBCLB ORG X15925
             ATPCPRBCLB PLN X15925
M33796       NRACPNTRBC ORG X55827 NRACPNTRBC PLN X55827
M33797       NRACPRBCL  ORG X55828 NRACPRBCL  PLN X55828
M33798       ACCPCACGH  BCT M33798
M33799       PSETRPEGDC BCT M33799
------------------------------------------------------------------------------


Table 11. Part of the keyword phrase index file in the 'ddbjkey.idx'.
Keyword phrases consist of names for gene products and other characteristics 
of sequence entries. 
------------------------------------------------------------------------------
A CHANNEL
             DROCHA     INV M17155
A COMPONENT
             SQLCVEA    VRL M38183
A LOCUS
             GORGOGOA3  PRI X54375 GORGOGOA4  PRI X54376
A LOCUS ALLELE
             GORA0101   PRI X60258 GORA0201   PRI X60259 GORA0401   PRI X60257
             GORA0501   PRI X60256
A MULTI-GENE FAMILY
             RICGLUTE   PLN D00584
A PROTEIN
             MS2AAR     PHG M25187 ST1APCS    PHG M25396
A SEQUENCE
             HS5TOA30   VRL D00148 HS5TOA31   VRL D00147
------------------------------------------------------------------------------


Table 12. Part of the author name index file in 'ddbjaut.idx'.
The author name index file lists all of the author names that appear in the 
citations. 
------------------------------------------------------------------------------
ABE,A.
             HUMMHDRBWE PRI M27509 HUMMHDRBWF PRI M27510 HUMMHDRBWG PRI M27511
             YSCGAL11A  PLN M22481
ABE,C.
             S85445     BCT S85445
ABE,E.
             M23442     UNA M23442
ABE,H.
             CHKADF     VRT M55660 CHKCOF     VRT M55659
ABE,K.
             CHPCLAC    PRI D11383 CHPIMRF    PRI D11384 CUGCUR09   PLN X64110
             CUGCUR37   PLN X64111 HPCCEXPA   VRL M55970 HPCCPEP1   VRL D10687
             HPCCPEP2   VRL D10688 HPCHABC82  VRL X51587 HPCNS2APA  VRL M55972
             HPCNS2PA   VRL M55971 HPCNS2PB   VRL M55973 HPCNS5PA   VRL M55974
             MUSKE2     ROD M65255 MUSKE2A    ROD M65256 MZECYS     PLN D10622
             RICCPI     PLN J03469 RICGLUTE   PLN D00584 RICLNOCI   PLN J05595
             RICOCS     PLN M29259 RICORYII   PLN X57658 RICOZA     PLN D90406
             RICOZB     PLN D90407 RICOZC     PLN D90408 S54524     PLN S54524
             S54526     PLN S54526 S54530     PLN S54530 S73960     ROD S73960
------------------------------------------------------------------------------


Table 13. Part of the journal citation index file in 'ddbjjou.idx'.
The journal citation index file lists all of the citations that appear in the 
references. 
------------------------------------------------------------------------------
ACTA BIOCHIM. BIOPHYS. SIN. 23, 246-253 (1992)
             HUMPLASINS PRI M98056
ACTA BIOCHIM. POL. 24, 301-318 (1977)
             LUPTRFJ    RNA K00345 LUPTRFN    RNA K00346
ACTA BIOCHIM. POL. 26, 369-381 (1979)
             BLYTRNPHE  PLN X02683
ACTA BIOCHIM. POL. 29, 143-149 (1982)
             EMEMTA     ORG M32572 EMEMTA     PLN M32572 EMEMTB     ORG M32573
             EMEMTB     PLN M32573 EMEMTC     ORG M32574 EMEMTC     PLN M32574
             EMEMTD     ORG M32575 EMEMTD     PLN M32575 EMEMTE     ORG M32576
             EMEMTE     PLN M32576
ACTA BIOCHIM. POL. 34, 21-27 (1987)
             LUPNOSP    PLN M32571
------------------------------------------------------------------------------


Table 14. Part of the gene name index file in 'ddbjgen.idx'.
This file lists all the gene names that appear in the feature table.
------------------------------------------------------------------------------
AACC8
             STMAACC8   BCT M55426
AACC9
             MPUAACC9   BCT M55427
AACT
             HUMA1ACM   PRI K01500 HUMA1ACMA  PRI X00947 HUMA1ACMB  PRI M18035
             HUMAACT1   PRI M18906 HUMAACT2   PRI M22533 HUMAACTA   PRI J05176
AAD
             INTINTORF  BCT L06418 LMOMO229D  BCT X17478
AAD A1
             ENTAAC3VI  BCT M88012
AAD9
             ENEAAD9A   BCT M69221
AADA
             LMOMO229A  BCT X17479 S52249     BCT S52249 SYNAADA    SYN M60473
             TRNTAAB    BCT M55547 TRNTN21CAS BCT M86913
------------------------------------------------------------------------------


The files in this release are arranged in the following order with non-
labeled format.

Release note
    ddbjrel.txt       762 records
Category for bacteria, 54199 entries, 133124032 bases
    ddbjbct.seq      5822424 records
Category for EST1 (expressed sequence tag), 280082 entries, 97699959 bases
    ddbjest1.seq     14777927 records
Category for EST2 (expressed sequence tag), 178588 entries, 65474371 bases
    ddbjest2.seq     11002619 records
Category for EST3 (expressed sequence tag), 215943 entries, 80209486 bases
    ddbjest3.seq     12614576 records
Category for EST4 (expressed sequence tag), 100000 entries, 37323377 bases
    ddbjest4.seq      6086798 records
Category for EST5 (expressed sequence tag), 100000 entries, 41141056 bases 
    ddbjest5.seq      5983967 records
Category for EST6 (expressed sequence tag), 100000 entries, 37300681 bases 
    ddbjest6.seq     6028411 records
Category for EST7 (expressed sequence tag), 100000 entries, 32462843 bases
    ddbjest7.seq     6245303 records
Category for EST8 (expressed sequence tag), 100000 entries, 38755120 bases 
    ddbjest8.seq     6016934 records
Category for EST9 (expressed sequence tag), 100000 entries, 40190676 bases 
    ddbjest9.seq      5939557 records
Category for EST10 (expressed sequence tag), 100000 entries, 38979091 bases 
    ddbjest10.seq     5938137 records
Category for EST11 (expressed sequence tag), 100000 entries, 38770434 bases 
    ddbjest11.seq     5965946 records
Category for EST12 (expressed sequence tag), 100000 entries, 39127203 bases 
    ddbjest12.seq     5946115 records
Category for EST13 (expressed sequence tag), 100000 entries, 38788225 bases 
    ddbjest13.seq     5909848 records
Category for EST14 (expressed sequence tag), 100000 entries, 42111282 bases 
    ddbjest14.seq     5939888 records
Category for EST15 (expressed sequence tag), 100000 entries, 42882792 bases 
    ddbjest15.seq     5954539 records
Category for EST16 (expressed sequence tag), 100000 entries, 40268241 bases 
    ddbjest16.seq     5569363 records
Category for EST17 (expressed sequence tag), 100000 entries, 40950366 bases 
    ddbjest17.seq     5785621 records
Category for EST18 (expressed sequence tag), 92224 entries, 39976563 bases 
    ddbjest18.seq     4976917 records
Category for GSS1 (Genome Survey Sequence), 100000 entries, 48560099 bases
    ddbjgss1.seq      4811428 records
Category for GSS2 (Genome Survey Sequence), 100000 entries, 41195177 bases
    ddbjgss2.seq      5053169 records
Category for GSS3 (Genome Survey Sequence), 100000  entries, 48308206 bases
    ddbjgss3.seq      5202085 records
Category for GSS4 (Genome Survey Sequence), 100000 entries, 51953303 bases
    ddbjgss4.seq      5568733 records
Category for GSS5 (Genome Survey Sequence), 100000 entries, 48662222 bases
    ddbjgss5.seq      5308952 records
Category for GSS6 (Genome Survey Sequence), 18245 entries, 8519628 bases
    ddbjgss6.seq      858905 records
Category for HTG (high throughput genomic sequencing), 1758 entries, 241834192 
bases
    ddbjhtg.seq      4133394 records
Category for human, 91121 entries, 358634878 bases
    ddbjhum.seq      10878989 records
Category for invertebrates, 41925 entries, 158325369 bases
    ddbjinv.seq      4992398 records
Category for mammals, 17687 entries, 16485760 bases
    ddbjmam.seq       1040844 records
Category for patents, 134612 entries, 42349047 bases
    ddbjpat.seq      3811049 records
Category for phages, 1394 entries, 3033907 bases
    ddbjphg.seq       147987 records
Category for plants, 68570 entries, 155968956 bases
    ddbjpln.seq     6135870 records
Category for primates, 4977 entries, 3800669 bases
    ddbjpri.seq       276857 records
Category for RNAs, 4883 entries, 2480449 bases
    ddbjrna.seq      204799 records
Category for rodents, 45407 entries, 64541563 bases
    ddbjrod.seq      3193355 records
Category for STS (sequence tagged site), 64115 entries, 22573045 bases
    ddbjsts.seq     3823501 records
Category for synthetic DNAs, 3238 entries, 7336707 bases
    ddbjsyn.seq       277157 records
Category for unannotated sequences, 785 entries, 652152 bases
    ddbjuna.seq      39569 records
Category for viruses, 65827 entries, 59257968 bases
    ddbjvrl.seq     4124784 records
Category for vertebrates, 26047 entries, 25252856 bases
    ddbjvrt.seq      1554471 records
Accession number index file
    ddbjacc.idx     3331500 records
Keyword phrase index file
    ddbjkey.idx      1325679 records
Journal citation index file
    ddbjjou.idx      1861823 records
Gene name index file
    ddbjgen.idx      300034 records