DDBJ Amino Acid Sequence Database (DAD) Release 76.0, Sep. 2016, including 53,463,317 entries, 16,696,790,839 residues Last published date in the present release: August 26, 2016 ------------------------------------------------------------------------------- Table of contents ------------------------------------------------------------------------------- 1. Introduction 1.1. Announcement for changes in the present release 1.2. Announcement for the forthcoming changes 2. Format of DAD entries 3. DAD categories 4. Citation 5. Contact information 6. Disclaimer 7. DAD file categories 8. A sample of DAD entries 9. Release history 10. Statistics of DAD ------------------------------------------------------------------------------- 1. Introduction This is release 76.0 of DDBJ Amino Acid Sequence Database (DAD). This database has been produced by extracting all translated sequences from the DDBJ periodical release 106.0 and TPA dataset (August 2016). 1.1. Announcement for changes in the present release Nothing particular. 1.2. Announcement for the forthcoming changes Nothing particular. 2. Format of DAD entries The standard format of DAD is almost the same as that of the DDBJ nucleotide sequence database except for those described below. Accession numbers of the DAD entries are written in the lines labeled as "ACCESSION." An accession number of DAD is comprised of a DDBJ accession number and an integer that begins with 1. These two numbers are combined by a hyphen (-). For example, two amino acid sequences extracted from a DDBJ entry D12345 respectively have accession numbers of D12345-1 and D12345-2. The number is useful for identifying a DAD entry. An amino acid sequence begins from the next line of "BEGIN." Up to sixty amino acids are written in one line. Following the amino acid sequence, there is a double slash (//) which means the end of the entry. LOCUS line contains locus name, length of protein, molecular type (this is always "PRT"), division name, and date of release of DNA counterpart. DEFINITION line contains species name and protein name. The other parts of a DAD entry, including FEATURES, are almost the same as those of the corresponding DDBJ entry. 3. DAD categories DAD entries are classified into 23 categories, adding TPA and TPACON to the 21 categories of DDBJ periodical release. Please refer to the release note of the DDBJ release for details (filename: ddbjrel.txt). Also, there are two types of DAD files for each division; files with suffix ".DAD" in the DAD standard format, and those with suffix ".DAD.fasta" in a FASTA-compatible format. [DDBJ release note] ftp://ftp.ddbj.nig.ac.jp/ddbj_database/ddbj/ddbjrel.txt 4. Citation When you use DAD in your research, we would appreciate it if you would include a reference to DDBJ in your publications related to your research. When citing an entry in the DAD database, it is appropriate to give the protein_id and its accession number. Also, it is recommended to cite the first publication in REFERENCE of the entry other than submitter information. DDBJ suggests authors add a reference to DDBJ itself. The following publication, which describes the recent activities of the DDBJ center, would be appropriate to be cited: Mashima J, Kodama Y, Kosuge T, Fujisawa T, Katayama T, Nagasaki H, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y and Takagi T. DNA data bank of Japan (DDBJ) progress report. Nucleic Acids Res. 44 (Database issue), D51-D57 (2016) DOI: 10.1093/nar/gkv1105 The following sentence is an example to cite an entry in the DAD database: ----------------------------------------------------------------------------- "We searched the DAD database (1) by sequence similarities and found an amino acid sequence (2), with protein_id BAA22986.1 in DDBJ accession number AB000714, which had significant similarity with ..." (1) Mashima, J. et al, Nucleic Acids Res. 44(Database issue), D51-D57 (2016). (2) Katahira, J. et al, J. Biol. Chem. 272, 26652-26658 (1997). ------------------------------------------------------------------------------ 5. Contact information DNA Data Bank of Japan DDBJ Center National Institute of Genetics Research Organization of Information and Systems Mishima 411-8540, Japan Phone: +81 55 981 6853 FAX: +81 55 981 6849 E-mail: ddbj@ddbj.nig.ac.jp WWW: http://www.ddbj.nig.ac.jp/ 6. Disclaimer While DDBJ endeavors to keep its data correct, DDBJ makes no representations or warranties of any kind about the completeness, accuracy or reliability with respect to the entries contained in the DAD periodical release. DDBJ also makes no legal liability or responsibility of merchantability or fitness for a particular purpose or that the use of the sequence data will not infringe any patent or other rights. Any receipt, reliance or use you place on such data is therefore strictly at your own risk. 7. DAD file categories This release covers 23 categories (see also '3. DAD categories'.) of organisms and others as follows: ------------------------------------------------------------------------------ ddbjbct; Category for bacteria ddbjcon; Category for CON (contigs) ddbjenv; Category for ENV (environmental samples) ddbjest; Category for EST (expressed sequence tags) ddbjgss; Category for GSS (genome survey sequences) ddbjhtc; Category for HTC (high throughput cDNA sequences) ddbjhtg; Category for HTG (high throughput genomic sequences) ddbjhum; Category for human ddbjinv; Category for invertebrates ddbjmam; Category for mammals other than primates and rodents ddbjpat; Category for patents ddbjphg; Category for phages ddbjpln; Category for plants ddbjpri; Category for primates other than human ddbjrod; Category for rodents ddbjsts; Category for STS (sequence tagged sites) ddbjsyn; Category for synthetic DNAs ddbjtpa; Category for TPA (third party annotations) ddbjtpacon; Category for CON (contigs) of TPA (third party annotations) ddbjtsa; Category for TSA (transcriptome shotgun assemblies) ddbjuna; Category for unannotated sequences ddbjvrl; Category for viruses ddbjvrt; Category for vertebrates other than mammals ------------------------------------------------------------------------------ All of above in the present release are recorded in ddbj***##.DAD files as follows, respectively. file prefix number of files ------------------------------- ddbjbct 54 ddbjcon 44 ddbjenv 2 ddbjest 1 ddbjgss 1 ddbjhtc 1 ddbjhtg 1 ddbjhum 2 ddbjinv 5 ddbjmam 1 ddbjpat 1 ddbjphg 1 ddbjpln 6 ddbjpri 1 ddbjrod 1 ddbjsts 1 ddbjsyn 1 ddbjtpa 1 ddbjtpacon 1 ddbjtsa 1 ddbjuna 1 ddbjvrl 5 ddbjvrt 2 ------------------------------- 8. A sample of DAD entries Below is a typical DAD entry. This might be useful for understanding its format and contents. ----- ----- ----- ----- sample begin ----- ----- ----- ----- LOCUS BAA22986.1 220 aa PRT HUM 28-OCT-1997 DEFINITION Homo sapiens RVP1 protein. ACCESSION AB000714-1 PROTEIN_ID BAA22986.1 SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryotae; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 1250) AUTHORS Katahira,J. TITLE Direct Submission JOURNAL Submitted (26-JAN-1997) to the DDBJ/EMBL/GenBank databases. Contact:Jun Katahira Institute for Microbial Diseases, Osaka University, Department of Bacterial Toxinology; 3-1, Yamadaoka, Suita, Osaka 565, Japan REFERENCE 2 AUTHORS Katahira,J., Sugiyama,H., Inoue,N., Horiguchi,Y., Matsuda,M. and Sugimoto,N. TITLE Clostridium perfringens enterotoxin utilizes two structurally related membrane proteins as functional receptors in vivo JOURNAL J. Biol. Chem. 272, 26652-26658 (1997) COMMENT FEATURES Qualifiers source /db_xref="H-InvDB:HIT000057926" /mol_type="mRNA" /organism="Homo sapiens" /tissue_lib="lung" protein /gene="hRVP1" /transl_table=1 BEGIN 1 MSMGLEITGT ALAVLGWLGT IVCCALPMWR VSAFIGSNII TSQNIWEGLW MNCVVQSTGQ 61 MQCKVYDSLL ALPQDLQAAR ALIVVAILLA AFGLLVALVG AQCTNCVQDD TAKAKITIVA 121 GVLFLLAALL TLVPVSWSAN TIIRDFYNPV VPEAQKREMG AGLYVGWAAA ALQLLGGALL 181 CCSCPPREKK YTATKVVYSA PRSTGPGASL GTGYDRKDYV // ----- ----- ----- ----- sample end ----- ----- ----- ----- 9. Release history ------------------ Since release 50 ------------------ The format of the SOURCE line in DAD flat file has been changed: As results of this change, 1) the order of organism name and organelle name is changed and 2) some of DAD flat files have included a common name like as GenBank flat files. The change is shown below in detail. ---------------- Old (-rel. 49) ---------------- Format: SOURCE [] Example: SOURCE Homo sapiens mitochondrion ---------------- New (rel. 50-) ---------------- Format: SOURCE [] [()] Example: SOURCE mitochondrion Homo sapiens (human) See also '8. A sample of DAD entries'. ------------------ Since release 45 ------------------ A new division, TSA (Transcriptome Shotgun Assembly) is started: A new division for assembled mRNA sequences, Transcriptome Shotgun Assembly (TSA), is included in the present release. With new sequencing technologies, INSDC has faced many requests to accept assembled EST sequences. These sequence data have become more useful than used to be, although they may not be correctly assembled or exist in nature. Therefore, INSDC decided to collect assembled EST sequences into the new division 'TSA'. TSA sequences are shotgun assemblies of primary sequences deposited in the EST division of INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA). Two specific keywords, "TSA" and "Transcriptome Shotgun Assembly", are present in all TSA entries. The new division code, "TSA", is also described in the LOCUS line in all TSA entries. No format changes are anticipated for this new division, however, note that TSA entries make use of the same PRIMARY line that is described for the entries in TPA category. The PRIMARY block contains references to the underlying reads/transcripts that were assembled to construct a TSA record. ------------------ Since release 42 ------------------ Deletion of E-mail address, phone and fax numbers from DAD flat file To follow the Japanese law of protecting personal information, DDBJ delete both phone and fax numbers, and E-mail address from the flat files of entries submitted to DDBJ. Also, it would be helpful to protect DAD releases against SPAM mail senders. DDBJ retrofitted most of all entries submitted to DDBJ, not to GenBank or EMBL, by the DDBJ periodical release 72. Before the DAD periodical release 42, the submitter information was described in JOURNAL line at REFERENCE 1 as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Taro Mishima, DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan (E-mail:ddbj@ddbj.nig.ac.jp, URL:http://www.ddbj.nig.ac.jp/, Tel:81-12-345-6789, Fax:81-12-345-9876) ------------------------------------------------------------------------------- After the deletion or the information in question, DAD flat file is either one of the following two types; Type 1: Phone and fax numbers and E-mail address are deleted. ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ ------------------------------------------------------------------------------- Type 2: When the submitters wish to keep their contact information disclosed, it is described as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ E-mail :ddbj@ddbj.nig.ac.jp Phone :81-12-345-6789 Fax :81-12-345-9876 ------------------------------------------------------------------------------- ------------------ Since release 40 ------------------ The CON division has been included. CON; Contig / Constructed To conjugate a series of entries, such as those submitted from a genome project, each of the three data banks constructs an entry and assign an accession number to a large scale sequence dataset. Such entries are classified into the CON division. ------------------ Since release 38 ------------------ From the present release, we change the maximum file size to 1.5 GB, because the network capacity has been remarkably increased. Each file named as ddbj***##.DAD has at most 1.5 GB storage capacity. See also the sections, '9. Statistics of DAD'. ------------------ Since release 32 ------------------ Introduction of ENV division : Recently, the submissions of the sequences derived from environmental samples have rapidly increased. To accommodate such submissions, a new division, ENV, has been created. This division contains the sequences obtained via direct molecular isolation such as PCR, DGGE, or any anonymous method. In the past, the sequences derived from environmental samples belonged to taxonomic divisions, mainly BCT. At DDBJ, the retrofit to transfer relevant entries from taxonomic divisions to the ENV division starts in the present release, and ends by the next periodical release. Please note that during this transitional period, some entries to be eventually placed in the ENV division will be found in other divisions. ------------------ Since release 30 ------------------ "H-InvDB" has been added to db_xref(cross-reference) as a qualifier key. The following is an example. FEATURES Location/Qualifiers source 1..5589 /clone="hf00223s1" /clone_lib="pBluescriptII SK plus" /db_xref="H-InvDB:HIT000000001" ------------------ Since release 29 ------------------ The GSS division has been included since release 29. GSS stands for the Genome Survey Sequence, which is similar to EST, except that GSS is genomic DNA whereas EST is cDNA. ------------------ Since release 21 ------------------ 1) Some information on introns has been added. It is given as "intron_pos" in the Feature/Qualifiers. Examples: intron_pos 142:1 (2/12) means that the 2nd intron among 12 in total is located between the 1st and 2nd bases of the 142th codon (amino acid residue). intron_pos 228:0 (4/12) means that the 4th intron among 12 in total is located between the 227th and 228th codons (between the 3rd base of the 227th codon and the 1st base of the 228th codon). 2) the Locus line has been changed. The following is an example and its explanation: LOCUS BAA21794.1 263 aa PRT BCT 05-FEB-1999 Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-28 Locus name 29-29 space 30-40 Length of sequence, right-justified 41-41 space 42-43 'aa' 44-47 spaces 48-53 'PRT' 54-64 spaces 65-67 Division code 68-68 space 69-79 Date, in the form DD-MMM-YYYY (e.g., 15-MAR-1991) --------------------- 3) TPA data have been provided in a separate file (ddbjtpa.DAD). 10. Statistics of DAD The followings are statistics of this release of DAD. total number of entries 53,463,317 total length of sequences 16,696,790,839 average length 312 name of longest sequence CP000108-608 PID:ABB27887.1 length of longest sequence 36,805 aa (CP000108-608) ========================================================================= file name no. of entries no. of amino acids file size ========================================================================= ddbjbct1.DAD 326954 99020778 1468006868 ddbjbct2.DAD 498609 151455353 1468026852 ddbjbct3.DAD 578816 183019492 1468008932 ddbjbct4.DAD 596864 180625473 1468008397 ddbjbct5.DAD 456014 146075574 1468010151 ddbjbct6.DAD 439180 139618631 1468009598 ddbjbct7.DAD 423409 132035039 1468006885 ddbjbct8.DAD 457290 141594540 1468010239 ddbjbct9.DAD 399808 125852852 1468007957 ddbjbct10.DAD 340146 109468572 1468008337 ddbjbct11.DAD 387842 120624118 1468009511 ddbjbct12.DAD 375345 118857224 1468011760 ddbjbct13.DAD 382721 121560899 1468010851 ddbjbct14.DAD 423641 134210368 1468010695 ddbjbct15.DAD 438508 138364647 1468007369 ddbjbct16.DAD 433394 135850658 1468007647 ddbjbct17.DAD 449025 140222820 1468006671 ddbjbct18.DAD 552225 173973498 1468006848 ddbjbct19.DAD 479876 149070781 1468007639 ddbjbct20.DAD 466926 143044901 1468007202 ddbjbct21.DAD 428353 133073665 1468007307 ddbjbct22.DAD 369465 114701055 1468007918 ddbjbct23.DAD 276019 85501098 1468008175 ddbjbct24.DAD 266185 82410414 1468008892 ddbjbct25.DAD 304424 94622373 1468006638 ddbjbct26.DAD 422152 129072309 1468008375 ddbjbct27.DAD 464300 144963107 1468009160 ddbjbct28.DAD 425839 136380207 1468010719 ddbjbct29.DAD 438529 139591101 1468007230 ddbjbct30.DAD 454821 141545891 1468010689 ddbjbct31.DAD 444887 136442025 1468010680 ddbjbct32.DAD 406750 125511980 1468008373 ddbjbct33.DAD 422287 129762046 1468008838 ddbjbct34.DAD 405793 126698209 1468006841 ddbjbct35.DAD 429996 135157388 1468008428 ddbjbct36.DAD 423727 134014233 1468007704 ddbjbct37.DAD 398238 125397771 1468006570 ddbjbct38.DAD 409920 127818644 1468010539 ddbjbct39.DAD 418860 130151842 1468010852 ddbjbct40.DAD 392017 121660927 1468008734 ddbjbct41.DAD 420477 131568550 1468006807 ddbjbct42.DAD 425095 132672402 1468007013 ddbjbct43.DAD 335157 104712435 1468007049 ddbjbct44.DAD 363760 114396949 1468006457 ddbjbct45.DAD 343066 106774843 1468010184 ddbjbct46.DAD 361691 112280281 1468010408 ddbjbct47.DAD 350954 110282101 1468009907 ddbjbct48.DAD 357175 111516567 1468007800 ddbjbct49.DAD 341726 105998426 1468007573 ddbjbct50.DAD 432154 129592930 1468006522 ddbjbct51.DAD 620953 185855361 1468006427 ddbjbct52.DAD 694320 208223822 1468007747 ddbjbct53.DAD 759175 204435841 1468007238 ddbjbct54.DAD 509140 159135722 886180673 ddbjcon1.DAD 211300 92664201 1468009604 ddbjcon2.DAD 277475 115178740 1468007387 ddbjcon3.DAD 180514 95301742 1468014326 ddbjcon4.DAD 281465 115382457 1468007115 ddbjcon5.DAD 319345 118076009 1468010597 ddbjcon6.DAD 329709 143015943 1468006640 ddbjcon7.DAD 490767 205617334 1468008263 ddbjcon8.DAD 471668 188433693 1468008174 ddbjcon9.DAD 467540 150327327 1468006796 ddbjcon10.DAD 366514 63996127 1468006561 ddbjcon11.DAD 366473 64025521 1468008979 ddbjcon12.DAD 366540 63948128 1468008017 ddbjcon13.DAD 366515 63927881 1468009199 ddbjcon14.DAD 366554 63926875 1468006538 ddbjcon15.DAD 366449 64197997 1468008315 ddbjcon16.DAD 367205 62737750 1468008901 ddbjcon17.DAD 366036 65142899 1468008472 ddbjcon18.DAD 361510 76614357 1468007189 ddbjcon19.DAD 362516 74303127 1468010258 ddbjcon20.DAD 361688 74245584 1468007256 ddbjcon21.DAD 362462 74238552 1468007927 ddbjcon22.DAD 360945 78103170 1468006899 ddbjcon23.DAD 358183 84503113 1468007657 ddbjcon24.DAD 356788 87232043 1468006478 ddbjcon25.DAD 357542 83885756 1468007058 ddbjcon26.DAD 405418 126528609 1468006521 ddbjcon27.DAD 449350 173801044 1468007915 ddbjcon28.DAD 401356 149896896 1468009701 ddbjcon29.DAD 404027 164406401 1468008504 ddbjcon30.DAD 475353 195538999 1468009054 ddbjcon31.DAD 339581 140600170 1468009660 ddbjcon32.DAD 342222 145503018 1468009355 ddbjcon33.DAD 367390 153967536 1468009534 ddbjcon34.DAD 448805 195391995 1468007519 ddbjcon35.DAD 457863 190176511 1468007850 ddbjcon36.DAD 398302 186327866 1468007667 ddbjcon37.DAD 482529 194383386 1468006872 ddbjcon38.DAD 467316 195349230 1468008338 ddbjcon39.DAD 385364 150636592 1468010602 ddbjcon40.DAD 369226 128792084 1468007284 ddbjcon41.DAD 441406 210233935 1468007735 ddbjcon42.DAD 435323 180039775 1468008106 ddbjcon43.DAD 379146 159038491 1468008809 ddbjcon44.DAD 255811 99211138 782271620 ddbjenv1.DAD 670496 137773661 1468007554 ddbjenv2.DAD 128400 24984589 251084442 ddbjest1.DAD 1163 153762 2567374 ddbjgss1.DAD 3137 962078 8039402 ddbjhtc1.DAD 118172 36081236 431921198 ddbjhtg1.DAD 64443 17653573 263545438 ddbjhum1.DAD 619417 180847875 1468007285 ddbjhum2.DAD 103922 25807060 215600465 ddbjinv1.DAD 584158 177236335 1468006753 ddbjinv2.DAD 690474 179382776 1468008679 ddbjinv3.DAD 698270 150834030 1468007049 ddbjinv4.DAD 658158 131690975 1468007626 ddbjinv5.DAD 348187 99102750 828629153 ddbjmam1.DAD 285686 72493139 605854256 ddbjpat1.DAD 391458 163940387 581436664 ddbjphg1.DAD 386012 80857227 830587441 ddbjpln1.DAD 456321 161029245 1468008239 ddbjpln2.DAD 439590 169975739 1468010650 ddbjpln3.DAD 442091 217501279 1468009908 ddbjpln4.DAD 668287 202849811 1468006839 ddbjpln5.DAD 759292 164925076 1468006617 ddbjpln6.DAD 429962 122902411 886228118 ddbjpri1.DAD 83619 19694171 182445171 ddbjrod1.DAD 221375 68098646 553333799 ddbjsts1.DAD 9 812 22053 ddbjsyn1.DAD 162465 58118337 411827504 ddbjtpa1.DAD 63460 25415633 195608198 ddbjtpacon1.DAD 71628 31568870 308879820 ddbjtsa1.DAD 121318 49616887 326360341 ddbjuna1.DAD 227 39165 388553 ddbjvrl1.DAD 659864 209221238 1468006738 ddbjvrl2.DAD 692397 209523945 1468009853 ddbjvrl3.DAD 631193 201518133 1468006562 ddbjvrl4.DAD 600810 225984440 1468007211 ddbjvrl5.DAD 246429 93486013 575246868 ddbjvrt1.DAD 693201 170689045 1468008396 ddbjvrt2.DAD 494737 109511755 993779938 ========================================================================= Total 53463317 16696790839 174538784283 =========================================================================