DDBJ Amino Acid Sequence Database (DAD) Release 75.0, June 2016, including 50,567,937 entries, 15,768,083,034 residues Last published date in the present release: May 27, 2016 ------------------------------------------------------------------------------- Table of contents ------------------------------------------------------------------------------- 1. Introduction 1.1. Announcement for changes in the present release 1.2. Announcement for the forthcoming changes 2. Format of DAD entries 3. DAD categories 4. Citation 5. Contact information 6. Disclaimer 7. DAD file categories 8. A sample of DAD entries 9. Release history 10. Statistics of DAD ------------------------------------------------------------------------------- 1. Introduction This is release 75.0 of DDBJ Amino Acid Sequence Database (DAD). This database has been produced by extracting all translated sequences from the DDBJ periodical release 105.0 and TPA dataset (May 2016). 1.1. Announcement for changes in the present release Nothing particular. 1.2. Announcement for the forthcoming changes Nothing particular. 2. Format of DAD entries The standard format of DAD is almost the same as that of the DDBJ nucleotide sequence database except for those described below. Accession numbers of the DAD entries are written in the lines labeled as "ACCESSION." An accession number of DAD is comprised of a DDBJ accession number and an integer that begins with 1. These two numbers are combined by a hyphen (-). For example, two amino acid sequences extracted from a DDBJ entry D12345 respectively have accession numbers of D12345-1 and D12345-2. The number is useful for identifying a DAD entry. An amino acid sequence begins from the next line of "BEGIN." Up to sixty amino acids are written in one line. Following the amino acid sequence, there is a double slash (//) which means the end of the entry. LOCUS line contains locus name, length of protein, molecular type (this is always "PRT"), division name, and date of release of DNA counterpart. DEFINITION line contains species name and protein name. The other parts of a DAD entry, including FEATURES, are almost the same as those of the corresponding DDBJ entry. 3. DAD categories DAD entries are classified into 23 categories, adding TPA and TPACON to the 21 categories of DDBJ periodical release. Please refer to the release note of the DDBJ release for details (filename: ddbjrel.txt). Also, there are two types of DAD files for each division; files with suffix ".DAD" in the DAD standard format, and those with suffix ".DAD.fasta" in a FASTA-compatible format. [DDBJ release note] ftp://ftp.ddbj.nig.ac.jp/ddbj_database/ddbj/ddbjrel.txt 4. Citation When you use DAD in your research, we would appreciate it if you would include a reference to DDBJ in your publications related to your research. When citing an entry in the DAD database, it is appropriate to give the protein_id and its accession number. Also, it is recommended to cite the first publication in REFERENCE of the entry other than submitter information. DDBJ suggests authors add a reference to DDBJ itself. The following publication, which describes the recent activities of the DDBJ center, would be appropriate to be cited: Mashima J, Kodama Y, Kosuge T, Fujisawa T, Katayama T, Nagasaki H, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y and Takagi T. DNA data bank of Japan (DDBJ) progress report. Nucleic Acids Res. 44 (Database issue), D51-D57 (2016) DOI: 10.1093/nar/gkv1105 The following sentence is an example to cite an entry in the DAD database: ----------------------------------------------------------------------------- "We searched the DAD database (1) by sequence similarities and found an amino acid sequence (2), with protein_id BAA22986.1 in DDBJ accession number AB000714, which had significant similarity with ..." (1) Mashima, J. et al, Nucleic Acids Res. 44(Database issue), D51-D57 (2016). (2) Katahira, J. et al, J. Biol. Chem. 272, 26652-26658 (1997). ------------------------------------------------------------------------------ 5. Contact information DNA Data Bank of Japan DDBJ Center National Institute of Genetics Research Organization of Information and Systems Mishima 411-8540, Japan Phone: +81 55 981 6853 FAX: +81 55 981 6849 E-mail: ddbj@ddbj.nig.ac.jp WWW: http://www.ddbj.nig.ac.jp/ 6. Disclaimer While DDBJ endeavors to keep its data correct, DDBJ makes no representations or warranties of any kind about the completeness, accuracy or reliability with respect to the entries contained in the DAD periodical release. DDBJ also makes no legal liability or responsibility of merchantability or fitness for a particular purpose or that the use of the sequence data will not infringe any patent or other rights. Any receipt, reliance or use you place on such data is therefore strictly at your own risk. 7. DAD file categories This release covers 23 categories (see also '3. DAD categories'.) of organisms and others as follows: ------------------------------------------------------------------------------ ddbjbct; Category for bacteria ddbjcon; Category for CON (contigs) ddbjenv; Category for ENV (environmental samples) ddbjest; Category for EST (expressed sequence tags) ddbjgss; Category for GSS (genome survey sequences) ddbjhtc; Category for HTC (high throughput cDNA sequences) ddbjhtg; Category for HTG (high throughput genomic sequences) ddbjhum; Category for human ddbjinv; Category for invertebrates ddbjmam; Category for mammals other than primates and rodents ddbjpat; Category for patents ddbjphg; Category for phages ddbjpln; Category for plants ddbjpri; Category for primates other than human ddbjrod; Category for rodents ddbjsts; Category for STS (sequence tagged sites) ddbjsyn; Category for synthetic DNAs ddbjtpa; Category for TPA (third party annotations) ddbjtpacon; Category for CON (contigs) of TPA (third party annotations) ddbjtsa; Category for TSA (transcriptome shotgun assemblies) ddbjuna; Category for unannotated sequences ddbjvrl; Category for viruses ddbjvrt; Category for vertebrates other than mammals ------------------------------------------------------------------------------ All of above in the present release are recorded in ddbj***##.DAD files as follows, respectively. file prefix number of files ------------------------------- ddbjbct 49 ddbjcon 43 ddbjenv 2 ddbjest 1 ddbjgss 1 ddbjhtc 1 ddbjhtg 1 ddbjhum 2 ddbjinv 5 ddbjmam 1 ddbjpat 1 ddbjphg 1 ddbjpln 6 ddbjpri 1 ddbjrod 1 ddbjsts 1 ddbjsyn 1 ddbjtpa 1 ddbjtpacon 1 ddbjtsa 1 ddbjuna 1 ddbjvrl 5 ddbjvrt 2 ------------------------------- 8. A sample of DAD entries Below is a typical DAD entry. This might be useful for understanding its format and contents. ----- ----- ----- ----- sample begin ----- ----- ----- ----- LOCUS BAA22986.1 220 aa PRT HUM 28-OCT-1997 DEFINITION Homo sapiens RVP1 protein. ACCESSION AB000714-1 PROTEIN_ID BAA22986.1 SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryotae; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 1250) AUTHORS Katahira,J. TITLE Direct Submission JOURNAL Submitted (26-JAN-1997) to the DDBJ/EMBL/GenBank databases. Contact:Jun Katahira Institute for Microbial Diseases, Osaka University, Department of Bacterial Toxinology; 3-1, Yamadaoka, Suita, Osaka 565, Japan REFERENCE 2 AUTHORS Katahira,J., Sugiyama,H., Inoue,N., Horiguchi,Y., Matsuda,M. and Sugimoto,N. TITLE Clostridium perfringens enterotoxin utilizes two structurally related membrane proteins as functional receptors in vivo JOURNAL J. Biol. Chem. 272, 26652-26658 (1997) COMMENT FEATURES Qualifiers source /db_xref="H-InvDB:HIT000057926" /mol_type="mRNA" /organism="Homo sapiens" /tissue_lib="lung" protein /gene="hRVP1" /transl_table=1 BEGIN 1 MSMGLEITGT ALAVLGWLGT IVCCALPMWR VSAFIGSNII TSQNIWEGLW MNCVVQSTGQ 61 MQCKVYDSLL ALPQDLQAAR ALIVVAILLA AFGLLVALVG AQCTNCVQDD TAKAKITIVA 121 GVLFLLAALL TLVPVSWSAN TIIRDFYNPV VPEAQKREMG AGLYVGWAAA ALQLLGGALL 181 CCSCPPREKK YTATKVVYSA PRSTGPGASL GTGYDRKDYV // ----- ----- ----- ----- sample end ----- ----- ----- ----- 9. Release history ------------------ Since release 50 ------------------ The format of the SOURCE line in DAD flat file has been changed: As results of this change, 1) the order of organism name and organelle name is changed and 2) some of DAD flat files have included a common name like as GenBank flat files. The change is shown below in detail. ---------------- Old (-rel. 49) ---------------- Format: SOURCE [] Example: SOURCE Homo sapiens mitochondrion ---------------- New (rel. 50-) ---------------- Format: SOURCE [] [()] Example: SOURCE mitochondrion Homo sapiens (human) See also '8. A sample of DAD entries'. ------------------ Since release 45 ------------------ A new division, TSA (Transcriptome Shotgun Assembly) is started: A new division for assembled mRNA sequences, Transcriptome Shotgun Assembly (TSA), is included in the present release. With new sequencing technologies, INSDC has faced many requests to accept assembled EST sequences. These sequence data have become more useful than used to be, although they may not be correctly assembled or exist in nature. Therefore, INSDC decided to collect assembled EST sequences into the new division 'TSA'. TSA sequences are shotgun assemblies of primary sequences deposited in the EST division of INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA). Two specific keywords, "TSA" and "Transcriptome Shotgun Assembly", are present in all TSA entries. The new division code, "TSA", is also described in the LOCUS line in all TSA entries. No format changes are anticipated for this new division, however, note that TSA entries make use of the same PRIMARY line that is described for the entries in TPA category. The PRIMARY block contains references to the underlying reads/transcripts that were assembled to construct a TSA record. ------------------ Since release 42 ------------------ Deletion of E-mail address, phone and fax numbers from DAD flat file To follow the Japanese law of protecting personal information, DDBJ delete both phone and fax numbers, and E-mail address from the flat files of entries submitted to DDBJ. Also, it would be helpful to protect DAD releases against SPAM mail senders. DDBJ retrofitted most of all entries submitted to DDBJ, not to GenBank or EMBL, by the DDBJ periodical release 72. Before the DAD periodical release 42, the submitter information was described in JOURNAL line at REFERENCE 1 as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Taro Mishima, DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan (E-mail:ddbj@ddbj.nig.ac.jp, URL:http://www.ddbj.nig.ac.jp/, Tel:81-12-345-6789, Fax:81-12-345-9876) ------------------------------------------------------------------------------- After the deletion or the information in question, DAD flat file is either one of the following two types; Type 1: Phone and fax numbers and E-mail address are deleted. ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ ------------------------------------------------------------------------------- Type 2: When the submitters wish to keep their contact information disclosed, it is described as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ E-mail :ddbj@ddbj.nig.ac.jp Phone :81-12-345-6789 Fax :81-12-345-9876 ------------------------------------------------------------------------------- ------------------ Since release 40 ------------------ The CON division has been included. CON; Contig / Constructed To conjugate a series of entries, such as those submitted from a genome project, each of the three data banks constructs an entry and assign an accession number to a large scale sequence dataset. Such entries are classified into the CON division. ------------------ Since release 38 ------------------ From the present release, we change the maximum file size to 1.5 GB, because the network capacity has been remarkably increased. Each file named as ddbj***##.DAD has at most 1.5 GB storage capacity. See also the sections, '9. Statistics of DAD'. ------------------ Since release 32 ------------------ Introduction of ENV division : Recently, the submissions of the sequences derived from environmental samples have rapidly increased. To accommodate such submissions, a new division, ENV, has been created. This division contains the sequences obtained via direct molecular isolation such as PCR, DGGE, or any anonymous method. In the past, the sequences derived from environmental samples belonged to taxonomic divisions, mainly BCT. At DDBJ, the retrofit to transfer relevant entries from taxonomic divisions to the ENV division starts in the present release, and ends by the next periodical release. Please note that during this transitional period, some entries to be eventually placed in the ENV division will be found in other divisions. ------------------ Since release 30 ------------------ "H-InvDB" has been added to db_xref(cross-reference) as a qualifier key. The following is an example. FEATURES Location/Qualifiers source 1..5589 /clone="hf00223s1" /clone_lib="pBluescriptII SK plus" /db_xref="H-InvDB:HIT000000001" ------------------ Since release 29 ------------------ The GSS division has been included since release 29. GSS stands for the Genome Survey Sequence, which is similar to EST, except that GSS is genomic DNA whereas EST is cDNA. ------------------ Since release 21 ------------------ 1) Some information on introns has been added. It is given as "intron_pos" in the Feature/Qualifiers. Examples: intron_pos 142:1 (2/12) means that the 2nd intron among 12 in total is located between the 1st and 2nd bases of the 142th codon (amino acid residue). intron_pos 228:0 (4/12) means that the 4th intron among 12 in total is located between the 227th and 228th codons (between the 3rd base of the 227th codon and the 1st base of the 228th codon). 2) the Locus line has been changed. The following is an example and its explanation: LOCUS BAA21794.1 263 aa PRT BCT 05-FEB-1999 Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-28 Locus name 29-29 space 30-40 Length of sequence, right-justified 41-41 space 42-43 'aa' 44-47 spaces 48-53 'PRT' 54-64 spaces 65-67 Division code 68-68 space 69-79 Date, in the form DD-MMM-YYYY (e.g., 15-MAR-1991) --------------------- 3) TPA data have been provided in a separate file (ddbjtpa.DAD). 10. Statistics of DAD The followings are statistics of this release of DAD. total number of entries 50,567,937 total length of sequences 15,768,083,034 average length 311 name of longest sequence CP000108-608 PID:ABB27887.1 length of longest sequence 36,805 aa (CP000108-608) ========================================================================= file name no. of entries no. of amino acids file size ========================================================================= ddbjbct1.DAD 326925 98952501 1468007780 ddbjbct2.DAD 498029 151273843 1468016349 ddbjbct3.DAD 579527 183222593 1468007570 ddbjbct4.DAD 592946 179042527 1468007656 ddbjbct5.DAD 451813 144733405 1468007020 ddbjbct6.DAD 435221 138929921 1468006412 ddbjbct7.DAD 425083 132515980 1468008167 ddbjbct8.DAD 455051 140726687 1468008107 ddbjbct9.DAD 393517 124337936 1468010434 ddbjbct10.DAD 354949 113185132 1468010645 ddbjbct11.DAD 382787 119317616 1468007391 ddbjbct12.DAD 369072 117061916 1468007962 ddbjbct13.DAD 389342 124006622 1468007426 ddbjbct14.DAD 427320 135201956 1468008525 ddbjbct15.DAD 441906 139044185 1468007289 ddbjbct16.DAD 439819 138012882 1468007637 ddbjbct17.DAD 452550 141009615 1468007501 ddbjbct18.DAD 548479 173563426 1468007477 ddbjbct19.DAD 474740 146399439 1468007909 ddbjbct20.DAD 455471 139637194 1468009433 ddbjbct21.DAD 416788 129723567 1468006650 ddbjbct22.DAD 370299 114597377 1468007787 ddbjbct23.DAD 267805 83059778 1468012088 ddbjbct24.DAD 273447 84638566 1468009105 ddbjbct25.DAD 326018 101015620 1468008903 ddbjbct26.DAD 429245 131087449 1468010448 ddbjbct27.DAD 459336 145337519 1468006452 ddbjbct28.DAD 420951 133734857 1468007476 ddbjbct29.DAD 452491 141859619 1468008214 ddbjbct30.DAD 445689 139917993 1468010031 ddbjbct31.DAD 441623 135894261 1468007211 ddbjbct32.DAD 432057 132335172 1468009758 ddbjbct33.DAD 413006 127912252 1468006910 ddbjbct34.DAD 408345 127334533 1468008275 ddbjbct35.DAD 439417 138085700 1468007855 ddbjbct36.DAD 403276 127228250 1468007356 ddbjbct37.DAD 399020 124741952 1468010035 ddbjbct38.DAD 423470 132198907 1468008088 ddbjbct39.DAD 390182 122994914 1468007782 ddbjbct40.DAD 425105 132837149 1468010018 ddbjbct41.DAD 375591 118510640 1468006706 ddbjbct42.DAD 339202 106424387 1468008343 ddbjbct43.DAD 353630 110516218 1468006982 ddbjbct44.DAD 355223 110223907 1468009691 ddbjbct45.DAD 379295 118783669 1468007001 ddbjbct46.DAD 624413 184727453 1468008112 ddbjbct47.DAD 673262 203848968 1468006857 ddbjbct48.DAD 743536 198917857 1468006919 ddbjbct49.DAD 491818 151273428 915656961 ddbjcon1.DAD 211305 92647799 1468007242 ddbjcon2.DAD 277470 115204529 1468007563 ddbjcon3.DAD 180514 95273627 1468010228 ddbjcon4.DAD 281547 115410248 1468006487 ddbjcon5.DAD 319310 118070880 1468010355 ddbjcon6.DAD 329747 143013568 1468009782 ddbjcon7.DAD 505888 215063571 1468010737 ddbjcon8.DAD 456996 173131760 1468006948 ddbjcon9.DAD 439276 113490170 1468007612 ddbjcon10.DAD 366552 63901695 1468009498 ddbjcon11.DAD 366483 63924161 1468007517 ddbjcon12.DAD 366460 64016643 1468010078 ddbjcon13.DAD 366522 64020656 1468008301 ddbjcon14.DAD 366528 63996468 1468007932 ddbjcon15.DAD 366598 63945513 1468007837 ddbjcon16.DAD 366996 62764502 1468010297 ddbjcon17.DAD 364610 69526824 1468010423 ddbjcon18.DAD 361334 77049342 1468009084 ddbjcon19.DAD 363066 71742502 1468007746 ddbjcon20.DAD 360785 77337138 1468007554 ddbjcon21.DAD 363573 71227756 1468007184 ddbjcon22.DAD 358833 83460153 1468008293 ddbjcon23.DAD 357973 84672458 1468008916 ddbjcon24.DAD 356685 86618863 1468009182 ddbjcon25.DAD 357055 85853436 1468010225 ddbjcon26.DAD 456387 165921557 1468006895 ddbjcon27.DAD 432151 160244751 1468006669 ddbjcon28.DAD 397619 159771833 1468006893 ddbjcon29.DAD 378571 150532021 1468008962 ddbjcon30.DAD 476323 192317679 1468008107 ddbjcon31.DAD 315267 135866853 1468008600 ddbjcon32.DAD 364026 156958251 1468009838 ddbjcon33.DAD 413875 176129086 1468011325 ddbjcon34.DAD 431032 187763094 1468007765 ddbjcon35.DAD 442517 177911518 1468007047 ddbjcon36.DAD 432124 191483323 1468008013 ddbjcon37.DAD 482220 194327982 1468008571 ddbjcon38.DAD 471150 206868292 1468007101 ddbjcon39.DAD 329776 109704044 1468008123 ddbjcon40.DAD 414523 176777785 1468007551 ddbjcon41.DAD 454673 200357397 1468008248 ddbjcon42.DAD 373652 148806066 1468007976 ddbjcon43.DAD 284437 113381151 878449012 ddbjenv1.DAD 676457 138035671 1468007709 ddbjenv2.DAD 90433 16574799 176200417 ddbjest1.DAD 1163 153762 2566494 ddbjgss1.DAD 3137 962078 8039402 ddbjhtc1.DAD 117441 35814992 430534763 ddbjhtg1.DAD 64409 17638928 263318762 ddbjhum1.DAD 619848 180850934 1468007509 ddbjhum2.DAD 90476 23054942 188097106 ddbjinv1.DAD 587167 175426042 1468007785 ddbjinv2.DAD 690022 179062091 1468007359 ddbjinv3.DAD 697043 150386662 1468008232 ddbjinv4.DAD 649331 131883128 1468007117 ddbjinv5.DAD 102306 42898812 219902028 ddbjmam1.DAD 278943 70734900 591638837 ddbjpat1.DAD 391458 163940387 581425593 ddbjphg1.DAD 359871 75098164 778587876 ddbjpln1.DAD 456174 161000910 1468007743 ddbjpln2.DAD 451531 175315107 1468009686 ddbjpln3.DAD 470353 222118249 1468007726 ddbjpln4.DAD 695059 202329840 1468007589 ddbjpln5.DAD 751717 163933707 1468006763 ddbjpln6.DAD 284496 85606441 589133231 ddbjpri1.DAD 82234 19386194 179242300 ddbjrod1.DAD 219100 67525486 548091616 ddbjsts1.DAD 9 812 22053 ddbjsyn1.DAD 160046 57285794 406066344 ddbjtpa1.DAD 63335 25386669 195279726 ddbjtpacon1.DAD 71628 31568870 308879820 ddbjtsa1.DAD 121318 49616887 326360341 ddbjuna1.DAD 227 39165 388056 ddbjvrl1.DAD 659680 209280851 1468007970 ddbjvrl2.DAD 692364 209330676 1468008426 ddbjvrl3.DAD 631565 201812839 1468008458 ddbjvrl4.DAD 599850 225476596 1468006964 ddbjvrl5.DAD 153393 54102230 341183142 ddbjvrt1.DAD 693227 170682829 1468007948 ddbjvrt2.DAD 460610 103342307 927749334 ========================================================================= Total 50567937 15768083034 164465692646 =========================================================================