DDBJ Amino Acid Sequence Database (DAD) Release 79.0, June 2017, including 63,670,485 entries, 19,963,617,826 residues Last published date in the present release: May 26, 2017 ------------------------------------------------------------------------------- Table of contents ------------------------------------------------------------------------------- 1. Introduction 1.1. Announcement for changes in the present release 1.2. Announcement for the forthcoming changes 2. Format of DAD entries 3. DAD categories 4. Citation 5. Contact information 6. Disclaimer 7. DAD file categories 8. A sample of DAD entries 9. Release history 10. Statistics of DAD ------------------------------------------------------------------------------- 1. Introduction This is release 79.0 of DDBJ Amino Acid Sequence Database (DAD). This database has been produced by extracting all translated sequences from conventional sequence data of DDBJ periodical release 109.0 and TPA data set (May 2017). 1.1. Announcement for changes in the present release Nothing particular. 1.2. Announcement for the forthcoming changes Nothing particular. 2. Format of DAD entries The standard format of DAD is almost the same as that of the DDBJ nucleotide sequence database except for those described below. Accession numbers of the DAD entries are written in the lines labeled as "ACCESSION." An accession number of DAD is comprised of a DDBJ accession number and an integer that begins with 1. These two numbers are combined by a hyphen (-). For example, two amino acid sequences extracted from a DDBJ entry D12345 respectively have accession numbers of D12345-1 and D12345-2. The number is useful for identifying a DAD entry. An amino acid sequence begins from the next line of "BEGIN." Up to sixty amino acids are written in one line. Following the amino acid sequence, there is a double slash (//) which means the end of the entry. LOCUS line contains locus name, length of protein, molecular type (this is always "PRT"), division name, and date of release of DNA counterpart. DEFINITION line contains species name and protein name. The other parts of a DAD entry, including FEATURES, are almost the same as those of the corresponding DDBJ entry. 3. DAD categories DAD entries are classified into 23 categories, adding TPA and TPACON to the 21 divisions of conventional sequence data of DDBJ periodical release. Please refer to the release note of the DDBJ release for details (filename: ddbjrel.txt). Also, there are two types of DAD files for each division; files with suffix ".DAD" in the DAD standard format, and those with suffix ".DAD.fasta" in a FASTA-compatible format. [DDBJ release note] ftp://ftp.ddbj.nig.ac.jp/ddbj_database/ddbj/ddbjrel.txt 4. Citation When you use DAD in your research, we would appreciate it if you would include a reference to DDBJ in your publications related to your research. When citing an entry in the DAD database, it is appropriate to give the protein_id and its accession number. Also, it is recommended to cite the first publication in REFERENCE of the entry other than submitter information. DDBJ suggests authors add a reference to DDBJ itself. The following publication, which describes the recent activities of the DDBJ center, would be appropriate to be cited: Mashima J, Kodama Y, Fujisawa T, Katayama T, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y and Takagi T. DNA Data Bank of Japan. Nucleic Acids Res. 45, D25-D31 (2017) DOI: 10.1093/nar/gkw1001 The following sentence is an example to cite an entry in the DAD database: ----------------------------------------------------------------------------- "We searched the DAD database (1) by sequence similarities and found an amino acid sequence (2), with protein_id BAA22986.1 in DDBJ accession number AB000714, which had significant similarity with ..." (1) Mashima, J. et al, Nucleic Acids Res. 45, D25-D31 (2017). (2) Katahira, J. et al, J. Biol. Chem. 272, 26652-26658 (1997). ------------------------------------------------------------------------------ 5. Contact information DNA Data Bank of Japan DDBJ Center National Institute of Genetics Research Organization of Information and Systems Mishima 411-8540, Japan Phone: +81 55 981 6853 FAX: +81 55 981 6849 E-mail: ddbj@ddbj.nig.ac.jp WWW: http://www.ddbj.nig.ac.jp/ 6. Disclaimer While DDBJ endeavors to keep its data correct, DDBJ makes no representations or warranties of any kind about the completeness, accuracy or reliability with respect to the entries contained in the DAD periodical release. DDBJ also makes no legal liability or responsibility of merchantability or fitness for a particular purpose or that the use of the sequence data will not infringe any patent or other rights. Any receipt, reliance or use you place on such data is therefore strictly at your own risk. 7. DAD file categories This release covers 23 categories (see also '3. DAD categories'.) of organisms and others as follows: ------------------------------------------------------------------------------ ddbjbct; Category for bacteria ddbjcon; Category for CON (contigs) ddbjenv; Category for ENV (environmental samples) ddbjest; Category for EST (expressed sequence tags) ddbjgss; Category for GSS (genome survey sequences) ddbjhtc; Category for HTC (high throughput cDNA sequences) ddbjhtg; Category for HTG (high throughput genomic sequences) ddbjhum; Category for human ddbjinv; Category for invertebrates ddbjmam; Category for mammals other than primates and rodents ddbjpat; Category for patents ddbjphg; Category for phages ddbjpln; Category for plants ddbjpri; Category for primates other than human ddbjrod; Category for rodents ddbjsts; Category for STS (sequence tagged sites) ddbjsyn; Category for synthetic DNAs ddbjtpa; Category for TPA (third party annotations) ddbjtpacon; Category for CON (contigs) of TPA (third party annotations) ddbjtsa; Category for TSA (transcriptome shotgun assemblies) ddbjuna; Category for unannotated sequences ddbjvrl; Category for viruses ddbjvrt; Category for vertebrates other than mammals ------------------------------------------------------------------------------ All of above in the present release are recorded in ddbj***##.DAD files as follows, respectively. file prefix number of files ------------------------------- ddbjbct 74 ddbjcon 45 ddbjenv 2 ddbjest 1 ddbjgss 1 ddbjhtc 1 ddbjhtg 1 ddbjhum 2 ddbjinv 6 ddbjmam 1 ddbjpat 1 ddbjphg 1 ddbjpln 6 ddbjpri 1 ddbjrod 1 ddbjsts 1 ddbjsyn 1 ddbjtpa 1 ddbjtpacon 1 ddbjtsa 1 ddbjuna 1 ddbjvrl 5 ddbjvrt 2 ------------------------------- 8. A sample of DAD entries Below is a typical DAD entry. This might be useful for understanding its format and contents. ----- ----- ----- ----- sample begin ----- ----- ----- ----- LOCUS BAA22986.1 220 aa PRT HUM 28-OCT-1997 DEFINITION Homo sapiens RVP1 protein. ACCESSION AB000714-1 PROTEIN_ID BAA22986.1 SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryotae; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 1250) AUTHORS Katahira,J. TITLE Direct Submission JOURNAL Submitted (26-JAN-1997) to the DDBJ/EMBL/GenBank databases. Contact:Jun Katahira Institute for Microbial Diseases, Osaka University, Department of Bacterial Toxinology; 3-1, Yamadaoka, Suita, Osaka 565, Japan REFERENCE 2 AUTHORS Katahira,J., Sugiyama,H., Inoue,N., Horiguchi,Y., Matsuda,M. and Sugimoto,N. TITLE Clostridium perfringens enterotoxin utilizes two structurally related membrane proteins as functional receptors in vivo JOURNAL J. Biol. Chem. 272, 26652-26658 (1997) COMMENT FEATURES Qualifiers source /db_xref="H-InvDB:HIT000057926" /mol_type="mRNA" /organism="Homo sapiens" /tissue_lib="lung" protein /gene="hRVP1" /transl_table=1 BEGIN 1 MSMGLEITGT ALAVLGWLGT IVCCALPMWR VSAFIGSNII TSQNIWEGLW MNCVVQSTGQ 61 MQCKVYDSLL ALPQDLQAAR ALIVVAILLA AFGLLVALVG AQCTNCVQDD TAKAKITIVA 121 GVLFLLAALL TLVPVSWSAN TIIRDFYNPV VPEAQKREMG AGLYVGWAAA ALQLLGGALL 181 CCSCPPREKK YTATKVVYSA PRSTGPGASL GTGYDRKDYV // ----- ----- ----- ----- sample end ----- ----- ----- ----- 9. Release history ------------------ Since release 50 ------------------ The format of the SOURCE line in DAD flat file has been changed: As results of this change, 1) the order of organism name and organelle name is changed and 2) some of DAD flat files have included a common name like as GenBank flat files. The change is shown below in detail. ---------------- Old (-rel. 49) ---------------- Format: SOURCE [] Example: SOURCE Homo sapiens mitochondrion ---------------- New (rel. 50-) ---------------- Format: SOURCE [] [()] Example: SOURCE mitochondrion Homo sapiens (human) See also '8. A sample of DAD entries'. ------------------ Since release 45 ------------------ A new division, TSA (Transcriptome Shotgun Assembly) is started: A new division for assembled mRNA sequences, Transcriptome Shotgun Assembly (TSA), is included in the present release. With new sequencing technologies, INSDC has faced many requests to accept assembled EST sequences. These sequence data have become more useful than used to be, although they may not be correctly assembled or exist in nature. Therefore, INSDC decided to collect assembled EST sequences into the new division 'TSA'. TSA sequences are shotgun assemblies of primary sequences deposited in the EST division of INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA). Two specific keywords, "TSA" and "Transcriptome Shotgun Assembly", are present in all TSA entries. The new division code, "TSA", is also described in the LOCUS line in all TSA entries. No format changes are anticipated for this new division, however, note that TSA entries make use of the same PRIMARY line that is described for the entries in TPA category. The PRIMARY block contains references to the underlying reads/transcripts that were assembled to construct a TSA record. ------------------ Since release 42 ------------------ Deletion of E-mail address, phone and fax numbers from DAD flat file To follow the Japanese law of protecting personal information, DDBJ delete both phone and fax numbers, and E-mail address from the flat files of entries submitted to DDBJ. Also, it would be helpful to protect DAD releases against SPAM mail senders. DDBJ retrofitted most of all entries submitted to DDBJ, not to GenBank or EMBL, by the DDBJ periodical release 72. Before the DAD periodical release 42, the submitter information was described in JOURNAL line at REFERENCE 1 as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Taro Mishima, DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan (E-mail:ddbj@ddbj.nig.ac.jp, URL:http://www.ddbj.nig.ac.jp/, Tel:81-12-345-6789, Fax:81-12-345-9876) ------------------------------------------------------------------------------- After the deletion or the information in question, DAD flat file is either one of the following two types; Type 1: Phone and fax numbers and E-mail address are deleted. ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ ------------------------------------------------------------------------------- Type 2: When the submitters wish to keep their contact information disclosed, it is described as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ E-mail :ddbj@ddbj.nig.ac.jp Phone :81-12-345-6789 Fax :81-12-345-9876 ------------------------------------------------------------------------------- ------------------ Since release 40 ------------------ The CON division has been included. CON; Contig / Constructed To conjugate a series of entries, such as those submitted from a genome project, each of the three data banks constructs an entry and assign an accession number to a large scale sequence dataset. Such entries are classified into the CON division. ------------------ Since release 38 ------------------ From the present release, we change the maximum file size to 1.5 GB, because the network capacity has been remarkably increased. Each file named as ddbj***##.DAD has at most 1.5 GB storage capacity. See also the sections, '9. Statistics of DAD'. ------------------ Since release 32 ------------------ Introduction of ENV division : Recently, the submissions of the sequences derived from environmental samples have rapidly increased. To accommodate such submissions, a new division, ENV, has been created. This division contains the sequences obtained via direct molecular isolation such as PCR, DGGE, or any anonymous method. In the past, the sequences derived from environmental samples belonged to taxonomic divisions, mainly BCT. At DDBJ, the retrofit to transfer relevant entries from taxonomic divisions to the ENV division starts in the present release, and ends by the next periodical release. Please note that during this transitional period, some entries to be eventually placed in the ENV division will be found in other divisions. ------------------ Since release 30 ------------------ "H-InvDB" has been added to db_xref(cross-reference) as a qualifier key. The following is an example. FEATURES Location/Qualifiers source 1..5589 /clone="hf00223s1" /clone_lib="pBluescriptII SK plus" /db_xref="H-InvDB:HIT000000001" ------------------ Since release 29 ------------------ The GSS division has been included since release 29. GSS stands for the Genome Survey Sequence, which is similar to EST, except that GSS is genomic DNA whereas EST is cDNA. ------------------ Since release 21 ------------------ 1) Some information on introns has been added. It is given as "intron_pos" in the Feature/Qualifiers. Examples: intron_pos 142:1 (2/12) means that the 2nd intron among 12 in total is located between the 1st and 2nd bases of the 142th codon (amino acid residue). intron_pos 228:0 (4/12) means that the 4th intron among 12 in total is located between the 227th and 228th codons (between the 3rd base of the 227th codon and the 1st base of the 228th codon). 2) the Locus line has been changed. The following is an example and its explanation: LOCUS BAA21794.1 263 aa PRT BCT 05-FEB-1999 Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-28 Locus name 29-29 space 30-40 Length of sequence, right-justified 41-41 space 42-43 'aa' 44-47 spaces 48-53 'PRT' 54-64 spaces 65-67 Division code 68-68 space 69-79 Date, in the form DD-MMM-YYYY (e.g., 15-MAR-1991) --------------------- 3) TPA data have been provided in a separate file (ddbjtpa.DAD). 10. Statistics of DAD The followings are statistics of this release of DAD. total number of entries 63,670,485 total length of sequences 19,963,617,826 average length 313 name of longest sequence CP000108-608 PID:ABB27887.1 length of longest sequence 36,805 aa (CP000108-608) ========================================================================= file name no. of entries no. of amino acids file size ========================================================================= ddbjbct1.DAD 332675 100546459 1468008028 ddbjbct2.DAD 494687 150366432 1468020199 ddbjbct3.DAD 585960 185359186 1468007532 ddbjbct4.DAD 633379 191161869 1468011752 ddbjbct5.DAD 478540 152214428 1468007803 ddbjbct6.DAD 440092 139990306 1468006942 ddbjbct7.DAD 425040 133358797 1468009153 ddbjbct8.DAD 469487 145322100 1468006633 ddbjbct9.DAD 424088 132676188 1468010291 ddbjbct10.DAD 334189 108658580 1468008780 ddbjbct11.DAD 389130 120194972 1468008163 ddbjbct12.DAD 377721 120413440 1468010475 ddbjbct13.DAD 386099 122293200 1468008735 ddbjbct14.DAD 409359 130032792 1468007005 ddbjbct15.DAD 438871 138030980 1468007516 ddbjbct16.DAD 442850 139037030 1468006418 ddbjbct17.DAD 451556 141619479 1468007717 ddbjbct18.DAD 565805 177858746 1468008860 ddbjbct19.DAD 495728 153842400 1468008778 ddbjbct20.DAD 467770 143345487 1468010102 ddbjbct21.DAD 439162 136321527 1468008410 ddbjbct22.DAD 375848 116732969 1468008737 ddbjbct23.DAD 282310 87551347 1468006555 ddbjbct24.DAD 268645 83146803 1468009670 ddbjbct25.DAD 305870 95073106 1468006650 ddbjbct26.DAD 404821 124229719 1468007938 ddbjbct27.DAD 472777 147729239 1468007134 ddbjbct28.DAD 400571 125703529 1468007465 ddbjbct29.DAD 419721 136095357 1468008763 ddbjbct30.DAD 435452 136482669 1468009312 ddbjbct31.DAD 463270 143625710 1468007418 ddbjbct32.DAD 447881 138420891 1468007794 ddbjbct33.DAD 401264 123470208 1468009945 ddbjbct34.DAD 401701 124249249 1468007199 ddbjbct35.DAD 408921 126713304 1468007496 ddbjbct36.DAD 416158 129926188 1468009433 ddbjbct37.DAD 393681 125602499 1468008669 ddbjbct38.DAD 415013 131573715 1468006443 ddbjbct39.DAD 395195 125473238 1468007123 ddbjbct40.DAD 387017 122695960 1468006583 ddbjbct41.DAD 409331 127970624 1468007494 ddbjbct42.DAD 417939 129589882 1468008746 ddbjbct43.DAD 391588 120147644 1468007696 ddbjbct44.DAD 398579 125053835 1468007068 ddbjbct45.DAD 471085 147738260 1468007893 ddbjbct46.DAD 443927 141401956 1468007509 ddbjbct47.DAD 399722 123913954 1468006811 ddbjbct48.DAD 346702 109593093 1468009156 ddbjbct49.DAD 357387 112217867 1468008464 ddbjbct50.DAD 352908 107122733 1468008943 ddbjbct51.DAD 368297 115384041 1468007236 ddbjbct52.DAD 374806 116166255 1468011270 ddbjbct53.DAD 359104 113305739 1468007035 ddbjbct54.DAD 364967 113870048 1468008766 ddbjbct55.DAD 379506 120840602 1468008004 ddbjbct56.DAD 381476 121212657 1468006537 ddbjbct57.DAD 357700 110814853 1468008560 ddbjbct58.DAD 364580 110711137 1468010446 ddbjbct59.DAD 373603 117508330 1468009726 ddbjbct60.DAD 364373 114808097 1468007565 ddbjbct61.DAD 369394 117075918 1468010184 ddbjbct62.DAD 347236 104879106 1468009544 ddbjbct63.DAD 349995 107858659 1468009703 ddbjbct64.DAD 340919 103780326 1468008410 ddbjbct65.DAD 348401 108619896 1468010690 ddbjbct66.DAD 353484 109616735 1468008817 ddbjbct67.DAD 355442 109388263 1468007051 ddbjbct68.DAD 363211 111468133 1468010511 ddbjbct69.DAD 579328 170277293 1468007694 ddbjbct70.DAD 627972 191127367 1468007503 ddbjbct71.DAD 731948 201674364 1468007380 ddbjbct72.DAD 786581 232864787 1468007569 ddbjbct73.DAD 920703 299371935 1468007675 ddbjbct74.DAD 121425 37035404 235279804 ddbjcon1.DAD 212096 93074871 1468007599 ddbjcon2.DAD 278759 116165139 1468007417 ddbjcon3.DAD 180994 95769762 1468010297 ddbjcon4.DAD 324636 140171380 1468007331 ddbjcon5.DAD 322238 119053181 1468006803 ddbjcon6.DAD 335747 145241503 1468009675 ddbjcon7.DAD 452786 192263710 1468007426 ddbjcon8.DAD 516892 208809944 1468009402 ddbjcon9.DAD 488682 182824958 1468008442 ddbjcon10.DAD 391605 76746974 1468009037 ddbjcon11.DAD 370651 64723404 1468010000 ddbjcon12.DAD 370637 64742452 1468007706 ddbjcon13.DAD 370656 64747395 1468007271 ddbjcon14.DAD 370779 64619665 1468009316 ddbjcon15.DAD 370634 64692683 1468007901 ddbjcon16.DAD 371053 64341106 1468009613 ddbjcon17.DAD 370666 64098166 1468007921 ddbjcon18.DAD 366651 75349045 1468007633 ddbjcon19.DAD 366054 76911908 1468006476 ddbjcon20.DAD 366496 73051575 1468006940 ddbjcon21.DAD 365614 76973696 1468006607 ddbjcon22.DAD 366488 75768461 1468009665 ddbjcon23.DAD 361809 85811032 1468008414 ddbjcon24.DAD 361112 87682663 1468007924 ddbjcon25.DAD 361451 85035905 1468009186 ddbjcon26.DAD 396529 112131735 1468006862 ddbjcon27.DAD 471184 183617504 1468009645 ddbjcon28.DAD 409134 151842875 1468011552 ddbjcon29.DAD 385291 162373644 1468008482 ddbjcon30.DAD 490315 196478802 1468007474 ddbjcon31.DAD 352599 146137669 1468008713 ddbjcon32.DAD 331145 140117631 1468009971 ddbjcon33.DAD 381289 160113787 1468007590 ddbjcon34.DAD 440561 193394873 1468008781 ddbjcon35.DAD 465665 193161794 1468008412 ddbjcon36.DAD 400774 188077450 1468007844 ddbjcon37.DAD 495692 198785379 1468006765 ddbjcon38.DAD 470719 198168786 1468008363 ddbjcon39.DAD 383126 147349975 1468008699 ddbjcon40.DAD 384969 139790133 1468008136 ddbjcon41.DAD 453311 210497954 1468007889 ddbjcon42.DAD 422019 174444780 1468008148 ddbjcon43.DAD 371863 159514680 1468008039 ddbjcon44.DAD 355522 156179569 1468011163 ddbjcon45.DAD 317703 129911315 954133396 ddbjenv1.DAD 679488 140528715 1468008520 ddbjenv2.DAD 216886 40753011 407616517 ddbjest1.DAD 1163 153762 2557007 ddbjgss1.DAD 3137 962078 7898298 ddbjhtc1.DAD 121000 37253340 435647013 ddbjhtg1.DAD 64490 17672393 262977659 ddbjhum1.DAD 631233 184033709 1468006805 ddbjhum2.DAD 163007 43068341 369934795 ddbjinv1.DAD 591497 179125854 1468006995 ddbjinv2.DAD 701270 181555900 1468007286 ddbjinv3.DAD 710050 152934817 1468007288 ddbjinv4.DAD 661996 132291237 1468006610 ddbjinv5.DAD 637015 135537736 1468006964 ddbjinv6.DAD 154927 89544117 339109124 ddbjmam1.DAD 306777 78163864 638404983 ddbjpat1.DAD 391930 164228016 582290210 ddbjphg1.DAD 468194 99119063 977251535 ddbjpln1.DAD 457641 161409472 1468007915 ddbjpln2.DAD 440485 170400356 1468008261 ddbjpln3.DAD 433862 213001063 1468008413 ddbjpln4.DAD 660583 200188365 1468006511 ddbjpln5.DAD 759007 176503316 1468007125 ddbjpln6.DAD 730875 205047125 1454877567 ddbjpri1.DAD 87373 20714054 187259849 ddbjrod1.DAD 229374 70468231 560315631 ddbjsts1.DAD 9 812 22011 ddbjsyn1.DAD 245075 85320390 604553355 ddbjtpa1.DAD 64021 25579180 198379924 ddbjtpacon1.DAD 42523 21795983 179418237 ddbjtsa1.DAD 121679 49825316 321844710 ddbjuna1.DAD 229 39813 387792 ddbjvrl1.DAD 668175 212586128 1468007028 ddbjvrl2.DAD 695411 212969149 1468009122 ddbjvrl3.DAD 640411 204540183 1468015924 ddbjvrl4.DAD 614629 230458404 1468007491 ddbjvrl5.DAD 543715 210991444 1298420544 ddbjvrt1.DAD 706987 173992113 1468008080 ddbjvrt2.DAD 577842 128550172 1135002640 ========================================================================= Total 63670485 19963617826 207866704744 =========================================================================