DDBJ Amino Acid Sequence Database (DAD) Release 80.0, Sep. 2017, including 66,795,627 entries, 20,913,963,738 residues Last published date in the present release: August 25, 2017 ------------------------------------------------------------------------------- Table of contents ------------------------------------------------------------------------------- 1. Introduction 1.1. Announcement for changes in the present release 1.2. Announcement for the forthcoming changes 2. Format of DAD entries 3. DAD categories 4. Citation 5. Contact information 6. Disclaimer 7. DAD file categories 8. A sample of DAD entries 9. Release history 10. Statistics of DAD ------------------------------------------------------------------------------- 1. Introduction This is release 80.0 of DDBJ Amino Acid Sequence Database (DAD). This database has been produced by extracting all translated sequences from conventional sequence data of DDBJ periodical release 110.0 and TPA data set (August 2017). 1.1. Announcement for changes in the present release Nothing particular. 1.2. Announcement for the forthcoming changes Nothing particular. 2. Format of DAD entries The standard format of DAD is almost the same as that of the DDBJ nucleotide sequence database except for those described below. Accession numbers of the DAD entries are written in the lines labeled as "ACCESSION." An accession number of DAD is comprised of a DDBJ accession number and an integer that begins with 1. These two numbers are combined by a hyphen (-). For example, two amino acid sequences extracted from a DDBJ entry D12345 respectively have accession numbers of D12345-1 and D12345-2. The number is useful for identifying a DAD entry. An amino acid sequence begins from the next line of "BEGIN." Up to sixty amino acids are written in one line. Following the amino acid sequence, there is a double slash (//) which means the end of the entry. LOCUS line contains locus name, length of protein, molecular type (this is always "PRT"), division name, and date of release of DNA counterpart. DEFINITION line contains species name and protein name. The other parts of a DAD entry, including FEATURES, are almost the same as those of the corresponding DDBJ entry. 3. DAD categories DAD entries are classified into 23 categories, adding TPA and TPACON to the 21 divisions of conventional sequence data of DDBJ periodical release. Please refer to the release note of the DDBJ release for details (filename: ddbjrel.txt). Also, there are two types of DAD files for each division; files with suffix ".DAD" in the DAD standard format, and those with suffix ".DAD.fasta" in a FASTA-compatible format. [DDBJ release note] ftp://ftp.ddbj.nig.ac.jp/ddbj_database/ddbj/ddbjrel.txt 4. Citation When you use DAD in your research, we would appreciate it if you would include a reference to DDBJ in your publications related to your research. When citing an entry in the DAD database, it is appropriate to give the protein_id and its accession number. Also, it is recommended to cite the first publication in REFERENCE of the entry other than submitter information. DDBJ suggests authors add a reference to DDBJ itself. The following publication, which describes the recent activities of the DDBJ center, would be appropriate to be cited: Mashima J, Kodama Y, Fujisawa T, Katayama T, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y and Takagi T. DNA Data Bank of Japan. Nucleic Acids Res. 45, D25-D31 (2017) DOI: 10.1093/nar/gkw1001 The following sentence is an example to cite an entry in the DAD database: ----------------------------------------------------------------------------- "We searched the DAD database (1) by sequence similarities and found an amino acid sequence (2), with protein_id BAA22986.1 in DDBJ accession number AB000714, which had significant similarity with ..." (1) Mashima, J. et al, Nucleic Acids Res. 45, D25-D31 (2017). (2) Katahira, J. et al, J. Biol. Chem. 272, 26652-26658 (1997). ------------------------------------------------------------------------------ 5. Contact information DNA Data Bank of Japan DDBJ Center National Institute of Genetics Research Organization of Information and Systems Mishima 411-8540, Japan Phone: +81 55 981 6853 FAX: +81 55 981 6849 E-mail: ddbj@ddbj.nig.ac.jp WWW: http://www.ddbj.nig.ac.jp/ 6. Disclaimer While DDBJ endeavors to keep its data correct, DDBJ makes no representations or warranties of any kind about the completeness, accuracy or reliability with respect to the entries contained in the DAD periodical release. DDBJ also makes no legal liability or responsibility of merchantability or fitness for a particular purpose or that the use of the sequence data will not infringe any patent or other rights. Any receipt, reliance or use you place on such data is therefore strictly at your own risk. 7. DAD file categories This release covers 23 categories (see also '3. DAD categories'.) of organisms and others as follows: ------------------------------------------------------------------------------ ddbjbct; Category for bacteria ddbjcon; Category for CON (contigs) ddbjenv; Category for ENV (environmental samples) ddbjest; Category for EST (expressed sequence tags) ddbjgss; Category for GSS (genome survey sequences) ddbjhtc; Category for HTC (high throughput cDNA sequences) ddbjhtg; Category for HTG (high throughput genomic sequences) ddbjhum; Category for human ddbjinv; Category for invertebrates ddbjmam; Category for mammals other than primates and rodents ddbjpat; Category for patents ddbjphg; Category for phages ddbjpln; Category for plants ddbjpri; Category for primates other than human ddbjrod; Category for rodents ddbjsts; Category for STS (sequence tagged sites) ddbjsyn; Category for synthetic DNAs ddbjtpa; Category for TPA (third party annotations) ddbjtpacon; Category for CON (contigs) of TPA (third party annotations) ddbjtsa; Category for TSA (transcriptome shotgun assemblies) ddbjuna; Category for unannotated sequences ddbjvrl; Category for viruses ddbjvrt; Category for vertebrates other than mammals ------------------------------------------------------------------------------ All of above in the present release are recorded in ddbj***##.DAD files as follows, respectively. file prefix number of files ------------------------------- ddbjbct 80 ddbjcon 45 ddbjenv 2 ddbjest 1 ddbjgss 1 ddbjhtc 1 ddbjhtg 1 ddbjhum 2 ddbjinv 6 ddbjmam 1 ddbjpat 1 ddbjphg 1 ddbjpln 7 ddbjpri 1 ddbjrod 1 ddbjsts 1 ddbjsyn 1 ddbjtpa 1 ddbjtpacon 1 ddbjtsa 1 ddbjuna 1 ddbjvrl 6 ddbjvrt 2 ------------------------------- 8. A sample of DAD entries Below is a typical DAD entry. This might be useful for understanding its format and contents. ----- ----- ----- ----- sample begin ----- ----- ----- ----- LOCUS BAA22986.1 220 aa PRT HUM 28-OCT-1997 DEFINITION Homo sapiens RVP1 protein. ACCESSION AB000714-1 PROTEIN_ID BAA22986.1 SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryotae; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 1250) AUTHORS Katahira,J. TITLE Direct Submission JOURNAL Submitted (26-JAN-1997) to the DDBJ/EMBL/GenBank databases. Contact:Jun Katahira Institute for Microbial Diseases, Osaka University, Department of Bacterial Toxinology; 3-1, Yamadaoka, Suita, Osaka 565, Japan REFERENCE 2 AUTHORS Katahira,J., Sugiyama,H., Inoue,N., Horiguchi,Y., Matsuda,M. and Sugimoto,N. TITLE Clostridium perfringens enterotoxin utilizes two structurally related membrane proteins as functional receptors in vivo JOURNAL J. Biol. Chem. 272, 26652-26658 (1997) COMMENT FEATURES Qualifiers source /db_xref="H-InvDB:HIT000057926" /mol_type="mRNA" /organism="Homo sapiens" /tissue_lib="lung" protein /gene="hRVP1" /transl_table=1 BEGIN 1 MSMGLEITGT ALAVLGWLGT IVCCALPMWR VSAFIGSNII TSQNIWEGLW MNCVVQSTGQ 61 MQCKVYDSLL ALPQDLQAAR ALIVVAILLA AFGLLVALVG AQCTNCVQDD TAKAKITIVA 121 GVLFLLAALL TLVPVSWSAN TIIRDFYNPV VPEAQKREMG AGLYVGWAAA ALQLLGGALL 181 CCSCPPREKK YTATKVVYSA PRSTGPGASL GTGYDRKDYV // ----- ----- ----- ----- sample end ----- ----- ----- ----- 9. Release history ------------------ Since release 50 ------------------ The format of the SOURCE line in DAD flat file has been changed: As results of this change, 1) the order of organism name and organelle name is changed and 2) some of DAD flat files have included a common name like as GenBank flat files. The change is shown below in detail. ---------------- Old (-rel. 49) ---------------- Format: SOURCE [] Example: SOURCE Homo sapiens mitochondrion ---------------- New (rel. 50-) ---------------- Format: SOURCE [] [()] Example: SOURCE mitochondrion Homo sapiens (human) See also '8. A sample of DAD entries'. ------------------ Since release 45 ------------------ A new division, TSA (Transcriptome Shotgun Assembly) is started: A new division for assembled mRNA sequences, Transcriptome Shotgun Assembly (TSA), is included in the present release. With new sequencing technologies, INSDC has faced many requests to accept assembled EST sequences. These sequence data have become more useful than used to be, although they may not be correctly assembled or exist in nature. Therefore, INSDC decided to collect assembled EST sequences into the new division 'TSA'. TSA sequences are shotgun assemblies of primary sequences deposited in the EST division of INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA). Two specific keywords, "TSA" and "Transcriptome Shotgun Assembly", are present in all TSA entries. The new division code, "TSA", is also described in the LOCUS line in all TSA entries. No format changes are anticipated for this new division, however, note that TSA entries make use of the same PRIMARY line that is described for the entries in TPA category. The PRIMARY block contains references to the underlying reads/transcripts that were assembled to construct a TSA record. ------------------ Since release 42 ------------------ Deletion of E-mail address, phone and fax numbers from DAD flat file To follow the Japanese law of protecting personal information, DDBJ delete both phone and fax numbers, and E-mail address from the flat files of entries submitted to DDBJ. Also, it would be helpful to protect DAD releases against SPAM mail senders. DDBJ retrofitted most of all entries submitted to DDBJ, not to GenBank or EMBL, by the DDBJ periodical release 72. Before the DAD periodical release 42, the submitter information was described in JOURNAL line at REFERENCE 1 as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Taro Mishima, DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan (E-mail:ddbj@ddbj.nig.ac.jp, URL:http://www.ddbj.nig.ac.jp/, Tel:81-12-345-6789, Fax:81-12-345-9876) ------------------------------------------------------------------------------- After the deletion or the information in question, DAD flat file is either one of the following two types; Type 1: Phone and fax numbers and E-mail address are deleted. ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ ------------------------------------------------------------------------------- Type 2: When the submitters wish to keep their contact information disclosed, it is described as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ E-mail :ddbj@ddbj.nig.ac.jp Phone :81-12-345-6789 Fax :81-12-345-9876 ------------------------------------------------------------------------------- ------------------ Since release 40 ------------------ The CON division has been included. CON; Contig / Constructed To conjugate a series of entries, such as those submitted from a genome project, each of the three data banks constructs an entry and assign an accession number to a large scale sequence dataset. Such entries are classified into the CON division. ------------------ Since release 38 ------------------ From the present release, we change the maximum file size to 1.5 GB, because the network capacity has been remarkably increased. Each file named as ddbj***##.DAD has at most 1.5 GB storage capacity. See also the sections, '9. Statistics of DAD'. ------------------ Since release 32 ------------------ Introduction of ENV division : Recently, the submissions of the sequences derived from environmental samples have rapidly increased. To accommodate such submissions, a new division, ENV, has been created. This division contains the sequences obtained via direct molecular isolation such as PCR, DGGE, or any anonymous method. In the past, the sequences derived from environmental samples belonged to taxonomic divisions, mainly BCT. At DDBJ, the retrofit to transfer relevant entries from taxonomic divisions to the ENV division starts in the present release, and ends by the next periodical release. Please note that during this transitional period, some entries to be eventually placed in the ENV division will be found in other divisions. ------------------ Since release 30 ------------------ "H-InvDB" has been added to db_xref(cross-reference) as a qualifier key. The following is an example. FEATURES Location/Qualifiers source 1..5589 /clone="hf00223s1" /clone_lib="pBluescriptII SK plus" /db_xref="H-InvDB:HIT000000001" ------------------ Since release 29 ------------------ The GSS division has been included since release 29. GSS stands for the Genome Survey Sequence, which is similar to EST, except that GSS is genomic DNA whereas EST is cDNA. ------------------ Since release 21 ------------------ 1) Some information on introns has been added. It is given as "intron_pos" in the Feature/Qualifiers. Examples: intron_pos 142:1 (2/12) means that the 2nd intron among 12 in total is located between the 1st and 2nd bases of the 142th codon (amino acid residue). intron_pos 228:0 (4/12) means that the 4th intron among 12 in total is located between the 227th and 228th codons (between the 3rd base of the 227th codon and the 1st base of the 228th codon). 2) the Locus line has been changed. The following is an example and its explanation: LOCUS BAA21794.1 263 aa PRT BCT 05-FEB-1999 Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-28 Locus name 29-29 space 30-40 Length of sequence, right-justified 41-41 space 42-43 'aa' 44-47 spaces 48-53 'PRT' 54-64 spaces 65-67 Division code 68-68 space 69-79 Date, in the form DD-MMM-YYYY (e.g., 15-MAR-1991) --------------------- 3) TPA data have been provided in a separate file (ddbjtpa.DAD). 10. Statistics of DAD The followings are statistics of this release of DAD. ttotal number of entries 66,795,627 total length of sequences 20,913,963,738 average length 313 name of longest sequence CP000108-608 PID:ABB27887.1 length of longest sequence 36,805 aa (CP000108-608) ========================================================================= file name no. of entries no. of amino acids file size ========================================================================= ddbjbct1.DAD 340391 102902443 1468006831 ddbjbct2.DAD 493075 149974129 1468008285 ddbjbct3.DAD 588899 186342986 1468008184 ddbjbct4.DAD 624909 193210988 1468007283 ddbjbct5.DAD 523270 162063343 1468009610 ddbjbct6.DAD 444168 142015348 1468009787 ddbjbct7.DAD 436058 136810482 1468008272 ddbjbct8.DAD 437319 136330553 1468009590 ddbjbct9.DAD 485688 148619418 1468010141 ddbjbct10.DAD 327795 106840275 1468007271 ddbjbct11.DAD 380733 119503203 1468008607 ddbjbct12.DAD 401155 126208849 1468009685 ddbjbct13.DAD 347367 110644419 1468008551 ddbjbct14.DAD 380688 121491419 1468007113 ddbjbct15.DAD 467416 147201361 1468007059 ddbjbct16.DAD 452312 141346842 1468011123 ddbjbct17.DAD 387284 123241935 1468011388 ddbjbct18.DAD 548373 171174183 1468008117 ddbjbct19.DAD 556103 174048588 1468009384 ddbjbct20.DAD 453194 139319476 1468008883 ddbjbct21.DAD 474505 145906132 1468009755 ddbjbct22.DAD 385211 119877886 1468007047 ddbjbct23.DAD 350605 108548159 1468010060 ddbjbct24.DAD 269369 83460544 1468011613 ddbjbct25.DAD 278445 86257129 1468010504 ddbjbct26.DAD 348762 107937613 1468009717 ddbjbct27.DAD 460094 141405905 1468006702 ddbjbct28.DAD 424891 134238004 1468009221 ddbjbct29.DAD 430698 136275869 1468007601 ddbjbct30.DAD 414247 134431629 1468009893 ddbjbct31.DAD 454552 141138930 1468008675 ddbjbct32.DAD 455895 140440413 1468008532 ddbjbct33.DAD 423141 130938567 1468008684 ddbjbct34.DAD 379829 116691956 1468009986 ddbjbct35.DAD 424458 130946137 1468009187 ddbjbct36.DAD 399175 124670441 1468006545 ddbjbct37.DAD 412203 129135195 1468010630 ddbjbct38.DAD 419314 132960210 1468009430 ddbjbct39.DAD 412296 131655391 1468007851 ddbjbct40.DAD 380793 121018320 1468009146 ddbjbct41.DAD 397478 124095258 1468009380 ddbjbct42.DAD 411321 129839796 1468009147 ddbjbct43.DAD 409036 126095406 1468008107 ddbjbct44.DAD 394443 121432386 1468009667 ddbjbct45.DAD 421560 133067664 1468006476 ddbjbct46.DAD 456543 143849868 1468008876 ddbjbct47.DAD 443304 139503758 1468009293 ddbjbct48.DAD 390766 121859185 1468009427 ddbjbct49.DAD 340687 106891183 1468009136 ddbjbct50.DAD 352907 109383240 1468010134 ddbjbct51.DAD 349532 107977620 1468006413 ddbjbct52.DAD 359816 111706774 1468007408 ddbjbct53.DAD 380697 118489909 1468006597 ddbjbct54.DAD 364695 114483772 1468011065 ddbjbct55.DAD 371626 116348615 1468007114 ddbjbct56.DAD 379217 120626094 1468006919 ddbjbct57.DAD 381071 120949016 1468008383 ddbjbct58.DAD 362689 112811467 1468006879 ddbjbct59.DAD 375512 113631180 1468007693 ddbjbct60.DAD 369248 115430005 1468009162 ddbjbct61.DAD 365055 114510639 1468007658 ddbjbct62.DAD 384088 120221821 1468009521 ddbjbct63.DAD 353688 110144854 1468007324 ddbjbct64.DAD 333652 100024640 1468009044 ddbjbct65.DAD 356940 109416642 1468010303 ddbjbct66.DAD 337997 102484451 1468007723 ddbjbct67.DAD 344673 107472106 1468012842 ddbjbct68.DAD 349418 109984721 1468008651 ddbjbct69.DAD 355770 107785746 1468008438 ddbjbct70.DAD 363368 110935328 1468008003 ddbjbct71.DAD 361222 112711055 1468006647 ddbjbct72.DAD 393431 120631783 1468006640 ddbjbct73.DAD 350072 107194309 1468008266 ddbjbct74.DAD 336548 105445859 1468008618 ddbjbct75.DAD 387201 121772930 1468007580 ddbjbct76.DAD 639788 187930111 1468008402 ddbjbct77.DAD 683872 207737303 1468007898 ddbjbct78.DAD 755983 196862346 1468007101 ddbjbct79.DAD 849125 266639617 1468006952 ddbjbct80.DAD 864927 279301738 1438312401 ddbjcon1.DAD 212056 93055116 1468013910 ddbjcon2.DAD 278762 116167370 1468012380 ddbjcon3.DAD 180912 95685792 1468013814 ddbjcon4.DAD 324496 140176152 1468008684 ddbjcon5.DAD 322320 119087762 1468006508 ddbjcon6.DAD 335690 145224477 1468006836 ddbjcon7.DAD 452840 192257921 1468008157 ddbjcon8.DAD 516926 208830272 1468009013 ddbjcon9.DAD 488664 182840879 1468006866 ddbjcon10.DAD 391643 76775571 1468009770 ddbjcon11.DAD 370653 64720984 1468008651 ddbjcon12.DAD 370635 64745381 1468009738 ddbjcon13.DAD 370657 64746525 1468007366 ddbjcon14.DAD 370780 64618429 1468010137 ddbjcon15.DAD 370634 64694191 1468008053 ddbjcon16.DAD 371052 64340537 1468007481 ddbjcon17.DAD 370666 64099349 1468008850 ddbjcon18.DAD 366652 75345195 1468007451 ddbjcon19.DAD 366055 76913363 1468010157 ddbjcon20.DAD 366498 73048104 1468009224 ddbjcon21.DAD 365611 76981961 1468008259 ddbjcon22.DAD 366492 75757182 1468009739 ddbjcon23.DAD 361809 85811662 1468009377 ddbjcon24.DAD 361112 87684179 1468006901 ddbjcon25.DAD 361451 85034603 1468008370 ddbjcon26.DAD 396481 112112008 1468008519 ddbjcon27.DAD 471239 183624743 1468008012 ddbjcon28.DAD 409166 151849506 1468008722 ddbjcon29.DAD 385236 162372982 1468008332 ddbjcon30.DAD 489857 196302208 1468008480 ddbjcon31.DAD 352660 146181311 1468009725 ddbjcon32.DAD 330951 139932582 1468006955 ddbjcon33.DAD 381455 160233396 1468007393 ddbjcon34.DAD 440348 193115459 1468007919 ddbjcon35.DAD 465370 193229096 1468008570 ddbjcon36.DAD 401168 188226443 1468009113 ddbjcon37.DAD 495427 198697173 1468008282 ddbjcon38.DAD 470776 198106619 1468007880 ddbjcon39.DAD 383456 147579913 1468011127 ddbjcon40.DAD 384709 139608564 1468007083 ddbjcon41.DAD 453239 210617574 1468008832 ddbjcon42.DAD 422354 174503161 1468008078 ddbjcon43.DAD 371881 159517790 1468006776 ddbjcon44.DAD 355461 156139248 1468007032 ddbjcon45.DAD 312016 129792839 969101970 ddbjenv1.DAD 672812 140793028 1468007095 ddbjenv2.DAD 257461 48342062 483379454 ddbjest1.DAD 1163 153762 2540789 ddbjgss1.DAD 3137 962078 7898298 ddbjhtc1.DAD 121231 37327305 436126314 ddbjhtg1.DAD 64497 17675817 262992500 ddbjhum1.DAD 631238 184034762 1468008410 ddbjhum2.DAD 214546 57960498 518624167 ddbjinv1.DAD 591502 179138736 1468007527 ddbjinv2.DAD 701150 181525119 1468006773 ddbjinv3.DAD 709962 152925577 1468007502 ddbjinv4.DAD 662050 132306592 1468008015 ddbjinv5.DAD 636902 133815048 1468008204 ddbjinv6.DAD 230432 107420492 497848313 ddbjmam1.DAD 311606 79396319 648931542 ddbjpat1.DAD 392001 164266855 582399547 ddbjphg1.DAD 495117 104429265 1036728987 ddbjpln1.DAD 457096 161235769 1468006483 ddbjpln2.DAD 438291 169202169 1468009321 ddbjpln3.DAD 434690 213380308 1468009869 ddbjpln4.DAD 658089 199290405 1468008233 ddbjpln5.DAD 757004 176849120 1468008267 ddbjpln6.DAD 746766 194567645 1468007779 ddbjpln7.DAD 108993 42825880 222596489 ddbjpri1.DAD 88998 21147634 190699069 ddbjrod1.DAD 232152 71310163 565953680 ddbjsts1.DAD 9 812 22011 ddbjsyn1.DAD 246150 85752546 606754043 ddbjtpa1.DAD 64049 25601689 198497851 ddbjtpacon1.DAD 42523 21795983 179418237 ddbjtsa1.DAD 121680 49825470 321841979 ddbjuna1.DAD 230 39999 390947 ddbjvrl1.DAD 667923 212959976 1468007015 ddbjvrl2.DAD 691169 213240164 1468008237 ddbjvrl3.DAD 638054 205973961 1468007782 ddbjvrl4.DAD 620040 228061608 1468007242 ddbjvrl5.DAD 613992 240147024 1468007971 ddbjvrl6.DAD 17684 4468197 36164358 ddbjvrt1.DAD 707012 174063976 1468008456 ddbjvrt2.DAD 604264 134459488 1187766506 ========================================================================= Total 66795627 20913963738 218852202985 =========================================================================