DDBJ Amino Acid Sequence Database (DAD) Release 77.0, Dec. 2016, including 56,608,385 entries, 17,718,176,559 residues Last published date in the present release: November 25, 2016 ------------------------------------------------------------------------------- Table of contents ------------------------------------------------------------------------------- 1. Introduction 1.1. Announcement for changes in the present release 1.2. Announcement for the forthcoming changes 2. Format of DAD entries 3. DAD categories 4. Citation 5. Contact information 6. Disclaimer 7. DAD file categories 8. A sample of DAD entries 9. Release history 10. Statistics of DAD ------------------------------------------------------------------------------- 1. Introduction This is release 77.0 of DDBJ Amino Acid Sequence Database (DAD). This database has been produced by extracting all translated sequences from conventional sequence data of DDBJ periodical release 107.0 and TPA data set (November 2016). 1.1. Announcement for changes in the present release Nothing particular. 1.2. Announcement for the forthcoming changes Nothing particular. 2. Format of DAD entries The standard format of DAD is almost the same as that of the DDBJ nucleotide sequence database except for those described below. Accession numbers of the DAD entries are written in the lines labeled as "ACCESSION." An accession number of DAD is comprised of a DDBJ accession number and an integer that begins with 1. These two numbers are combined by a hyphen (-). For example, two amino acid sequences extracted from a DDBJ entry D12345 respectively have accession numbers of D12345-1 and D12345-2. The number is useful for identifying a DAD entry. An amino acid sequence begins from the next line of "BEGIN." Up to sixty amino acids are written in one line. Following the amino acid sequence, there is a double slash (//) which means the end of the entry. LOCUS line contains locus name, length of protein, molecular type (this is always "PRT"), division name, and date of release of DNA counterpart. DEFINITION line contains species name and protein name. The other parts of a DAD entry, including FEATURES, are almost the same as those of the corresponding DDBJ entry. 3. DAD categories DAD entries are classified into 23 categories, adding TPA and TPACON to the 21 divisions of conventional sequence data of DDBJ periodical release. Please refer to the release note of the DDBJ release for details (filename: ddbjrel.txt). Also, there are two types of DAD files for each division; files with suffix ".DAD" in the DAD standard format, and those with suffix ".DAD.fasta" in a FASTA-compatible format. [DDBJ release note] ftp://ftp.ddbj.nig.ac.jp/ddbj_database/ddbj/ddbjrel.txt 4. Citation When you use DAD in your research, we would appreciate it if you would include a reference to DDBJ in your publications related to your research. When citing an entry in the DAD database, it is appropriate to give the protein_id and its accession number. Also, it is recommended to cite the first publication in REFERENCE of the entry other than submitter information. DDBJ suggests authors add a reference to DDBJ itself. The following publication, which describes the recent activities of the DDBJ center, would be appropriate to be cited: Mashima J, Kodama Y, Kosuge T, Fujisawa T, Katayama T, Nagasaki H, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y and Takagi T. DNA data bank of Japan (DDBJ) progress report. Nucleic Acids Res. 44 (Database issue), D51-D57 (2016) DOI: 10.1093/nar/gkv1105 The following sentence is an example to cite an entry in the DAD database: ----------------------------------------------------------------------------- "We searched the DAD database (1) by sequence similarities and found an amino acid sequence (2), with protein_id BAA22986.1 in DDBJ accession number AB000714, which had significant similarity with ..." (1) Mashima, J. et al, Nucleic Acids Res. 44(Database issue), D51-D57 (2016). (2) Katahira, J. et al, J. Biol. Chem. 272, 26652-26658 (1997). ------------------------------------------------------------------------------ 5. Contact information DNA Data Bank of Japan DDBJ Center National Institute of Genetics Research Organization of Information and Systems Mishima 411-8540, Japan Phone: +81 55 981 6853 FAX: +81 55 981 6849 E-mail: ddbj@ddbj.nig.ac.jp WWW: http://www.ddbj.nig.ac.jp/ 6. Disclaimer While DDBJ endeavors to keep its data correct, DDBJ makes no representations or warranties of any kind about the completeness, accuracy or reliability with respect to the entries contained in the DAD periodical release. DDBJ also makes no legal liability or responsibility of merchantability or fitness for a particular purpose or that the use of the sequence data will not infringe any patent or other rights. Any receipt, reliance or use you place on such data is therefore strictly at your own risk. 7. DAD file categories This release covers 23 categories (see also '3. DAD categories'.) of organisms and others as follows: ------------------------------------------------------------------------------ ddbjbct; Category for bacteria ddbjcon; Category for CON (contigs) ddbjenv; Category for ENV (environmental samples) ddbjest; Category for EST (expressed sequence tags) ddbjgss; Category for GSS (genome survey sequences) ddbjhtc; Category for HTC (high throughput cDNA sequences) ddbjhtg; Category for HTG (high throughput genomic sequences) ddbjhum; Category for human ddbjinv; Category for invertebrates ddbjmam; Category for mammals other than primates and rodents ddbjpat; Category for patents ddbjphg; Category for phages ddbjpln; Category for plants ddbjpri; Category for primates other than human ddbjrod; Category for rodents ddbjsts; Category for STS (sequence tagged sites) ddbjsyn; Category for synthetic DNAs ddbjtpa; Category for TPA (third party annotations) ddbjtpacon; Category for CON (contigs) of TPA (third party annotations) ddbjtsa; Category for TSA (transcriptome shotgun assemblies) ddbjuna; Category for unannotated sequences ddbjvrl; Category for viruses ddbjvrt; Category for vertebrates other than mammals ------------------------------------------------------------------------------ All of above in the present release are recorded in ddbj***##.DAD files as follows, respectively. file prefix number of files ------------------------------- ddbjbct 59 ddbjcon 45 ddbjenv 2 ddbjest 1 ddbjgss 1 ddbjhtc 1 ddbjhtg 1 ddbjhum 2 ddbjinv 6 ddbjmam 1 ddbjpat 1 ddbjphg 1 ddbjpln 6 ddbjpri 1 ddbjrod 1 ddbjsts 1 ddbjsyn 1 ddbjtpa 1 ddbjtpacon 1 ddbjtsa 1 ddbjuna 1 ddbjvrl 5 ddbjvrt 2 ------------------------------- 8. A sample of DAD entries Below is a typical DAD entry. This might be useful for understanding its format and contents. ----- ----- ----- ----- sample begin ----- ----- ----- ----- LOCUS BAA22986.1 220 aa PRT HUM 28-OCT-1997 DEFINITION Homo sapiens RVP1 protein. ACCESSION AB000714-1 PROTEIN_ID BAA22986.1 SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryotae; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 1250) AUTHORS Katahira,J. TITLE Direct Submission JOURNAL Submitted (26-JAN-1997) to the DDBJ/EMBL/GenBank databases. Contact:Jun Katahira Institute for Microbial Diseases, Osaka University, Department of Bacterial Toxinology; 3-1, Yamadaoka, Suita, Osaka 565, Japan REFERENCE 2 AUTHORS Katahira,J., Sugiyama,H., Inoue,N., Horiguchi,Y., Matsuda,M. and Sugimoto,N. TITLE Clostridium perfringens enterotoxin utilizes two structurally related membrane proteins as functional receptors in vivo JOURNAL J. Biol. Chem. 272, 26652-26658 (1997) COMMENT FEATURES Qualifiers source /db_xref="H-InvDB:HIT000057926" /mol_type="mRNA" /organism="Homo sapiens" /tissue_lib="lung" protein /gene="hRVP1" /transl_table=1 BEGIN 1 MSMGLEITGT ALAVLGWLGT IVCCALPMWR VSAFIGSNII TSQNIWEGLW MNCVVQSTGQ 61 MQCKVYDSLL ALPQDLQAAR ALIVVAILLA AFGLLVALVG AQCTNCVQDD TAKAKITIVA 121 GVLFLLAALL TLVPVSWSAN TIIRDFYNPV VPEAQKREMG AGLYVGWAAA ALQLLGGALL 181 CCSCPPREKK YTATKVVYSA PRSTGPGASL GTGYDRKDYV // ----- ----- ----- ----- sample end ----- ----- ----- ----- 9. Release history ------------------ Since release 50 ------------------ The format of the SOURCE line in DAD flat file has been changed: As results of this change, 1) the order of organism name and organelle name is changed and 2) some of DAD flat files have included a common name like as GenBank flat files. The change is shown below in detail. ---------------- Old (-rel. 49) ---------------- Format: SOURCE [] Example: SOURCE Homo sapiens mitochondrion ---------------- New (rel. 50-) ---------------- Format: SOURCE [] [()] Example: SOURCE mitochondrion Homo sapiens (human) See also '8. A sample of DAD entries'. ------------------ Since release 45 ------------------ A new division, TSA (Transcriptome Shotgun Assembly) is started: A new division for assembled mRNA sequences, Transcriptome Shotgun Assembly (TSA), is included in the present release. With new sequencing technologies, INSDC has faced many requests to accept assembled EST sequences. These sequence data have become more useful than used to be, although they may not be correctly assembled or exist in nature. Therefore, INSDC decided to collect assembled EST sequences into the new division 'TSA'. TSA sequences are shotgun assemblies of primary sequences deposited in the EST division of INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA). Two specific keywords, "TSA" and "Transcriptome Shotgun Assembly", are present in all TSA entries. The new division code, "TSA", is also described in the LOCUS line in all TSA entries. No format changes are anticipated for this new division, however, note that TSA entries make use of the same PRIMARY line that is described for the entries in TPA category. The PRIMARY block contains references to the underlying reads/transcripts that were assembled to construct a TSA record. ------------------ Since release 42 ------------------ Deletion of E-mail address, phone and fax numbers from DAD flat file To follow the Japanese law of protecting personal information, DDBJ delete both phone and fax numbers, and E-mail address from the flat files of entries submitted to DDBJ. Also, it would be helpful to protect DAD releases against SPAM mail senders. DDBJ retrofitted most of all entries submitted to DDBJ, not to GenBank or EMBL, by the DDBJ periodical release 72. Before the DAD periodical release 42, the submitter information was described in JOURNAL line at REFERENCE 1 as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Taro Mishima, DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan (E-mail:ddbj@ddbj.nig.ac.jp, URL:http://www.ddbj.nig.ac.jp/, Tel:81-12-345-6789, Fax:81-12-345-9876) ------------------------------------------------------------------------------- After the deletion or the information in question, DAD flat file is either one of the following two types; Type 1: Phone and fax numbers and E-mail address are deleted. ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ ------------------------------------------------------------------------------- Type 2: When the submitters wish to keep their contact information disclosed, it is described as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ E-mail :ddbj@ddbj.nig.ac.jp Phone :81-12-345-6789 Fax :81-12-345-9876 ------------------------------------------------------------------------------- ------------------ Since release 40 ------------------ The CON division has been included. CON; Contig / Constructed To conjugate a series of entries, such as those submitted from a genome project, each of the three data banks constructs an entry and assign an accession number to a large scale sequence dataset. Such entries are classified into the CON division. ------------------ Since release 38 ------------------ From the present release, we change the maximum file size to 1.5 GB, because the network capacity has been remarkably increased. Each file named as ddbj***##.DAD has at most 1.5 GB storage capacity. See also the sections, '9. Statistics of DAD'. ------------------ Since release 32 ------------------ Introduction of ENV division : Recently, the submissions of the sequences derived from environmental samples have rapidly increased. To accommodate such submissions, a new division, ENV, has been created. This division contains the sequences obtained via direct molecular isolation such as PCR, DGGE, or any anonymous method. In the past, the sequences derived from environmental samples belonged to taxonomic divisions, mainly BCT. At DDBJ, the retrofit to transfer relevant entries from taxonomic divisions to the ENV division starts in the present release, and ends by the next periodical release. Please note that during this transitional period, some entries to be eventually placed in the ENV division will be found in other divisions. ------------------ Since release 30 ------------------ "H-InvDB" has been added to db_xref(cross-reference) as a qualifier key. The following is an example. FEATURES Location/Qualifiers source 1..5589 /clone="hf00223s1" /clone_lib="pBluescriptII SK plus" /db_xref="H-InvDB:HIT000000001" ------------------ Since release 29 ------------------ The GSS division has been included since release 29. GSS stands for the Genome Survey Sequence, which is similar to EST, except that GSS is genomic DNA whereas EST is cDNA. ------------------ Since release 21 ------------------ 1) Some information on introns has been added. It is given as "intron_pos" in the Feature/Qualifiers. Examples: intron_pos 142:1 (2/12) means that the 2nd intron among 12 in total is located between the 1st and 2nd bases of the 142th codon (amino acid residue). intron_pos 228:0 (4/12) means that the 4th intron among 12 in total is located between the 227th and 228th codons (between the 3rd base of the 227th codon and the 1st base of the 228th codon). 2) the Locus line has been changed. The following is an example and its explanation: LOCUS BAA21794.1 263 aa PRT BCT 05-FEB-1999 Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-28 Locus name 29-29 space 30-40 Length of sequence, right-justified 41-41 space 42-43 'aa' 44-47 spaces 48-53 'PRT' 54-64 spaces 65-67 Division code 68-68 space 69-79 Date, in the form DD-MMM-YYYY (e.g., 15-MAR-1991) --------------------- 3) TPA data have been provided in a separate file (ddbjtpa.DAD). 10. Statistics of DAD The followings are statistics of this release of DAD. total number of entries 56,608,385 total length of sequences 17,718,176,559 average length 312 name of longest sequence CP000108-608 PID:ABB27887.1 length of longest sequence 36,805 aa (CP000108-608) ========================================================================= file name no. of entries no. of amino acids file size ========================================================================= ddbjbct1.DAD 327056 99046677 1468008701 ddbjbct2.DAD 498524 151436740 1468014999 ddbjbct3.DAD 579513 183372352 1468008099 ddbjbct4.DAD 602714 182660357 1468006647 ddbjbct5.DAD 456719 146252253 1468008098 ddbjbct6.DAD 443089 140184665 1468008075 ddbjbct7.DAD 423479 132436636 1468007037 ddbjbct8.DAD 458730 141923750 1468007373 ddbjbct9.DAD 402247 126556284 1468008973 ddbjbct10.DAD 335332 108521153 1468008272 ddbjbct11.DAD 385305 118913499 1468006929 ddbjbct12.DAD 368705 117828178 1468008234 ddbjbct13.DAD 390721 123375017 1468010394 ddbjbct14.DAD 421209 133599921 1468006854 ddbjbct15.DAD 432804 136554722 1468009026 ddbjbct16.DAD 433000 135911956 1468006489 ddbjbct17.DAD 445530 139671509 1468006551 ddbjbct18.DAD 555696 174494129 1468007053 ddbjbct19.DAD 484818 150583467 1468006818 ddbjbct20.DAD 463129 141909441 1468006524 ddbjbct21.DAD 433856 134657408 1468008127 ddbjbct22.DAD 371984 115550752 1468007206 ddbjbct23.DAD 279048 86526526 1468006580 ddbjbct24.DAD 266412 82468837 1468011091 ddbjbct25.DAD 302774 94102622 1468007093 ddbjbct26.DAD 405036 124272400 1468007294 ddbjbct27.DAD 470032 146809196 1468009399 ddbjbct28.DAD 420976 133332956 1468008130 ddbjbct29.DAD 446219 143247961 1468007475 ddbjbct30.DAD 455391 141301947 1468007877 ddbjbct31.DAD 453189 139331654 1468008030 ddbjbct32.DAD 405062 125728962 1468008261 ddbjbct33.DAD 412869 126276028 1468006813 ddbjbct34.DAD 409015 127522734 1468008296 ddbjbct35.DAD 432037 135644896 1468007167 ddbjbct36.DAD 418175 131907042 1468008783 ddbjbct37.DAD 403759 127178217 1468009504 ddbjbct38.DAD 402564 126302781 1468009142 ddbjbct39.DAD 412875 128575735 1468006593 ddbjbct40.DAD 388027 118845260 1468007506 ddbjbct41.DAD 419414 132303292 1468006591 ddbjbct42.DAD 420645 133257109 1468008429 ddbjbct43.DAD 430373 135176589 1468007891 ddbjbct44.DAD 343631 107825834 1468006667 ddbjbct45.DAD 362244 113671361 1468009494 ddbjbct46.DAD 343172 106722879 1468006770 ddbjbct47.DAD 360556 112202014 1468009851 ddbjbct48.DAD 366134 115191954 1468010073 ddbjbct49.DAD 354643 111162197 1468006542 ddbjbct50.DAD 348424 112355335 1468007552 ddbjbct51.DAD 362369 114199604 1468008143 ddbjbct52.DAD 364639 110083517 1468009388 ddbjbct53.DAD 367296 114369516 1468007024 ddbjbct54.DAD 361726 113439352 1468008357 ddbjbct55.DAD 606053 178973941 1468006619 ddbjbct56.DAD 657224 199863812 1468007157 ddbjbct57.DAD 729824 191896867 1468007785 ddbjbct58.DAD 851903 265377758 1468007665 ddbjbct59.DAD 551402 180828125 831446615 ddbjcon1.DAD 211313 92677386 1468011001 ddbjcon2.DAD 277476 115175557 1468006956 ddbjcon3.DAD 180516 95299759 1468007453 ddbjcon4.DAD 281425 115370899 1468006975 ddbjcon5.DAD 319360 118081509 1468007368 ddbjcon6.DAD 329694 143011029 1468006747 ddbjcon7.DAD 490797 205631025 1468008381 ddbjcon8.DAD 471681 188435652 1468009208 ddbjcon9.DAD 467538 150329541 1468006715 ddbjcon10.DAD 366514 63996456 1468007293 ddbjcon11.DAD 366473 64025046 1468009073 ddbjcon12.DAD 366540 63948453 1468008572 ddbjcon13.DAD 366515 63928242 1468009761 ddbjcon14.DAD 366555 63926777 1468009873 ddbjcon15.DAD 366449 64197885 1468007980 ddbjcon16.DAD 367205 62737917 1468009736 ddbjcon17.DAD 366036 65142025 1468006942 ddbjcon18.DAD 361510 76614505 1468007193 ddbjcon19.DAD 362516 74303526 1468010179 ddbjcon20.DAD 361688 74245222 1468007237 ddbjcon21.DAD 362462 74237905 1468007327 ddbjcon22.DAD 360945 78103679 1468007885 ddbjcon23.DAD 358183 84502887 1468007038 ddbjcon24.DAD 356790 87230696 1468009487 ddbjcon25.DAD 357541 83887649 1468008364 ddbjcon26.DAD 405406 126524364 1468007006 ddbjcon27.DAD 449343 173805179 1468008221 ddbjcon28.DAD 401365 149884990 1468008441 ddbjcon29.DAD 404014 164402192 1468006765 ddbjcon30.DAD 475345 195540310 1468008234 ddbjcon31.DAD 339596 140603742 1468007271 ddbjcon32.DAD 342212 145499533 1468007401 ddbjcon33.DAD 367399 153969137 1468007079 ddbjcon34.DAD 448786 195381252 1468009245 ddbjcon35.DAD 457847 190176473 1468008031 ddbjcon36.DAD 398299 186318334 1468007215 ddbjcon37.DAD 482545 194403762 1468008211 ddbjcon38.DAD 467327 195349086 1468007944 ddbjcon39.DAD 385366 150632162 1468007364 ddbjcon40.DAD 369206 128781349 1468008314 ddbjcon41.DAD 441214 210171430 1468006592 ddbjcon42.DAD 435426 180076860 1468009068 ddbjcon43.DAD 351402 146128544 1468009547 ddbjcon44.DAD 430517 177277632 1468006736 ddbjcon45.DAD 44435 21324255 85071341 ddbjenv1.DAD 668803 137782678 1468007250 ddbjenv2.DAD 146976 28361695 287130572 ddbjest1.DAD 1163 153762 2567182 ddbjgss1.DAD 3137 962078 8039402 ddbjhtc1.DAD 118750 36339559 433274001 ddbjhtg1.DAD 64444 17654107 263555908 ddbjhum1.DAD 619597 180881114 1468007463 ddbjhum2.DAD 106950 26586681 220956875 ddbjinv1.DAD 583850 177245614 1468008255 ddbjinv2.DAD 690679 179376959 1468006992 ddbjinv3.DAD 698153 150871769 1468008012 ddbjinv4.DAD 649814 129653054 1468007207 ddbjinv5.DAD 610053 145369597 1468006872 ddbjinv6.DAD 94706 59342665 186154671 ddbjmam1.DAD 292032 74232028 619748282 ddbjpat1.DAD 391930 164228016 582320568 ddbjphg1.DAD 433179 91835421 925650413 ddbjpln1.DAD 456326 161025396 1468006834 ddbjpln2.DAD 437418 168879289 1468009264 ddbjpln3.DAD 440539 216212370 1468008894 ddbjpln4.DAD 656799 198413504 1468007799 ddbjpln5.DAD 745486 170907200 1468006799 ddbjpln6.DAD 563277 157312832 1149424561 ddbjpri1.DAD 84165 19837270 183482437 ddbjrod1.DAD 223004 68525287 555460130 ddbjsts1.DAD 9 812 22053 ddbjsyn1.DAD 176903 64555069 452604492 ddbjtpa1.DAD 63641 25459458 197659690 ddbjtpacon1.DAD 71628 31568870 308879820 ddbjtsa1.DAD 121330 49627790 326398266 ddbjuna1.DAD 227 39165 388553 ddbjvrl1.DAD 659652 208983968 1468007274 ddbjvrl2.DAD 692211 210047072 1468007250 ddbjvrl3.DAD 630286 201203012 1468007258 ddbjvrl4.DAD 603657 226423031 1468009328 ddbjvrl5.DAD 341207 130380143 817503292 ddbjvrt1.DAD 693729 171053735 1468007456 ddbjvrt2.DAD 532611 117835000 1069710220 ========================================================================= Total 56608385 17718176559 184200396491 =========================================================================