DDBJ Amino Acid Sequence Database (DAD) Release 71.0, June 2015, including 41,675,650 entries, 12,913,734,732 residues Last published date in the present release: May 29, 2015 ------------------------------------------------------------------------------- Table of contents ------------------------------------------------------------------------------- 1. Introduction 1.1. Announcement for changes in the present release 1.2. Announcement for the forthcoming changes 2. Format of DAD entries 3. DAD categories 4. Contact information 5. Disclaimer 6. DAD file categories 7. A sample of DAD entries 8. Release history 9. Statistics of DAD ------------------------------------------------------------------------------- 1. Introduction This is release 71.0 of DDBJ Amino Acid Sequence Database (DAD). This database has been produced by extracting all translated sequences from the DDBJ periodical release 101.0 and TPA dataset (May 2015). 1.1. Announcement for changes in the present release Nothing particular. 1.2. Announcement for the forthcoming changes Nothing particular. 2. Format of DAD entries The standard format of DAD is almost the same as that of the DDBJ nucleotide sequence database except for those described below. Accession numbers of the DAD entries are written in the lines labeled as "ACCESSION." An accession number of DAD is comprised of a DDBJ accession number and an integer that begins with 1. These two numbers are combined by a hyphen (-). For example, two amino acid sequences extracted from a DDBJ entry D12345 respectively have accession numbers of D12345-1 and D12345-2. The number is useful for identifying a DAD entry. An amino acid sequence begins from the next line of "BEGIN." Up to sixty amino acids are written in one line. Following the amino acid sequence, there is a double slash (//) which means the end of the entry. LOCUS line contains locus name, length of protein, molecular type (this is always "PRT"), division name, and date of release of DNA counterpart. DEFINITION line contains species name and protein name. The other parts of a DAD entry, including FEATURES, are almost the same as those of the corresponding DDBJ entry. 3. DAD categories DAD entries are classified into 23 categories, adding TPA and TPACON to the 21 categories of DDBJ periodical release. Please refer to the release note of the DDBJ release for details (filename: ddbjrel.txt). Also, there are two types of DAD files for each division; files with suffix ".DAD" in the DAD standard format, and those with suffix ".DAD.fasta" in a FASTA-compatible format. [DDBJ release note] ftp://ftp.ddbj.nig.ac.jp/ddbj_database/ddbj/ddbjrel.txt 4. Contact information DNA Data Bank of Japan DDBJ Center National Institute of Genetics Research Organization of Information and Systems Mishima 411-8540, Japan Phone: +81 55 981 6853 FAX: +81 55 981 6849 E-mail: ddbj@ddbj.nig.ac.jp WWW: http://www.ddbj.nig.ac.jp/ 5. Disclaimer While DDBJ endeavors to keep its data correct, DDBJ makes no representations or warranties of any kind about the completeness, accuracy or reliability with respect to the entries contained in the DAD periodical release. DDBJ also makes no legal liability or responsibility of merchantability or fitness for a particular purpose or that the use of the sequence data will not infringe any patent or other rights. Any receipt, reliance or use you place on such data is therefore strictly at your own risk. 6. DAD file categories This release covers 23 categories (see also '3. DAD categories'.) of organisms and others as follows: ------------------------------------------------------------------------------ ddbjbct; Category for bacteria ddbjcon; Category for CON (contigs) ddbjenv; Category for ENV (environmental samples) ddbjest; Category for EST (expressed sequence tags) ddbjgss; Category for GSS (genome survey sequences) ddbjhtc; Category for HTC (high throughput cDNA sequences) ddbjhtg; Category for HTG (high throughput genomic sequences) ddbjhum; Category for human ddbjinv; Category for invertebrates ddbjmam; Category for mammals other than primates and rodents ddbjpat; Category for patents ddbjphg; Category for phages ddbjpln; Category for plants ddbjpri; Category for primates other than human ddbjrod; Category for rodents ddbjsts; Category for STS (sequence tagged sites) ddbjsyn; Category for synthetic DNAs ddbjtpa; Category for TPA (third party annotations) ddbjtpacon; Category for CON (contigs) of TPA (third party annotations) ddbjtsa; Category for TSA (transcriptome shotgun assemblies) ddbjuna; Category for unannotated sequences ddbjvrl; Category for viruses ddbjvrt; Category for vertebrates other than mammals ------------------------------------------------------------------------------ All of above in the present release are recorded in ddbj***##.DAD files as follows, respectively. file prefix number of files ------------------------------- ddbjbct 35 ddbjcon 40 ddbjenv 1 ddbjest 1 ddbjgss 1 ddbjhtc 1 ddbjhtg 1 ddbjhum 2 ddbjinv 3 ddbjmam 1 ddbjpat 1 ddbjphg 1 ddbjpln 5 ddbjpri 1 ddbjrod 1 ddbjsts 1 ddbjsyn 1 ddbjtpa 1 ddbjtpacon 1 ddbjtsa 1 ddbjuna 1 ddbjvrl 4 ddbjvrt 2 ------------------------------- 7. A sample of DAD entries Below is a typical DAD entry. This might be useful for understanding its format and contents. ----- ----- ----- ----- sample begin ----- ----- ----- ----- LOCUS BAA22986.1 220 aa PRT HUM 28-OCT-1997 DEFINITION Homo sapiens RVP1 protein. ACCESSION AB000714-1 PROTEIN_ID BAA22986.1 SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryotae; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 1250) AUTHORS Katahira,J. TITLE Direct Submission JOURNAL Submitted (26-JAN-1997) to the DDBJ/EMBL/GenBank databases. Contact:Jun Katahira Institute for Microbial Diseases, Osaka University, Department of Bacterial Toxinology; 3-1, Yamadaoka, Suita, Osaka 565, Japan REFERENCE 2 AUTHORS Katahira,J., Sugiyama,H., Inoue,N., Horiguchi,Y., Matsuda,M. and Sugimoto,N. TITLE Clostridium perfringens enterotoxin utilizes two structurally related membrane proteins as functional receptors in vivo JOURNAL J. Biol. Chem. 272, 26652-26658 (1997) COMMENT FEATURES Qualifiers source /db_xref="H-InvDB:HIT000057926" /mol_type="mRNA" /organism="Homo sapiens" /tissue_lib="lung" protein /gene="hRVP1" /transl_table=1 BEGIN 1 MSMGLEITGT ALAVLGWLGT IVCCALPMWR VSAFIGSNII TSQNIWEGLW MNCVVQSTGQ 61 MQCKVYDSLL ALPQDLQAAR ALIVVAILLA AFGLLVALVG AQCTNCVQDD TAKAKITIVA 121 GVLFLLAALL TLVPVSWSAN TIIRDFYNPV VPEAQKREMG AGLYVGWAAA ALQLLGGALL 181 CCSCPPREKK YTATKVVYSA PRSTGPGASL GTGYDRKDYV // ----- ----- ----- ----- sample end ----- ----- ----- ----- 8. Release history ------------------ Since release 50 ------------------ The format of the SOURCE line in DAD flat file has been changed: As results of this change, 1) the order of organism name and organelle name is changed and 2) some of DAD flat files have included a common name like as GenBank flat files. The change is shown below in detail. ---------------- Old (-rel. 49) ---------------- Format: SOURCE [] Example: SOURCE Homo sapiens mitochondrion ---------------- New (rel. 50-) ---------------- Format: SOURCE [] [()] Example: SOURCE mitochondrion Homo sapiens (human) See also '7. A sample of DAD entries'. ------------------ Since release 45 ------------------ A new division, TSA (Transcriptome Shotgun Assembly) is started: A new division for assembled mRNA sequences, Transcriptome Shotgun Assembly (TSA), is included in the present release. With new sequencing technologies, INSDC has faced many requests to accept assembled EST sequences. These sequence data have become more useful than used to be, although they may not be correctly assembled or exist in nature. Therefore, INSDC decided to collect assembled EST sequences into the new division 'TSA'. TSA sequences are shotgun assemblies of primary sequences deposited in the EST division of INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA). Two specific keywords, "TSA" and "Transcriptome Shotgun Assembly", are present in all TSA entries. The new division code, "TSA", is also described in the LOCUS line in all TSA entries. No format changes are anticipated for this new division, however, note that TSA entries make use of the same PRIMARY line that is described for the entries in TPA category. The PRIMARY block contains references to the underlying reads/transcripts that were assembled to construct a TSA record. ------------------ Since release 42 ------------------ Deletion of E-mail address, phone and fax numbers from DAD flat file To follow the Japanese law of protecting personal information, DDBJ delete both phone and fax numbers, and E-mail address from the flat files of entries submitted to DDBJ. Also, it would be helpful to protect DAD releases against SPAM mail senders. DDBJ retrofitted most of all entries submitted to DDBJ, not to GenBank or EMBL, by the DDBJ periodical release 72. Before the DAD periodical release 42, the submitter information was described in JOURNAL line at REFERENCE 1 as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Taro Mishima, DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan (E-mail:ddbj@ddbj.nig.ac.jp, URL:http://www.ddbj.nig.ac.jp/, Tel:81-12-345-6789, Fax:81-12-345-9876) ------------------------------------------------------------------------------- After the deletion or the information in question, DAD flat file is either one of the following two types; Type 1: Phone and fax numbers and E-mail address are deleted. ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ ------------------------------------------------------------------------------- Type 2: When the submitters wish to keep their contact information disclosed, it is described as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ E-mail :ddbj@ddbj.nig.ac.jp Phone :81-12-345-6789 Fax :81-12-345-9876 ------------------------------------------------------------------------------- ------------------ Since release 40 ------------------ The CON division has been included. CON; Contig / Constructed To conjugate a series of entries, such as those submitted from a genome project, each of the three data banks constructs an entry and assign an accession number to a large scale sequence dataset. Such entries are classified into the CON division. ------------------ Since release 38 ------------------ From the present release, we change the maximum file size to 1.5 GB, because the network capacity has been remarkably increased. Each file named as ddbj***##.DAD has at most 1.5 GB storage capacity. See also the sections, '9. Statistics of DAD'. ------------------ Since release 32 ------------------ Introduction of ENV division : Recently, the submissions of the sequences derived from environmental samples have rapidly increased. To accommodate such submissions, a new division, ENV, has been created. This division contains the sequences obtained via direct molecular isolation such as PCR, DGGE, or any anonymous method. In the past, the sequences derived from environmental samples belonged to taxonomic divisions, mainly BCT. At DDBJ, the retrofit to transfer relevant entries from taxonomic divisions to the ENV division starts in the present release, and ends by the next periodical release. Please note that during this transitional period, some entries to be eventually placed in the ENV division will be found in other divisions. ------------------ Since release 30 ------------------ "H-InvDB" has been added to db_xref(cross-reference) as a qualifier key. The following is an example. FEATURES Location/Qualifiers source 1..5589 /clone="hf00223s1" /clone_lib="pBluescriptII SK plus" /db_xref="H-InvDB:HIT000000001" ------------------ Since release 29 ------------------ The GSS division has been included since release 29. GSS stands for the Genome Survey Sequence, which is similar to EST, except that GSS is genomic DNA whereas EST is cDNA. ------------------ Since release 21 ------------------ 1) Some information on introns has been added. It is given as "intron_pos" in the Feature/Qualifiers. Examples: intron_pos 142:1 (2/12) means that the 2nd intron among 12 in total is located between the 1st and 2nd bases of the 142th codon (amino acid residue). intron_pos 228:0 (4/12) means that the 4th intron among 12 in total is located between the 227th and 228th codons (between the 3rd base of the 227th codon and the 1st base of the 228th codon). 2) the Locus line has been changed. The following is an example and its explanation: LOCUS BAA21794.1 263 aa PRT BCT 05-FEB-1999 Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-28 Locus name 29-29 space 30-40 Length of sequence, right-justified 41-41 space 42-43 'aa' 44-47 spaces 48-53 'PRT' 54-64 spaces 65-67 Division code 68-68 space 69-79 Date, in the form DD-MMM-YYYY (e.g., 15-MAR-1991) --------------------- 3) TPA data have been provided in a separate file (ddbjtpa.DAD). 9. Statistics of DAD The followings are statistics of this release of DAD. total number of entries 41,675,650 total length of sequences 12,913,734,732 average length 309 aa name of longest sequence CP000108-608 PID:ABB27887.1 length of longest sequence 36,805 aa (CP000108-608) ========================================================================= file name no. of entries no. of amino acids file size ========================================================================= ddbjbct1.DAD 323706 98040557 1468007618 ddbjbct2.DAD 497913 151254781 1468022171 ddbjbct3.DAD 580353 183863583 1468008049 ddbjbct4.DAD 544337 167290457 1468008069 ddbjbct5.DAD 437950 140231750 1468011194 ddbjbct6.DAD 426680 134083681 1468009451 ddbjbct7.DAD 434144 135344233 1468009878 ddbjbct8.DAD 479584 146636083 1468007872 ddbjbct9.DAD 345288 111762927 1468010662 ddbjbct10.DAD 368063 115739233 1468006584 ddbjbct11.DAD 399696 125622668 1468010831 ddbjbct12.DAD 345848 109972536 1468008194 ddbjbct13.DAD 379673 121478799 1468010033 ddbjbct14.DAD 447623 141083881 1468006912 ddbjbct15.DAD 449359 140750889 1468007160 ddbjbct16.DAD 418903 131916270 1468011471 ddbjbct17.DAD 502570 156515423 1468008870 ddbjbct18.DAD 569717 179238949 1468010425 ddbjbct19.DAD 450797 138647879 1468007453 ddbjbct20.DAD 459582 141494525 1468009378 ddbjbct21.DAD 383164 119300434 1468009215 ddbjbct22.DAD 326421 98812753 1468010392 ddbjbct23.DAD 322594 96845758 1468007422 ddbjbct24.DAD 364068 110870335 1468008198 ddbjbct25.DAD 462217 142300551 1468008185 ddbjbct26.DAD 433186 136480917 1468008158 ddbjbct27.DAD 458908 147311897 1468007257 ddbjbct28.DAD 491645 152017441 1468006635 ddbjbct29.DAD 487168 149608637 1468008348 ddbjbct30.DAD 427894 131325030 1468009298 ddbjbct31.DAD 435372 136000541 1468010405 ddbjbct32.DAD 563833 165994362 1468008641 ddbjbct33.DAD 612156 184082079 1468007289 ddbjbct34.DAD 694689 198912124 1468007953 ddbjbct35.DAD 420317 124537010 795396936 ddbjcon1.DAD 217040 93376752 1468012068 ddbjcon2.DAD 275990 109008234 1468009796 ddbjcon3.DAD 193944 85362277 1468006504 ddbjcon4.DAD 294988 106899059 1468008583 ddbjcon5.DAD 275369 108515203 1468006653 ddbjcon6.DAD 489560 204402865 1468007736 ddbjcon7.DAD 467084 176461997 1468008494 ddbjcon8.DAD 406017 90133042 1468008991 ddbjcon9.DAD 366541 63969168 1468008936 ddbjcon10.DAD 366493 63972757 1468009727 ddbjcon11.DAD 366534 63898903 1468008979 ddbjcon12.DAD 366515 63962806 1468008248 ddbjcon13.DAD 366446 64095403 1468008569 ddbjcon14.DAD 366678 63863616 1468009256 ddbjcon15.DAD 366865 62952043 1468007883 ddbjcon16.DAD 363582 71871313 1468009422 ddbjcon17.DAD 361705 76764572 1468007860 ddbjcon18.DAD 362833 71643878 1468008665 ddbjcon19.DAD 360853 77409426 1468006631 ddbjcon20.DAD 363744 71558432 1468008565 ddbjcon21.DAD 358032 84547027 1468010486 ddbjcon22.DAD 357762 84920893 1468008773 ddbjcon23.DAD 356888 86331360 1468006939 ddbjcon24.DAD 356319 87611139 1468007432 ddbjcon25.DAD 462119 177023514 1468006946 ddbjcon26.DAD 449533 155458417 1468010710 ddbjcon27.DAD 375860 157038344 1468010370 ddbjcon28.DAD 440618 176295648 1468007880 ddbjcon29.DAD 401431 162809956 1468006914 ddbjcon30.DAD 306522 127165317 1468006410 ddbjcon31.DAD 398773 172150528 1468007988 ddbjcon32.DAD 399959 170497341 1468009817 ddbjcon33.DAD 451729 201213874 1468009305 ddbjcon34.DAD 442469 179578264 1468008719 ddbjcon35.DAD 443903 190332659 1468007985 ddbjcon36.DAD 489535 197921230 1468008776 ddbjcon37.DAD 417042 176072556 1468011415 ddbjcon38.DAD 341499 111999682 1468007347 ddbjcon39.DAD 439120 201602115 1468006732 ddbjcon40.DAD 98840 37606659 263937268 ddbjenv1.DAD 633036 125816786 1339100467 ddbjest1.DAD 1163 153762 2563311 ddbjgss1.DAD 2898 925769 7587555 ddbjhtc1.DAD 108271 33722753 411463462 ddbjhtg1.DAD 63975 17454696 256910248 ddbjhum1.DAD 620240 181194556 1468006787 ddbjhum2.DAD 57937 14971964 121111083 ddbjinv1.DAD 576206 170281486 1468006522 ddbjinv2.DAD 693052 180151110 1468007415 ddbjinv3.DAD 667383 162557151 1407086906 ddbjmam1.DAD 243812 61410880 496165470 ddbjpat1.DAD 391101 163766053 579563786 ddbjphg1.DAD 287131 60544726 620114988 ddbjpln1.DAD 462926 163035153 1468007675 ddbjpln2.DAD 459809 194704265 1468007905 ddbjpln3.DAD 579187 215755023 1468006662 ddbjpln4.DAD 699253 202928462 1468008631 ddbjpln5.DAD 560076 135637845 1125340746 ddbjpri1.DAD 75755 17924423 165865662 ddbjrod1.DAD 204519 64089144 518448906 ddbjsts1.DAD 9 812 22033 ddbjsyn1.DAD 120656 45132177 324421106 ddbjtpa1.DAD 62661 25192474 190049788 ddbjtpacon1.DAD 71628 31568870 308879820 ddbjtsa1.DAD 120372 49450798 324034624 ddbjuna1.DAD 214 35721 360833 ddbjvrl1.DAD 665775 209576769 1468006820 ddbjvrl2.DAD 692420 208742664 1468006946 ddbjvrl3.DAD 633015 204286963 1468006712 ddbjvrl4.DAD 366511 134808578 856357479 ddbjvrt1.DAD 693412 170448479 1468007994 ddbjvrt2.DAD 329095 73797178 662084922 ========================================================================= Total 41675650 12913734732 134089589649 =========================================================================