DDBJ Amino Acid Sequence Database (DAD) Release 78.0, Mar. 2017, including 60,807,075 entries, 19,069,634,236 residues Last published date in the present release: February 24, 2017 ------------------------------------------------------------------------------- Table of contents ------------------------------------------------------------------------------- 1. Introduction 1.1. Announcement for changes in the present release 1.2. Announcement for the forthcoming changes 2. Format of DAD entries 3. DAD categories 4. Citation 5. Contact information 6. Disclaimer 7. DAD file categories 8. A sample of DAD entries 9. Release history 10. Statistics of DAD ------------------------------------------------------------------------------- 1. Introduction This is release 78.0 of DDBJ Amino Acid Sequence Database (DAD). This database has been produced by extracting all translated sequences from conventional sequence data of DDBJ periodical release 108.0 and TPA data set (February 2017). 1.1. Announcement for changes in the present release Nothing particular. 1.2. Announcement for the forthcoming changes Nothing particular. 2. Format of DAD entries The standard format of DAD is almost the same as that of the DDBJ nucleotide sequence database except for those described below. Accession numbers of the DAD entries are written in the lines labeled as "ACCESSION." An accession number of DAD is comprised of a DDBJ accession number and an integer that begins with 1. These two numbers are combined by a hyphen (-). For example, two amino acid sequences extracted from a DDBJ entry D12345 respectively have accession numbers of D12345-1 and D12345-2. The number is useful for identifying a DAD entry. An amino acid sequence begins from the next line of "BEGIN." Up to sixty amino acids are written in one line. Following the amino acid sequence, there is a double slash (//) which means the end of the entry. LOCUS line contains locus name, length of protein, molecular type (this is always "PRT"), division name, and date of release of DNA counterpart. DEFINITION line contains species name and protein name. The other parts of a DAD entry, including FEATURES, are almost the same as those of the corresponding DDBJ entry. 3. DAD categories DAD entries are classified into 23 categories, adding TPA and TPACON to the 21 divisions of conventional sequence data of DDBJ periodical release. Please refer to the release note of the DDBJ release for details (filename: ddbjrel.txt). Also, there are two types of DAD files for each division; files with suffix ".DAD" in the DAD standard format, and those with suffix ".DAD.fasta" in a FASTA-compatible format. [DDBJ release note] ftp://ftp.ddbj.nig.ac.jp/ddbj_database/ddbj/ddbjrel.txt 4. Citation When you use DAD in your research, we would appreciate it if you would include a reference to DDBJ in your publications related to your research. When citing an entry in the DAD database, it is appropriate to give the protein_id and its accession number. Also, it is recommended to cite the first publication in REFERENCE of the entry other than submitter information. DDBJ suggests authors add a reference to DDBJ itself. The following publication, which describes the recent activities of the DDBJ center, would be appropriate to be cited: Mashima J, Kodama Y, Fujisawa T, Katayama T, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y and Takagi T. DNA Data Bank of Japan. Nucleic Acids Res. 45, D25-D31 (2017) DOI: 10.1093/nar/gkw1001 The following sentence is an example to cite an entry in the DAD database: ----------------------------------------------------------------------------- "We searched the DAD database (1) by sequence similarities and found an amino acid sequence (2), with protein_id BAA22986.1 in DDBJ accession number AB000714, which had significant similarity with ..." (1) Mashima, J. et al, Nucleic Acids Res. 45, D25-D31 (2017). (2) Katahira, J. et al, J. Biol. Chem. 272, 26652-26658 (1997). ------------------------------------------------------------------------------ 5. Contact information DNA Data Bank of Japan DDBJ Center National Institute of Genetics Research Organization of Information and Systems Mishima 411-8540, Japan Phone: +81 55 981 6853 FAX: +81 55 981 6849 E-mail: ddbj@ddbj.nig.ac.jp WWW: http://www.ddbj.nig.ac.jp/ 6. Disclaimer While DDBJ endeavors to keep its data correct, DDBJ makes no representations or warranties of any kind about the completeness, accuracy or reliability with respect to the entries contained in the DAD periodical release. DDBJ also makes no legal liability or responsibility of merchantability or fitness for a particular purpose or that the use of the sequence data will not infringe any patent or other rights. Any receipt, reliance or use you place on such data is therefore strictly at your own risk. 7. DAD file categories This release covers 23 categories (see also '3. DAD categories'.) of organisms and others as follows: ------------------------------------------------------------------------------ ddbjbct; Category for bacteria ddbjcon; Category for CON (contigs) ddbjenv; Category for ENV (environmental samples) ddbjest; Category for EST (expressed sequence tags) ddbjgss; Category for GSS (genome survey sequences) ddbjhtc; Category for HTC (high throughput cDNA sequences) ddbjhtg; Category for HTG (high throughput genomic sequences) ddbjhum; Category for human ddbjinv; Category for invertebrates ddbjmam; Category for mammals other than primates and rodents ddbjpat; Category for patents ddbjphg; Category for phages ddbjpln; Category for plants ddbjpri; Category for primates other than human ddbjrod; Category for rodents ddbjsts; Category for STS (sequence tagged sites) ddbjsyn; Category for synthetic DNAs ddbjtpa; Category for TPA (third party annotations) ddbjtpacon; Category for CON (contigs) of TPA (third party annotations) ddbjtsa; Category for TSA (transcriptome shotgun assemblies) ddbjuna; Category for unannotated sequences ddbjvrl; Category for viruses ddbjvrt; Category for vertebrates other than mammals ------------------------------------------------------------------------------ All of above in the present release are recorded in ddbj***##.DAD files as follows, respectively. file prefix number of files ------------------------------- ddbjbct 69 ddbjcon 45 ddbjenv 2 ddbjest 1 ddbjgss 1 ddbjhtc 1 ddbjhtg 1 ddbjhum 2 ddbjinv 6 ddbjmam 1 ddbjpat 1 ddbjphg 1 ddbjpln 6 ddbjpri 1 ddbjrod 1 ddbjsts 1 ddbjsyn 1 ddbjtpa 1 ddbjtpacon 1 ddbjtsa 1 ddbjuna 1 ddbjvrl 5 ddbjvrt 2 ------------------------------- 8. A sample of DAD entries Below is a typical DAD entry. This might be useful for understanding its format and contents. ----- ----- ----- ----- sample begin ----- ----- ----- ----- LOCUS BAA22986.1 220 aa PRT HUM 28-OCT-1997 DEFINITION Homo sapiens RVP1 protein. ACCESSION AB000714-1 PROTEIN_ID BAA22986.1 SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryotae; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 1250) AUTHORS Katahira,J. TITLE Direct Submission JOURNAL Submitted (26-JAN-1997) to the DDBJ/EMBL/GenBank databases. Contact:Jun Katahira Institute for Microbial Diseases, Osaka University, Department of Bacterial Toxinology; 3-1, Yamadaoka, Suita, Osaka 565, Japan REFERENCE 2 AUTHORS Katahira,J., Sugiyama,H., Inoue,N., Horiguchi,Y., Matsuda,M. and Sugimoto,N. TITLE Clostridium perfringens enterotoxin utilizes two structurally related membrane proteins as functional receptors in vivo JOURNAL J. Biol. Chem. 272, 26652-26658 (1997) COMMENT FEATURES Qualifiers source /db_xref="H-InvDB:HIT000057926" /mol_type="mRNA" /organism="Homo sapiens" /tissue_lib="lung" protein /gene="hRVP1" /transl_table=1 BEGIN 1 MSMGLEITGT ALAVLGWLGT IVCCALPMWR VSAFIGSNII TSQNIWEGLW MNCVVQSTGQ 61 MQCKVYDSLL ALPQDLQAAR ALIVVAILLA AFGLLVALVG AQCTNCVQDD TAKAKITIVA 121 GVLFLLAALL TLVPVSWSAN TIIRDFYNPV VPEAQKREMG AGLYVGWAAA ALQLLGGALL 181 CCSCPPREKK YTATKVVYSA PRSTGPGASL GTGYDRKDYV // ----- ----- ----- ----- sample end ----- ----- ----- ----- 9. Release history ------------------ Since release 50 ------------------ The format of the SOURCE line in DAD flat file has been changed: As results of this change, 1) the order of organism name and organelle name is changed and 2) some of DAD flat files have included a common name like as GenBank flat files. The change is shown below in detail. ---------------- Old (-rel. 49) ---------------- Format: SOURCE [] Example: SOURCE Homo sapiens mitochondrion ---------------- New (rel. 50-) ---------------- Format: SOURCE [] [()] Example: SOURCE mitochondrion Homo sapiens (human) See also '8. A sample of DAD entries'. ------------------ Since release 45 ------------------ A new division, TSA (Transcriptome Shotgun Assembly) is started: A new division for assembled mRNA sequences, Transcriptome Shotgun Assembly (TSA), is included in the present release. With new sequencing technologies, INSDC has faced many requests to accept assembled EST sequences. These sequence data have become more useful than used to be, although they may not be correctly assembled or exist in nature. Therefore, INSDC decided to collect assembled EST sequences into the new division 'TSA'. TSA sequences are shotgun assemblies of primary sequences deposited in the EST division of INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA). Two specific keywords, "TSA" and "Transcriptome Shotgun Assembly", are present in all TSA entries. The new division code, "TSA", is also described in the LOCUS line in all TSA entries. No format changes are anticipated for this new division, however, note that TSA entries make use of the same PRIMARY line that is described for the entries in TPA category. The PRIMARY block contains references to the underlying reads/transcripts that were assembled to construct a TSA record. ------------------ Since release 42 ------------------ Deletion of E-mail address, phone and fax numbers from DAD flat file To follow the Japanese law of protecting personal information, DDBJ delete both phone and fax numbers, and E-mail address from the flat files of entries submitted to DDBJ. Also, it would be helpful to protect DAD releases against SPAM mail senders. DDBJ retrofitted most of all entries submitted to DDBJ, not to GenBank or EMBL, by the DDBJ periodical release 72. Before the DAD periodical release 42, the submitter information was described in JOURNAL line at REFERENCE 1 as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Taro Mishima, DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan (E-mail:ddbj@ddbj.nig.ac.jp, URL:http://www.ddbj.nig.ac.jp/, Tel:81-12-345-6789, Fax:81-12-345-9876) ------------------------------------------------------------------------------- After the deletion or the information in question, DAD flat file is either one of the following two types; Type 1: Phone and fax numbers and E-mail address are deleted. ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ ------------------------------------------------------------------------------- Type 2: When the submitters wish to keep their contact information disclosed, it is described as, ------------------------------------------------------------------------------- REFERENCE 1 (bases 1 to 1200) AUTHORS Mishima,T. TITLE Direct Submission JOURNAL Submitted (01-Jan-1990) to the DDBJ/EMBL/GenBank databases. Contact:Taro Mishima DNA Data Bank of Japan, National Institute of Genetics; 1111, Yata, Mishima, Shizuoka 411-8540, Japan URL :http://www.ddbj.nig.ac.jp/ E-mail :ddbj@ddbj.nig.ac.jp Phone :81-12-345-6789 Fax :81-12-345-9876 ------------------------------------------------------------------------------- ------------------ Since release 40 ------------------ The CON division has been included. CON; Contig / Constructed To conjugate a series of entries, such as those submitted from a genome project, each of the three data banks constructs an entry and assign an accession number to a large scale sequence dataset. Such entries are classified into the CON division. ------------------ Since release 38 ------------------ From the present release, we change the maximum file size to 1.5 GB, because the network capacity has been remarkably increased. Each file named as ddbj***##.DAD has at most 1.5 GB storage capacity. See also the sections, '9. Statistics of DAD'. ------------------ Since release 32 ------------------ Introduction of ENV division : Recently, the submissions of the sequences derived from environmental samples have rapidly increased. To accommodate such submissions, a new division, ENV, has been created. This division contains the sequences obtained via direct molecular isolation such as PCR, DGGE, or any anonymous method. In the past, the sequences derived from environmental samples belonged to taxonomic divisions, mainly BCT. At DDBJ, the retrofit to transfer relevant entries from taxonomic divisions to the ENV division starts in the present release, and ends by the next periodical release. Please note that during this transitional period, some entries to be eventually placed in the ENV division will be found in other divisions. ------------------ Since release 30 ------------------ "H-InvDB" has been added to db_xref(cross-reference) as a qualifier key. The following is an example. FEATURES Location/Qualifiers source 1..5589 /clone="hf00223s1" /clone_lib="pBluescriptII SK plus" /db_xref="H-InvDB:HIT000000001" ------------------ Since release 29 ------------------ The GSS division has been included since release 29. GSS stands for the Genome Survey Sequence, which is similar to EST, except that GSS is genomic DNA whereas EST is cDNA. ------------------ Since release 21 ------------------ 1) Some information on introns has been added. It is given as "intron_pos" in the Feature/Qualifiers. Examples: intron_pos 142:1 (2/12) means that the 2nd intron among 12 in total is located between the 1st and 2nd bases of the 142th codon (amino acid residue). intron_pos 228:0 (4/12) means that the 4th intron among 12 in total is located between the 227th and 228th codons (between the 3rd base of the 227th codon and the 1st base of the 228th codon). 2) the Locus line has been changed. The following is an example and its explanation: LOCUS BAA21794.1 263 aa PRT BCT 05-FEB-1999 Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-28 Locus name 29-29 space 30-40 Length of sequence, right-justified 41-41 space 42-43 'aa' 44-47 spaces 48-53 'PRT' 54-64 spaces 65-67 Division code 68-68 space 69-79 Date, in the form DD-MMM-YYYY (e.g., 15-MAR-1991) --------------------- 3) TPA data have been provided in a separate file (ddbjtpa.DAD). 10. Statistics of DAD The followings are statistics of this release of DAD. total number of entries 60,807,075 total length of sequences 19,069,634,236 average length 313 name of longest sequence CP000108-608 PID:ABB27887.1 length of longest sequence 36,805 aa (CP000108-608) ========================================================================= file name no. of entries no. of amino acids file size ========================================================================= ddbjbct1.DAD 327245 99072020 1468007301 ddbjbct2.DAD 499264 151612948 1468039545 ddbjbct3.DAD 578867 183079695 1468008341 ddbjbct4.DAD 623344 188508539 1468008680 ddbjbct5.DAD 468386 149407563 1468008356 ddbjbct6.DAD 442464 140498858 1468009815 ddbjbct7.DAD 412514 129190098 1468006451 ddbjbct8.DAD 468517 145146330 1468008720 ddbjbct9.DAD 417309 130590916 1468009546 ddbjbct10.DAD 326074 106047476 1468009665 ddbjbct11.DAD 385174 119016584 1468010077 ddbjbct12.DAD 374595 119512952 1468007475 ddbjbct13.DAD 375441 118973868 1468006592 ddbjbct14.DAD 402052 128127481 1468008054 ddbjbct15.DAD 440476 138217509 1468006891 ddbjbct16.DAD 428862 134520588 1468006942 ddbjbct17.DAD 432288 135706690 1468008722 ddbjbct18.DAD 564345 177173181 1468009177 ddbjbct19.DAD 507340 158154575 1468009915 ddbjbct20.DAD 452875 138670627 1468007411 ddbjbct21.DAD 447026 138040159 1468008333 ddbjbct22.DAD 375624 116978971 1468006903 ddbjbct23.DAD 293753 91134495 1468010501 ddbjbct24.DAD 265718 82226774 1468007748 ddbjbct25.DAD 279654 86626633 1468006647 ddbjbct26.DAD 400973 123744226 1468006554 ddbjbct27.DAD 473576 147405543 1468009342 ddbjbct28.DAD 418241 131618616 1468007703 ddbjbct29.DAD 421232 135872441 1468009844 ddbjbct30.DAD 420483 131822158 1468008112 ddbjbct31.DAD 445243 139676686 1468008992 ddbjbct32.DAD 445411 137006271 1468008418 ddbjbct33.DAD 410695 126382708 1468008727 ddbjbct34.DAD 393714 120902697 1468008415 ddbjbct35.DAD 409018 127507224 1468006583 ddbjbct36.DAD 393098 122762890 1468009037 ddbjbct37.DAD 405572 128704673 1468007189 ddbjbct38.DAD 408712 128989979 1468006519 ddbjbct39.DAD 395106 125757161 1468006883 ddbjbct40.DAD 387914 122706741 1468009597 ddbjbct41.DAD 387561 121698442 1468006537 ddbjbct42.DAD 419559 130477549 1468009744 ddbjbct43.DAD 387887 119013177 1468007196 ddbjbct44.DAD 397248 124179712 1468007828 ddbjbct45.DAD 440190 137917312 1468007698 ddbjbct46.DAD 437526 140318817 1468007040 ddbjbct47.DAD 419836 129174576 1468007006 ddbjbct48.DAD 354584 112264844 1468006643 ddbjbct49.DAD 346854 109219317 1468008515 ddbjbct50.DAD 363093 110615314 1468009583 ddbjbct51.DAD 348808 109325439 1468010153 ddbjbct52.DAD 369528 114901663 1468008264 ddbjbct53.DAD 357943 112322982 1468007833 ddbjbct54.DAD 350055 109759093 1468007894 ddbjbct55.DAD 369686 118361020 1468008940 ddbjbct56.DAD 376778 119468284 1468011471 ddbjbct57.DAD 339364 102508543 1468007918 ddbjbct58.DAD 367204 114185674 1468009946 ddbjbct59.DAD 363285 114196298 1468009325 ddbjbct60.DAD 362193 114412548 1468009195 ddbjbct61.DAD 345785 105226648 1468006760 ddbjbct62.DAD 337268 104129288 1468011423 ddbjbct63.DAD 332095 103055387 1468009963 ddbjbct64.DAD 528283 156081907 1468006580 ddbjbct65.DAD 622711 187049386 1468007362 ddbjbct66.DAD 707423 205046269 1468006500 ddbjbct67.DAD 771539 220991041 1468008019 ddbjbct68.DAD 993460 321266469 1468007910 ddbjbct69.DAD 53589 15948350 144247983 ddbjcon1.DAD 211322 92725638 1468010962 ddbjcon2.DAD 277491 115183072 1468010038 ddbjcon3.DAD 180507 95296579 1468008078 ddbjcon4.DAD 306064 132971487 1468009138 ddbjcon5.DAD 320527 118598347 1468007937 ddbjcon6.DAD 325508 141897600 1468007559 ddbjcon7.DAD 441247 191490876 1468010875 ddbjcon8.DAD 512693 205816098 1468008192 ddbjcon9.DAD 484054 179809789 1468008009 ddbjcon10.DAD 385537 75020962 1468010165 ddbjcon11.DAD 366497 63981786 1468008408 ddbjcon12.DAD 366463 64023172 1468009727 ddbjcon13.DAD 366499 63997865 1468010141 ddbjcon14.DAD 366573 63947837 1468008790 ddbjcon15.DAD 366503 63995075 1468006951 ddbjcon16.DAD 366844 63657896 1468009898 ddbjcon17.DAD 366598 63288296 1468010304 ddbjcon18.DAD 362974 73330151 1468007599 ddbjcon19.DAD 361828 76578823 1468006555 ddbjcon20.DAD 362697 71714058 1468008050 ddbjcon21.DAD 361073 77100803 1468008282 ddbjcon22.DAD 363312 72821873 1468007404 ddbjcon23.DAD 357931 84552408 1468008696 ddbjcon24.DAD 357163 86624417 1468009260 ddbjcon25.DAD 357496 84552182 1468006652 ddbjcon26.DAD 359803 92794823 1468006917 ddbjcon27.DAD 476464 183897524 1468008073 ddbjcon28.DAD 427085 153870313 1468011066 ddbjcon29.DAD 377192 159377922 1468009030 ddbjcon30.DAD 446532 180736631 1468006509 ddbjcon31.DAD 400643 162190882 1468008178 ddbjcon32.DAD 305082 126379126 1468008482 ddbjcon33.DAD 402352 173835096 1468008231 ddbjcon34.DAD 397731 169814540 1468006860 ddbjcon35.DAD 440026 192731228 1468006779 ddbjcon36.DAD 453064 192141938 1468008379 ddbjcon37.DAD 428278 184393399 1468007514 ddbjcon38.DAD 494184 198820133 1468007837 ddbjcon39.DAD 428351 182372000 1468006909 ddbjcon40.DAD 339393 113986677 1468010550 ddbjcon41.DAD 423366 195830692 1468006921 ddbjcon42.DAD 425646 181691042 1468006406 ddbjcon43.DAD 393983 157466563 1468007047 ddbjcon44.DAD 366603 164111963 1468007165 ddbjcon45.DAD 324354 133597090 979063217 ddbjenv1.DAD 666703 138157466 1468007834 ddbjenv2.DAD 177416 34074936 343690626 ddbjest1.DAD 1163 153762 2567299 ddbjgss1.DAD 3137 962078 8039402 ddbjhtc1.DAD 119991 36773230 435854298 ddbjhtg1.DAD 64453 17661240 263597506 ddbjhum1.DAD 619552 180868580 1468007736 ddbjhum2.DAD 132735 33802334 278399070 ddbjinv1.DAD 580735 176570040 1468007412 ddbjinv2.DAD 689760 179272404 1468007686 ddbjinv3.DAD 699137 151175735 1468007181 ddbjinv4.DAD 651300 130163560 1468007073 ddbjinv5.DAD 614927 129092717 1468007045 ddbjinv6.DAD 156143 90745053 331720247 ddbjmam1.DAD 299746 76321909 635722971 ddbjpat1.DAD 391930 164228016 582279716 ddbjphg1.DAD 447469 94927910 954841093 ddbjpln1.DAD 456458 161068758 1468006556 ddbjpln2.DAD 435441 167896723 1468009122 ddbjpln3.DAD 429052 210820363 1468008808 ddbjpln4.DAD 651079 195607935 1468007831 ddbjpln5.DAD 740888 174843244 1468007015 ddbjpln6.DAD 661985 184536532 1344579089 ddbjpri1.DAD 85871 20324078 186996641 ddbjrod1.DAD 227974 70087387 566514585 ddbjsts1.DAD 9 812 22053 ddbjsyn1.DAD 212011 76023130 551203772 ddbjtpa1.DAD 64013 25571499 198897510 ddbjtpacon1.DAD 42523 21795983 179418237 ddbjtsa1.DAD 121331 49629889 326404273 ddbjuna1.DAD 230 39816 393290 ddbjvrl1.DAD 659630 208962449 1468008043 ddbjvrl2.DAD 686535 209199183 1468007609 ddbjvrl3.DAD 625800 201935540 1468008701 ddbjvrl4.DAD 607241 223703259 1468007795 ddbjvrl5.DAD 448262 172783989 1083700123 ddbjvrt1.DAD 693364 171152400 1468006734 ddbjvrt2.DAD 566018 125472732 1136780762 ========================================================================= Total 60807075 19069634236 199908023436 =========================================================================