############################################################# README for ftp://ncbi.nlm.nih.gov/refseq/special_requests/ Last updated: August 26, 2008 ############################################################# _________________________________________________________________________ National Center for Biotechnology Information (NCBI) National Library of Medicine National Institutes of Health 8600 Rockville Pike Bethesda, MD 20894, USA tel: (301) 496-2475 fax: (301) 480-9241 e-mail: info@ncbi.nlm.nih.gov _________________________________________________________________________ MODIFICATIONS TO THIS FILE: December 15, 2006 added reporting of protein accession, version, and gi to the suppressed_final and suppressed_temporary files. April 27, 2007 Added documentation for release##.{rna|prot}.accession2source August 26, 2008 Added documentation for file: suppressed_models March 25, 2011 Added documentation for file: longest_CDS_per_gene.txid9606.2011-03-25 ========================================================================= See the README file and the RefSeq Release notes for more information about the refseq FTP site: ftp://ftp.ncbi.nih.gov/refseq/release/README ftp://ftp.ncbi.nih.gov/refseq/release/release-notes/ ========================================================================= This directory area provides additional requested reports that may be limited in scope and may be provided at different times. Regularly provided files report removed accessions and the GenBank accessions that have been used to generate the RefSeq records. Files provided on a regular basis include: release##.prot.accession2source release##.rna.accession2source secondary_suppressed_temporary secondary_suppressed_final secondary_public suppressed_temporary suppressed_final suppressed_models taxid2speciesname File names, data content, and format are: ------------------------------------------ release#.rna.accession2source release#prot.accession2source ------------------------------------------ These files are provided in conjunction with the regular RefSeq release cycle. Files report the GenBank accessions(s) that were used to generate the RefSeq record. The report is generated for RefSeq transcript and protein records by processing the RefGeneTracking user object which can be found in the RefSeq release ASN.1 format files. This report is not comprehensive for the proteins because different RefSeq process flows track data differently. Update frequency: bi-monthly, with the RefSeq release Scope: transcripts and proteins with the RefSeq release Columns tax_id RefSeq accession.version GenBank accession; multiple accessions are separated with a '|' File Name: ---------- Update frequency: weekly, normally tuesday night Scope: indicated in taxid2speciesname secondary_public: report of secondary accessions, where the primary accession is public secondary_suppressed_temporary: report of secondary accessions, where the primary accession has been temporarily suppressed secondary_suppressed_final: report of secondary accessions, where the primary accession has been permanently suppressed. Columns: taxid secondary_nucleotideAccession secondary_nucVersion secondary_nucGI secondary_proteinAccession secondary_protVersion secondary_protGI primary_nucleotideAccession primary_nucVersion primary_nucGI primary_proteinAccession primary_protVersion primary_protGI GeneID File Name: ---------- Update frequency: weekly, normally tuesday night Scope: indicated in taxid2speciesname suppressed_final: report of accessions that have been permanently suppressed. Note: permanently suppressed accessions can be reinstated; at the time of the suppression, reinstatement was not expected. suppressed_temporary: report of accessions that have been temporarily suppressed Note: temporarily suppressed accessions are expected to be reinstated at some future date. Reinstatement may be dependent on additional data becoming available. As more information is available over time, a temporary suppression may be updated to become a permanent suppression. suppressed_models: report of accessions provided by NCBI's genome annotation pipeline that have been permanently suppressed Columns: taxid nucleotideAccession nucleotide version nucleotide gi protein accession protein version protein gi GeneID Note: the associated protein record is also suppressed. tax_id2speciesname: ------------------- Species in scope for these secondary and suppressed reports columns: 1. NCBI taxonomic ID value 2. species name 3. common name longest_CDS_per_gene.txid9606.2011-03-25: ----------------------------------------- The RefSeq nucleotide and protein accession.version with the longest CDS per GeneID for all human loci. If no RefSeq (NM_/NP_) was available for the gene, then the model RefSeq (XM_/XP_) with the longest CDS is provided. This file was generated as a one-time report per user request on 3/25/2011. columns: 1. NCBI taxonomic ID value 2. Gene ID 3. RefSeq transcript accession.version 4. RefSeq protein accession.version 5. CDS length (in amino acids)