Universal Protein Resource (UniProt) ==================================== The Universal Protein Resource (UniProt), a collaboration between the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics, and the Protein Information Resource (PIR), is comprised of three databases, each optimized for different uses. The UniProt Knowledgebase (UniProtKB) is the central access point for extensively curated protein information, including function, classification and cross-references. The UniProt Reference Clusters (UniRef) combine closely related sequences into a single record to speed up sequence similarity searches. The UniProt Archive (UniParc) is a comprehensive repository of all protein sequences, consisting only of unique identifiers and sequences. UniProt RDF Distribution ======================== This directory contains the following files: - Core datasets of UniProt in RDF/XML format: Due to the volume of data, each core dataset is distributed as a collection of files that match the following file name patterns: uniprotkb_*.rdf.xz UniProt Knowledgebase (UniProtKB) uniref_*.rdf.xz UniProt Reference clusters (UniRef) uniparc_*.rdf.xz UniProt Sequence archive (UniParc) The UniProtKB dataset is split into files based on the top levels of the NCBI taxonomy (the file name indicates the classification and ID of the taxon) that contain at most 1 million entries. Obsolete entries are provided in separate files with at most 10 million entries (uniprotkb_obsolete_*.rdf.xz). The UniRef dataset is split into files that contain about 100,000 clusters. The UniParc dataset is split into files of about 1 GB in size. - Supporting datasets for UniProt in RDF/XML format: citations.rdf.xz Literature citations diseases.rdf.xz Human diseases journals.rdf.xz Journals which contain articles cited in UniProt taxonomy.rdf.xz Organisms keywords.rdf.xz Keywords go.owl.xz Gene Ontology enzyme.rdf.xz Enzyme classification pathways.rdf.xz Pathways locations.rdf.xz Subcellular locations tissues.rdf.xz Tissues databases.rdf.xz Databases that are linked to from uniprot.rdf.xz proteomes.rdf.xz Proteomes For taxonomy and GO, these additional files contain inferred rdfs:subClassOf statements: taxonomy-hierarchy.rdf.xz go-hierarchy.rdf.xz For chemical reaction data, Rhea RDF can be downloaded from https://ftp.expasy.org/databases/rhea/rdf/ - Classes and properties used in the UniProt RDF distribution: core.owl, also includes Cellular components (Organelles) - Release information: RELEASE.metalink or RELEASE.meta4 For more information about UniProt RDF, please see https://sparql.uniprot.org/ -------------------------------------------------------------------------------- LICENSE -------------------------------------------------------------------------------- We have chosen to apply the Creative Commons Attribution 4.0 International (CC BY 4.0) License (https://creativecommons.org/licenses/by/4.0/) to all copyrightable parts of our databases. (c) 2002-2024 UniProt Consortium -------------------------------------------------------------------------------- DISCLAIMER -------------------------------------------------------------------------------- We make no warranties regarding the correctness of the data, and disclaim liability for damages resulting from its use. We cannot provide unrestricted permission regarding the use of the data, as some data may be covered by patents or other rights. Any medical or genetic information is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.