118 Positions of miRNAs along the genome are shown. A BLASTN of genomic sequence regions against miRBase sequences is performed, and hits are clustered and filtered by E value. Aligned genomic sequence is then checked for possible secondary structure using RNAFold. If evidence is found that the genomic sequence could form a stable hairpin structure the locus is used to create a miRNA gene model. The resulting BLAST hit is used as supporting evidence for the miRNA gene. See article. miRBase miRNA 1 {'type' => 'rna'} 9 CpG islands are regions of nucleic acid sequence containing a high number of adjacent cytosine guanine pairs (along one strand). Usually unmethylated, they are associated with promoters and regulatory regions. They are determined from the genomic sequence using a program written by G. Miklem, similar to newcpgreport in the EMBOSS package. CpG islands 1 \N 7 Dust is a program that identifies low-complexity sequences (regions of the genome with a biased distribution of nucleotides, such as a repeat). The Dust module is widely used with BLAST to prevent 'sticky' regions from determining false hits. Low complexity (Dust) 1 \N 94 Alignment of human ESTs (expressed sequence tags) to the genome using the program Est2genome. ESTs are from dbEST Human EST (EST2genome) 1 {'type' => 'est'} 97 Alignment of mouse ESTs (expressed sequence tags) to the genome using the program Est2genome. ESTs are from dbEST Mouse EST (EST2genome) 1 {'type' => 'est'} 98 Alignment of non-human, non-mouse ESTs (expressed sequence tags) to the genome using the program Est2genome. ESTs are from dbEST Other EST (EST2genome) 1 {'type' => 'est'} 1 Transcription start sites predicted by Eponine-TSS. TSS (Eponine) 1 \N 5237 These are short sequence tags from the start sites of polyA transcripts. These ditags were downloaded from the Riken Fantom project and aligned to the genome using Exonerate. Fantom CAGE tags 1 \N 5 Ab initio prediction of protein coding genes by Genscan. The splice site models used are described in more detail in C. Burge, Modelling dependencies in pre-mRNA splicing signals. 1998 In Salzberg, S., Searls, D. and Kasif, S., eds. Computational Methods in Molecular Biology, Elsevier Science, Amsterdam, 127-163. Genscan prediction 1 \N 15 Markers, or sequence tagged sites (STS), from UniSTS are aligned to the genome using Electronic PCR (e-PCR). Marker 1 \N 114 Protein domains and motifs from the PIR (Protein Information Resource) Superfamily database. PIRSF domain 1 {'type' => 'domain'} 103 Protein domains and motifs in the Pfam database. Pfam domain 1 {'type' => 'domain'} 110 Protein fingerprints (groups of conserved motifs) are aligned to the genome. These motifs come from the PRINTS database. Prints domain 1 {'type' => 'domain'} 2 RepeatMasker is used to find repeats and low-complexity sequences. This track usually shows repeats alone (not low-complexity sequences). Repeats 1 \N 120 PositionsPositions of ncRNAs (non-coding RNAs) from the Rfam database are shown. Initial BLASTN hits of genomic sequence to RFAM ncRNAs are clustered and filtered by E value. These hits are supporting evidence for ncRNA genes. RFAM ncRNA gene 1 {'type' => 'rna'} 105 Identification of peptide low complexity sequences by Seg. Low complexity (Seg) 1 \N 109 Prediction of signal peptide cleavage sites by SignalP. Cleavage site (Signalp) 1 \N 106 Protein domains and motifs in the SMART database. SMART domain 1 {'type' => 'domain'} 104 Protein domains and motifs in the SUPERFAMILY database. Superfamily domain 1 {'type' => 'domain'} 6 Tandem Repeats Finder locates adjacent copies of a pattern of nucleotides. Tandem repeats (TRF) 1 \N 113 Protein domains and motifs in the TIGRFAM database. TIGRFAM domain 1 {'type' => 'domain'} 4 Positions of UniGene sequences along the genome. These are determined using TBLASTN of Genscan predictions against UniGene sequences. Unigene EST cluster 1 {'type' => 'cdna'} 34 Mammalian proteins from the UniProtKB database, positioned on the genome through BLASTP of Genscan-predicted mammalian peptides to UniProtKB proteins. mammal UniProt prot. 1 \N 13 Non-mammal proteins from the UniProtKB database, positioned on the genome through BLASTP of Genscan-predicted (non-mammalian) peptides to UniProtKB proteins. non-mammal UniProt prot. 1 \N 11 Positions of vertebrate mRNAs along the genome. mRNAs are from the EMBL database. Initial alignments are performed using TBLASTN of Genscan-predicted peptides against EMBL mRNAs. EMBL vertebrate cDNA 1 {'type' => 'cdna'} 119 Sequences from various databases are matched to Ensembl transcripts using Exonerate. These are external references, or 'Xrefs'. DNA match 0 \N 5242 Protein coding sequences agreed upon by the Consensus Coding Sequence project, or CCDS. CCDS set 1 {'default' => {'contigviewbottom' => 'normal'},'type' => 'cdna'} 92 Transcripts were annotated by the Ensembl genebuild. Ensembl gene 1 {'colour_key' => '[biotype]_[status]','caption' => 'Ensembl/Havana gene','label_key' => '[text_label] [display_label]','name' => 'Merged Ensembl and Havana Genes','default' => {'contigviewbottom' => 'transcript_label','contigviewtop' => 'gene_label','cytoview' => 'gene_label'},'key' => 'ensembl'} 5238 Gene containing both Ensembl genebuild transcripts and Havana manual curation, see article. Ensembl/Havana merge gene 1 {'colour_key' => '[biotype]_[status]','caption' => 'Ensembl/Havana gene','label_key' => '[text_label] [display_label]','name' => 'Merged Ensembl and Havana Genes','default' => {'contigviewbottom' => 'transcript_label','contigviewtop' => 'gene_label','cytoview' => 'gene_label'},'key' => 'ensembl'} 5239 Transcript where the Ensembl genebuild transcript and the Vega manual annotation have the same sequence, for every base pair. See article. Ensembl/Havana merge transcript 1 {'colour_key' => '[biotype]_[status]','caption' => 'Ensembl/Havana gene','label_key' => '[text_label] [display_label]','name' => 'Merged Ensembl and Havana Genes','default' => {'contigviewbottom' => 'transcript_label','contigviewtop' => 'gene_label','cytoview' => 'gene_label'},'key' => 'ensembl'} 83 Manually annotated transcripts (determined on a case-by-case basis) from the Havana project. Vega gene 1 {'colour_key' => '[biotype]_[status]','caption' => 'Ensembl/Havana gene','label_key' => '[text_label] [display_label]','name' => 'Merged Ensembl and Havana Genes','default' => {'contigviewbottom' => 'transcript_label','contigviewtop' => 'gene_label','cytoview' => 'gene_label'},'key' => 'ensembl'} 54 Human cDNAs from NCBI RefSeq and EMBL are aligned to the genome using Exonerate. Human RefSeq/EMBL cDNA 0 {'type' => 'cdna'} 45 Homo sapiens 'Expressed Sequence Tags' (ESTs) from dbEST are aligned to the genome using Exonerate. Human EST 0 {'type' => 'est'} 5240 Human proteins from UniProtKB used in the genebuild are aligned to the genome using GeneWise. Human UniProt prot. 1 \N 100 Non-coding RNA (ncRNA) is predicted using sequences from RFAM and miRBase. See article. ncRNA gene 1 {'colour_key' => 'rna_[status]','label_key' => '[text_label] [display_label]','default' => {'contigviewbottom' => 'transcript_label','contigviewtop' => 'gene_label','cytoview' => 'gene_label'}} 112 Prediction of coiled-coil regions in proteins is by Ncoils. Coiled-coils (Ncoils) 1 \N 5241 For various species, proteins from UniProtKB are aligned to the genome with GeneWise. Other sp. prot. 1 \N 108 Protein domains and motifs from the PROSITE profiles database are aligned to the genome. PROSITE profiles 1 {'type' => 'domain'} 107 Protein domains and motifs from the PROSITE profiles database are aligned to the genome. PROSITE patterns 1 {'type' => 'domain'} 8 Prediction of tRNAs in genomic sequence is through tRNAscan-SE. See article. tRNA 1 \N 111 Prediction of transmembrane helices in proteins by TMHMM. Transmembrane helices 1 \N 101 Immunoglobulin (Ig) and T-cell receptor (TcR) genes were imported from the IMGT database using Exonerate. Ig/T-cell receptor gene 1 {'colour_key' => '[biotype]','label_key' => '[text_label] [display_label]','default' => {'contigviewbottom' => 'transcript_label','contigviewtop' => 'gene_label','cytoview' => 'gene_label'}} 102 Immunoglobulin (Ig) and T-cell receptor (TcR) genes were imported from the IMGT database using Exonerate. Ig and TcR transcripts 0 {'type' => 'cdna'} 5209 Positions of mRNA start and end sequences are shown here, obtained by paired-end ditag (PET) sequencing on ChIP (chromatin immunoprecipitation) samples. Ditags (ChIP-PET) 0 \N 5204 Raw data for ChIP_PET sample alignments. 0 \N 14 First Exon Finder (First EF) predicts positions of the first exons of transcripts, both coding and non-coding, using the sequence to identify features such as CpG islands and promoter regions. First EF 1 \N 5208 Positions of mRNA start and end sequences are shown here, obtained by paired-end ditag (PET) sequencing by GIS (Genome Institute of Singapore). Method described here. Ditags (GIS) 0 \N 5111 See method described here. Ditags (GIS Encode) 0 \N 5203 Raw data. See method described here. Ditags (GIS raw) 0 \N 95 Proteins from the UniProtKB Swiss-Prot database, aligned to the genome. UniProt prot. 1 \N 96 Proteins from the UniProtKB TrEMBL database, aligned to the genome. TrEMBL prot. 1 \N 5254 match Protein 0 \N 5272 See the Vega website for details of the approaches used for the annotation of external Vega genes Vega External gene 1 {'colour_key' => '[gene.logic_name]_[gene.biotype]'} 5249 Microarray probes from Affymetrix (and other manufacturers) are aligned to the genome by Ensembl, if probe sequences are provided. The mapping is a two-step procedure outlined here. Affymetrix probes 1 \N 5264 Microarray probes from manufacturers are aligned to the genome by Ensembl, if the probe sequences are provided. The mapping is a two-step procedure outlined here. 0 \N 8013 Density of Single Nucleotide Polymorphisms (SNPs) calculated by variation_density.pl (see scripts at the Sanger Centre CVS repository). SNP Density 1 \N 8008 Percentage of repetitive elements for top level sequences (such as chromosomes, scaffolds, etc.) Repeats (percent) 1 \N 8011 Percentage of G/C bases in the sequence. GC content 1 \N 8014 Known gene density as calculated by gene_density_calc.pl. Genes (density) 1 \N 8015 Known gene density as calculated by gene_density_calc.pl. Genes (density) 1 \N 8016 Recombination rates were estimated using the interval program from the LDhat package (McVean, Myers et al. 2004; Auton and McVean 2007) 1KG Recombination hotspots 1 \N 8017 Large Deletions >50nt discovered using the low coverage pilot individuals 1KG Low coverage deletions 1 \N 8018 Large Deletions >50nt discovered using the trio pilot individuals 1KG trio deletions 1 \N 8019 Mobile element insertions discovered in CEU trio CEU trio ME insertion 1 {'colour_key'=>'brown'} 8020 Mobile element insertions discovered in YRI trio YRI trio ME insertion 1 {'colour_key'=>'brown'} 8021 Mobile element insertions discovered in CEU low coverage pilot CEU low coverage ME insertion 1 {'colour_key'=>'brown'} 8022 Mobile element insertions discovered in CHBJPT low coverage pilot CHBJPT low coverage ME insertion 1 {'colour_key'=>'brown'} 8023 Mobile element insertions discovered in YRI low coverage pilot YRI low coverage ME insertion 1 {'colour_key'=>'brown'} 8024 Tandem Duplications discovered in the CEU low coverage pilot CEU TD low coverage 1 \N 8025 Tandem Duplications discovered in the CHBJPT low coverage pilot CHBJPT TD low coverage 1 \N 8026 Tandem Duplications discovered in the YRI low coverage pilot YRI TD low coverage 1 \N 8027 Tandem Duplications discovered in the CEU trio CEU TD trio 1 \N 8028 Tandem Duplications discovered in the YRI trio YRI TD trio 1 \N 8029 Novel sequence insertions discovered in the CEU trio CEU trio novel insertions 1 \N 8030 Novel sequence insertions discovered in the YRI trio YRI trio novel insertions 1 \N