This directory contains the 20120430.sample_summary_with_status.txt which is a tab delimited file that describes the current summary of each sample for both exome and low coverage sequencing from the DCC perspective. All its alignment data is based on the 20111114 alignment release. The genome size considered is 2750000000. This file contains the columns Sample Population Phase Is the same part of phase 1 or 2 Low Coverage Center LC Meta Data Present Meta Data is present in the ENA database for this sample at low coverage BP in ENA The Number of basepairs at the ENA for this sample at low coverage BP at DCC The Number of basepairs at the DCC for this sample at low coverage Total Bases Aligned The number of input bases to the 20111114 low coverage alignment Raw Aligned Coverage The x coverage for all aligned bases Coverage at Omni sites The non duplicated x coverage at omni sites Non Dup Coverage The total non duplicated coverage across the Genomes Exome Center Exome Meta Data Present BP at ENA BP at DCC Total Aligned Percent of Targets >20x The percentage of targets covered to > 20X as calculated by CalculateHsMetrics from Picard LC Complete Completion status for the Low coverage sequence (> 2.7 x non dup coverage is complete) Exome Complete Completion status for exome sequence (> 0.7 percent of targets coverage > 20x is complete) LC Alignment Has an alignment E Alignment Failed Genotype QC Failed sequence level genotype qc Failed Short Indel QC Failed the overrepresentation of short indels qc Failed VerifyBamID Failed the VerifyBamID qc LC Sufficent Raw Has sufficient raw low coverage data (If the sample is not already complete this means either there is no alignment and more than 10GB of raw sequence of there is an alignment and at least 2.75GB of new sequence is avaialble since the last alignment was run) Exome Sufficient Raw Has sufficient raw exome data (If the sample is not already complete this means either there is no alignment and at least 1GB of raw sequence or there is an alignment and at least 500MB of new sequence is available since the last alignment was run) The 3 QC processes that were run are: Genotype QC, This is a pre alignment qc process run for each run_id, The run is subsampled, aligned to the reference and then genotypes called and compared to know genotypes from the omni platform Short Indel QC, This is a process which checks for overrepresentation of short indels which may indicate an error in the sequence process. This check was designed by Li Heng and the c script can be found in the center_summaries directory VerifyBamID, This is a test from Hyun Min Kang UMich http://genome.sph.umich.edu/wiki/VerifyBamID It is run on the completed alignment files. It checks sample identity like the Genotype QC test. It also checks other metrics like number of heterozygous sites and number of hom alt sites. The measures provide a likelihood that the sequence is contaminated rather than a sample swap A breakdown of the runs which fail these checks for each sample can be found in the center_summaries directory