This represents a summary of the current state of play for sequence at the DCC in the 1000 genomes project. This is based on the 20111114 alignments and the 20120419 sequence index. Coverage assumes a genome size of 2.75GB Samples are considered complete when they have more than 2.7x non duplicated low coverage x coverage and more than 70% of exome targets covered to 20x or higher This is the current state of play There are 1721 samples in the phase 1 and 2 assignments 1719 have some data at the DCC 2 samples currently have no data HG00867, BGI failed verify bam qc for both exome and low coverage HG02128, no lc from MPIMG and no exome from Baylor 1336 are complete for both low coverage and exome 32 are just complete for low coverage 217 are just complete for exome The files in this folder all have the format Sample Population Phase if the sample was part of phase 1 and phase 2 Low Coverage Center LC Meta Data Present low coverage meta data for the sample is present in the EBI SRA BP in ENA BP at DCC Total Bases Aligned in 20111114 Raw Aligned Coverage Coverage at Omni sites This is from the VerifyBamID QC and I believe considers non dup bases at the OMNI sites Non Dup Coverage Exome Center Exome Meta Data Present exome meta data for the sample is present in the EBI SRA BP at ENA BP at DCC Total Aligned Percent of Targets >20x This is from picard's calculateHsMetrics The files themselves are sub divided several different ways. 20120424.sample_summary.txt contains a line per sample for all samples The first part of each filename represents the center who did that sequencing, then you have exome or low coverage to represent the two analysis groups, finally there is a word to describe what the status of the samples the file contains is complete, the samples meet the completion criteria no_data, the samples have no data at the DCC data_no_alignment, the samples look to have sufficient data but no alignment has been made yet sufficient_raw, the samples look to have sufficient raw data but the previous alignment didn't consider all of it and as such is incomplete insufficient_has_alignment, has been aligned but did not meet coverage criteria, no new data is yet available insufficient_new, has not yet been aligned but it doesn't meet 10GB raw data for low coverage all_failed, This means data was submitted but none of it passed the dcc QC