This readme describes the update which occurred on the 10th of September. 4213 sites in which some individuals do not have proper genotype are removed from this v5 release. The old call set has been moved to ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20140708_previous_phase3/v4_vcfs/ We have updated the known issues README to cover any outstanding issues which are still present with the dataset. ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/README_known_issues_20140910 This directory contains the final variant call set with phased genotypes for chr1-22 based on the phase3 analysis of the 1000 Genomes sequence data. We have also removed the genotypes for 31 individuals who have blood relationship with the 2504 samples in the main release. The 31 related samples are listed in 20140625_related_individuals.txt. This has resulted in a small number of AC=0 sites from rare alleles only present in one or more of these 31 individuals. This was done to ensure we do not over count allele frequency. The genotypes of these 31 related individuals can be found at: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/related_samples_vcf This data set is based on 20130502 sequence freeze and alignments. The analysis was run using only Illumina platform sequence and only considered sequence with read lengths of 70bp or greater. ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/sequence_indices/20130502.analysis.sequence.index ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/alignment_indices/20130502.exome.alignment.index ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/alignment_indices/20130502.low_coverage.alignment.index This variant set contains 2504 individuals from 26 populations. The list of all the samples in the data set and their population, super population and gender can be found in the file: integrated_call_samples_v3.20130502.ALL.panel This file was moved from v2 to v3 on the 9th September as the consortium moved from using ASN to EAS to refer to East Asian populations and from SAN to SAS to refer to South Asian populations A full record of all the sample relations is list in the ped file. Please note this also lists other individuals who make up part of the Coriell catalog of cell lines. integrated_call_samples.20130502.ALL.ped This variant set represents the work of a large number of people and groups. Full details of how it was created will be published in the consortiums next paper. Here we give a brief summary and list many of the tools used. Many different callers were used to identify the sites in this set. A full set of input vcfs can be found at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20140502/supporting/input_callsets A README is in the directory describing each input call set. SNPs, indels and complex short variants were filtered using the SVM method from UMich (part of the GotCloud package) and the SV sites were filtered by the 1000 Genomes SV group. Biallelic snps, indels and large deletions were genotyped using Beagle and phased with the Shapeit2 method from Oxford. MVNCall, also from Oxford was used to add complex short variants and large structural variants to the scaffolds built by Shapeit2. The genotype likelihoods estimated by Shapeit2 and MVNCall can be found at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20140502/supporting/genotype_likelihoods Here are some basic statistics about the variant sites in this release calculated with bcftools: The total number of sites in the file is 81271745 The breakdown of these sites is: number of SNPs: 78136341 number of indels: 3135424 number of others: 58671 number of multiallelic sites: 416023 number of multiallelic SNP sites: 259370