This readme describes the update which occurred on the 10th of September.

4213 sites in which some individuals do not have proper genotype are removed from this v5 release.

The old call set has been moved to ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20140708_previous_phase3/v4_vcfs/

We have updated the known issues README to cover any outstanding issues which are still present with the dataset.

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/README_known_issues_20140910

This directory contains the final variant call set with phased genotypes for chr1-22 
based on the phase3 analysis of the 1000 Genomes sequence data.

We have also removed the genotypes for 31 individuals who have blood relationship with the 2504 samples in the main release. 
The 31 related samples are listed in 20140625_related_individuals.txt. This has resulted in a small number of
AC=0 sites from rare alleles only present in one or more of these 31 individuals. This was done to ensure we do not over count allele frequency.

The genotypes of these 31 related individuals can be found at: 
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/related_samples_vcf

This data set is based on 20130502 sequence freeze and alignments. The analysis was run using only 
Illumina platform sequence and only considered sequence with read lengths of 70bp or greater.

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/sequence_indices/20130502.analysis.sequence.index
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/alignment_indices/20130502.exome.alignment.index
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/alignment_indices/20130502.low_coverage.alignment.index

This variant set contains 2504 individuals from 26 populations. 

The list of all the samples in the data set and their population, super population and gender can 
be found in the file:

integrated_call_samples_v3.20130502.ALL.panel

This file was moved from v2 to v3 on the 9th September as the consortium moved from using ASN to EAS to
refer to East Asian populations and from SAN to SAS to refer to South Asian populations

A full record of all the sample relations is list in the ped file. Please note this also lists 
other individuals who make up part of the Coriell catalog of cell lines.

integrated_call_samples.20130502.ALL.ped

This variant set represents the work of a large number of people and groups. Full details of how it was 
created will be published in the consortiums next paper. Here we give a brief summary and list many of 
the tools used.

Many different callers were used to identify the sites in this set. A full set of input vcfs can be found at

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20140502/supporting/input_callsets

A README is in the directory describing each input call set.

SNPs, indels and complex short variants were filtered using the SVM method from UMich (part of the 
GotCloud package) and the SV sites were filtered by the 1000 Genomes SV group.

Biallelic snps, indels and large deletions were genotyped using Beagle and phased with the Shapeit2 
method from Oxford. MVNCall, also from Oxford was used to add complex short variants and large 
structural variants to the scaffolds built by Shapeit2. The genotype likelihoods estimated by Shapeit2 and MVNCall can be found at 

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20140502/supporting/genotype_likelihoods

Here are some basic statistics about the variant sites in this release calculated with bcftools:

The total number of sites in the file is 81271745

The breakdown of these sites is:

number of SNPs:	78136341
number of indels:	3135424
number of others:	58671
number of multiallelic sites:	416023
number of multiallelic SNP sites:	259370