This directory contains files associated with the variant calling carried out for the phase1 of the 1000 genomes project and other ancillary files associated with the analysis for phase1. The phase1 analysis results directory contains a number of sub directories with different content. These are listed here. Ancestry Deconvolution This directory contains information about the local ancestry inference which has been carried out on the ad-mixed populations found in the 1000 genomes phase1 samples. These are the African Americans (ASW), Colombians (CLM), Mexicans (MXL) and Puerto Ricans (PUR). Consensus Call Sets These directories contain the consensus call sets and genotype likelihoods which were used to produce the final integrated release. Please note the indel file in this directory still contains indels which were subsequently filtered out of our integrated data release due to validation efforts. These can be identified by looking at the excluded_indel_sites directory under supporting Experimental Validation This directory contains information about which sites were validated for the different variant types and the results of the validation processes. Functional Annotation This contains two directories, annotation_sets contains bed and gtf files which describe the gene and non coding annotation which our variant sets were compared with and annotation_vcfs that contains the actual variant annotation in vcf format. Input Call Sets This directory contains all the union call sets for the snps (both low coverage and exome), indels and deletions that make up the integrated release. The directory contains several vcf files, in each file any variant whose filter column reads PASS should be part of the integrated release. Integrated Call Sets This directory contains our final variant calls for the phase1 data sets. The majority of the data in this directory is identical to what can be found in ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521 but there are also chrY calls for snps and deletions and chrMT calls for snps found here. shapeit2_phased_haplotypes/ This directory contains our final variant calls on the autosomes rephased using the SHAPEIT2 algorithm from Olivier Delaneau and Jonathan Marchini http://mathgen.stats.ox.ac.uk/impute/impute_v2.html Paper This directory contains the pdf files of the Nature Paper An integrated map of genetic variation from 1092 human genomes Nature 491, 56–65 (01 November 2012) doi:10.1038/nature11632 The paper is distributed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported licence. Please share our paper appropirately. Supporting ancestral_alignments, Ancestral fasta files based on a 32 way alignment from Ensembl 59 based on the Enredo Pecan Ortheus pipeline axiom_genotypes, Genotypes from the Affymetrix Axiom platform for 1000 genomes samples cryptic_relation_analysis, The results of the Cryptic Relatedness Analysis performed by Jim Nemesh at the Broad Insititute excluded_indel_sites, The list of indels which were excluded from the v3 integrated variant release exome_pull_down, The target coordinates used for both variant calling and the downstream analysis of the exome data omni_haplotypes, Genotypes from the Illumina Omni 2.5M Chip for 1000 genomes individuals accessible_genome_masks, Mask files defining which regions of the genome are more or less accessible to the next generation methods used by the 1000 Genomes Project variant_gerp_scores, Conservation scores for all snp and indel variant sites highly_differentiated_sites, An excel spreadsheets listing highly differentiated sites both between super populations and between sub populations within a super population Many of the files in this directory are VCF files. This is our vcf file naming convention Population.region.description.YYYYMMDD.variant_type.analysis_group.[sites|genotypes|haplotypes].vcf.gz Population, This gives the 3 letter code for the population, If the file represetns all possible individuals in the set ALL is used region, This is the chromosome name, all the genomes (sometimes this is just the autosomes and chrX) is wgs. The full exome is wex. description, This is a string which describes the file creation/contents YYYYMMDD, This is a date in the format year month day. This mostly represents the sequence index date that the variant call set is based on. If the file is not based on our alignments the date should represent when the file was created variant_type, This described what sort of variant the file contains, snps, indels or SVs analysis_group, This states if the data is based on low coverage, exome or other stratergies [sites|genotypes|haplotypes], This describes if the file contains just a sites list or additional info