#README_sv_reference_datasets.md The Human Genome Structural Variation Consortium is generating data on three trios from the 1000 Genomes sample set. - Yoruban Trio (YRI) -- NA19238 -- NA19239 -- NA19240 - Han Chinese Trio (CHS) -- HG00512 -- HG00513 -- HG00514 - Puerto Rican Trio (PUR) -- HG00731 -- HG00732 -- HG00733 This readme points to the standard reference datasets used for our analysis. The reference genome, GRCh38. [ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/) This directory contains the fasta file for the Primary GRCh38 assembly plus alt haplotypes and decoy sequence, all of which are accessioned by Genbank and hla sequence which was sourced by Heng Li. The HLA sequence is sourced from the [bwakit package](https://github.com/lh3/bwa/tree/master/bwakit) which uses the [IMGT/HLA database v3.18.0](https://www.ebi.ac.uk/ipd/imgt/hla/docs/version_r3180.html) This directory also contains GRCh38 mapped snps and indels for use in recalibration and indel realignment, and a file with the positions for the modelled centromeric sequence, chr Y PAR and chr 7 heterochromatin If you have any questions about this readme please email igsr-dcc@ebi.ac.uk