#README_sv_reference_datasets.md

The Human Genome Structural Variation Consortium is generating data on three trios 
from the 1000 Genomes sample set.

- Yoruban Trio (YRI)
-- NA19238
-- NA19239
-- NA19240
- Han Chinese Trio (CHS)
-- HG00512
-- HG00513
-- HG00514
- Puerto Rican Trio (PUR)
-- HG00731
-- HG00732
-- HG00733

This readme points to the standard reference datasets used for our analysis.

The reference genome, GRCh38.

[ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/)

This directory contains the fasta file for the Primary GRCh38 assembly plus alt haplotypes and decoy sequence, 
all of which are accessioned by Genbank and hla sequence which was sourced by Heng Li. The HLA sequence is sourced 
from the [bwakit package](https://github.com/lh3/bwa/tree/master/bwakit) which uses the [IMGT/HLA database v3.18.0](https://www.ebi.ac.uk/ipd/imgt/hla/docs/version_r3180.html)

This directory also contains GRCh38 mapped snps and indels for use in recalibration and indel realignment, 
and a file with the positions for the modelled centromeric sequence, chr Y PAR and chr 7 heterochromatin

If you have any questions about this readme please email igsr-dcc@ebi.ac.uk