home > bioproject > PRJEB31736
identifier PRJEB31736
type bioproject
sameAs
organism
title 30X whole genome sequencing coverage of the 2504 Phase 3 1000 Genome samples.
description We sequenced all 2,504 samples from the 1000 Genomes (1KG) Project to a minimum of 30x mean genome coverage. Though a small number of 1KG samples had been sequenced to high coverage previously, we sequenced all samples to depth on the latest technology, providing a unified dataset for the next phase of analyses. We processed these samples using the laboratory processes we have previously used for the CCDG project (with minor modifications). Specifically, we generated PCR-free sequencing libraries using unique dual indices to avoid the index switching phenomenon that occurs and causes low level sequencing data contamination on the Illumina patterned flow cells. We sequenced these samples on the Illumina NovaSeq 6000 sequencing instrument, with 2x150bp reads. We believe this instrument represents the future for WGS with short-read technology, and it was important to sequence the 1KG samples in a format that is consistent with future large scale sequencing projects. Our automated analysis pipeline for whole genome sequencing matches the CCDG and TOPMed recommended best practices. Sequencing reads were aligned to the human reference, hs38DH, using BWA-MEM v0.7.15. Data are further processed using the GATK best-practices (v3.5), which generates VCF files in the 4.2 format. Single nucleotide variants and Indels are called using GATK HaplotypeCaller (v3.5), which generates a single-sample GVCF. Variant Quality Score Recalibration (VQSR) is performed using dbSNP138 so quality metrics for each variant can be used in downstream variant filtering.Additional information and links to data can be found at https: //https://urldefense.com/v3/__http://www.internationalgenome.org/data-portal/data-collection/30x-grch38__;!!C6sPl7C9qQ!BjhVenDl1v0jJYWcAb8zn-KEuaQJDHOLm3JTGxzkEO5rRLeioX_7BoFiaE7woY98KnI$
data type Other
organization
publication
properties 
{...}
dbXrefs
sra-run  ERR3239276ERR3239277ERR3239278ERR3239279ERR3239280ERR3239281ERR3239282ERR3239283ERR3239284ERR3239285 More
sra-submission  ERA1783081ERA1783410ERA1783865ERA1784271ERA1784521ERA2128006ERA2128919ERA2128931ERA2128933ERA2128941 More
biosample  SAMN00797023SAMN00797025SAMN00797044SAMN00797054SAMN00797126SAMN00797154SAMN00797406SAMN00797419SAMN00800258SAMN00800266 More
sra-study  ERP114329
sra-sample  SRS000030SRS000031SRS000032SRS000033SRS000034SRS000035SRS000037SRS000038SRS000039SRS000040 More
sra-experiment  ERX3266651ERX3266652ERX3266653ERX3266654ERX3266655ERX3266656ERX3266657ERX3266658ERX3266659ERX3266660 More
distribution JSONJSON-LD
Download
bioproject.xml  HTTPS FTP
status public
visibility unrestricted-access
dateCreated 2019-03-16T00:00:00Z
dateModified 2019-03-16T00:00:00Z
datePublished