home > bioproject > PRJEB14744
identifier PRJEB14744
type bioproject
sameAs
organism
title Additional sequencing coverage for the study by Jolma et al 2013, “DNA-binding specificities of human Transcription factors. This new deeper sequencing data has been used in the publication "Transcription factor family‐specific DNA shape readout revealed by quantitative specificity models”
description Here we have generated deeper sequencing coverage for the HT-SELEX experiment generated massively parallel sequencing libraries that we described in our 2013 published Cell-article "DNA-binding specificities of human transcription factors” (Jolma et al. 2013; PMID:23332764). In The DNA-fragments produced by the experiments were re-pooled using lesser extent of multiplexing (~55 samples per lane vs. ~800 samples used in the original study) and these libraries were then subjected to sequencing using Illumina Hiseq2000-system. We note that the PWM-models and all other analysis published in the original article have been generated using the earlier data that is also available through ENA, under accession: PRJEB3289, and while the data under this study is based on same experiments and thus leads to very similar results, it has not been scrutinized as extensively as our earlier data and can potentially contain new artifacts derived from e.g. the different multiplexing scheme.Samples are composed of single read sequencing of synthetic DNA fragments with a fixed length randomized region or samples derived from such a initial library by selection with a sequence specific DNA binding protein. Originally multiple samples with different "barcode" tag sequences were run on the same Illumina sequencing lane but the released files have been already de-multiplexed, and the constant regions and "barcodes" of each sequence have been cut out of the sequencing reads to facilitate the use of data. Barcodes and oligonucleotide designs are indicated in the names of individual entries. Depending of the selection ligand design, the sequences in each of these fastq-files are 14, 20, 30 or 40 bases long and had different flanking regions in both sides of the sequence. The names of the sequencing result files are same as in previous data for the same experiments and selection cycles except that letters ES (Extended Sequencing) have been added to "experimental batch" identifying field to distinguish it from the original data. The run entries are named in either of the following ways:Example 1) "BCL6B_DBD_ESAC_TGCGGG20NGA_1", where name is composed of following fields ProteinName_CloneType_Batch_BarcodeDesign_SelectionCycle.This experiment used barcode ligand TGCGGG20NGA, where both of the variable flanking constant regions are indicated as they were on the original sequence-reads. This ligand has been selected for one round of HT-SELEX using recombinant protein that contained the DNA binding domain of human transcription factor BCL6B. It also tells that the file is based on Extended Sequencing “ES” of an experiment that was performed on a original batch of experiments named as "AC”. Example 2) ES0_TGCGGG20NGA_0, where name is composed of ES(zero)_BarcodeDesign_(zero) These sequences have been generated from extended sequencing of the initial non-selected pool. Same initial pools have been used in multiple experiments that were on different batches, thus for example this background sequence pool is the shared background for all of the following samples. BCL6B_DBD_ESAC_TGCGGG20NGA_1, ZNF784_full_ESAE_TGCGGG20NGA_3, DLX6_DBD_ESY_TGCGGG20NGA_4 and MSX2_DBD_ESW_TGCGGG20NGA_2. This new deeper sequencing data has been used in the publication "Transcription factor family‐specific DNA shape readout revealed by quantitative specificity models” by Yang et al. 2016http://msb.embopress.org/content/13/2/910 Pre-processed data as described by Yang et al. is available at http://rohslab.cmb.usc.edu/MSB2017/
data type Other
organization
publication
DNA-binding specificities of human transcription factors.
Transcription factor family-specific DNA shape readout revealed by quantitative specificity models.
properties 
{...}
dbXrefs
sra-run  ERR1535464ERR1535465ERR1535466ERR1535467ERR1535468ERR1535469ERR1535470ERR1535471ERR1535472ERR1535473 More
sra-submission  ERA675819ERA675852ERA675853
biosample  SAMEA4082829SAMEA4082830SAMEA4082831SAMEA4082832SAMEA4082833SAMEA4082834SAMEA4082835SAMEA4082836SAMEA4082837SAMEA4082838 More
sra-study  ERP016411
sra-sample  ERS1253939ERS1253940ERS1253941ERS1253942ERS1253943ERS1253944ERS1253945ERS1253946ERS1253947ERS1253948 More
sra-experiment  ERX1606292ERX1606293ERX1606294ERX1606295ERX1606296ERX1606297ERX1606298ERX1606299ERX1606300ERX1606301 More
distribution JSONJSON-LD
Download
bioproject.xml  HTTPS FTP
status public
visibility unrestricted-access
dateCreated 2017-02-06T00:00:00Z
dateModified 2017-02-06T00:00:00Z
datePublished