<?xml version="1.0" encoding="UTF-8"?>
<STUDY_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <STUDY accession="ERP023050" alias="ena-STUDY-Earlham Institute-15-05-2017-16:47:53:185-295" center_name="Earlham Institute">
    <IDENTIFIERS>
      <PRIMARY_ID>ERP023050</PRIMARY_ID>
      <EXTERNAL_ID namespace="BioProject">PRJEB20860</EXTERNAL_ID>
      <SUBMITTER_ID namespace="Earlham Institute">ena-STUDY-Earlham Institute-15-05-2017-16:47:53:185-295</SUBMITTER_ID>
    </IDENTIFIERS>
    <DESCRIPTOR>
      <STUDY_TITLE>We compare several technologies for a plant genome sequencing project and offer best practice guidance for the non-specialist. We show that one can realistically produce a plant assembly with a good gene space and contiguity that is usually good enough for downstream analysis.</STUDY_TITLE>
      <STUDY_TYPE existing_study_type="Other"/>
      <STUDY_ABSTRACT>Since the dawn of high-throughput sequencing there has been an ever increasing number of genomes to assemble from all kingdoms of life. More recent years have seen the advent of longer read sequencing technologies offering advantages over short reads for assembling and analysing complex, repeat-rich genomes. This paper explores these new technologies with a repeat-rich plant genome. We offer best practice guidance for the non-specialist working with these new data types and provide insight into the budget, computational resource expertise and time that will be required for assembly. We show that one can realistically produce a plant assembly with a good gene space and contiguity that is usually good enough for downstream analysis. The measures we use for assessing our genome assemblies are synteny to S. tuberosum, local accuracy using BAC sequences, R-gene space as difficult to assemble regions, gene space using core eukaryotic genes, and contiguity measured by N50.</STUDY_ABSTRACT>
      <CENTER_PROJECT_NAME>A critical comparison of technologies for a plant genome sequencing project</CENTER_PROJECT_NAME>
      <STUDY_DESCRIPTION>Since the dawn of high-throughput sequencing there has been an ever increasing number of genomes to assemble from all kingdoms of life. More recent years have seen the advent of longer read sequencing technologies offering advantages over short reads for assembling and analysing complex, repeat-rich genomes. This paper explores these new technologies with a repeat-rich plant genome. We offer best practice guidance for the non-specialist working with these new data types and provide insight into the budget, computational resource expertise and time that will be required for assembly. We show that one can realistically produce a plant assembly with a good gene space and contiguity that is usually good enough for downstream analysis. The measures we use for assessing our genome assemblies are synteny to S. tuberosum, local accuracy using BAC sequences, R-gene space as difficult to assemble regions, gene space using core eukaryotic genes, and contiguity measured by N50.</STUDY_DESCRIPTION>
    </DESCRIPTOR>
    <STUDY_ATTRIBUTES>
      <STUDY_ATTRIBUTE>
        <TAG>ENA-FIRST-PUBLIC</TAG>
        <VALUE>2018-05-21</VALUE>
      </STUDY_ATTRIBUTE>
      <STUDY_ATTRIBUTE>
        <TAG>ENA-LAST-UPDATE</TAG>
        <VALUE>2017-05-15</VALUE>
      </STUDY_ATTRIBUTE>
    </STUDY_ATTRIBUTES>
  </STUDY>
</STUDY_SET>
