ERP114153 PRJEB31582 ena-STUDY-SC-06-03-2019-15:25:12:477-237 ena-STUDY-SC-06-03-2019-15:25:12:477-237 Erpetoichthys calabaricus (reedfish) This project provides the genome assembly of Erpetoichthys calabaricus, common name reedfish, based on a sample provided by Byrappa Venkatesh. The assembly fErpCal1.1 is based on ~51x PacBio Sequel data, ~36x coverage Illumina HiSeqX data from a 10X Genomics Chromium library generated at the Wellcome Sanger Institute as well as BioNano Saphyr DLE data generated at the Rockefeller University Vertebrate Genome Laboratory and ~69x coverage HiSeqX data from a Hi-C library prepared by Arima Genomics. An initial PacBio assembly was made using Falcon-unzip. The primary contigs were then scaffolded using the 10X data with scaff10x, then scaffolded further with BioNano hybrid scaffolding and scaffolded further still using the Hi-C data with SALSA2. Polishing and gap-filling of both the primary scaffolds and haplotigs was performed using the PacBio reads and Arrow, followed by two rounds of Illumina polishing using the 10X data and freebayes. Finally, the assembly was manually improved using gEVAL to correct mis-joins, improve concordance with the BioNano and Hi-C data and remove retained haplotypic duplication using purge_haplotigs. Chromosomes identified from the Hi-C data have been named in order of size. The assembly is provided by the Wellcome Sanger Institute and Cambridge University team (https://www.sanger.ac.uk/science/data/vertebrate-genomes-sequencing) of the Vertebrate Genomes Project (http://vertebrategenomesproject.org). The data under this project are made available subject to the Genome10K data use policies (https://genome10k.soe.ucsc.edu/data-use-policies). fErpCal1.1 alternate haplotype This project provides the genome assembly of Erpetoichthys calabaricus, common name reedfish, based on a sample provided by Byrappa Venkatesh. The assembly fErpCal1.1 is based on ~51x PacBio Sequel data, ~36x coverage Illumina HiSeqX data from a 10X Genomics Chromium library generated at the Wellcome Sanger Institute as well as BioNano Saphyr DLE data generated at the Rockefeller University Vertebrate Genome Laboratory and ~69x coverage HiSeqX data from a Hi-C library prepared by Arima Genomics. An initial PacBio assembly was made using Falcon-unzip. The primary contigs were then scaffolded using the 10X data with scaff10x, then scaffolded further with BioNano hybrid scaffolding and scaffolded further still using the Hi-C data with SALSA2. Polishing and gap-filling of both the primary scaffolds and haplotigs was performed using the PacBio reads and Arrow, followed by two rounds of Illumina polishing using the 10X data and freebayes. Finally, the assembly was manually improved using gEVAL to correct mis-joins, improve concordance with the BioNano and Hi-C data and remove retained haplotypic duplication using purge_haplotigs. Chromosomes identified from the Hi-C data have been named in order of size. The assembly is provided by the Wellcome Sanger Institute and Cambridge University team (https://www.sanger.ac.uk/science/data/vertebrate-genomes-sequencing) of the Vertebrate Genomes Project (http://vertebrategenomesproject.org). The data under this project are made available subject to the Genome10K data use policies (https://genome10k.soe.ucsc.edu/data-use-policies). ENA-FIRST-PUBLIC 2019-03-06 ENA-LAST-UPDATE 2021-01-19