ONT RNAseq of DF15: multiple myeloma cell-line

SRX11528791 DF15_WT_ONT_RNAseq ONT RNAseq of DF15: multiple myeloma cell-line SRP330017 PRJNA749325 We performed basecalling on the raw fast5 data using Guppy (v) (guppy_basecaller compress-fastq -c dna_r9.4.1_450bps_hac.cfg -x cuda:1) in GPU mode from Oxford Nanopore Technologies running on a GTX 1080 Ti graphics card. For each read we identify the barcode and UMI sequence by searching for the polyA region and flanking regions before and after the barcode/UMI. Accurately sequenced barcodes were identified based on their dual nucleotide complementarity. Unambiguous barcodes were then used as a guide to error correct the ambiguous barcodes in a second pass correction analysis approach. We performed fuzzy searching using a Levenshtein distance of 4 (unless otherwise stated in the figure legend) and replaced the original ambiguous barcode with the unambiguous sequence. A whitelist of barcodes was then generated using UMI-tools whitelist (umi_tools whitelist --bc-pattern=CCCCCCCCCCCCCCCCCCCCCCCCNNNNNNNNNNNNNNNN --set-cell-number=1000) [3]. This whitelist was used to assess the quality of our cells to read count ratio and used as an input for UMI-tools extract. Next the barcode and UMI sequence of each read was extracted and placed within the read2 header file using UMI-tools extract (umi_tools extract --bc-pattern=CCCCCCCCCCCCCCCCCCCCCCCCNNNNNNNNNNNNNNNN --whitelist=whitelist.txt). Reads were then aligned to the transcriptome using minimap2 [10] (-ax splice -uf --MD --sam-hit-only --junc-bed) using the reference transcriptome for human hg38 and mouse mm10. The resulting sam file was converted to a bam file and then sorted and indexed using samtools [11]. The transcript name was then added as a XT tag within the bam file using pysam. SRS9566055 DF15 DF15_WT_ONT_RNAseq RNA-Seq TRANSCRIPTOMIC SINGLE CELL cDNA PromethION