ERP127787 PRJEB43819 ena-STUDY-BOEHRINGER INGELHEIM PHARMA GMBH & CO KG-19-03-2021-20:14:44:188-65 scRNAX: cross-species transfer of high quality 3â€™UTR annotation for single cell RNA-Seq BackgroundUnlike model organisms that have been extensively characterized and received comprehensive gene annotation, non-model organisms are less well-studied and suffer from significantly poorer gene annotation, especially in the untranslated regions (UTRs) of their genes. The incomplete or shortened UTR annotation in these organisms, combined with the 3'bias in the read coverage of many single cell RNA-seq protocols, lead to missing reads, underestimation of gene expression values, and problems in the downstream analysis of scRNA-seq datasets from these non-model organisms.ResultsWe demonstrated that analyzing human scRNAseq datasets using simulated gene annotation with shortened 3'UTR resulted in significant reduction in the percentage of mappable reads and in the number of genes and UMIs being detected, which in turn compromised the downstream analysis, especially the identification of cell type markers. Moreover, we generated matched scRNA-Seq and bulk RNASeq data from pig retina, which are samples derived from a non-model organisim with poorer UTR annotation. We observed considerable underestimation in gene abundance using scRNAseq datasets as compared to abundance quantification using bulk RNAseq data. In the current study, we developed the scRNAX workflow to improve UTR annotation using comparative genomics approaches, which can be further enhanced by de-novo assembly from matched bulk-RNA-Seq data if available. Applying this workflow onto the matched pig datasets resulted in significant improvement in the downstream analysis.ConclusionsWe present the scRNAX workflow, a method to improve UTR annotation in poorly-annotated non-model organisms, which in turn improve the corresponding scRNAseq analysis outcomes to facilitate downstream biological interpretation. scRNAX_workflow BackgroundUnlike model organisms that have been extensively characterized and received comprehensive gene annotation, non-model organisms are less well-studied and suffer from significantly poorer gene annotation, especially in the untranslated regions (UTRs) of their genes. The incomplete or shortened UTR annotation in these organisms, combined with the 3'bias in the read coverage of many single cell RNA-seq protocols, lead to missing reads, underestimation of gene expression values, and problems in the downstream analysis of scRNA-seq datasets from these non-model organisms.ResultsWe demonstrated that analyzing human scRNAseq datasets using simulated gene annotation with shortened 3'UTR resulted in significant reduction in the percentage of mappable reads and in the number of genes and UMIs being detected, which in turn compromised the downstream analysis, especially the identification of cell type markers. Moreover, we generated matched scRNA-Seq and bulk RNASeq data from pig retina, which are samples derived from a non-model organisim with poorer UTR annotation. We observed considerable underestimation in gene abundance using scRNAseq datasets as compared to abundance quantification using bulk RNAseq data. In the current study, we developed the scRNAX workflow to improve UTR annotation using comparative genomics approaches, which can be further enhanced by de-novo assembly from matched bulk-RNA-Seq data if available. Applying this workflow onto the matched pig datasets resulted in significant improvement in the downstream analysis.ConclusionsWe present the scRNAX workflow, a method to improve UTR annotation in poorly-annotated non-model organisms, which in turn improve the corresponding scRNAseq analysis outcomes to facilitate downstream biological interpretation. ENA-FIRST-PUBLIC 2021-10-28 ENA-LAST-UPDATE 2021-10-28