Two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

Matthew T. Parker (Lead / Corresponding author), Geoffrey J. Barton, Gordon G. Simpson (Lead / Corresponding author)

Research output: Working paper/PreprintPreprint

Abstract

Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long-reads reveals the true complexity of processing, however the relatively high error rates of long-read technologies can reduce the accuracy of intron identification. Here we present a two-pass approach, combining alignment metrics and machine-learning-derived sequence information to filter spurious examples from splice junctions identified in long-read alignments. The remaining junctions are then used to guide realignment. This method, available in the software package 2passtools (https://github.com/bartongroup/2passtools), improves the accuracy of spliced alignment and transcriptome annotation without requiring orthogonal information from short read RNAseq or existing annotations.
Original languageEnglish
Place of PublicationCold Spring Harbor Laboratory
PublisherBioRxiv
Number of pages36
DOIs
Publication statusPublished - 30 May 2020

Fingerprint

Dive into the research topics of 'Two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing'. Together they form a unique fingerprint.

Cite this