Two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

Matthew T. Parker (Lead / Corresponding author), Geoffrey J. Barton, Gordon G. Simpson (Lead / Corresponding author)

Research output: Contribution to specialist publicationArticle

Abstract

Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long-reads reveals the true complexity of processing, however the relatively high error rates of long-read technologies can reduce the accuracy of intron identification. Here we present a two-pass approach, combining alignment metrics and machine-learning-derived sequence information to filter spurious examples from splice junctions identified in long-read alignments. The remaining junctions are then used to guide realignment. This method, available in the software package 2passtools (https://github.com/bartongroup/2passtools), improves the accuracy of spliced alignment and transcriptome annotation without requiring orthogonal information from short read RNAseq or existing annotations.
Original languageEnglish
Number of pages36
Specialist publicationBioRxiv
DOIs
Publication statusPublished - 30 May 2020

Fingerprint Dive into the research topics of 'Two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing'. Together they form a unique fingerprint.

Cite this