Sensitive inference of alignment-safe intervals from biodiverse protein sequence clusters using EMERALD

Andreas Grigorjew, Artur Gynter, Fernando H. C Dias, Benjamin Buchfink, Hajk-Georg Drost (Lead / Corresponding author), Alexandru I. Tomescu (Lead / Corresponding author)

Research output: Contribution to journalArticlepeer-review

16 Downloads (Pure)

Abstract

Sequence alignments are the foundations of life science research, but most innovation so far focuses on optimal alignments, while information derived from suboptimal solutions is ignored. We argue that one optimal alignment per pairwise sequence comparison is a reasonable approximation when dealing with very similar sequences but is insufficient when exploring the biodiversity of the protein universe at tree-of-life scale. To overcome this limitation, we introduce pairwise alignment-safety to uncover the amino acid positions robustly shared across all suboptimal solutions. We implement EMERALD, a software library for alignment-safety inference, and apply it to 400k sequences from the SwissProt database.

Original languageEnglish
Article number168
Number of pages21
JournalGenome Biology
Volume24
DOIs
Publication statusPublished - 17 Jul 2023

Keywords

  • Dynamic programming
  • Needleman-Wunsch algorithm
  • Protein folding
  • Sequence alignment
  • Suboptimal alignments

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Cell Biology

Fingerprint

Dive into the research topics of 'Sensitive inference of alignment-safe intervals from biodiverse protein sequence clusters using EMERALD'. Together they form a unique fingerprint.

Cite this