Increased coverage obtained by combination of methods for protein sequence database searching

Caleb Webber, Geoffrey J. Barton

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

MOTIVATION: Sequence alignment methods that compare two sequences (pairwise methods) are important tools for the detection of biological sequence relationships. In genome annotation, multiple methods are often run and agreement between methods taken as confirmation. In this paper, we assess the advantages of combining search methods by comparing seven pairwise alignment methods, including three local dynamic programming algorithms (PRSS, SSEARCH and SCANPS), two global dynamic programming algorithms (GSRCH and AMPS) and two heuristic approximations (BLAST and FASTA), individually and by pairwise intersection and union of their result lists at equal p-value cut-offs.

RESULTS: When applied singly, the dynamic programming methods SCANPS and SSEARCH gave significantly better coverage (p=0.01) compared to AMPS, GSRCH, PRSS, BLAST and FASTA. Results ranked by BLAST p-values gave significantly better coverage compared to ranking by BLAST e-values. Of 56 combinations of eight methods considered, 19 gave significant increases in coverage at low error compared to the parent methods at an equal p-value cutoff. The union of results by BLAST (p-value) and FASTA at an equal p-value cutoff gave significantly better coverage than either method individually. The best overall performance was obtained from the intersection of the results from SSEARCH and the GSRCH62 global alignment method. At an error level of five false positives, this combination found 444 true positives, a significant 12.4% increase over SSEARCH applied alone.

Original languageEnglish
Pages (from-to)1397-1403
Number of pages7
JournalBioinformatics
Volume19
Issue number11
DOIs
Publication statusPublished - 22 Jul 2003

Keywords

  • Algorithms
  • Amino acid sequence
  • Computer simulation
  • Database management systems
  • Databases, Protein
  • Information storage and retrieval
  • Molecular sequence data
  • Proteins
  • Reproducibility of results
  • Sensitivity and specificity
  • Sequence alignment
  • Sequence analysis, Protein
  • Systems integration

Fingerprint

Dive into the research topics of 'Increased coverage obtained by combination of methods for protein sequence database searching'. Together they form a unique fingerprint.

Cite this