Evaluation and improvement of multiple sequence methods for protein secondary structure prediction

J A Cuff, G J Barton

Research output: Contribution to journalArticle

477 Citations (Scopus)

Abstract

A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396-protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8- to 3-state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross-validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/.

Original languageEnglish
Pages (from-to)508-19
Number of pages12
JournalProteins: Structure, Function, and Bioinformatics
Volume34
Issue number4
Publication statusPublished - 1 Mar 1999

Fingerprint

Secondary Protein Structure
Proteins
World Wide Web
Web services
Sequence Alignment
Internet
Artifacts

Keywords

  • Algorithms
  • Computer Simulation
  • Databases, Factual
  • Models, Statistical
  • Protein Structure, Secondary
  • Reproducibility of Results
  • Sequence Alignment

Cite this

@article{f12e4e25689e4d5080e75d206ac0546f,
title = "Evaluation and improvement of multiple sequence methods for protein secondary structure prediction",
abstract = "A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78{\%}. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9{\%}. This is a 1{\%} improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4{\%} for the consensus method on the 396-protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8- to 3-state reduction methods shows variation of over 3{\%} on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross-validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/.",
keywords = "Algorithms, Computer Simulation, Databases, Factual, Models, Statistical, Protein Structure, Secondary, Reproducibility of Results, Sequence Alignment",
author = "Cuff, {J A} and Barton, {G J}",
year = "1999",
month = "3",
day = "1",
language = "English",
volume = "34",
pages = "508--19",
journal = "Proteins: Structure, Function, and Bioinformatics",
issn = "0887-3585",
publisher = "Wiley",
number = "4",

}

Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. / Cuff, J A; Barton, G J.

In: Proteins: Structure, Function, and Bioinformatics, Vol. 34, No. 4, 01.03.1999, p. 508-19.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Evaluation and improvement of multiple sequence methods for protein secondary structure prediction

AU - Cuff, J A

AU - Barton, G J

PY - 1999/3/1

Y1 - 1999/3/1

N2 - A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396-protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8- to 3-state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross-validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/.

AB - A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396-protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8- to 3-state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross-validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/.

KW - Algorithms

KW - Computer Simulation

KW - Databases, Factual

KW - Models, Statistical

KW - Protein Structure, Secondary

KW - Reproducibility of Results

KW - Sequence Alignment

M3 - Article

VL - 34

SP - 508

EP - 519

JO - Proteins: Structure, Function, and Bioinformatics

JF - Proteins: Structure, Function, and Bioinformatics

SN - 0887-3585

IS - 4

ER -