Continuous and discontinuous domains

an algorithm for the automatic generation of reliable protein domain definitions

A S Siddiqui, G J Barton

Research output: Contribution to journalArticle

143 Citations (Scopus)

Abstract

An algorithm is presented for the fast and accurate definition of protein structural domains from coordinate data without prior knowledge of the number or type of domains. The algorithm explicitly locates domains that comprise one or two continuous segments of protein chain. Domains that include more than two segments are also located. The algorithm was applied to a nonredundant database of 230 protein structures and the results compared to domain definitions obtained from the literature, or by inspection of the coordinates on molecular graphics. For 70% of the proteins, the derived domains agree with the reference definitions, 18% show minor differences and only 12% (28 proteins) show very different definitions. Three screens were applied to identify the derived domains least likely to agree with the subjective definition set. These screens revealed a set of 173 proteins, 97% of which agree well with the subjective definitions. The algorithm represents a practical domain identification tool that can be run routinely on the entire structural database. Adjustment of parameters also allows smaller compact units to be identified in proteins.

Original languageEnglish
Pages (from-to)872-84
Number of pages13
JournalProtein Science
Volume4
Issue number5
DOIs
Publication statusPublished - May 1995

Fingerprint

Proteins
Protein Databases
Molecular graphics
Databases
Protein Domains
Inspection

Keywords

  • Actins
  • Algorithms
  • Computer Graphics
  • Databases, Factual
  • Molecular Structure
  • Protein Conformation
  • Protein Structure, Tertiary
  • Proteins
  • Software
  • Trypsin

Cite this

@article{0e0a51104dde48828e411c0b22f30fd8,
title = "Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions",
abstract = "An algorithm is presented for the fast and accurate definition of protein structural domains from coordinate data without prior knowledge of the number or type of domains. The algorithm explicitly locates domains that comprise one or two continuous segments of protein chain. Domains that include more than two segments are also located. The algorithm was applied to a nonredundant database of 230 protein structures and the results compared to domain definitions obtained from the literature, or by inspection of the coordinates on molecular graphics. For 70{\%} of the proteins, the derived domains agree with the reference definitions, 18{\%} show minor differences and only 12{\%} (28 proteins) show very different definitions. Three screens were applied to identify the derived domains least likely to agree with the subjective definition set. These screens revealed a set of 173 proteins, 97{\%} of which agree well with the subjective definitions. The algorithm represents a practical domain identification tool that can be run routinely on the entire structural database. Adjustment of parameters also allows smaller compact units to be identified in proteins.",
keywords = "Actins, Algorithms, Computer Graphics, Databases, Factual, Molecular Structure, Protein Conformation, Protein Structure, Tertiary, Proteins, Software, Trypsin",
author = "Siddiqui, {A S} and Barton, {G J}",
year = "1995",
month = "5",
doi = "10.1002/pro.5560040507",
language = "English",
volume = "4",
pages = "872--84",
journal = "Protein Science",
issn = "0961-8368",
publisher = "Wiley",
number = "5",

}

Continuous and discontinuous domains : an algorithm for the automatic generation of reliable protein domain definitions. / Siddiqui, A S; Barton, G J.

In: Protein Science, Vol. 4, No. 5, 05.1995, p. 872-84.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Continuous and discontinuous domains

T2 - an algorithm for the automatic generation of reliable protein domain definitions

AU - Siddiqui, A S

AU - Barton, G J

PY - 1995/5

Y1 - 1995/5

N2 - An algorithm is presented for the fast and accurate definition of protein structural domains from coordinate data without prior knowledge of the number or type of domains. The algorithm explicitly locates domains that comprise one or two continuous segments of protein chain. Domains that include more than two segments are also located. The algorithm was applied to a nonredundant database of 230 protein structures and the results compared to domain definitions obtained from the literature, or by inspection of the coordinates on molecular graphics. For 70% of the proteins, the derived domains agree with the reference definitions, 18% show minor differences and only 12% (28 proteins) show very different definitions. Three screens were applied to identify the derived domains least likely to agree with the subjective definition set. These screens revealed a set of 173 proteins, 97% of which agree well with the subjective definitions. The algorithm represents a practical domain identification tool that can be run routinely on the entire structural database. Adjustment of parameters also allows smaller compact units to be identified in proteins.

AB - An algorithm is presented for the fast and accurate definition of protein structural domains from coordinate data without prior knowledge of the number or type of domains. The algorithm explicitly locates domains that comprise one or two continuous segments of protein chain. Domains that include more than two segments are also located. The algorithm was applied to a nonredundant database of 230 protein structures and the results compared to domain definitions obtained from the literature, or by inspection of the coordinates on molecular graphics. For 70% of the proteins, the derived domains agree with the reference definitions, 18% show minor differences and only 12% (28 proteins) show very different definitions. Three screens were applied to identify the derived domains least likely to agree with the subjective definition set. These screens revealed a set of 173 proteins, 97% of which agree well with the subjective definitions. The algorithm represents a practical domain identification tool that can be run routinely on the entire structural database. Adjustment of parameters also allows smaller compact units to be identified in proteins.

KW - Actins

KW - Algorithms

KW - Computer Graphics

KW - Databases, Factual

KW - Molecular Structure

KW - Protein Conformation

KW - Protein Structure, Tertiary

KW - Proteins

KW - Software

KW - Trypsin

U2 - 10.1002/pro.5560040507

DO - 10.1002/pro.5560040507

M3 - Article

VL - 4

SP - 872

EP - 884

JO - Protein Science

JF - Protein Science

SN - 0961-8368

IS - 5

ER -