Protein structural domains

analysis of the 3Dee domains database

Uwe Dengler, Asiim S. Siddiqui, Geoffrey J. Barton (Lead / Corresponding author)

Research output: Contribution to journalArticle

26 Citations (Scopus)

Abstract

The 3Dee database of domain definitions was developed as a comprehensive collection of domain definitions for all three-dimensional structures in the Protein Data Bank (PDB). The database includes definitions for complex, multiple-segment and multiple-chain domains as well as simple sequential domains, organized in a structural hierarchy. Two different snapshots of the 3Dee database were analyzed at September 1996 and November 1999. For the November 1999 release, 7,995 PDB entries contained 13,767 protein chains and gave rise to 18,896 domains. The domain sequences clustered into 1,715 domain sequence families, which were further clustered into a conservative 1,199 domain structure families (families with similar folds). The proportion of different domain structure families per domain sequence family increases from 84% for domains 1-100 residues long to 100% for domains greater than 600 residues. This is in keeping with the idea that longer chains will have more alternative folds available to them. Of the representative domains from the domain sequence families, 49% are in the range of 51-150 residues, whereas 64% of the representative chains over 200 residues have more than 1 domain. Of the representative chains, 8.5% are part of multichain domains. The largest multichain domain in the database has 14 chains and 1,400 residues, whereas the largest single-chain domain has 907 residues. The largest number of domains found in a protein is 13. The analysis shows that over the history of the PDB, new domain folds have been discovered at a slower rate than by random selection of all known folds. Between 1992 and 1997, a constant 1 in 11 new domains deposited in the PDB has shown no sequence similarity to a previously known domain sequence family, and only 1 in 15 new domain structures has had a fold that has not been seen previously. A comparison of the September 1996 release of 3Dee to the Structural Classification of Proteins (SCOP) showed that the domain definitions agreed for 80% of the representative protein chains. However, 3Dee provided explicit domain boundaries for more proteins. 3Dee is accessible on the World Wide Web at http://barton.ebi.ac.uk/servers/3Dee.html.

Original languageEnglish
Pages (from-to)332-344
Number of pages13
JournalProteins: Structure, Function, and Bioinformatics
Volume42
Issue number3
Early online date21 Dec 2000
DOIs
Publication statusPublished - 15 Feb 2001

Fingerprint

Databases
Proteins
Protein Domains
Internet
World Wide Web
History
Servers

Keywords

  • Computational biology
  • Databases, Factual
  • Protein conformation
  • Protein folding
  • Proteins

Cite this

@article{772a7dd88f12419183d35a7f9922ea6d,
title = "Protein structural domains: analysis of the 3Dee domains database",
abstract = "The 3Dee database of domain definitions was developed as a comprehensive collection of domain definitions for all three-dimensional structures in the Protein Data Bank (PDB). The database includes definitions for complex, multiple-segment and multiple-chain domains as well as simple sequential domains, organized in a structural hierarchy. Two different snapshots of the 3Dee database were analyzed at September 1996 and November 1999. For the November 1999 release, 7,995 PDB entries contained 13,767 protein chains and gave rise to 18,896 domains. The domain sequences clustered into 1,715 domain sequence families, which were further clustered into a conservative 1,199 domain structure families (families with similar folds). The proportion of different domain structure families per domain sequence family increases from 84{\%} for domains 1-100 residues long to 100{\%} for domains greater than 600 residues. This is in keeping with the idea that longer chains will have more alternative folds available to them. Of the representative domains from the domain sequence families, 49{\%} are in the range of 51-150 residues, whereas 64{\%} of the representative chains over 200 residues have more than 1 domain. Of the representative chains, 8.5{\%} are part of multichain domains. The largest multichain domain in the database has 14 chains and 1,400 residues, whereas the largest single-chain domain has 907 residues. The largest number of domains found in a protein is 13. The analysis shows that over the history of the PDB, new domain folds have been discovered at a slower rate than by random selection of all known folds. Between 1992 and 1997, a constant 1 in 11 new domains deposited in the PDB has shown no sequence similarity to a previously known domain sequence family, and only 1 in 15 new domain structures has had a fold that has not been seen previously. A comparison of the September 1996 release of 3Dee to the Structural Classification of Proteins (SCOP) showed that the domain definitions agreed for 80{\%} of the representative protein chains. However, 3Dee provided explicit domain boundaries for more proteins. 3Dee is accessible on the World Wide Web at http://barton.ebi.ac.uk/servers/3Dee.html.",
keywords = "Computational biology, Databases, Factual, Protein conformation, Protein folding, Proteins",
author = "Uwe Dengler and Siddiqui, {Asiim S.} and Barton, {Geoffrey J.}",
year = "2001",
month = "2",
day = "15",
doi = "10.1002/1097-0134(20010215)42:3<332::AID-PROT40>3.0.CO;2-S",
language = "English",
volume = "42",
pages = "332--344",
journal = "Proteins: Structure, Function, and Bioinformatics",
issn = "0887-3585",
publisher = "Wiley",
number = "3",

}

Protein structural domains : analysis of the 3Dee domains database. / Dengler, Uwe; Siddiqui, Asiim S.; Barton, Geoffrey J. (Lead / Corresponding author).

In: Proteins: Structure, Function, and Bioinformatics, Vol. 42, No. 3, 15.02.2001, p. 332-344.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Protein structural domains

T2 - analysis of the 3Dee domains database

AU - Dengler, Uwe

AU - Siddiqui, Asiim S.

AU - Barton, Geoffrey J.

PY - 2001/2/15

Y1 - 2001/2/15

N2 - The 3Dee database of domain definitions was developed as a comprehensive collection of domain definitions for all three-dimensional structures in the Protein Data Bank (PDB). The database includes definitions for complex, multiple-segment and multiple-chain domains as well as simple sequential domains, organized in a structural hierarchy. Two different snapshots of the 3Dee database were analyzed at September 1996 and November 1999. For the November 1999 release, 7,995 PDB entries contained 13,767 protein chains and gave rise to 18,896 domains. The domain sequences clustered into 1,715 domain sequence families, which were further clustered into a conservative 1,199 domain structure families (families with similar folds). The proportion of different domain structure families per domain sequence family increases from 84% for domains 1-100 residues long to 100% for domains greater than 600 residues. This is in keeping with the idea that longer chains will have more alternative folds available to them. Of the representative domains from the domain sequence families, 49% are in the range of 51-150 residues, whereas 64% of the representative chains over 200 residues have more than 1 domain. Of the representative chains, 8.5% are part of multichain domains. The largest multichain domain in the database has 14 chains and 1,400 residues, whereas the largest single-chain domain has 907 residues. The largest number of domains found in a protein is 13. The analysis shows that over the history of the PDB, new domain folds have been discovered at a slower rate than by random selection of all known folds. Between 1992 and 1997, a constant 1 in 11 new domains deposited in the PDB has shown no sequence similarity to a previously known domain sequence family, and only 1 in 15 new domain structures has had a fold that has not been seen previously. A comparison of the September 1996 release of 3Dee to the Structural Classification of Proteins (SCOP) showed that the domain definitions agreed for 80% of the representative protein chains. However, 3Dee provided explicit domain boundaries for more proteins. 3Dee is accessible on the World Wide Web at http://barton.ebi.ac.uk/servers/3Dee.html.

AB - The 3Dee database of domain definitions was developed as a comprehensive collection of domain definitions for all three-dimensional structures in the Protein Data Bank (PDB). The database includes definitions for complex, multiple-segment and multiple-chain domains as well as simple sequential domains, organized in a structural hierarchy. Two different snapshots of the 3Dee database were analyzed at September 1996 and November 1999. For the November 1999 release, 7,995 PDB entries contained 13,767 protein chains and gave rise to 18,896 domains. The domain sequences clustered into 1,715 domain sequence families, which were further clustered into a conservative 1,199 domain structure families (families with similar folds). The proportion of different domain structure families per domain sequence family increases from 84% for domains 1-100 residues long to 100% for domains greater than 600 residues. This is in keeping with the idea that longer chains will have more alternative folds available to them. Of the representative domains from the domain sequence families, 49% are in the range of 51-150 residues, whereas 64% of the representative chains over 200 residues have more than 1 domain. Of the representative chains, 8.5% are part of multichain domains. The largest multichain domain in the database has 14 chains and 1,400 residues, whereas the largest single-chain domain has 907 residues. The largest number of domains found in a protein is 13. The analysis shows that over the history of the PDB, new domain folds have been discovered at a slower rate than by random selection of all known folds. Between 1992 and 1997, a constant 1 in 11 new domains deposited in the PDB has shown no sequence similarity to a previously known domain sequence family, and only 1 in 15 new domain structures has had a fold that has not been seen previously. A comparison of the September 1996 release of 3Dee to the Structural Classification of Proteins (SCOP) showed that the domain definitions agreed for 80% of the representative protein chains. However, 3Dee provided explicit domain boundaries for more proteins. 3Dee is accessible on the World Wide Web at http://barton.ebi.ac.uk/servers/3Dee.html.

KW - Computational biology

KW - Databases, Factual

KW - Protein conformation

KW - Protein folding

KW - Proteins

U2 - 10.1002/1097-0134(20010215)42:3<332::AID-PROT40>3.0.CO;2-S

DO - 10.1002/1097-0134(20010215)42:3<332::AID-PROT40>3.0.CO;2-S

M3 - Article

VL - 42

SP - 332

EP - 344

JO - Proteins: Structure, Function, and Bioinformatics

JF - Proteins: Structure, Function, and Bioinformatics

SN - 0887-3585

IS - 3

ER -