Characterization and prediction of protein nucleolar localization sequences

Michelle S. Scott (Lead / Corresponding author), Francois-Michel Boisvert, Mark D. McDowall, Angus I. Lamond, Geoffrey J. Barton

    Research output: Contribution to journalArticle

    83 Citations (Scopus)

    Abstract

    Although the nucleolar localization of proteins is often believed to be mediated primarily by non-specific retention to core nucleolar components, many examples of short nucleolar targeting sequences have been reported in recent years. In this article, 46 human nucleolar localization sequences (NoLSs) were collated from the literature and subjected to statistical analysis. Of the residues in these NoLSs 48% are basic, whereas 99% of the residues are predicted to be solvent-accessible with 42% in alpha-helix and 57% in coil. The sequence and predicted protein secondary structure of the 46 NoLSs were used to train an artificial neural network to identify NoLSs. At a true positive rate of 54%, the predictor's overall false positive rate (FPR) is estimated to be 1.52%, which can be broken down to FPRs of 0.26% for randomly chosen cytoplasmic sequences, 0.80% for randomly chosen nucleoplasmic sequences and 12% for nuclear localization signals. The predictor was used to predict NoLSs in the complete human proteome and 10 of the highest scoring previously unknown NoLSs were experimentally confirmed. NoLSs are a prevalent type of targeting motif that is distinct from nuclear localization signals and that can be computationally predicted.

    Original languageEnglish
    Pages (from-to)7388-7399
    Number of pages12
    JournalNucleic Acids Research
    Volume38
    Issue number21
    DOIs
    Publication statusPublished - Nov 2010

    Keywords

    • TARGETING SEQUENCES
    • SIGNAL PEPTIDE
    • NUCLEAR EXPORT
    • MESSENGER-RNA
    • VIRUS TYPE-1
    • DATABASE
    • COMPLEX
    • BINDING
    • IDENTIFICATION
    • DOMAIN

    Cite this

    Scott, Michelle S. ; Boisvert, Francois-Michel ; McDowall, Mark D. ; Lamond, Angus I. ; Barton, Geoffrey J. / Characterization and prediction of protein nucleolar localization sequences. In: Nucleic Acids Research. 2010 ; Vol. 38, No. 21. pp. 7388-7399.
    @article{e55bf4a3357a4132b7ba99a6d944d577,
    title = "Characterization and prediction of protein nucleolar localization sequences",
    abstract = "Although the nucleolar localization of proteins is often believed to be mediated primarily by non-specific retention to core nucleolar components, many examples of short nucleolar targeting sequences have been reported in recent years. In this article, 46 human nucleolar localization sequences (NoLSs) were collated from the literature and subjected to statistical analysis. Of the residues in these NoLSs 48{\%} are basic, whereas 99{\%} of the residues are predicted to be solvent-accessible with 42{\%} in alpha-helix and 57{\%} in coil. The sequence and predicted protein secondary structure of the 46 NoLSs were used to train an artificial neural network to identify NoLSs. At a true positive rate of 54{\%}, the predictor's overall false positive rate (FPR) is estimated to be 1.52{\%}, which can be broken down to FPRs of 0.26{\%} for randomly chosen cytoplasmic sequences, 0.80{\%} for randomly chosen nucleoplasmic sequences and 12{\%} for nuclear localization signals. The predictor was used to predict NoLSs in the complete human proteome and 10 of the highest scoring previously unknown NoLSs were experimentally confirmed. NoLSs are a prevalent type of targeting motif that is distinct from nuclear localization signals and that can be computationally predicted.",
    keywords = "TARGETING SEQUENCES, SIGNAL PEPTIDE, NUCLEAR EXPORT, MESSENGER-RNA, VIRUS TYPE-1, DATABASE, COMPLEX, BINDING, IDENTIFICATION, DOMAIN",
    author = "Scott, {Michelle S.} and Francois-Michel Boisvert and McDowall, {Mark D.} and Lamond, {Angus I.} and Barton, {Geoffrey J.}",
    year = "2010",
    month = "11",
    doi = "10.1093/nar/gkq653",
    language = "English",
    volume = "38",
    pages = "7388--7399",
    journal = "Nucleic Acids Research",
    issn = "0305-1048",
    publisher = "Oxford University Press",
    number = "21",

    }

    Characterization and prediction of protein nucleolar localization sequences. / Scott, Michelle S. (Lead / Corresponding author); Boisvert, Francois-Michel; McDowall, Mark D.; Lamond, Angus I.; Barton, Geoffrey J.

    In: Nucleic Acids Research, Vol. 38, No. 21, 11.2010, p. 7388-7399.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Characterization and prediction of protein nucleolar localization sequences

    AU - Scott, Michelle S.

    AU - Boisvert, Francois-Michel

    AU - McDowall, Mark D.

    AU - Lamond, Angus I.

    AU - Barton, Geoffrey J.

    PY - 2010/11

    Y1 - 2010/11

    N2 - Although the nucleolar localization of proteins is often believed to be mediated primarily by non-specific retention to core nucleolar components, many examples of short nucleolar targeting sequences have been reported in recent years. In this article, 46 human nucleolar localization sequences (NoLSs) were collated from the literature and subjected to statistical analysis. Of the residues in these NoLSs 48% are basic, whereas 99% of the residues are predicted to be solvent-accessible with 42% in alpha-helix and 57% in coil. The sequence and predicted protein secondary structure of the 46 NoLSs were used to train an artificial neural network to identify NoLSs. At a true positive rate of 54%, the predictor's overall false positive rate (FPR) is estimated to be 1.52%, which can be broken down to FPRs of 0.26% for randomly chosen cytoplasmic sequences, 0.80% for randomly chosen nucleoplasmic sequences and 12% for nuclear localization signals. The predictor was used to predict NoLSs in the complete human proteome and 10 of the highest scoring previously unknown NoLSs were experimentally confirmed. NoLSs are a prevalent type of targeting motif that is distinct from nuclear localization signals and that can be computationally predicted.

    AB - Although the nucleolar localization of proteins is often believed to be mediated primarily by non-specific retention to core nucleolar components, many examples of short nucleolar targeting sequences have been reported in recent years. In this article, 46 human nucleolar localization sequences (NoLSs) were collated from the literature and subjected to statistical analysis. Of the residues in these NoLSs 48% are basic, whereas 99% of the residues are predicted to be solvent-accessible with 42% in alpha-helix and 57% in coil. The sequence and predicted protein secondary structure of the 46 NoLSs were used to train an artificial neural network to identify NoLSs. At a true positive rate of 54%, the predictor's overall false positive rate (FPR) is estimated to be 1.52%, which can be broken down to FPRs of 0.26% for randomly chosen cytoplasmic sequences, 0.80% for randomly chosen nucleoplasmic sequences and 12% for nuclear localization signals. The predictor was used to predict NoLSs in the complete human proteome and 10 of the highest scoring previously unknown NoLSs were experimentally confirmed. NoLSs are a prevalent type of targeting motif that is distinct from nuclear localization signals and that can be computationally predicted.

    KW - TARGETING SEQUENCES

    KW - SIGNAL PEPTIDE

    KW - NUCLEAR EXPORT

    KW - MESSENGER-RNA

    KW - VIRUS TYPE-1

    KW - DATABASE

    KW - COMPLEX

    KW - BINDING

    KW - IDENTIFICATION

    KW - DOMAIN

    UR - http://www.scopus.com/inward/record.url?scp=78649820071&partnerID=8YFLogxK

    U2 - 10.1093/nar/gkq653

    DO - 10.1093/nar/gkq653

    M3 - Article

    VL - 38

    SP - 7388

    EP - 7399

    JO - Nucleic Acids Research

    JF - Nucleic Acids Research

    SN - 0305-1048

    IS - 21

    ER -