Near real-time processing of proteomics data using Hadoop.

Chris Hillman (Lead / Corresponding author), Yasmeen Ahmad, Mark Whitehorn, Andy Cobley

Research output: Contribution to journalArticle

5 Citations (Scopus)
181 Downloads (Pure)

Abstract

This article presents a near real-time processing solution using MapReduce and Hadoop. The solution is aimed at some of the data management and processing challenges facing the life sciences community. Research into genes and their product proteins generates huge volumes of data that must be extensively preprocessed before any biological insight can be gained. In order to carry out this processing in a timely manner, we have investigated the use of techniques from the big data field. These are applied specifically to process data resulting from mass spectrometers in the course of proteomic experiments. Here we present methods of handling the raw data in Hadoop, and then we investigate a process for preprocessing the data using Java code and the MapReduce framework to identify 2D and 3D peaks.

Original languageEnglish
Pages (from-to)44-49
Number of pages5
JournalBig Data
Volume2
Issue number1
Early online date14 Feb 2014
DOIs
Publication statusPublished - 14 Mar 2014

Fingerprint

Mass spectrometers
Processing
Information management
Genes
Proteins
Experiments
Proteomics
Hadoop
MapReduce
Big data
Experiment
Gene
Data management
Java
Protein
Life sciences

Cite this

Hillman, Chris ; Ahmad, Yasmeen ; Whitehorn, Mark ; Cobley, Andy. / Near real-time processing of proteomics data using Hadoop. In: Big Data. 2014 ; Vol. 2, No. 1. pp. 44-49.
@article{83999479b34d43ba87374b98688caa45,
title = "Near real-time processing of proteomics data using Hadoop.",
abstract = "This article presents a near real-time processing solution using MapReduce and Hadoop. The solution is aimed at some of the data management and processing challenges facing the life sciences community. Research into genes and their product proteins generates huge volumes of data that must be extensively preprocessed before any biological insight can be gained. In order to carry out this processing in a timely manner, we have investigated the use of techniques from the big data field. These are applied specifically to process data resulting from mass spectrometers in the course of proteomic experiments. Here we present methods of handling the raw data in Hadoop, and then we investigate a process for preprocessing the data using Java code and the MapReduce framework to identify 2D and 3D peaks.",
author = "Chris Hillman and Yasmeen Ahmad and Mark Whitehorn and Andy Cobley",
year = "2014",
month = "3",
day = "14",
doi = "10.1089/big.2013.0036",
language = "English",
volume = "2",
pages = "44--49",
journal = "Big Data",
issn = "2167-6461",
publisher = "Mary Ann Liebert, Inc., publishers",
number = "1",

}

Near real-time processing of proteomics data using Hadoop. / Hillman, Chris (Lead / Corresponding author); Ahmad, Yasmeen; Whitehorn, Mark; Cobley, Andy.

In: Big Data, Vol. 2, No. 1, 14.03.2014, p. 44-49.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Near real-time processing of proteomics data using Hadoop.

AU - Hillman, Chris

AU - Ahmad, Yasmeen

AU - Whitehorn, Mark

AU - Cobley, Andy

PY - 2014/3/14

Y1 - 2014/3/14

N2 - This article presents a near real-time processing solution using MapReduce and Hadoop. The solution is aimed at some of the data management and processing challenges facing the life sciences community. Research into genes and their product proteins generates huge volumes of data that must be extensively preprocessed before any biological insight can be gained. In order to carry out this processing in a timely manner, we have investigated the use of techniques from the big data field. These are applied specifically to process data resulting from mass spectrometers in the course of proteomic experiments. Here we present methods of handling the raw data in Hadoop, and then we investigate a process for preprocessing the data using Java code and the MapReduce framework to identify 2D and 3D peaks.

AB - This article presents a near real-time processing solution using MapReduce and Hadoop. The solution is aimed at some of the data management and processing challenges facing the life sciences community. Research into genes and their product proteins generates huge volumes of data that must be extensively preprocessed before any biological insight can be gained. In order to carry out this processing in a timely manner, we have investigated the use of techniques from the big data field. These are applied specifically to process data resulting from mass spectrometers in the course of proteomic experiments. Here we present methods of handling the raw data in Hadoop, and then we investigate a process for preprocessing the data using Java code and the MapReduce framework to identify 2D and 3D peaks.

U2 - 10.1089/big.2013.0036

DO - 10.1089/big.2013.0036

M3 - Article

VL - 2

SP - 44

EP - 49

JO - Big Data

JF - Big Data

SN - 2167-6461

IS - 1

ER -