Near real-time processing of proteomics data using Hadoop.

Chris Hillman (Lead / Corresponding author), Yasmeen Ahmad, Mark Whitehorn, Andy Cobley

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)
389 Downloads (Pure)


This article presents a near real-time processing solution using MapReduce and Hadoop. The solution is aimed at some of the data management and processing challenges facing the life sciences community. Research into genes and their product proteins generates huge volumes of data that must be extensively preprocessed before any biological insight can be gained. In order to carry out this processing in a timely manner, we have investigated the use of techniques from the big data field. These are applied specifically to process data resulting from mass spectrometers in the course of proteomic experiments. Here we present methods of handling the raw data in Hadoop, and then we investigate a process for preprocessing the data using Java code and the MapReduce framework to identify 2D and 3D peaks.

Original languageEnglish
Pages (from-to)44-49
Number of pages5
JournalBig Data
Issue number1
Early online date14 Feb 2014
Publication statusPublished - 14 Mar 2014


Dive into the research topics of 'Near real-time processing of proteomics data using Hadoop.'. Together they form a unique fingerprint.

Cite this