Discovery - University of Dundee - Online Publications

Library & Learning Centre

Near real-time processing of proteomics data using Hadoop.

Near real-time processing of proteomics data using Hadoop.

Research output: Contribution to journalArticle

View graph of relations

Authors

  • Chris Hillman (Lead / Corresponding author)
  • Yasmeen Ahmad
  • Mark Whitehorn
  • Andy Cobley

Research units

Info

Original languageEnglish
Pages (from-to)44-49
Number of pages5
JournalBig Data
Volume2
Issue number1
Early online date14 Feb 2014
DOIs
StatePublished - 14 Mar 2014

Abstract

This article presents a near real-time processing solution using MapReduce and Hadoop. The solution is aimed at some of the data management and processing challenges facing the life sciences community. Research into genes and their product proteins generates huge volumes of data that must be extensively preprocessed before any biological insight can be gained. In order to carry out this processing in a timely manner, we have investigated the use of techniques from the big data field. These are applied specifically to process data resulting from mass spectrometers in the course of proteomic experiments. Here we present methods of handling the raw data in Hadoop, and then we investigate a process for preprocessing the data using Java code and the MapReduce framework to identify 2D and 3D peaks.

Download statistics

No data available

Documents

Open Access permissions

Open

Documents

DOI

Library & Learning Centre

Contact | Accessibility | Policy