Abstract
This article presents a near real-time processing solution using MapReduce and Hadoop. The solution is aimed at some of the data management and processing challenges facing the life sciences community. Research into genes and their product proteins generates huge volumes of data that must be extensively preprocessed before any biological insight can be gained. In order to carry out this processing in a timely manner, we have investigated the use of techniques from the big data field. These are applied specifically to process data resulting from mass spectrometers in the course of proteomic experiments. Here we present methods of handling the raw data in Hadoop, and then we investigate a process for preprocessing the data using Java code and the MapReduce framework to identify 2D and 3D peaks.
Original language | English |
---|---|
Pages (from-to) | 44-49 |
Number of pages | 5 |
Journal | Big Data |
Volume | 2 |
Issue number | 1 |
Early online date | 14 Feb 2014 |
DOIs | |
Publication status | Published - 14 Mar 2014 |