TY - JOUR
T1 - highMLR
T2 - An open-source package for R with machine learning for feature selection in high dimensional cancer clinical genome time to event data
AU - Bhattacharjee, Atanu
AU - Vishwakarma, Gajendra K.
AU - Banerjee, Souvik
AU - Pashchenko, Alexander F.
N1 - © 2022 Elsevier Ltd. All rights reserved.
PY - 2022/12
Y1 - 2022/12
N2 - Machine learning techniques, popularly used as a tool for dimensionality reduction and pattern recognition of features, have been utilized extensively in data mining. In survival analysis, where the primary outcome is the time until a specific event occurs, identifying relevant features for building an efficient prediction model is essential. This is where machine learning can be a suitable option. However, there is an existing gap in utilizing machine learning techniques in high-dimensional survival data due to the non-availability of convenient programming functions and packages. In this article, we have developed an efficient machine learning procedure for analyzing survival data associated with high-dimensional gene expressions. Though there are several R libraries available for performing machine learning, no package support is available to implement machine learning with classification on high-dimensional survival data. highMLR, our developed R package, is capable of implementing machine learning methods on high dimensional survival data and provides a way of feature selection based on the logarithmic loss function. Several statistical methods for survival analysis have been incorporated into this machine learning algorithm. A high-dimensional gene expression dataset has been analyzed using the proposed R library to show its efficacy in feature selection.
AB - Machine learning techniques, popularly used as a tool for dimensionality reduction and pattern recognition of features, have been utilized extensively in data mining. In survival analysis, where the primary outcome is the time until a specific event occurs, identifying relevant features for building an efficient prediction model is essential. This is where machine learning can be a suitable option. However, there is an existing gap in utilizing machine learning techniques in high-dimensional survival data due to the non-availability of convenient programming functions and packages. In this article, we have developed an efficient machine learning procedure for analyzing survival data associated with high-dimensional gene expressions. Though there are several R libraries available for performing machine learning, no package support is available to implement machine learning with classification on high-dimensional survival data. highMLR, our developed R package, is capable of implementing machine learning methods on high dimensional survival data and provides a way of feature selection based on the logarithmic loss function. Several statistical methods for survival analysis have been incorporated into this machine learning algorithm. A high-dimensional gene expression dataset has been analyzed using the proposed R library to show its efficacy in feature selection.
KW - Feature selection
KW - Gene expression
KW - High dimension
KW - Machine learning
KW - Survival data
U2 - 10.1016/j.eswa.2022.118432
DO - 10.1016/j.eswa.2022.118432
M3 - Article
SN - 0957-4174
VL - 210
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 118432
ER -