Optimal feature selection for estimating biomass using a genetic algorithm

Anand Vetrivel, Mark Cutler

    Research output: Contribution to conferencePoster

    147 Downloads (Pure)

    Abstract

    Whilst the use of SAR and multispectral image texture has shown promise for estimating tropical forest biomass, there remains uncertainty over the optimum number and type of texture features that provide reliable models to estimate biomass across different forest sites. Previous studies have consistently suggested that selecting an appropriate window size for extracting texture is critical, as small window sizes often exaggerate texture whilst larger window sizes create smoothing effects. There are also a variety of image texture features that have previously been correlated with biomass (e.g. wavelet decomposition and GLCM outputs such as entropy, homogeneity, energy etc.). Neural network regression methods allow any number of these variables (and window sizes) to be included in the regression model. However, this increases the dimensionality of the input data with resultant effects on the ability of the network to learn and generalise. It is desirable to try and restrict the dimensionality of the inputs, particularly when using small samples for predictive modelling, a characteristic of many biomass estimation studies.

    Common methods for feature selection include Principal Components Analysis (PCA) but are not always suited to this kind of problem. Here, we compare PCA feature selection with a Genetic Algorithm (GA) approach, using separately an artificial neural network and Fuzzy c-means as fitness functions. A combination of texture features were derived from SAR and multispectral images of three tropical forest sites. Evaluation of the optimum combination of these features to estimate aboveground biomass was conducted by applying each of the three feature selection methods in turn, and then using the features selected as inputs to estimate biomass with a neural network. The correlation between the input training data and the unseen testing data was used as a measure of model performance for estimating biomass.

    The results indicated that features selected using the GA approach with a neural network used as a fitness function produced the strongest relationships with biomass at the three sites (r=0.91, 0.89 and 0.87 for Brazil (n=9), Malaysia (n=9) and Thailand (n=13) respectively), compared to the other GA approach and PCA. In all cases, the texture features and window sizes selected varied, although some commonality in selection between Malaysia and Brazil sites was noted. Overall, the GA approaches selected features that produced stronger relationships than PCA with evidence that these hold much promise for determining the optimum set of inputs for biomass estimation models, although much work is still required.
    Original languageEnglish
    Pages93-94
    Number of pages2
    Publication statusPublished - 2012
    EventRemote Sensing and Photogrammetry Society Annual Conference 2012: Changing How We View the World - University of Greenwich, London, United Kingdom
    Duration: 12 Sept 201214 Sept 2012

    Conference

    ConferenceRemote Sensing and Photogrammetry Society Annual Conference 2012: Changing How We View the World
    Abbreviated titleRSPSoc 2012
    Country/TerritoryUnited Kingdom
    CityLondon
    Period12/09/1214/09/12

    Fingerprint

    Dive into the research topics of 'Optimal feature selection for estimating biomass using a genetic algorithm'. Together they form a unique fingerprint.

    Cite this