Hybrid feature selection methods for high-dimensional multi-class datasets

Amit Kumar Saxena, Vimal Kumar Dubey, John Wang

Research output: Contribution to journalArticleResearchpeer-review

2 Citations (Scopus)

Abstract

Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.

Original languageEnglish
Pages (from-to)315-339
Number of pages25
JournalInternational Journal of Data Mining, Modelling and Management
Volume9
Issue number4
DOIs
StatePublished - 1 Jan 2017

Fingerprint

Multi-class
Feature Selection
Feature extraction
High-dimensional
Information Gain
Genetic algorithms
Genetic Algorithm
Hybrid Method
Search Methods
Sequential Methods
Random Search
Selection Model
Feature Model
Fitness Function
Feature selection
Nearest Neighbor
Filter
Binary
Subset

Keywords

  • Classification
  • Filter approach
  • Genetic algorithm
  • High-dimensional dataset
  • Information gain
  • Intelligent mining

Cite this

@article{8ae03f0e81944287bf04024713f68d60,
title = "Hybrid feature selection methods for high-dimensional multi-class datasets",
abstract = "Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.",
keywords = "Classification, Filter approach, Genetic algorithm, High-dimensional dataset, Information gain, Intelligent mining",
author = "Saxena, {Amit Kumar} and Dubey, {Vimal Kumar} and John Wang",
year = "2017",
month = "1",
day = "1",
doi = "10.1504/IJDMMM.2017.088411",
language = "English",
volume = "9",
pages = "315--339",
journal = "International Journal of Data Mining, Modelling and Management",
issn = "1759-1163",
publisher = "Inderscience Publishers",
number = "4",

}

Hybrid feature selection methods for high-dimensional multi-class datasets. / Saxena, Amit Kumar; Dubey, Vimal Kumar; Wang, John.

In: International Journal of Data Mining, Modelling and Management, Vol. 9, No. 4, 01.01.2017, p. 315-339.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Hybrid feature selection methods for high-dimensional multi-class datasets

AU - Saxena, Amit Kumar

AU - Dubey, Vimal Kumar

AU - Wang, John

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.

AB - Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.

KW - Classification

KW - Filter approach

KW - Genetic algorithm

KW - High-dimensional dataset

KW - Information gain

KW - Intelligent mining

UR - http://www.scopus.com/inward/record.url?scp=85037674658&partnerID=8YFLogxK

U2 - 10.1504/IJDMMM.2017.088411

DO - 10.1504/IJDMMM.2017.088411

M3 - Article

VL - 9

SP - 315

EP - 339

JO - International Journal of Data Mining, Modelling and Management

JF - International Journal of Data Mining, Modelling and Management

SN - 1759-1163

IS - 4

ER -