Hybrid feature selection methods for high-dimensional multi-class datasets

Amit Kumar Saxena, Vimal Kumar Dubey, John Wang

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.

Original languageEnglish
Pages (from-to)315-339
Number of pages25
JournalInternational Journal of Data Mining, Modelling and Management
Issue number4
StatePublished - 2017


  • Classification
  • Filter approach
  • Genetic algorithm
  • High-dimensional dataset
  • Information gain
  • Intelligent mining


Dive into the research topics of 'Hybrid feature selection methods for high-dimensional multi-class datasets'. Together they form a unique fingerprint.

Cite this