The fisher-markov selector: Fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data

Qiang Cheng, Hongbo Zhou, Jie Cheng

Research output: Contribution to journalArticle

92 Citations (Scopus)

Abstract

Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selectorwhich we call the Fisher-Markov selectorto identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid - dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.

Original languageEnglish
Article number5611544
Pages (from-to)1217-1233
Number of pages17
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume33
Issue number6
DOIs
StatePublished - 2 May 2011

Fingerprint

Multi-class Classification
Selector
High-dimensional Data
Pattern recognition
Subset
Objective function
Global Optimum
Pattern Recognition
High-dimensional
Microarrays
Set theory
Gene expression
Learning systems
Feature extraction
Gene Expression Data
Microarray Data
Digit
Sparsity
Model Selection
Feature Selection

Keywords

  • Classification
  • feature subset selection
  • Fisher's linear discriminant analysis
  • high-dimensional data
  • kernel
  • Markov random field

Cite this

@article{7ad41cebf3d042d085274754d6e0c20f,
title = "The fisher-markov selector: Fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data",
abstract = "Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selectorwhich we call the Fisher-Markov selectorto identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid - dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.",
keywords = "Classification, feature subset selection, Fisher's linear discriminant analysis, high-dimensional data, kernel, Markov random field",
author = "Qiang Cheng and Hongbo Zhou and Jie Cheng",
year = "2011",
month = "5",
day = "2",
doi = "10.1109/TPAMI.2010.195",
language = "English",
volume = "33",
pages = "1217--1233",
journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
issn = "0162-8828",
publisher = "IEEE Computer Society",
number = "6",

}

TY - JOUR

T1 - The fisher-markov selector

T2 - Fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data

AU - Cheng, Qiang

AU - Zhou, Hongbo

AU - Cheng, Jie

PY - 2011/5/2

Y1 - 2011/5/2

N2 - Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selectorwhich we call the Fisher-Markov selectorto identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid - dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.

AB - Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selectorwhich we call the Fisher-Markov selectorto identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid - dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.

KW - Classification

KW - feature subset selection

KW - Fisher's linear discriminant analysis

KW - high-dimensional data

KW - kernel

KW - Markov random field

UR - http://www.scopus.com/inward/record.url?scp=79955444979&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2010.195

DO - 10.1109/TPAMI.2010.195

M3 - Article

C2 - 21493968

AN - SCOPUS:79955444979

VL - 33

SP - 1217

EP - 1233

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

SN - 0162-8828

IS - 6

M1 - 5611544

ER -