Exploiting fisher and fukunaga-koontz transforms in chernoff dimensionality reduction

Jing Peng, Guna Seetharaman, Wei Fan, Aparna Varde

Research output: Contribution to journalArticle

Abstract

Knowledge discovery from big data demands effective representation of data. However, big data are often characterized by high dimensionality, which makes knowledge discovery more difficult. Many techniques for dimensionality reudction have been proposed, including well-known Fisher 's Linear Discriminant Analysis (LDA). However, the Fisher criterion is incapable of dealing with heteroscedasticity in the data. A technique based on the Chernoff criterion for linear dimensionality reduction has been proposed that is capable of exploiting heteroscedastic information in the data. While the Chernoff criterion has been shown to outperform the Fisher 's, a clear understanding of its exact behavior is lacking. In this article, we show precisely what can be expected from the Chernoff criterion. In particular, we show that the Chernoff criterion exploits the Fisher and Fukunaga-Koontz transforms in computing its linear discriminants. Furthermore, we show that a recently proposed decomposition of the data space into four subspaces is incomplete.We provide arguments on how to best enrich the decomposition of the data space in order to account for heteroscedasticity in the data. Finally, we provide experimental results validating our theoretical analysis.

Original languageEnglish
Article number2499911
JournalACM Transactions on Knowledge Discovery from Data
Volume7
Issue number2
DOIs
StatePublished - 1 Jan 2013

Fingerprint

Data mining
Decomposition
Discriminant analysis
Big data

Keywords

  • Chernoff distance
  • Dimensionality reduction
  • FKT
  • Feature evaluation and selection
  • LDA

Cite this

@article{4c4831ff2d744cc6b11612c7c295baf0,
title = "Exploiting fisher and fukunaga-koontz transforms in chernoff dimensionality reduction",
abstract = "Knowledge discovery from big data demands effective representation of data. However, big data are often characterized by high dimensionality, which makes knowledge discovery more difficult. Many techniques for dimensionality reudction have been proposed, including well-known Fisher 's Linear Discriminant Analysis (LDA). However, the Fisher criterion is incapable of dealing with heteroscedasticity in the data. A technique based on the Chernoff criterion for linear dimensionality reduction has been proposed that is capable of exploiting heteroscedastic information in the data. While the Chernoff criterion has been shown to outperform the Fisher 's, a clear understanding of its exact behavior is lacking. In this article, we show precisely what can be expected from the Chernoff criterion. In particular, we show that the Chernoff criterion exploits the Fisher and Fukunaga-Koontz transforms in computing its linear discriminants. Furthermore, we show that a recently proposed decomposition of the data space into four subspaces is incomplete.We provide arguments on how to best enrich the decomposition of the data space in order to account for heteroscedasticity in the data. Finally, we provide experimental results validating our theoretical analysis.",
keywords = "Chernoff distance, Dimensionality reduction, FKT, Feature evaluation and selection, LDA",
author = "Jing Peng and Guna Seetharaman and Wei Fan and Aparna Varde",
year = "2013",
month = "1",
day = "1",
doi = "10.1145/2499907.2499911",
language = "English",
volume = "7",
journal = "ACM Transactions on Knowledge Discovery from Data",
issn = "1556-4681",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

Exploiting fisher and fukunaga-koontz transforms in chernoff dimensionality reduction. / Peng, Jing; Seetharaman, Guna; Fan, Wei; Varde, Aparna.

In: ACM Transactions on Knowledge Discovery from Data, Vol. 7, No. 2, 2499911, 01.01.2013.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Exploiting fisher and fukunaga-koontz transforms in chernoff dimensionality reduction

AU - Peng, Jing

AU - Seetharaman, Guna

AU - Fan, Wei

AU - Varde, Aparna

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Knowledge discovery from big data demands effective representation of data. However, big data are often characterized by high dimensionality, which makes knowledge discovery more difficult. Many techniques for dimensionality reudction have been proposed, including well-known Fisher 's Linear Discriminant Analysis (LDA). However, the Fisher criterion is incapable of dealing with heteroscedasticity in the data. A technique based on the Chernoff criterion for linear dimensionality reduction has been proposed that is capable of exploiting heteroscedastic information in the data. While the Chernoff criterion has been shown to outperform the Fisher 's, a clear understanding of its exact behavior is lacking. In this article, we show precisely what can be expected from the Chernoff criterion. In particular, we show that the Chernoff criterion exploits the Fisher and Fukunaga-Koontz transforms in computing its linear discriminants. Furthermore, we show that a recently proposed decomposition of the data space into four subspaces is incomplete.We provide arguments on how to best enrich the decomposition of the data space in order to account for heteroscedasticity in the data. Finally, we provide experimental results validating our theoretical analysis.

AB - Knowledge discovery from big data demands effective representation of data. However, big data are often characterized by high dimensionality, which makes knowledge discovery more difficult. Many techniques for dimensionality reudction have been proposed, including well-known Fisher 's Linear Discriminant Analysis (LDA). However, the Fisher criterion is incapable of dealing with heteroscedasticity in the data. A technique based on the Chernoff criterion for linear dimensionality reduction has been proposed that is capable of exploiting heteroscedastic information in the data. While the Chernoff criterion has been shown to outperform the Fisher 's, a clear understanding of its exact behavior is lacking. In this article, we show precisely what can be expected from the Chernoff criterion. In particular, we show that the Chernoff criterion exploits the Fisher and Fukunaga-Koontz transforms in computing its linear discriminants. Furthermore, we show that a recently proposed decomposition of the data space into four subspaces is incomplete.We provide arguments on how to best enrich the decomposition of the data space in order to account for heteroscedasticity in the data. Finally, we provide experimental results validating our theoretical analysis.

KW - Chernoff distance

KW - Dimensionality reduction

KW - FKT

KW - Feature evaluation and selection

KW - LDA

UR - http://www.scopus.com/inward/record.url?scp=84896953516&partnerID=8YFLogxK

U2 - 10.1145/2499907.2499911

DO - 10.1145/2499907.2499911

M3 - Article

AN - SCOPUS:84896953516

VL - 7

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

SN - 1556-4681

IS - 2

M1 - 2499911

ER -