Graph-based iterative hybrid feature selection

Zhong ErHeng, Xie Sihong, Fan Wei, Ren Jiangtao, Jing Peng, Zhang Kun

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

7 Citations (Scopus)

Abstract

When the number of labeled examples is limited, traditional supervised feature selection techniques often fail due to sample selection bias or unrepresentative sample problem. To solve this, semi-supervised feature selection techniques exploit the statistical information of both labeled and unlabeled examples in the same time. However, the results of semi-supervised feature selection can be at times unsatisfactory, and the culprit is on how to effectively use the unlabeled data. Quite different from both supervised and semi-supervised feature selection, we propose a "hybrid" framework based on graph models. We first apply supervisedmethods to select a small set of most critical features from the labeled data. Importantly, these initial features might otherwise be missed when selection is performed on the labeled and unlabeled examples simultaneously. Next, this initial feature set is expanded and corrected with the use of unlabeled data. We formally analyze why the expected performance of the hybrid framework is better than both supervised and semi-supervised feature selection. Experimental results demonstrate that the proposed method outperforms both traditional supervised and state-of-the-art semisupervised feature selection algorithms by at least 10% in accuracy on a number of text and biomedical problems with thousands of features to choose from. Software and dataset is available from the authors.

Original languageEnglish
Title of host publicationProceedings - 8th IEEE International Conference on Data Mining, ICDM 2008
Pages1133-1138
Number of pages6
DOIs
StatePublished - 1 Dec 2008
Event8th IEEE International Conference on Data Mining, ICDM 2008 - Pisa, Italy
Duration: 15 Dec 200819 Dec 2008

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other8th IEEE International Conference on Data Mining, ICDM 2008
CountryItaly
CityPisa
Period15/12/0819/12/08

Fingerprint

Feature extraction

Cite this

ErHeng, Z., Sihong, X., Wei, F., Jiangtao, R., Peng, J., & Kun, Z. (2008). Graph-based iterative hybrid feature selection. In Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008 (pp. 1133-1138). [4781237] (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2008.63
ErHeng, Zhong ; Sihong, Xie ; Wei, Fan ; Jiangtao, Ren ; Peng, Jing ; Kun, Zhang. / Graph-based iterative hybrid feature selection. Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008. 2008. pp. 1133-1138 (Proceedings - IEEE International Conference on Data Mining, ICDM).
@inproceedings{34562be81c284efb8f4c72c6bede6ac7,
title = "Graph-based iterative hybrid feature selection",
abstract = "When the number of labeled examples is limited, traditional supervised feature selection techniques often fail due to sample selection bias or unrepresentative sample problem. To solve this, semi-supervised feature selection techniques exploit the statistical information of both labeled and unlabeled examples in the same time. However, the results of semi-supervised feature selection can be at times unsatisfactory, and the culprit is on how to effectively use the unlabeled data. Quite different from both supervised and semi-supervised feature selection, we propose a {"}hybrid{"} framework based on graph models. We first apply supervisedmethods to select a small set of most critical features from the labeled data. Importantly, these initial features might otherwise be missed when selection is performed on the labeled and unlabeled examples simultaneously. Next, this initial feature set is expanded and corrected with the use of unlabeled data. We formally analyze why the expected performance of the hybrid framework is better than both supervised and semi-supervised feature selection. Experimental results demonstrate that the proposed method outperforms both traditional supervised and state-of-the-art semisupervised feature selection algorithms by at least 10{\%} in accuracy on a number of text and biomedical problems with thousands of features to choose from. Software and dataset is available from the authors.",
author = "Zhong ErHeng and Xie Sihong and Fan Wei and Ren Jiangtao and Jing Peng and Zhang Kun",
year = "2008",
month = "12",
day = "1",
doi = "10.1109/ICDM.2008.63",
language = "English",
isbn = "9780769535029",
series = "Proceedings - IEEE International Conference on Data Mining, ICDM",
pages = "1133--1138",
booktitle = "Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008",

}

ErHeng, Z, Sihong, X, Wei, F, Jiangtao, R, Peng, J & Kun, Z 2008, Graph-based iterative hybrid feature selection. in Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008., 4781237, Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 1133-1138, 8th IEEE International Conference on Data Mining, ICDM 2008, Pisa, Italy, 15/12/08. https://doi.org/10.1109/ICDM.2008.63

Graph-based iterative hybrid feature selection. / ErHeng, Zhong; Sihong, Xie; Wei, Fan; Jiangtao, Ren; Peng, Jing; Kun, Zhang.

Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008. 2008. p. 1133-1138 4781237 (Proceedings - IEEE International Conference on Data Mining, ICDM).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

TY - GEN

T1 - Graph-based iterative hybrid feature selection

AU - ErHeng, Zhong

AU - Sihong, Xie

AU - Wei, Fan

AU - Jiangtao, Ren

AU - Peng, Jing

AU - Kun, Zhang

PY - 2008/12/1

Y1 - 2008/12/1

N2 - When the number of labeled examples is limited, traditional supervised feature selection techniques often fail due to sample selection bias or unrepresentative sample problem. To solve this, semi-supervised feature selection techniques exploit the statistical information of both labeled and unlabeled examples in the same time. However, the results of semi-supervised feature selection can be at times unsatisfactory, and the culprit is on how to effectively use the unlabeled data. Quite different from both supervised and semi-supervised feature selection, we propose a "hybrid" framework based on graph models. We first apply supervisedmethods to select a small set of most critical features from the labeled data. Importantly, these initial features might otherwise be missed when selection is performed on the labeled and unlabeled examples simultaneously. Next, this initial feature set is expanded and corrected with the use of unlabeled data. We formally analyze why the expected performance of the hybrid framework is better than both supervised and semi-supervised feature selection. Experimental results demonstrate that the proposed method outperforms both traditional supervised and state-of-the-art semisupervised feature selection algorithms by at least 10% in accuracy on a number of text and biomedical problems with thousands of features to choose from. Software and dataset is available from the authors.

AB - When the number of labeled examples is limited, traditional supervised feature selection techniques often fail due to sample selection bias or unrepresentative sample problem. To solve this, semi-supervised feature selection techniques exploit the statistical information of both labeled and unlabeled examples in the same time. However, the results of semi-supervised feature selection can be at times unsatisfactory, and the culprit is on how to effectively use the unlabeled data. Quite different from both supervised and semi-supervised feature selection, we propose a "hybrid" framework based on graph models. We first apply supervisedmethods to select a small set of most critical features from the labeled data. Importantly, these initial features might otherwise be missed when selection is performed on the labeled and unlabeled examples simultaneously. Next, this initial feature set is expanded and corrected with the use of unlabeled data. We formally analyze why the expected performance of the hybrid framework is better than both supervised and semi-supervised feature selection. Experimental results demonstrate that the proposed method outperforms both traditional supervised and state-of-the-art semisupervised feature selection algorithms by at least 10% in accuracy on a number of text and biomedical problems with thousands of features to choose from. Software and dataset is available from the authors.

UR - http://www.scopus.com/inward/record.url?scp=67049167710&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2008.63

DO - 10.1109/ICDM.2008.63

M3 - Conference contribution

SN - 9780769535029

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 1133

EP - 1138

BT - Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008

ER -

ErHeng Z, Sihong X, Wei F, Jiangtao R, Peng J, Kun Z. Graph-based iterative hybrid feature selection. In Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008. 2008. p. 1133-1138. 4781237. (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2008.63