Generalized and heuristic-free feature construction for improved accuracy

Wei Fan, Erheng Zhong, Jing Peng, Olivier Verscheure, Kun Zhang, Jiangtao Ren, Rong Yan, Qiang Yang

Research output: Contribution to conferencePaperpeer-review

22 Scopus citations

Abstract

State-of-the-art learning algorithms accept data in feature vector format as input. Examples belonging to different classes may not always be easy to separate in the original feature space. One may ask: can transformation of existing features into new space reveal significant discriminative information not obvious in the original space? Since there can be infinite number of ways to extend features, it is impractical to first enumerate and then perform feature selection. Second, evaluation of discriminative power on the complete dataset is not always optimal. This is because features highly discriminative on subset of examples may not necessarily be significant when evaluated on the entire dataset. Third, feature construction ought to be automated and general, such that, it doesn't require domain knowledge and its improved accuracy maintains over a large number of classification algorithms. In this paper, we propose a framework to address these problems through the following steps: (1) divide-conquer to avoid exhaustive enumeration; (2) local feature construction and evaluation within subspaces of examples where local error is still high and constructed features thus far still do not predict well; (3) weighting rules based search that is domain knowledge free and has provable performance guarantee. Empirical studies indicate that significant improvement (as much as 9% in accuracy and 28% in AUC) is achieved using the newly constructed features over a variety of inductive learners evaluated against a number of balanced, skewed and high-dimensional datasets. Software and datasets are available from the authors.

Original languageEnglish
Pages629-640
Number of pages12
DOIs
StatePublished - 2010
Event10th SIAM International Conference on Data Mining, SDM 2010 - Columbus, OH, United States
Duration: 29 Apr 20101 May 2010

Other

Other10th SIAM International Conference on Data Mining, SDM 2010
Country/TerritoryUnited States
CityColumbus, OH
Period29/04/101/05/10

Keywords

  • Accuracy improvement
  • Automatic
  • Efficiency
  • Feature construction

Fingerprint

Dive into the research topics of 'Generalized and heuristic-free feature construction for improved accuracy'. Together they form a unique fingerprint.

Cite this