Cross domain distribution adaptation via kernel mapping

Erheng Zhong, Wei Fan, Jing Peng, Kun Zhang, Jiangtao Ren, Deepak Turaga, Olivier Verscheure

Research output: Chapter in Book/Report/Conference proceedingConference contribution

76 Citations (Scopus)

Abstract

When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of targetdomain and source-domain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernel-mapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the state-of-the-art boosting-based transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10% higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.

Original languageEnglish
Title of host publicationKDD '09
Subtitle of host publicationProceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages1027-1035
Number of pages9
DOIs
StatePublished - 9 Nov 2009
Event15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09 - Paris, France
Duration: 28 Jun 20091 Jul 2009

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09
CountryFrance
CityParis
Period28/06/091/07/09

Fingerprint

Linear transformations
Websites
Classifiers

Cite this

Zhong, E., Fan, W., Peng, J., Zhang, K., Ren, J., Turaga, D., & Verscheure, O. (2009). Cross domain distribution adaptation via kernel mapping. In KDD '09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1027-1035). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/1557019.1557130
Zhong, Erheng ; Fan, Wei ; Peng, Jing ; Zhang, Kun ; Ren, Jiangtao ; Turaga, Deepak ; Verscheure, Olivier. / Cross domain distribution adaptation via kernel mapping. KDD '09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009. pp. 1027-1035 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{6210b3db90d84a67b67c09588bcb4f64,
title = "Cross domain distribution adaptation via kernel mapping",
abstract = "When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of targetdomain and source-domain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernel-mapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the state-of-the-art boosting-based transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10{\%} higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.",
author = "Erheng Zhong and Wei Fan and Jing Peng and Kun Zhang and Jiangtao Ren and Deepak Turaga and Olivier Verscheure",
year = "2009",
month = "11",
day = "9",
doi = "10.1145/1557019.1557130",
language = "English",
isbn = "9781605584959",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
pages = "1027--1035",
booktitle = "KDD '09",

}

Zhong, E, Fan, W, Peng, J, Zhang, K, Ren, J, Turaga, D & Verscheure, O 2009, Cross domain distribution adaptation via kernel mapping. in KDD '09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1027-1035, 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, Paris, France, 28/06/09. https://doi.org/10.1145/1557019.1557130

Cross domain distribution adaptation via kernel mapping. / Zhong, Erheng; Fan, Wei; Peng, Jing; Zhang, Kun; Ren, Jiangtao; Turaga, Deepak; Verscheure, Olivier.

KDD '09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009. p. 1027-1035 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Cross domain distribution adaptation via kernel mapping

AU - Zhong, Erheng

AU - Fan, Wei

AU - Peng, Jing

AU - Zhang, Kun

AU - Ren, Jiangtao

AU - Turaga, Deepak

AU - Verscheure, Olivier

PY - 2009/11/9

Y1 - 2009/11/9

N2 - When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of targetdomain and source-domain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernel-mapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the state-of-the-art boosting-based transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10% higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.

AB - When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of targetdomain and source-domain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernel-mapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the state-of-the-art boosting-based transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10% higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.

UR - http://www.scopus.com/inward/record.url?scp=70350645469&partnerID=8YFLogxK

U2 - 10.1145/1557019.1557130

DO - 10.1145/1557019.1557130

M3 - Conference contribution

AN - SCOPUS:70350645469

SN - 9781605584959

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 1027

EP - 1035

BT - KDD '09

ER -

Zhong E, Fan W, Peng J, Zhang K, Ren J, Turaga D et al. Cross domain distribution adaptation via kernel mapping. In KDD '09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009. p. 1027-1035. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/1557019.1557130