TY - GEN
T1 - Cross domain distribution adaptation via kernel mapping
AU - Zhong, Erheng
AU - Fan, Wei
AU - Peng, Jing
AU - Zhang, Kun
AU - Ren, Jiangtao
AU - Turaga, Deepak
AU - Verscheure, Olivier
PY - 2009
Y1 - 2009
N2 - When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of targetdomain and source-domain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernel-mapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the state-of-the-art boosting-based transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10% higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.
AB - When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of targetdomain and source-domain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernel-mapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the state-of-the-art boosting-based transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10% higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.
UR - http://www.scopus.com/inward/record.url?scp=70350645469&partnerID=8YFLogxK
U2 - 10.1145/1557019.1557130
DO - 10.1145/1557019.1557130
M3 - Conference contribution
AN - SCOPUS:70350645469
SN - 9781605584959
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1027
EP - 1035
BT - KDD '09
T2 - 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09
Y2 - 28 June 2009 through 1 July 2009
ER -