TY - GEN
T1 - Universal learning over related distributions and adaptive graph transduction
AU - Zhong, Erheng
AU - Fan, Wei
AU - Peng, Jing
AU - Verscheure, Olivier
AU - Ren, Jiangtao
PY - 2009
Y1 - 2009
N2 - The basis assumption that "training and test data drawn from the same distribution" is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under "different but related distributions" in a single framework. Explicit examples include (a) sample selection bias between training and testing data, (b) transfer learning or no labeled data in target domain, and (c) noisy or uncertain training data. The main motivation is that one could ideally solve as many problems as possible with a single approach. The proposed solution extends graph transduction using the maximum margin principle over unlabeled data. The error of the proposed method is bounded under reasonable assumptions even when the training and testing distributions are different. Experiment results demonstrate that the proposed method improves the traditional graph transduction by as much as 15% in accuracy and AUC in all common situations of distribution difference. Most importantly, it outperforms, by up to 10% in accuracy, several state-of-art approaches proposed to solve specific category of distribution difference, i.e, BRSD [1] for sample selection bias, CDSC [2] for transfer learning, etc. The main claim is that the adaptive graph transduction is a general and competitive method to solve distribution differences implicitly without knowing and worrying about the exact type. These at least include sample selection bias, transfer learning, uncertainty mining, as well as those alike that are still not studied yet. The source code and datasets are available from the authors.
AB - The basis assumption that "training and test data drawn from the same distribution" is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under "different but related distributions" in a single framework. Explicit examples include (a) sample selection bias between training and testing data, (b) transfer learning or no labeled data in target domain, and (c) noisy or uncertain training data. The main motivation is that one could ideally solve as many problems as possible with a single approach. The proposed solution extends graph transduction using the maximum margin principle over unlabeled data. The error of the proposed method is bounded under reasonable assumptions even when the training and testing distributions are different. Experiment results demonstrate that the proposed method improves the traditional graph transduction by as much as 15% in accuracy and AUC in all common situations of distribution difference. Most importantly, it outperforms, by up to 10% in accuracy, several state-of-art approaches proposed to solve specific category of distribution difference, i.e, BRSD [1] for sample selection bias, CDSC [2] for transfer learning, etc. The main claim is that the adaptive graph transduction is a general and competitive method to solve distribution differences implicitly without knowing and worrying about the exact type. These at least include sample selection bias, transfer learning, uncertainty mining, as well as those alike that are still not studied yet. The source code and datasets are available from the authors.
UR - http://www.scopus.com/inward/record.url?scp=70349943977&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04174-7_44
DO - 10.1007/978-3-642-04174-7_44
M3 - Conference contribution
AN - SCOPUS:70349943977
SN - 3642041736
SN - 9783642041730
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 678
EP - 693
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings
T2 - European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2009
Y2 - 7 September 2009 through 11 September 2009
ER -