Universal learning over related distributions and adaptive graph transduction

Erheng Zhong, Wei Fan, Jing Peng, Olivier Verscheure, Jiangtao Ren

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The basis assumption that "training and test data drawn from the same distribution" is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under "different but related distributions" in a single framework. Explicit examples include (a) sample selection bias between training and testing data, (b) transfer learning or no labeled data in target domain, and (c) noisy or uncertain training data. The main motivation is that one could ideally solve as many problems as possible with a single approach. The proposed solution extends graph transduction using the maximum margin principle over unlabeled data. The error of the proposed method is bounded under reasonable assumptions even when the training and testing distributions are different. Experiment results demonstrate that the proposed method improves the traditional graph transduction by as much as 15% in accuracy and AUC in all common situations of distribution difference. Most importantly, it outperforms, by up to 10% in accuracy, several state-of-art approaches proposed to solve specific category of distribution difference, i.e, BRSD [1] for sample selection bias, CDSC [2] for transfer learning, etc. The main claim is that the adaptive graph transduction is a general and competitive method to solve distribution differences implicitly without knowing and worrying about the exact type. These at least include sample selection bias, transfer learning, uncertainty mining, as well as those alike that are still not studied yet. The source code and datasets are available from the authors.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings
Pages678-693
Number of pages16
EditionPART 2
DOIs
StatePublished - 19 Oct 2009
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2009 - Bled, Slovenia
Duration: 7 Sep 200911 Sep 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume5782 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2009
CountrySlovenia
CityBled
Period7/09/0911/09/09

Fingerprint

Transfer Learning
Sample Selection
Selection Bias
Graph in graph theory
Maximum principle
Testing
Alike
Margin
Learning
Mining
Experiments
Cover
Uncertainty
Scenarios
Target
Training
Demonstrate
Experiment

Cite this

Zhong, E., Fan, W., Peng, J., Verscheure, O., & Ren, J. (2009). Universal learning over related distributions and adaptive graph transduction. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings (PART 2 ed., pp. 678-693). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5782 LNAI, No. PART 2). https://doi.org/10.1007/978-3-642-04174-7_44
Zhong, Erheng ; Fan, Wei ; Peng, Jing ; Verscheure, Olivier ; Ren, Jiangtao. / Universal learning over related distributions and adaptive graph transduction. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings. PART 2. ed. 2009. pp. 678-693 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2).
@inproceedings{a5a1f194079d4db4984ed4ed8a49ca80,
title = "Universal learning over related distributions and adaptive graph transduction",
abstract = "The basis assumption that {"}training and test data drawn from the same distribution{"} is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under {"}different but related distributions{"} in a single framework. Explicit examples include (a) sample selection bias between training and testing data, (b) transfer learning or no labeled data in target domain, and (c) noisy or uncertain training data. The main motivation is that one could ideally solve as many problems as possible with a single approach. The proposed solution extends graph transduction using the maximum margin principle over unlabeled data. The error of the proposed method is bounded under reasonable assumptions even when the training and testing distributions are different. Experiment results demonstrate that the proposed method improves the traditional graph transduction by as much as 15{\%} in accuracy and AUC in all common situations of distribution difference. Most importantly, it outperforms, by up to 10{\%} in accuracy, several state-of-art approaches proposed to solve specific category of distribution difference, i.e, BRSD [1] for sample selection bias, CDSC [2] for transfer learning, etc. The main claim is that the adaptive graph transduction is a general and competitive method to solve distribution differences implicitly without knowing and worrying about the exact type. These at least include sample selection bias, transfer learning, uncertainty mining, as well as those alike that are still not studied yet. The source code and datasets are available from the authors.",
author = "Erheng Zhong and Wei Fan and Jing Peng and Olivier Verscheure and Jiangtao Ren",
year = "2009",
month = "10",
day = "19",
doi = "10.1007/978-3-642-04174-7_44",
language = "English",
isbn = "3642041736",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 2",
pages = "678--693",
booktitle = "Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings",
edition = "PART 2",

}

Zhong, E, Fan, W, Peng, J, Verscheure, O & Ren, J 2009, Universal learning over related distributions and adaptive graph transduction. in Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings. PART 2 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 2, vol. 5782 LNAI, pp. 678-693, European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2009, Bled, Slovenia, 7/09/09. https://doi.org/10.1007/978-3-642-04174-7_44

Universal learning over related distributions and adaptive graph transduction. / Zhong, Erheng; Fan, Wei; Peng, Jing; Verscheure, Olivier; Ren, Jiangtao.

Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings. PART 2. ed. 2009. p. 678-693 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5782 LNAI, No. PART 2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Universal learning over related distributions and adaptive graph transduction

AU - Zhong, Erheng

AU - Fan, Wei

AU - Peng, Jing

AU - Verscheure, Olivier

AU - Ren, Jiangtao

PY - 2009/10/19

Y1 - 2009/10/19

N2 - The basis assumption that "training and test data drawn from the same distribution" is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under "different but related distributions" in a single framework. Explicit examples include (a) sample selection bias between training and testing data, (b) transfer learning or no labeled data in target domain, and (c) noisy or uncertain training data. The main motivation is that one could ideally solve as many problems as possible with a single approach. The proposed solution extends graph transduction using the maximum margin principle over unlabeled data. The error of the proposed method is bounded under reasonable assumptions even when the training and testing distributions are different. Experiment results demonstrate that the proposed method improves the traditional graph transduction by as much as 15% in accuracy and AUC in all common situations of distribution difference. Most importantly, it outperforms, by up to 10% in accuracy, several state-of-art approaches proposed to solve specific category of distribution difference, i.e, BRSD [1] for sample selection bias, CDSC [2] for transfer learning, etc. The main claim is that the adaptive graph transduction is a general and competitive method to solve distribution differences implicitly without knowing and worrying about the exact type. These at least include sample selection bias, transfer learning, uncertainty mining, as well as those alike that are still not studied yet. The source code and datasets are available from the authors.

AB - The basis assumption that "training and test data drawn from the same distribution" is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under "different but related distributions" in a single framework. Explicit examples include (a) sample selection bias between training and testing data, (b) transfer learning or no labeled data in target domain, and (c) noisy or uncertain training data. The main motivation is that one could ideally solve as many problems as possible with a single approach. The proposed solution extends graph transduction using the maximum margin principle over unlabeled data. The error of the proposed method is bounded under reasonable assumptions even when the training and testing distributions are different. Experiment results demonstrate that the proposed method improves the traditional graph transduction by as much as 15% in accuracy and AUC in all common situations of distribution difference. Most importantly, it outperforms, by up to 10% in accuracy, several state-of-art approaches proposed to solve specific category of distribution difference, i.e, BRSD [1] for sample selection bias, CDSC [2] for transfer learning, etc. The main claim is that the adaptive graph transduction is a general and competitive method to solve distribution differences implicitly without knowing and worrying about the exact type. These at least include sample selection bias, transfer learning, uncertainty mining, as well as those alike that are still not studied yet. The source code and datasets are available from the authors.

UR - http://www.scopus.com/inward/record.url?scp=70349943977&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-04174-7_44

DO - 10.1007/978-3-642-04174-7_44

M3 - Conference contribution

AN - SCOPUS:70349943977

SN - 3642041736

SN - 9783642041730

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 678

EP - 693

BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings

ER -

Zhong E, Fan W, Peng J, Verscheure O, Ren J. Universal learning over related distributions and adaptive graph transduction. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings. PART 2 ed. 2009. p. 678-693. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2). https://doi.org/10.1007/978-3-642-04174-7_44