Composite kernels for semi-supervised clustering

Carlotta Domeniconi, Jing Peng, Bojun Yan

Research output: Contribution to journalArticleResearchpeer-review

20 Citations (Scopus)

Abstract

A critical problem related to kernel-based methods is how to select optimal kernels. A kernel function must conform to the learning target in order to obtain meaningful results. While solutions to the problem of estimating optimal kernel functions and corresponding parameters have been proposed in a supervised setting, it remains a challenge when no labeled data are available, and all we have is a set of pairwise must-link and cannot-link constraints. In this paper, we address the problem of optimizing the kernel function using pairwise constraints for semi-supervised clustering. We propose a new optimization criterion for automatically estimating the optimal parameters of composite Gaussian kernels, directly from the data and given constraints. We combine our proposal with a semi-supervised kernel-based algorithm to demonstrate experimentally the effectiveness of our approach. The results show that our method is very effective for kernel-based semi-supervised clustering.

Original languageEnglish
Pages (from-to)99-116
Number of pages18
JournalKnowledge and Information Systems
Volume28
Issue number1
DOIs
StatePublished - 1 Jul 2011

Fingerprint

Composite materials

Keywords

  • Clustering
  • Kernel methods
  • Semi-supervised clustering

Cite this

Domeniconi, Carlotta ; Peng, Jing ; Yan, Bojun. / Composite kernels for semi-supervised clustering. In: Knowledge and Information Systems. 2011 ; Vol. 28, No. 1. pp. 99-116.
@article{7ded6e6a0100482c83297feaad901938,
title = "Composite kernels for semi-supervised clustering",
abstract = "A critical problem related to kernel-based methods is how to select optimal kernels. A kernel function must conform to the learning target in order to obtain meaningful results. While solutions to the problem of estimating optimal kernel functions and corresponding parameters have been proposed in a supervised setting, it remains a challenge when no labeled data are available, and all we have is a set of pairwise must-link and cannot-link constraints. In this paper, we address the problem of optimizing the kernel function using pairwise constraints for semi-supervised clustering. We propose a new optimization criterion for automatically estimating the optimal parameters of composite Gaussian kernels, directly from the data and given constraints. We combine our proposal with a semi-supervised kernel-based algorithm to demonstrate experimentally the effectiveness of our approach. The results show that our method is very effective for kernel-based semi-supervised clustering.",
keywords = "Clustering, Kernel methods, Semi-supervised clustering",
author = "Carlotta Domeniconi and Jing Peng and Bojun Yan",
year = "2011",
month = "7",
day = "1",
doi = "10.1007/s10115-010-0318-8",
language = "English",
volume = "28",
pages = "99--116",
journal = "Knowledge and Information Systems",
issn = "0219-1377",
publisher = "Springer London",
number = "1",

}

Composite kernels for semi-supervised clustering. / Domeniconi, Carlotta; Peng, Jing; Yan, Bojun.

In: Knowledge and Information Systems, Vol. 28, No. 1, 01.07.2011, p. 99-116.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Composite kernels for semi-supervised clustering

AU - Domeniconi, Carlotta

AU - Peng, Jing

AU - Yan, Bojun

PY - 2011/7/1

Y1 - 2011/7/1

N2 - A critical problem related to kernel-based methods is how to select optimal kernels. A kernel function must conform to the learning target in order to obtain meaningful results. While solutions to the problem of estimating optimal kernel functions and corresponding parameters have been proposed in a supervised setting, it remains a challenge when no labeled data are available, and all we have is a set of pairwise must-link and cannot-link constraints. In this paper, we address the problem of optimizing the kernel function using pairwise constraints for semi-supervised clustering. We propose a new optimization criterion for automatically estimating the optimal parameters of composite Gaussian kernels, directly from the data and given constraints. We combine our proposal with a semi-supervised kernel-based algorithm to demonstrate experimentally the effectiveness of our approach. The results show that our method is very effective for kernel-based semi-supervised clustering.

AB - A critical problem related to kernel-based methods is how to select optimal kernels. A kernel function must conform to the learning target in order to obtain meaningful results. While solutions to the problem of estimating optimal kernel functions and corresponding parameters have been proposed in a supervised setting, it remains a challenge when no labeled data are available, and all we have is a set of pairwise must-link and cannot-link constraints. In this paper, we address the problem of optimizing the kernel function using pairwise constraints for semi-supervised clustering. We propose a new optimization criterion for automatically estimating the optimal parameters of composite Gaussian kernels, directly from the data and given constraints. We combine our proposal with a semi-supervised kernel-based algorithm to demonstrate experimentally the effectiveness of our approach. The results show that our method is very effective for kernel-based semi-supervised clustering.

KW - Clustering

KW - Kernel methods

KW - Semi-supervised clustering

UR - http://www.scopus.com/inward/record.url?scp=79959506150&partnerID=8YFLogxK

U2 - 10.1007/s10115-010-0318-8

DO - 10.1007/s10115-010-0318-8

M3 - Article

VL - 28

SP - 99

EP - 116

JO - Knowledge and Information Systems

JF - Knowledge and Information Systems

SN - 0219-1377

IS - 1

ER -