Learning semantics-preserving distance metrics for clustering graphical data

Aparna Varde, Elke A. Rundensteiner, Carolina Ruiz, Mohammed Maniruzzaman, Richard D. Sisson

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

7 Citations (Scopus)

Abstract

In mining graphical data the default Euclidean distance is often used as a notion of similarity. However this does not adequately capture semantics in our targeted domains, having graphical representations depicting results of scientific experiments. It is seldom known a-priori what other distance metric best preserves semantics. This motivates the need to learn such a metric. A technique called LearnMet is proposed here to learn a domain-specific distance metric for graphical representations. Input to LearnMet is a training set of correct clusters of such graphs. LearnMet iteratively compares these correct clusters with those obtained from an arbitrary but fixed clustering algorithm. In the first iteration a guessed metric is used for clustering. This metric is then refined using the error between the obtained and correct clusters until the error is below a given threshold. LearnMet is evaluated rigorously in the Heat Treating domain which motivated this research. Clusters obtained using the learned metric and clusters obtained using Euclidean distance are both compared against the correct clusters over a separate test set. Our results show that the learned metric provides better clusters.

Original languageEnglish
Title of host publicationProceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05
Subtitle of host publicationMining Integrated Media and Complex Data
Pages107-112
Number of pages6
DOIs
StatePublished - 1 Dec 2005
Event6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data - Chicago, IL, United States
Duration: 21 Aug 200521 Aug 2005

Publication series

NameProceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data

Other

Other6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data
CountryUnited States
CityChicago, IL
Period21/08/0521/08/05

Fingerprint

Semantics
Clustering algorithms
Experiments
Hot Temperature

Keywords

  • clustering
  • distance metric
  • semantic graphical mining

Cite this

Varde, A., Rundensteiner, E. A., Ruiz, C., Maniruzzaman, M., & Sisson, R. D. (2005). Learning semantics-preserving distance metrics for clustering graphical data. In Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data (pp. 107-112). (Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data). https://doi.org/10.1145/1133890.1133904
Varde, Aparna ; Rundensteiner, Elke A. ; Ruiz, Carolina ; Maniruzzaman, Mohammed ; Sisson, Richard D. / Learning semantics-preserving distance metrics for clustering graphical data. Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data. 2005. pp. 107-112 (Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data).
@inproceedings{b32a6fdb09b44b699df6c42d19ad13ec,
title = "Learning semantics-preserving distance metrics for clustering graphical data",
abstract = "In mining graphical data the default Euclidean distance is often used as a notion of similarity. However this does not adequately capture semantics in our targeted domains, having graphical representations depicting results of scientific experiments. It is seldom known a-priori what other distance metric best preserves semantics. This motivates the need to learn such a metric. A technique called LearnMet is proposed here to learn a domain-specific distance metric for graphical representations. Input to LearnMet is a training set of correct clusters of such graphs. LearnMet iteratively compares these correct clusters with those obtained from an arbitrary but fixed clustering algorithm. In the first iteration a guessed metric is used for clustering. This metric is then refined using the error between the obtained and correct clusters until the error is below a given threshold. LearnMet is evaluated rigorously in the Heat Treating domain which motivated this research. Clusters obtained using the learned metric and clusters obtained using Euclidean distance are both compared against the correct clusters over a separate test set. Our results show that the learned metric provides better clusters.",
keywords = "clustering, distance metric, semantic graphical mining",
author = "Aparna Varde and Rundensteiner, {Elke A.} and Carolina Ruiz and Mohammed Maniruzzaman and Sisson, {Richard D.}",
year = "2005",
month = "12",
day = "1",
doi = "10.1145/1133890.1133904",
language = "English",
isbn = "159593216X",
series = "Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data",
pages = "107--112",
booktitle = "Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05",

}

Varde, A, Rundensteiner, EA, Ruiz, C, Maniruzzaman, M & Sisson, RD 2005, Learning semantics-preserving distance metrics for clustering graphical data. in Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data. Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data, pp. 107-112, 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data, Chicago, IL, United States, 21/08/05. https://doi.org/10.1145/1133890.1133904

Learning semantics-preserving distance metrics for clustering graphical data. / Varde, Aparna; Rundensteiner, Elke A.; Ruiz, Carolina; Maniruzzaman, Mohammed; Sisson, Richard D.

Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data. 2005. p. 107-112 (Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

TY - GEN

T1 - Learning semantics-preserving distance metrics for clustering graphical data

AU - Varde, Aparna

AU - Rundensteiner, Elke A.

AU - Ruiz, Carolina

AU - Maniruzzaman, Mohammed

AU - Sisson, Richard D.

PY - 2005/12/1

Y1 - 2005/12/1

N2 - In mining graphical data the default Euclidean distance is often used as a notion of similarity. However this does not adequately capture semantics in our targeted domains, having graphical representations depicting results of scientific experiments. It is seldom known a-priori what other distance metric best preserves semantics. This motivates the need to learn such a metric. A technique called LearnMet is proposed here to learn a domain-specific distance metric for graphical representations. Input to LearnMet is a training set of correct clusters of such graphs. LearnMet iteratively compares these correct clusters with those obtained from an arbitrary but fixed clustering algorithm. In the first iteration a guessed metric is used for clustering. This metric is then refined using the error between the obtained and correct clusters until the error is below a given threshold. LearnMet is evaluated rigorously in the Heat Treating domain which motivated this research. Clusters obtained using the learned metric and clusters obtained using Euclidean distance are both compared against the correct clusters over a separate test set. Our results show that the learned metric provides better clusters.

AB - In mining graphical data the default Euclidean distance is often used as a notion of similarity. However this does not adequately capture semantics in our targeted domains, having graphical representations depicting results of scientific experiments. It is seldom known a-priori what other distance metric best preserves semantics. This motivates the need to learn such a metric. A technique called LearnMet is proposed here to learn a domain-specific distance metric for graphical representations. Input to LearnMet is a training set of correct clusters of such graphs. LearnMet iteratively compares these correct clusters with those obtained from an arbitrary but fixed clustering algorithm. In the first iteration a guessed metric is used for clustering. This metric is then refined using the error between the obtained and correct clusters until the error is below a given threshold. LearnMet is evaluated rigorously in the Heat Treating domain which motivated this research. Clusters obtained using the learned metric and clusters obtained using Euclidean distance are both compared against the correct clusters over a separate test set. Our results show that the learned metric provides better clusters.

KW - clustering

KW - distance metric

KW - semantic graphical mining

UR - http://www.scopus.com/inward/record.url?scp=34548560762&partnerID=8YFLogxK

U2 - 10.1145/1133890.1133904

DO - 10.1145/1133890.1133904

M3 - Conference contribution

SN - 159593216X

SN - 9781595932167

T3 - Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data

SP - 107

EP - 112

BT - Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05

ER -

Varde A, Rundensteiner EA, Ruiz C, Maniruzzaman M, Sisson RD. Learning semantics-preserving distance metrics for clustering graphical data. In Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data. 2005. p. 107-112. (Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data). https://doi.org/10.1145/1133890.1133904