TY - GEN
T1 - Learning semantics-preserving distance metrics for clustering graphical data
AU - Varde, Aparna S.
AU - Rundensteiner, Elke A.
AU - Ruiz, Carolina
AU - Maniruzzaman, Mohammed
AU - Sisson, Richard D.
PY - 2005
Y1 - 2005
N2 - In mining graphical data the default Euclidean distance is often used as a notion of similarity. However this does not adequately capture semantics in our targeted domains, having graphical representations depicting results of scientific experiments. It is seldom known a-priori what other distance metric best preserves semantics. This motivates the need to learn such a metric. A technique called LearnMet is proposed here to learn a domain-specific distance metric for graphical representations. Input to LearnMet is a training set of correct clusters of such graphs. LearnMet iteratively compares these correct clusters with those obtained from an arbitrary but fixed clustering algorithm. In the first iteration a guessed metric is used for clustering. This metric is then refined using the error between the obtained and correct clusters until the error is below a given threshold. LearnMet is evaluated rigorously in the Heat Treating domain which motivated this research. Clusters obtained using the learned metric and clusters obtained using Euclidean distance are both compared against the correct clusters over a separate test set. Our results show that the learned metric provides better clusters.
AB - In mining graphical data the default Euclidean distance is often used as a notion of similarity. However this does not adequately capture semantics in our targeted domains, having graphical representations depicting results of scientific experiments. It is seldom known a-priori what other distance metric best preserves semantics. This motivates the need to learn such a metric. A technique called LearnMet is proposed here to learn a domain-specific distance metric for graphical representations. Input to LearnMet is a training set of correct clusters of such graphs. LearnMet iteratively compares these correct clusters with those obtained from an arbitrary but fixed clustering algorithm. In the first iteration a guessed metric is used for clustering. This metric is then refined using the error between the obtained and correct clusters until the error is below a given threshold. LearnMet is evaluated rigorously in the Heat Treating domain which motivated this research. Clusters obtained using the learned metric and clusters obtained using Euclidean distance are both compared against the correct clusters over a separate test set. Our results show that the learned metric provides better clusters.
KW - clustering
KW - distance metric
KW - semantic graphical mining
UR - http://www.scopus.com/inward/record.url?scp=34548560762&partnerID=8YFLogxK
U2 - 10.1145/1133890.1133904
DO - 10.1145/1133890.1133904
M3 - Conference contribution
AN - SCOPUS:34548560762
SN - 159593216X
SN - 9781595932167
T3 - Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data
SP - 107
EP - 112
BT - Proceedings of the 6th International Workshop on Multimedia Data Mining, MDM '05
T2 - 6th International Workshop on Multimedia Data Mining, MDM '05: Mining Integrated Media and Complex Data
Y2 - 21 August 2005 through 21 August 2005
ER -