Designing semantics-preserving cluster representatives for scientific input conditions

Aparna Varde, Elke A. Rundensteiner, Carolina Ruiz, David C. Brown, Mohammmed Maniruzzaman, Richard D. Sisson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In scientific domains, knowledge is often discovered from experiments by grouping or clustering them based on the similarity of their output. The causes of similarity are analyzed based on the input conditions characterizing a given type of output, i.e., a given cluster. This analysis helps in applications such as decision support in industry. Cluster representatives form at-a-glance depictions for such applications. Randomly selecting a set of conditions in a cluster as its representative is not sufficient since distinct combinations of inputs could lead to the same cluster. In this paper, an approach called DesCond is proposed to design semantics-preserving cluster representatives for scientific input conditions. We define a notion of distance for conditions to capture semantics based on the types of their attributes and their relative importance. Using this distance, methods of building candidate cluster representatives with different levels of detail are proposed. Candidates are compared using the DesCond Encoding proposed in this paper that assesses their complexity and information loss, given user interests. The candidate with the lowest encoding for each cluster is returned as its designed representative. DesCond is evaluated with real data from Materials Science. Evaluation with domain expert interviews and formal user surveys shows that designed representatives consistently outperform randomly selected ones and different candidates suit different users.

Original languageEnglish
Title of host publicationProceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006
Pages708-717
Number of pages10
DOIs
StatePublished - 1 Dec 2006
Event15th ACM Conference on Information and Knowledge Management, CIKM 2006 - Arlington, VA, United States
Duration: 6 Nov 200611 Nov 2006

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other15th ACM Conference on Information and Knowledge Management, CIKM 2006
CountryUnited States
CityArlington, VA
Period6/11/0611/11/06

Fingerprint

Industry
Grouping
Experiment
Decision support
Domain knowledge
Evaluation
Clustering
Relative importance

Keywords

  • Decision trees
  • Distance metrics
  • Domain knowledge
  • Minimum description length
  • Post-processing
  • Visual displays

Cite this

Varde, A., Rundensteiner, E. A., Ruiz, C., Brown, D. C., Maniruzzaman, M., & Sisson, R. D. (2006). Designing semantics-preserving cluster representatives for scientific input conditions. In Proceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006 (pp. 708-717). (International Conference on Information and Knowledge Management, Proceedings). https://doi.org/10.1145/1183614.1183715
Varde, Aparna ; Rundensteiner, Elke A. ; Ruiz, Carolina ; Brown, David C. ; Maniruzzaman, Mohammmed ; Sisson, Richard D. / Designing semantics-preserving cluster representatives for scientific input conditions. Proceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006. 2006. pp. 708-717 (International Conference on Information and Knowledge Management, Proceedings).
@inproceedings{39e0ffd891554ea8bf5d7dad2ec3d6f6,
title = "Designing semantics-preserving cluster representatives for scientific input conditions",
abstract = "In scientific domains, knowledge is often discovered from experiments by grouping or clustering them based on the similarity of their output. The causes of similarity are analyzed based on the input conditions characterizing a given type of output, i.e., a given cluster. This analysis helps in applications such as decision support in industry. Cluster representatives form at-a-glance depictions for such applications. Randomly selecting a set of conditions in a cluster as its representative is not sufficient since distinct combinations of inputs could lead to the same cluster. In this paper, an approach called DesCond is proposed to design semantics-preserving cluster representatives for scientific input conditions. We define a notion of distance for conditions to capture semantics based on the types of their attributes and their relative importance. Using this distance, methods of building candidate cluster representatives with different levels of detail are proposed. Candidates are compared using the DesCond Encoding proposed in this paper that assesses their complexity and information loss, given user interests. The candidate with the lowest encoding for each cluster is returned as its designed representative. DesCond is evaluated with real data from Materials Science. Evaluation with domain expert interviews and formal user surveys shows that designed representatives consistently outperform randomly selected ones and different candidates suit different users.",
keywords = "Decision trees, Distance metrics, Domain knowledge, Minimum description length, Post-processing, Visual displays",
author = "Aparna Varde and Rundensteiner, {Elke A.} and Carolina Ruiz and Brown, {David C.} and Mohammmed Maniruzzaman and Sisson, {Richard D.}",
year = "2006",
month = "12",
day = "1",
doi = "10.1145/1183614.1183715",
language = "English",
isbn = "1595934332",
series = "International Conference on Information and Knowledge Management, Proceedings",
pages = "708--717",
booktitle = "Proceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006",

}

Varde, A, Rundensteiner, EA, Ruiz, C, Brown, DC, Maniruzzaman, M & Sisson, RD 2006, Designing semantics-preserving cluster representatives for scientific input conditions. in Proceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006. International Conference on Information and Knowledge Management, Proceedings, pp. 708-717, 15th ACM Conference on Information and Knowledge Management, CIKM 2006, Arlington, VA, United States, 6/11/06. https://doi.org/10.1145/1183614.1183715

Designing semantics-preserving cluster representatives for scientific input conditions. / Varde, Aparna; Rundensteiner, Elke A.; Ruiz, Carolina; Brown, David C.; Maniruzzaman, Mohammmed; Sisson, Richard D.

Proceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006. 2006. p. 708-717 (International Conference on Information and Knowledge Management, Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Designing semantics-preserving cluster representatives for scientific input conditions

AU - Varde, Aparna

AU - Rundensteiner, Elke A.

AU - Ruiz, Carolina

AU - Brown, David C.

AU - Maniruzzaman, Mohammmed

AU - Sisson, Richard D.

PY - 2006/12/1

Y1 - 2006/12/1

N2 - In scientific domains, knowledge is often discovered from experiments by grouping or clustering them based on the similarity of their output. The causes of similarity are analyzed based on the input conditions characterizing a given type of output, i.e., a given cluster. This analysis helps in applications such as decision support in industry. Cluster representatives form at-a-glance depictions for such applications. Randomly selecting a set of conditions in a cluster as its representative is not sufficient since distinct combinations of inputs could lead to the same cluster. In this paper, an approach called DesCond is proposed to design semantics-preserving cluster representatives for scientific input conditions. We define a notion of distance for conditions to capture semantics based on the types of their attributes and their relative importance. Using this distance, methods of building candidate cluster representatives with different levels of detail are proposed. Candidates are compared using the DesCond Encoding proposed in this paper that assesses their complexity and information loss, given user interests. The candidate with the lowest encoding for each cluster is returned as its designed representative. DesCond is evaluated with real data from Materials Science. Evaluation with domain expert interviews and formal user surveys shows that designed representatives consistently outperform randomly selected ones and different candidates suit different users.

AB - In scientific domains, knowledge is often discovered from experiments by grouping or clustering them based on the similarity of their output. The causes of similarity are analyzed based on the input conditions characterizing a given type of output, i.e., a given cluster. This analysis helps in applications such as decision support in industry. Cluster representatives form at-a-glance depictions for such applications. Randomly selecting a set of conditions in a cluster as its representative is not sufficient since distinct combinations of inputs could lead to the same cluster. In this paper, an approach called DesCond is proposed to design semantics-preserving cluster representatives for scientific input conditions. We define a notion of distance for conditions to capture semantics based on the types of their attributes and their relative importance. Using this distance, methods of building candidate cluster representatives with different levels of detail are proposed. Candidates are compared using the DesCond Encoding proposed in this paper that assesses their complexity and information loss, given user interests. The candidate with the lowest encoding for each cluster is returned as its designed representative. DesCond is evaluated with real data from Materials Science. Evaluation with domain expert interviews and formal user surveys shows that designed representatives consistently outperform randomly selected ones and different candidates suit different users.

KW - Decision trees

KW - Distance metrics

KW - Domain knowledge

KW - Minimum description length

KW - Post-processing

KW - Visual displays

UR - http://www.scopus.com/inward/record.url?scp=34547626954&partnerID=8YFLogxK

U2 - 10.1145/1183614.1183715

DO - 10.1145/1183614.1183715

M3 - Conference contribution

AN - SCOPUS:34547626954

SN - 1595934332

SN - 9781595934338

T3 - International Conference on Information and Knowledge Management, Proceedings

SP - 708

EP - 717

BT - Proceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006

ER -

Varde A, Rundensteiner EA, Ruiz C, Brown DC, Maniruzzaman M, Sisson RD. Designing semantics-preserving cluster representatives for scientific input conditions. In Proceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006. 2006. p. 708-717. (International Conference on Information and Knowledge Management, Proceedings). https://doi.org/10.1145/1183614.1183715