TY - GEN
T1 - Designing semantics-preserving cluster representatives for scientific input conditions
AU - Varde, Aparna S.
AU - Rundensteiner, Elke A.
AU - Ruiz, Carolina
AU - Brown, David C.
AU - Maniruzzaman, Mohammmed
AU - Sisson, Richard D.
PY - 2006
Y1 - 2006
N2 - In scientific domains, knowledge is often discovered from experiments by grouping or clustering them based on the similarity of their output. The causes of similarity are analyzed based on the input conditions characterizing a given type of output, i.e., a given cluster. This analysis helps in applications such as decision support in industry. Cluster representatives form at-a-glance depictions for such applications. Randomly selecting a set of conditions in a cluster as its representative is not sufficient since distinct combinations of inputs could lead to the same cluster. In this paper, an approach called DesCond is proposed to design semantics-preserving cluster representatives for scientific input conditions. We define a notion of distance for conditions to capture semantics based on the types of their attributes and their relative importance. Using this distance, methods of building candidate cluster representatives with different levels of detail are proposed. Candidates are compared using the DesCond Encoding proposed in this paper that assesses their complexity and information loss, given user interests. The candidate with the lowest encoding for each cluster is returned as its designed representative. DesCond is evaluated with real data from Materials Science. Evaluation with domain expert interviews and formal user surveys shows that designed representatives consistently outperform randomly selected ones and different candidates suit different users.
AB - In scientific domains, knowledge is often discovered from experiments by grouping or clustering them based on the similarity of their output. The causes of similarity are analyzed based on the input conditions characterizing a given type of output, i.e., a given cluster. This analysis helps in applications such as decision support in industry. Cluster representatives form at-a-glance depictions for such applications. Randomly selecting a set of conditions in a cluster as its representative is not sufficient since distinct combinations of inputs could lead to the same cluster. In this paper, an approach called DesCond is proposed to design semantics-preserving cluster representatives for scientific input conditions. We define a notion of distance for conditions to capture semantics based on the types of their attributes and their relative importance. Using this distance, methods of building candidate cluster representatives with different levels of detail are proposed. Candidates are compared using the DesCond Encoding proposed in this paper that assesses their complexity and information loss, given user interests. The candidate with the lowest encoding for each cluster is returned as its designed representative. DesCond is evaluated with real data from Materials Science. Evaluation with domain expert interviews and formal user surveys shows that designed representatives consistently outperform randomly selected ones and different candidates suit different users.
KW - Decision trees
KW - Distance metrics
KW - Domain knowledge
KW - Minimum description length
KW - Post-processing
KW - Visual displays
UR - http://www.scopus.com/inward/record.url?scp=34547626954&partnerID=8YFLogxK
U2 - 10.1145/1183614.1183715
DO - 10.1145/1183614.1183715
M3 - Conference contribution
AN - SCOPUS:34547626954
SN - 1595934332
SN - 9781595934338
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 708
EP - 717
BT - Proceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006
T2 - 15th ACM Conference on Information and Knowledge Management, CIKM 2006
Y2 - 6 November 2006 through 11 November 2006
ER -