TY - GEN
T1 - Component selection to optimize distance function learning in complex scientific data sets
AU - Varde, Aparna
AU - Bique, Stephen
AU - Rundensteiner, Elke
AU - Brown, David
AU - Liang, Jianyu
AU - Sisson, Richard
AU - Sheybani, Ehsan
AU - Sayre, Brian
PY - 2008
Y1 - 2008
N2 - Analyzing complex scientific data, e.g., graphs and images, often requires comparison of features: regions on graphs, visual aspects of images and related metadata, some features being relatively more important. The notion of similarity for comparison is typically distance between data objects which could be expressed as distance between features. We refer to distance based on each feature as a component. Weights of components representing relative importance of features could be learned using distance function learning algorithms. However, it is seldom known which components optimize learning, given criteria such as accuracy, efficiency and simplicity. This is the problem we address. We propose and theoretically compare four component selection approaches: Maximal Path Traversal, Minimal Path Traversal, Maximal Path Traversal with Pruning and Minimal Path Traversal with Pruning. Experimental evaluation is conducted using real data from Materials Science, Nanotechnology and Bioinformatics. A trademarked software tool is developed as a highlight of this work.
AB - Analyzing complex scientific data, e.g., graphs and images, often requires comparison of features: regions on graphs, visual aspects of images and related metadata, some features being relatively more important. The notion of similarity for comparison is typically distance between data objects which could be expressed as distance between features. We refer to distance based on each feature as a component. Weights of components representing relative importance of features could be learned using distance function learning algorithms. However, it is seldom known which components optimize learning, given criteria such as accuracy, efficiency and simplicity. This is the problem we address. We propose and theoretically compare four component selection approaches: Maximal Path Traversal, Minimal Path Traversal, Maximal Path Traversal with Pruning and Minimal Path Traversal with Pruning. Experimental evaluation is conducted using real data from Materials Science, Nanotechnology and Bioinformatics. A trademarked software tool is developed as a highlight of this work.
KW - Data Mining
KW - Feature Selection
KW - Multimedia
KW - Scientific Analysis
UR - http://www.scopus.com/inward/record.url?scp=52949093169&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-85654-2_27
DO - 10.1007/978-3-540-85654-2_27
M3 - Conference contribution
AN - SCOPUS:52949093169
SN - 3540856536
SN - 9783540856535
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 269
EP - 282
BT - Database and Expert Systems Applications - 19th International Conference, DEXA 2008, Proceedings
T2 - 19th International Conference on Database and Expert Systems Applications, DEXA 2008
Y2 - 1 September 2008 through 5 September 2008
ER -