Component selection to optimize distance function learning in complex scientific data sets

Aparna Varde, Stephen Bique, Elke Rundensteiner, David Brown, Jianyu Liang, Richard Sisson, Ehsan Sheybani, Brian Sayre

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Abstract

Analyzing complex scientific data, e.g., graphs and images, often requires comparison of features: regions on graphs, visual aspects of images and related metadata, some features being relatively more important. The notion of similarity for comparison is typically distance between data objects which could be expressed as distance between features. We refer to distance based on each feature as a component. Weights of components representing relative importance of features could be learned using distance function learning algorithms. However, it is seldom known which components optimize learning, given criteria such as accuracy, efficiency and simplicity. This is the problem we address. We propose and theoretically compare four component selection approaches: Maximal Path Traversal, Minimal Path Traversal, Maximal Path Traversal with Pruning and Minimal Path Traversal with Pruning. Experimental evaluation is conducted using real data from Materials Science, Nanotechnology and Bioinformatics. A trademarked software tool is developed as a highlight of this work.

Original languageEnglish
Title of host publicationDatabase and Expert Systems Applications - 19th International Conference, DEXA 2008, Proceedings
Pages269-282
Number of pages14
DOIs
StatePublished - 6 Oct 2008
Event19th International Conference on Database and Expert Systems Applications, DEXA 2008 - Turin, Italy
Duration: 1 Sep 20085 Sep 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5181 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other19th International Conference on Database and Expert Systems Applications, DEXA 2008
CountryItaly
CityTurin
Period1/09/085/09/08

Fingerprint

Materials science
Bioinformatics
Distance Function
Metadata
Nanotechnology
Learning algorithms
Optimise
Minimal Path
Pruning
Path
Materials Science
Graph in graph theory
Software Tools
Experimental Evaluation
Learning Algorithm
Simplicity
Learning

Keywords

  • Data Mining
  • Feature Selection
  • Multimedia
  • Scientific Analysis

Cite this

Varde, A., Bique, S., Rundensteiner, E., Brown, D., Liang, J., Sisson, R., ... Sayre, B. (2008). Component selection to optimize distance function learning in complex scientific data sets. In Database and Expert Systems Applications - 19th International Conference, DEXA 2008, Proceedings (pp. 269-282). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5181 LNCS). https://doi.org/10.1007/978-3-540-85654-2_27
Varde, Aparna ; Bique, Stephen ; Rundensteiner, Elke ; Brown, David ; Liang, Jianyu ; Sisson, Richard ; Sheybani, Ehsan ; Sayre, Brian. / Component selection to optimize distance function learning in complex scientific data sets. Database and Expert Systems Applications - 19th International Conference, DEXA 2008, Proceedings. 2008. pp. 269-282 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{edccdba844f1400dbfb2dcc8c220e73b,
title = "Component selection to optimize distance function learning in complex scientific data sets",
abstract = "Analyzing complex scientific data, e.g., graphs and images, often requires comparison of features: regions on graphs, visual aspects of images and related metadata, some features being relatively more important. The notion of similarity for comparison is typically distance between data objects which could be expressed as distance between features. We refer to distance based on each feature as a component. Weights of components representing relative importance of features could be learned using distance function learning algorithms. However, it is seldom known which components optimize learning, given criteria such as accuracy, efficiency and simplicity. This is the problem we address. We propose and theoretically compare four component selection approaches: Maximal Path Traversal, Minimal Path Traversal, Maximal Path Traversal with Pruning and Minimal Path Traversal with Pruning. Experimental evaluation is conducted using real data from Materials Science, Nanotechnology and Bioinformatics. A trademarked software tool is developed as a highlight of this work.",
keywords = "Data Mining, Feature Selection, Multimedia, Scientific Analysis",
author = "Aparna Varde and Stephen Bique and Elke Rundensteiner and David Brown and Jianyu Liang and Richard Sisson and Ehsan Sheybani and Brian Sayre",
year = "2008",
month = "10",
day = "6",
doi = "10.1007/978-3-540-85654-2_27",
language = "English",
isbn = "3540856536",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "269--282",
booktitle = "Database and Expert Systems Applications - 19th International Conference, DEXA 2008, Proceedings",

}

Varde, A, Bique, S, Rundensteiner, E, Brown, D, Liang, J, Sisson, R, Sheybani, E & Sayre, B 2008, Component selection to optimize distance function learning in complex scientific data sets. in Database and Expert Systems Applications - 19th International Conference, DEXA 2008, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5181 LNCS, pp. 269-282, 19th International Conference on Database and Expert Systems Applications, DEXA 2008, Turin, Italy, 1/09/08. https://doi.org/10.1007/978-3-540-85654-2_27

Component selection to optimize distance function learning in complex scientific data sets. / Varde, Aparna; Bique, Stephen; Rundensteiner, Elke; Brown, David; Liang, Jianyu; Sisson, Richard; Sheybani, Ehsan; Sayre, Brian.

Database and Expert Systems Applications - 19th International Conference, DEXA 2008, Proceedings. 2008. p. 269-282 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5181 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

TY - GEN

T1 - Component selection to optimize distance function learning in complex scientific data sets

AU - Varde, Aparna

AU - Bique, Stephen

AU - Rundensteiner, Elke

AU - Brown, David

AU - Liang, Jianyu

AU - Sisson, Richard

AU - Sheybani, Ehsan

AU - Sayre, Brian

PY - 2008/10/6

Y1 - 2008/10/6

N2 - Analyzing complex scientific data, e.g., graphs and images, often requires comparison of features: regions on graphs, visual aspects of images and related metadata, some features being relatively more important. The notion of similarity for comparison is typically distance between data objects which could be expressed as distance between features. We refer to distance based on each feature as a component. Weights of components representing relative importance of features could be learned using distance function learning algorithms. However, it is seldom known which components optimize learning, given criteria such as accuracy, efficiency and simplicity. This is the problem we address. We propose and theoretically compare four component selection approaches: Maximal Path Traversal, Minimal Path Traversal, Maximal Path Traversal with Pruning and Minimal Path Traversal with Pruning. Experimental evaluation is conducted using real data from Materials Science, Nanotechnology and Bioinformatics. A trademarked software tool is developed as a highlight of this work.

AB - Analyzing complex scientific data, e.g., graphs and images, often requires comparison of features: regions on graphs, visual aspects of images and related metadata, some features being relatively more important. The notion of similarity for comparison is typically distance between data objects which could be expressed as distance between features. We refer to distance based on each feature as a component. Weights of components representing relative importance of features could be learned using distance function learning algorithms. However, it is seldom known which components optimize learning, given criteria such as accuracy, efficiency and simplicity. This is the problem we address. We propose and theoretically compare four component selection approaches: Maximal Path Traversal, Minimal Path Traversal, Maximal Path Traversal with Pruning and Minimal Path Traversal with Pruning. Experimental evaluation is conducted using real data from Materials Science, Nanotechnology and Bioinformatics. A trademarked software tool is developed as a highlight of this work.

KW - Data Mining

KW - Feature Selection

KW - Multimedia

KW - Scientific Analysis

UR - http://www.scopus.com/inward/record.url?scp=52949093169&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-85654-2_27

DO - 10.1007/978-3-540-85654-2_27

M3 - Conference contribution

SN - 3540856536

SN - 9783540856535

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 269

EP - 282

BT - Database and Expert Systems Applications - 19th International Conference, DEXA 2008, Proceedings

ER -

Varde A, Bique S, Rundensteiner E, Brown D, Liang J, Sisson R et al. Component selection to optimize distance function learning in complex scientific data sets. In Database and Expert Systems Applications - 19th International Conference, DEXA 2008, Proceedings. 2008. p. 269-282. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-85654-2_27