Classifying idiomatic and literal expressions using vector space representations

Jing Peng, Anna Feldman, Hamza Jazmati

Research output: Contribution to journalConference article

3 Citations (Scopus)

Abstract

We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that idioms and literal expressions occur in different contexts. Idioms tend to violate cohesive ties in local contexts, while literals are expected to fit in. Our goal is to capture this intuition using a vector representation of words. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. We provide experimental results validating the proposed techniques.

Original languageEnglish
Pages (from-to)507-511
Number of pages5
JournalInternational Conference Recent Advances in Natural Language Processing, RANLP
Volume2015-January
StatePublished - 1 Jan 2015
Event10th International Conference on Recent Advances in Natural Language Processing, RANLP 2015 - Hissar, Bulgaria
Duration: 7 Sep 20159 Sep 2015

Fingerprint

Vector spaces
Covariance matrix

Cite this

@article{fc70337740c94732ab9cf6ddf84590fa,
title = "Classifying idiomatic and literal expressions using vector space representations",
abstract = "We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that idioms and literal expressions occur in different contexts. Idioms tend to violate cohesive ties in local contexts, while literals are expected to fit in. Our goal is to capture this intuition using a vector representation of words. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. We provide experimental results validating the proposed techniques.",
author = "Jing Peng and Anna Feldman and Hamza Jazmati",
year = "2015",
month = "1",
day = "1",
language = "English",
volume = "2015-January",
pages = "507--511",
journal = "International Conference Recent Advances in Natural Language Processing, RANLP",
issn = "1313-8502",
publisher = "Association for Computational Linguistics (ACL)",

}

Classifying idiomatic and literal expressions using vector space representations. / Peng, Jing; Feldman, Anna; Jazmati, Hamza.

In: International Conference Recent Advances in Natural Language Processing, RANLP, Vol. 2015-January, 01.01.2015, p. 507-511.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Classifying idiomatic and literal expressions using vector space representations

AU - Peng, Jing

AU - Feldman, Anna

AU - Jazmati, Hamza

PY - 2015/1/1

Y1 - 2015/1/1

N2 - We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that idioms and literal expressions occur in different contexts. Idioms tend to violate cohesive ties in local contexts, while literals are expected to fit in. Our goal is to capture this intuition using a vector representation of words. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. We provide experimental results validating the proposed techniques.

AB - We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that idioms and literal expressions occur in different contexts. Idioms tend to violate cohesive ties in local contexts, while literals are expected to fit in. Our goal is to capture this intuition using a vector representation of words. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. We provide experimental results validating the proposed techniques.

UR - http://www.scopus.com/inward/record.url?scp=84949797859&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84949797859

VL - 2015-January

SP - 507

EP - 511

JO - International Conference Recent Advances in Natural Language Processing, RANLP

JF - International Conference Recent Advances in Natural Language Processing, RANLP

SN - 1313-8502

ER -