In God we trust. All others must bring data. - W. Edwards Deming Using word embeddings to recognize idioms

Research output: Contribution to journalConference articleResearchpeer-review

Abstract

Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms violate cohesive ties in local contexts, while literal expressions do not. We propose two approaches: 1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement Fazly et al. (2009)'s, Sporleder and Li (2009)'s, and Li and Sporleder (2010b)'s methods and apply them to our data. We provide experimental results validating the proposed techniques.

Original languageEnglish
Pages (from-to)96-102
Number of pages7
JournalCEUR Workshop Proceedings
Volume1743
StatePublished - 1 Jan 2016
Event3rd Annual International Symposium on Information Management and Big Data, SIMBig 2016 - Cusco, Peru
Duration: 1 Sep 20163 Sep 2016

Fingerprint

Vector spaces
Covariance matrix
Fires
Processing

Cite this

@article{b08d7e64cb3747d19cdae747955de65f,
title = "In God we trust. All others must bring data. - W. Edwards Deming Using word embeddings to recognize idioms",
abstract = "Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms violate cohesive ties in local contexts, while literal expressions do not. We propose two approaches: 1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement Fazly et al. (2009)'s, Sporleder and Li (2009)'s, and Li and Sporleder (2010b)'s methods and apply them to our data. We provide experimental results validating the proposed techniques.",
author = "Jing Peng and Anna Feldman",
year = "2016",
month = "1",
day = "1",
language = "English",
volume = "1743",
pages = "96--102",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",

}

In God we trust. All others must bring data. - W. Edwards Deming Using word embeddings to recognize idioms. / Peng, Jing; Feldman, Anna.

In: CEUR Workshop Proceedings, Vol. 1743, 01.01.2016, p. 96-102.

Research output: Contribution to journalConference articleResearchpeer-review

TY - JOUR

T1 - In God we trust. All others must bring data. - W. Edwards Deming Using word embeddings to recognize idioms

AU - Peng, Jing

AU - Feldman, Anna

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms violate cohesive ties in local contexts, while literal expressions do not. We propose two approaches: 1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement Fazly et al. (2009)'s, Sporleder and Li (2009)'s, and Li and Sporleder (2010b)'s methods and apply them to our data. We provide experimental results validating the proposed techniques.

AB - Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms violate cohesive ties in local contexts, while literal expressions do not. We propose two approaches: 1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement Fazly et al. (2009)'s, Sporleder and Li (2009)'s, and Li and Sporleder (2010b)'s methods and apply them to our data. We provide experimental results validating the proposed techniques.

UR - http://www.scopus.com/inward/record.url?scp=85006132043&partnerID=8YFLogxK

M3 - Conference article

VL - 1743

SP - 96

EP - 102

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -