Automatic idiom recognition with word embeddings

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms and their literal counterparts do not appear in the same contexts. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement [8, 16, 24] and apply them to our data. We provide experimental results validating the proposed techniques.

Original languageEnglish
Title of host publicationInformation Management and Big Data - 2nd Annual International Symposium, SIMBig 2015 and 3rd Annual International Symposium, SIMBig 2016, Revised Selected Papers
EditorsJuan Antonio Lossio-Ventura, Hugo Alatrista-Salas
PublisherSpringer Verlag
Pages17-29
Number of pages13
ISBN (Print)9783319552088
DOIs
StatePublished - 1 Jan 2017
Event3rd Annual International Symposium on Information Management and Big Data, SIMBig 2016 - Cusco, Peru
Duration: 1 Sep 20163 Sep 2016

Publication series

NameCommunications in Computer and Information Science
Volume656 CCIS
ISSN (Print)1865-0929

Other

Other3rd Annual International Symposium on Information Management and Big Data, SIMBig 2016
CountryPeru
CityCusco
Period1/09/163/09/16

Fingerprint

Vector spaces
Covariance matrix
Scatter
Scalar, inner or dot product
Fires
Frobenius norm
Processing
Context
Natural Language
Vector space
Predict
Target
Experimental Results

Keywords

  • Distributional semantics
  • Idiom recognition
  • Vector space models
  • Word embeddings

Cite this

Peng, J., & Feldman, A. (2017). Automatic idiom recognition with word embeddings. In J. A. Lossio-Ventura, & H. Alatrista-Salas (Eds.), Information Management and Big Data - 2nd Annual International Symposium, SIMBig 2015 and 3rd Annual International Symposium, SIMBig 2016, Revised Selected Papers (pp. 17-29). (Communications in Computer and Information Science; Vol. 656 CCIS). Springer Verlag. https://doi.org/10.1007/978-3-319-55209-5_2
Peng, Jing ; Feldman, Anna. / Automatic idiom recognition with word embeddings. Information Management and Big Data - 2nd Annual International Symposium, SIMBig 2015 and 3rd Annual International Symposium, SIMBig 2016, Revised Selected Papers. editor / Juan Antonio Lossio-Ventura ; Hugo Alatrista-Salas. Springer Verlag, 2017. pp. 17-29 (Communications in Computer and Information Science).
@inproceedings{15bccd92100c4496bead6f93e302b1b9,
title = "Automatic idiom recognition with word embeddings",
abstract = "Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms and their literal counterparts do not appear in the same contexts. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement [8, 16, 24] and apply them to our data. We provide experimental results validating the proposed techniques.",
keywords = "Distributional semantics, Idiom recognition, Vector space models, Word embeddings",
author = "Jing Peng and Anna Feldman",
year = "2017",
month = "1",
day = "1",
doi = "10.1007/978-3-319-55209-5_2",
language = "English",
isbn = "9783319552088",
series = "Communications in Computer and Information Science",
publisher = "Springer Verlag",
pages = "17--29",
editor = "Lossio-Ventura, {Juan Antonio} and Hugo Alatrista-Salas",
booktitle = "Information Management and Big Data - 2nd Annual International Symposium, SIMBig 2015 and 3rd Annual International Symposium, SIMBig 2016, Revised Selected Papers",

}

Peng, J & Feldman, A 2017, Automatic idiom recognition with word embeddings. in JA Lossio-Ventura & H Alatrista-Salas (eds), Information Management and Big Data - 2nd Annual International Symposium, SIMBig 2015 and 3rd Annual International Symposium, SIMBig 2016, Revised Selected Papers. Communications in Computer and Information Science, vol. 656 CCIS, Springer Verlag, pp. 17-29, 3rd Annual International Symposium on Information Management and Big Data, SIMBig 2016, Cusco, Peru, 1/09/16. https://doi.org/10.1007/978-3-319-55209-5_2

Automatic idiom recognition with word embeddings. / Peng, Jing; Feldman, Anna.

Information Management and Big Data - 2nd Annual International Symposium, SIMBig 2015 and 3rd Annual International Symposium, SIMBig 2016, Revised Selected Papers. ed. / Juan Antonio Lossio-Ventura; Hugo Alatrista-Salas. Springer Verlag, 2017. p. 17-29 (Communications in Computer and Information Science; Vol. 656 CCIS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Automatic idiom recognition with word embeddings

AU - Peng, Jing

AU - Feldman, Anna

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms and their literal counterparts do not appear in the same contexts. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement [8, 16, 24] and apply them to our data. We provide experimental results validating the proposed techniques.

AB - Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms and their literal counterparts do not appear in the same contexts. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement [8, 16, 24] and apply them to our data. We provide experimental results validating the proposed techniques.

KW - Distributional semantics

KW - Idiom recognition

KW - Vector space models

KW - Word embeddings

UR - http://www.scopus.com/inward/record.url?scp=85015145960&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-55209-5_2

DO - 10.1007/978-3-319-55209-5_2

M3 - Conference contribution

AN - SCOPUS:85015145960

SN - 9783319552088

T3 - Communications in Computer and Information Science

SP - 17

EP - 29

BT - Information Management and Big Data - 2nd Annual International Symposium, SIMBig 2015 and 3rd Annual International Symposium, SIMBig 2016, Revised Selected Papers

A2 - Lossio-Ventura, Juan Antonio

A2 - Alatrista-Salas, Hugo

PB - Springer Verlag

ER -

Peng J, Feldman A. Automatic idiom recognition with word embeddings. In Lossio-Ventura JA, Alatrista-Salas H, editors, Information Management and Big Data - 2nd Annual International Symposium, SIMBig 2015 and 3rd Annual International Symposium, SIMBig 2016, Revised Selected Papers. Springer Verlag. 2017. p. 17-29. (Communications in Computer and Information Science). https://doi.org/10.1007/978-3-319-55209-5_2