Automatic idiom recognition with word embeddings

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations


Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms and their literal counterparts do not appear in the same contexts. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement [8, 16, 24] and apply them to our data. We provide experimental results validating the proposed techniques.

Original languageEnglish
Title of host publicationInformation Management and Big Data - 2nd Annual International Symposium, SIMBig 2015 and 3rd Annual International Symposium, SIMBig 2016, Revised Selected Papers
EditorsJuan Antonio Lossio-Ventura, Hugo Alatrista-Salas
PublisherSpringer Verlag
Number of pages13
ISBN (Print)9783319552088
StatePublished - 2017
Event3rd Annual International Symposium on Information Management and Big Data, SIMBig 2016 - Cusco, Peru
Duration: 1 Sep 20163 Sep 2016

Publication series

NameCommunications in Computer and Information Science
Volume656 CCIS
ISSN (Print)1865-0929


Other3rd Annual International Symposium on Information Management and Big Data, SIMBig 2016


  • Distributional semantics
  • Idiom recognition
  • Vector space models
  • Word embeddings


Dive into the research topics of 'Automatic idiom recognition with word embeddings'. Together they form a unique fingerprint.

Cite this