TY - GEN
T1 - Automatic idiom recognition with word embeddings
AU - Peng, Jing
AU - Feldman, Anna
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms and their literal counterparts do not appear in the same contexts. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement [8, 16, 24] and apply them to our data. We provide experimental results validating the proposed techniques.
AB - Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms and their literal counterparts do not appear in the same contexts. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement [8, 16, 24] and apply them to our data. We provide experimental results validating the proposed techniques.
KW - Distributional semantics
KW - Idiom recognition
KW - Vector space models
KW - Word embeddings
UR - http://www.scopus.com/inward/record.url?scp=85015145960&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-55209-5_2
DO - 10.1007/978-3-319-55209-5_2
M3 - Conference contribution
AN - SCOPUS:85015145960
SN - 9783319552088
T3 - Communications in Computer and Information Science
SP - 17
EP - 29
BT - Information Management and Big Data - 2nd Annual International Symposium, SIMBig 2015 and 3rd Annual International Symposium, SIMBig 2016, Revised Selected Papers
A2 - Lossio-Ventura, Juan Antonio
A2 - Alatrista-Salas, Hugo
PB - Springer Verlag
T2 - 3rd Annual International Symposium on Information Management and Big Data, SIMBig 2016
Y2 - 1 September 2016 through 3 September 2016
ER -