Idioms: Humans or machines, it’s all about context

Manali Pradhan, Jing Peng, Anna Feldman, Bianca Wright

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Expressions can be ambiguous between idiomatic and literal interpretation depending on the context they occur in (“sales hit the roof” vs “hit the roof of the car”). Previous studies suggest that idiomaticity is not a binary property, but rather a continuum or the so-called “scalar phenomenon” ranging from completely literal to highly idiomatic. This paper reports the results of an experiment in which human annotators rank idiomatic expressions in context on a scale from 1 (literal) to 4 (highly idiomatic). Our experiment supports the hypothesis that idioms fall on a continuum and that one might differentiate between highly idiomatic, mildly idiomatic and weakly idiomatic expressions. In addition, we measure the relative idiomaticity of 11 idiomatic types and compute the correlation between the relative idiomaticity of an expression and the performance of various automatic models for idiom detection. We show that our model, based on the distributional semantics ideas, not only outperforms the previous models, but also positively correlates with the human judgements, which suggests that we are moving in the right direction toward automatic idiom detection.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers
EditorsAlexander Gelbukh
PublisherSpringer Verlag
Pages291-304
Number of pages14
ISBN (Print)9783319771120
DOIs
StatePublished - 2018
Event18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017 - Budapest, Hungary
Duration: 17 Apr 201723 Apr 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10761 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017
Country/TerritoryHungary
CityBudapest
Period17/04/1723/04/17

Fingerprint

Dive into the research topics of 'Idioms: Humans or machines, it’s all about context'. Together they form a unique fingerprint.

Cite this