Idioms

Humans or machines, it’s all about context

Manali Pradhan, Jing Peng, Anna Feldman, Bianca Wright

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Abstract

Expressions can be ambiguous between idiomatic and literal interpretation depending on the context they occur in (“sales hit the roof” vs “hit the roof of the car”). Previous studies suggest that idiomaticity is not a binary property, but rather a continuum or the so-called “scalar phenomenon” ranging from completely literal to highly idiomatic. This paper reports the results of an experiment in which human annotators rank idiomatic expressions in context on a scale from 1 (literal) to 4 (highly idiomatic). Our experiment supports the hypothesis that idioms fall on a continuum and that one might differentiate between highly idiomatic, mildly idiomatic and weakly idiomatic expressions. In addition, we measure the relative idiomaticity of 11 idiomatic types and compute the correlation between the relative idiomaticity of an expression and the performance of various automatic models for idiom detection. We show that our model, based on the distributional semantics ideas, not only outperforms the previous models, but also positively correlates with the human judgements, which suggests that we are moving in the right direction toward automatic idiom detection.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers
EditorsAlexander Gelbukh
PublisherSpringer Verlag
Pages291-304
Number of pages14
ISBN (Print)9783319771120
DOIs
StatePublished - 1 Jan 2018
Event18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017 - Budapest, Hungary
Duration: 17 Apr 201723 Apr 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10761 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017
CountryHungary
CityBudapest
Period17/04/1723/04/17

Fingerprint

Roofs
Hits
Continuum
Sales
Railroad cars
Ambiguous
Differentiate
Experiments
Semantics
Correlate
Experiment
Scalar
Model-based
Binary
Human
Context
Model
Interpretation
Judgment

Cite this

Pradhan, M., Peng, J., Feldman, A., & Wright, B. (2018). Idioms: Humans or machines, it’s all about context. In A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers (pp. 291-304). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10761 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-77113-7_23
Pradhan, Manali ; Peng, Jing ; Feldman, Anna ; Wright, Bianca. / Idioms : Humans or machines, it’s all about context. Computational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers. editor / Alexander Gelbukh. Springer Verlag, 2018. pp. 291-304 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{00c27c7e99024b479fa1f75e150414fe,
title = "Idioms: Humans or machines, it’s all about context",
abstract = "Expressions can be ambiguous between idiomatic and literal interpretation depending on the context they occur in (“sales hit the roof” vs “hit the roof of the car”). Previous studies suggest that idiomaticity is not a binary property, but rather a continuum or the so-called “scalar phenomenon” ranging from completely literal to highly idiomatic. This paper reports the results of an experiment in which human annotators rank idiomatic expressions in context on a scale from 1 (literal) to 4 (highly idiomatic). Our experiment supports the hypothesis that idioms fall on a continuum and that one might differentiate between highly idiomatic, mildly idiomatic and weakly idiomatic expressions. In addition, we measure the relative idiomaticity of 11 idiomatic types and compute the correlation between the relative idiomaticity of an expression and the performance of various automatic models for idiom detection. We show that our model, based on the distributional semantics ideas, not only outperforms the previous models, but also positively correlates with the human judgements, which suggests that we are moving in the right direction toward automatic idiom detection.",
author = "Manali Pradhan and Jing Peng and Anna Feldman and Bianca Wright",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-319-77113-7_23",
language = "English",
isbn = "9783319771120",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "291--304",
editor = "Alexander Gelbukh",
booktitle = "Computational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers",

}

Pradhan, M, Peng, J, Feldman, A & Wright, B 2018, Idioms: Humans or machines, it’s all about context. in A Gelbukh (ed.), Computational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10761 LNCS, Springer Verlag, pp. 291-304, 18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017, Budapest, Hungary, 17/04/17. https://doi.org/10.1007/978-3-319-77113-7_23

Idioms : Humans or machines, it’s all about context. / Pradhan, Manali; Peng, Jing; Feldman, Anna; Wright, Bianca.

Computational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers. ed. / Alexander Gelbukh. Springer Verlag, 2018. p. 291-304 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10761 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

TY - GEN

T1 - Idioms

T2 - Humans or machines, it’s all about context

AU - Pradhan, Manali

AU - Peng, Jing

AU - Feldman, Anna

AU - Wright, Bianca

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Expressions can be ambiguous between idiomatic and literal interpretation depending on the context they occur in (“sales hit the roof” vs “hit the roof of the car”). Previous studies suggest that idiomaticity is not a binary property, but rather a continuum or the so-called “scalar phenomenon” ranging from completely literal to highly idiomatic. This paper reports the results of an experiment in which human annotators rank idiomatic expressions in context on a scale from 1 (literal) to 4 (highly idiomatic). Our experiment supports the hypothesis that idioms fall on a continuum and that one might differentiate between highly idiomatic, mildly idiomatic and weakly idiomatic expressions. In addition, we measure the relative idiomaticity of 11 idiomatic types and compute the correlation between the relative idiomaticity of an expression and the performance of various automatic models for idiom detection. We show that our model, based on the distributional semantics ideas, not only outperforms the previous models, but also positively correlates with the human judgements, which suggests that we are moving in the right direction toward automatic idiom detection.

AB - Expressions can be ambiguous between idiomatic and literal interpretation depending on the context they occur in (“sales hit the roof” vs “hit the roof of the car”). Previous studies suggest that idiomaticity is not a binary property, but rather a continuum or the so-called “scalar phenomenon” ranging from completely literal to highly idiomatic. This paper reports the results of an experiment in which human annotators rank idiomatic expressions in context on a scale from 1 (literal) to 4 (highly idiomatic). Our experiment supports the hypothesis that idioms fall on a continuum and that one might differentiate between highly idiomatic, mildly idiomatic and weakly idiomatic expressions. In addition, we measure the relative idiomaticity of 11 idiomatic types and compute the correlation between the relative idiomaticity of an expression and the performance of various automatic models for idiom detection. We show that our model, based on the distributional semantics ideas, not only outperforms the previous models, but also positively correlates with the human judgements, which suggests that we are moving in the right direction toward automatic idiom detection.

UR - http://www.scopus.com/inward/record.url?scp=85055438053&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-77113-7_23

DO - 10.1007/978-3-319-77113-7_23

M3 - Conference contribution

SN - 9783319771120

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 291

EP - 304

BT - Computational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers

A2 - Gelbukh, Alexander

PB - Springer Verlag

ER -

Pradhan M, Peng J, Feldman A, Wright B. Idioms: Humans or machines, it’s all about context. In Gelbukh A, editor, Computational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers. Springer Verlag. 2018. p. 291-304. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-77113-7_23