RI: Small: DaRE: Detection and Recognition of Euphemisms

Project Details

Description

To fully understand human language, machines need to be able to recognize and interpret expressions that contain hidden meanings. This project concentrates on euphemisms, mild or indirect phrases used in place of harsher or more offensive ones. Euphemisms are often used to mask profanity or refer to sensitive topics such as death, sex, religion, disability, or personal relationships in a polite way. People use euphemisms all the time, e.g., 'negative patient outcome', 'between jobs', 'financially fortunate', 'correctional facility','friendly fire', or 'sunshine unit'. Different cultures/languages use different euphemisms. Euphemisms change over time. Machines that process human language do not understand euphemisms yet. This project is devoted to making machines understand euphemisms in different languages, and therefore contributing to improving the capabilities of artificial intelligence. Additional benefits include interesting new generalizations about the nature of euphemisms and the training of a diverse cadre of undergraduate and graduate students in highly practical work on a difficult interdisciplinary problem. Montclair State University, a Hispanic Serving Institution, is known for its diverse student population and a large proportion of first-generation college students. Montclair State University puts great emphasis on justice and inclusivity in academia. This project is not an exception.Detecting and interpreting figurative language is a rapidly growing area in Natural Language Processing (NLP). Unfortunately, the processing of euphemisms is lacking in NLP thus far. The project addresses the following: 1) algorithm design for detecting and interpreting euphemisms, and 2) interpretability of black-box neural models by creating a series of new datasets and tasks that explore the embedding space of transformer language models for euphemism recognition. The key insights are 1) euphemistic expressions and their paraphrased counterparts differ in the strength of the sentiment they convey; 2) euphemistic and non-euphemistic interpretation is context-sensitive; 3) euphemisms are vaguer than the taboo expressions they substitute. The experiments test what linguistic properties of euphemisms the deep learning approaches capture and why. The algorithm developed can detect new euphemisms, not previously recorded in dictionaries, without human intervention. The computational work on euphemisms is important to further the understanding of how strategic use of language can bias people's perceptions of important and highly contentious actions and perhaps find ways how to de-bias language models. This work on euphemisms helps understand what topics are controversial or sensitive in a specific culture. Applying the algorithm to diachronic data and detecting the change in euphemism usage leads to a better understanding of culture changes. The corpora produced are useful for answering questions at the intersection of AI, NLP, linguistics, cultural anthropology, and social psychology. The range of languages provides a natural way of making interesting linguistic observations about euphemisms. Since euphemisms are a form of verbal behavior, finding a way to detect and interpret euphemisms automatically may lead to a better understanding of human behavior in general.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date1/01/2331/12/25

Funding

  • National Science Foundation: $564,126.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.