Turkish Delights: a Dataset on Turkish Euphemisms

Hasan Can Biyik, Patrick Lee, Anna Feldman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Euphemisms are a form of figurative language relatively understudied in natural language processing. This research extends the current computational work on potentially euphemistic terms (PETs) to Turkish. We introduce the Turkish PET dataset, the first available of its kind in the field. By creating a list of euphemisms in Turkish, collecting example contexts, and annotating them, we provide both euphemistic and non-euphemistic examples of PETs in Turkish. We describe the dataset and methodologies, and also experiment with transformer-based models on Turkish euphemism detection by using our dataset for binary classification. We compare performances across models using F1, accuracy, and precision as evaluation metrics.

Original languageEnglish
Title of host publicationSIGTURK 2024 - 1st Workshop on Natural Language Processing for Turkic Languages, Proceedings of the Workshop
EditorsDuygu Ataman, Mehmet Oguz Derin, Sardana Ivanova, Abdullatif Koksal, Jonne Saleva, Deniz Zeyrek
PublisherAssociation for Computational Linguistics (ACL)
Pages71-80
Number of pages10
ISBN (Electronic)9798891761407
StatePublished - 2024
Event1st Workshop on Natural Language Processing for Turkic Languages, SIGTURK 2024 - Hybrid, Bangkok, Thailand
Duration: 15 Aug 2024 → …

Publication series

NameSIGTURK 2024 - 1st Workshop on Natural Language Processing for Turkic Languages, Proceedings of the Workshop

Conference

Conference1st Workshop on Natural Language Processing for Turkic Languages, SIGTURK 2024
Country/TerritoryThailand
CityHybrid, Bangkok
Period15/08/24 → …

Fingerprint

Dive into the research topics of 'Turkish Delights: a Dataset on Turkish Euphemisms'. Together they form a unique fingerprint.

Cite this