A Low-budget Tagger for Old Czech

Jirka Hana, Anna Feldman, Katsiaryna Aharodnik

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

The paper describes a tagger for Old Czech (1200-1500 AD), a fusional language with rich morphology. The practical restrictions (no native speakers, limited corpora and lexicons, limited funding) make Old Czech an ideal candidate for a resource-light crosslingual method that we have been developing (e.g. Hana et al., 2004; Feldman and Hana, 2010). We use a traditional supervised tagger. However, instead of spending years of effort to create a large annotated corpus of Old Czech, we approximate it by a corpus of Modern Czech. We perform a series of simple transformations to make a modern text look more like a text in Old Czech and vice versa. We also use a resource-light morphological analyzer to provide candidate tags. The results are worse than the results of traditional taggers, but the amount of language-specific work needed is minimal.

Original languageEnglish
Title of host publicationWorkshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH 2011 at the 49th Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, ACL-HLT 2011 - Proceedings
EditorsKalliopi Zervanou, Piroska Lendvai
PublisherAssociation for Computational Linguistics (ACL)
Pages10-18
Number of pages9
ISBN (Electronic)9781937284046
StatePublished - 2011
Event5th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH 2011 at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011 - Portland, United States
Duration: 24 Jun 2011 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference5th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH 2011 at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011
Country/TerritoryUnited States
CityPortland
Period24/06/11 → …

Fingerprint

Dive into the research topics of 'A Low-budget Tagger for Old Czech'. Together they form a unique fingerprint.

Cite this