TY - GEN
T1 - A Low-budget Tagger for Old Czech
AU - Hana, Jirka
AU - Feldman, Anna
AU - Aharodnik, Katsiaryna
N1 - Publisher Copyright:
© 2011 Proceedings of the Annual Meeting of the Association for Computational Linguistics. All rights reserved.
PY - 2011
Y1 - 2011
N2 - The paper describes a tagger for Old Czech (1200-1500 AD), a fusional language with rich morphology. The practical restrictions (no native speakers, limited corpora and lexicons, limited funding) make Old Czech an ideal candidate for a resource-light crosslingual method that we have been developing (e.g. Hana et al., 2004; Feldman and Hana, 2010). We use a traditional supervised tagger. However, instead of spending years of effort to create a large annotated corpus of Old Czech, we approximate it by a corpus of Modern Czech. We perform a series of simple transformations to make a modern text look more like a text in Old Czech and vice versa. We also use a resource-light morphological analyzer to provide candidate tags. The results are worse than the results of traditional taggers, but the amount of language-specific work needed is minimal.
AB - The paper describes a tagger for Old Czech (1200-1500 AD), a fusional language with rich morphology. The practical restrictions (no native speakers, limited corpora and lexicons, limited funding) make Old Czech an ideal candidate for a resource-light crosslingual method that we have been developing (e.g. Hana et al., 2004; Feldman and Hana, 2010). We use a traditional supervised tagger. However, instead of spending years of effort to create a large annotated corpus of Old Czech, we approximate it by a corpus of Modern Czech. We perform a series of simple transformations to make a modern text look more like a text in Old Czech and vice versa. We also use a resource-light morphological analyzer to provide candidate tags. The results are worse than the results of traditional taggers, but the amount of language-specific work needed is minimal.
UR - http://www.scopus.com/inward/record.url?scp=84867284251&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84867284251
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 10
EP - 18
BT - Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH 2011 at the 49th Annual Meeting of the Association for Computational Linguistics
A2 - Zervanou, Kalliopi
A2 - Lendvai, Piroska
PB - Association for Computational Linguistics (ACL)
T2 - 5th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH 2011 at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011
Y2 - 24 June 2011
ER -