TY - GEN
T1 - Designing and evaluating a Russian tagset
AU - Sharoff, Serge
AU - Kopotev, Mikhail
AU - Erjavec, Tomaž
AU - Feldman, Anna
AU - Divjak, Dagmar
PY - 2008
Y1 - 2008
N2 - This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset and associated morphosyntactic specifications are based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 600 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set of tagging models and corpora that can be shared with other researchers.
AB - This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset and associated morphosyntactic specifications are based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 600 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set of tagging models and corpora that can be shared with other researchers.
UR - http://www.scopus.com/inward/record.url?scp=85021700467&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85021700467
T3 - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
SP - 279
EP - 285
BT - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PB - European Language Resources Association (ELRA)
T2 - 6th International Conference on Language Resources and Evaluation, LREC 2008
Y2 - 28 May 2008 through 30 May 2008
ER -