TY - GEN
T1 - Experiments in cross-language morphological annotation transferz
AU - Feldman, Anna
AU - Hana, Jirka
AU - Brew, Chris
PY - 2006
Y1 - 2006
N2 - Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is available. Our paper describes experiments with Polish, Czech, and Russian. However, the method is not tied in any way to these languages. In all the experiments we use the TnT tagger ([3]), a second-order Markov model. Our approach assumes that the information acquired about one language can be used for processing a related language. We have found out that even breath-takingly naive things (such as approximating the Russian transitions by Czech and/or Polish and approximating the Russian emissions by (manually/automatically derived) Czech cognates) can lead to a significant improvement of the tagger's performance.
AB - Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is available. Our paper describes experiments with Polish, Czech, and Russian. However, the method is not tied in any way to these languages. In all the experiments we use the TnT tagger ([3]), a second-order Markov model. Our approach assumes that the information acquired about one language can be used for processing a related language. We have found out that even breath-takingly naive things (such as approximating the Russian transitions by Czech and/or Polish and approximating the Russian emissions by (manually/automatically derived) Czech cognates) can lead to a significant improvement of the tagger's performance.
UR - http://www.scopus.com/inward/record.url?scp=33745548153&partnerID=8YFLogxK
U2 - 10.1007/11671299_4
DO - 10.1007/11671299_4
M3 - Conference contribution
AN - SCOPUS:33745548153
SN - 3540322051
SN - 9783540322054
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 41
EP - 50
BT - Computational Linguistics and Intelligent Text Processing - 7th International Conference, CICLing 2006, Proceedings
T2 - 7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006
Y2 - 19 February 2006 through 25 February 2006
ER -