Experiments in cross-language morphological annotation transferz

Anna Feldman, Jirka Hana, Chris Brew

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is available. Our paper describes experiments with Polish, Czech, and Russian. However, the method is not tied in any way to these languages. In all the experiments we use the TnT tagger ([3]), a second-order Markov model. Our approach assumes that the information acquired about one language can be used for processing a related language. We have found out that even breath-takingly naive things (such as approximating the Russian transitions by Czech and/or Polish and approximating the Russian emissions by (manually/automatically derived) Czech cognates) can lead to a significant improvement of the tagger's performance.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 7th International Conference, CICLing 2006, Proceedings
Pages41-50
Number of pages10
DOIs
StatePublished - 7 Jul 2006
Event7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006 - Mexico City, Mexico
Duration: 19 Feb 200625 Feb 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3878 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006
CountryMexico
CityMexico City
Period19/02/0625/02/06

Fingerprint

Annotation
Textbooks
Experiment
Experiments
Processing
Second-order Model
Target
Language
Thing
Markov Model
Resources
Corpus

Cite this

Feldman, A., Hana, J., & Brew, C. (2006). Experiments in cross-language morphological annotation transferz. In Computational Linguistics and Intelligent Text Processing - 7th International Conference, CICLing 2006, Proceedings (pp. 41-50). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3878 LNCS). https://doi.org/10.1007/11671299_4
Feldman, Anna ; Hana, Jirka ; Brew, Chris. / Experiments in cross-language morphological annotation transferz. Computational Linguistics and Intelligent Text Processing - 7th International Conference, CICLing 2006, Proceedings. 2006. pp. 41-50 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{f362a217abf4426783f97bdca5ae4c0a,
title = "Experiments in cross-language morphological annotation transferz",
abstract = "Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is available. Our paper describes experiments with Polish, Czech, and Russian. However, the method is not tied in any way to these languages. In all the experiments we use the TnT tagger ([3]), a second-order Markov model. Our approach assumes that the information acquired about one language can be used for processing a related language. We have found out that even breath-takingly naive things (such as approximating the Russian transitions by Czech and/or Polish and approximating the Russian emissions by (manually/automatically derived) Czech cognates) can lead to a significant improvement of the tagger's performance.",
author = "Anna Feldman and Jirka Hana and Chris Brew",
year = "2006",
month = "7",
day = "7",
doi = "10.1007/11671299_4",
language = "English",
isbn = "3540322051",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "41--50",
booktitle = "Computational Linguistics and Intelligent Text Processing - 7th International Conference, CICLing 2006, Proceedings",

}

Feldman, A, Hana, J & Brew, C 2006, Experiments in cross-language morphological annotation transferz. in Computational Linguistics and Intelligent Text Processing - 7th International Conference, CICLing 2006, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3878 LNCS, pp. 41-50, 7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006, Mexico City, Mexico, 19/02/06. https://doi.org/10.1007/11671299_4

Experiments in cross-language morphological annotation transferz. / Feldman, Anna; Hana, Jirka; Brew, Chris.

Computational Linguistics and Intelligent Text Processing - 7th International Conference, CICLing 2006, Proceedings. 2006. p. 41-50 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3878 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Experiments in cross-language morphological annotation transferz

AU - Feldman, Anna

AU - Hana, Jirka

AU - Brew, Chris

PY - 2006/7/7

Y1 - 2006/7/7

N2 - Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is available. Our paper describes experiments with Polish, Czech, and Russian. However, the method is not tied in any way to these languages. In all the experiments we use the TnT tagger ([3]), a second-order Markov model. Our approach assumes that the information acquired about one language can be used for processing a related language. We have found out that even breath-takingly naive things (such as approximating the Russian transitions by Czech and/or Polish and approximating the Russian emissions by (manually/automatically derived) Czech cognates) can lead to a significant improvement of the tagger's performance.

AB - Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is available. Our paper describes experiments with Polish, Czech, and Russian. However, the method is not tied in any way to these languages. In all the experiments we use the TnT tagger ([3]), a second-order Markov model. Our approach assumes that the information acquired about one language can be used for processing a related language. We have found out that even breath-takingly naive things (such as approximating the Russian transitions by Czech and/or Polish and approximating the Russian emissions by (manually/automatically derived) Czech cognates) can lead to a significant improvement of the tagger's performance.

UR - http://www.scopus.com/inward/record.url?scp=33745548153&partnerID=8YFLogxK

U2 - 10.1007/11671299_4

DO - 10.1007/11671299_4

M3 - Conference contribution

AN - SCOPUS:33745548153

SN - 3540322051

SN - 9783540322054

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 41

EP - 50

BT - Computational Linguistics and Intelligent Text Processing - 7th International Conference, CICLing 2006, Proceedings

ER -

Feldman A, Hana J, Brew C. Experiments in cross-language morphological annotation transferz. In Computational Linguistics and Intelligent Text Processing - 7th International Conference, CICLing 2006, Proceedings. 2006. p. 41-50. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11671299_4