Tagging Portuguese with a Spanish Tagger Using Cognates

Jirka Hana, Anna Feldman, Chris Brew, Luiz Amaral

Research output: Contribution to conferencePaperpeer-review

13 Scopus citations

Abstract

We describe a knowledge and resource light system for an automatic morphological analysis and tagging of Brazilian Portuguese.1 We avoid the use of labor intensive resources; particularly, large annotated corpora and lexicons. Instead, we use (i) an annotated corpus of Peninsular Spanish, a language related to Portuguese, (ii) an unannotated corpus of Portuguese, (iii) a description of Portuguese morphology on the level of a basic grammar book. We extend the similar work that we have done (Hana et al., 2004; Feldman et al., 2006) by proposing an alternative algorithm for cognate transfer that effectively projects the Spanish emission probabilities into Portuguese. Our experiments use minimal new human effort and show 21% error reduction over even emissions on a fine-grained tagset.

Original languageEnglish
Pages33-40
Number of pages8
StatePublished - 2006
Event2006 International Workshop on Cross-Language Knowledge Induction - Trento, Italy
Duration: 3 Apr 2006 → …

Conference

Conference2006 International Workshop on Cross-Language Knowledge Induction
Country/TerritoryItaly
CityTrento
Period3/04/06 → …

Fingerprint

Dive into the research topics of 'Tagging Portuguese with a Spanish Tagger Using Cognates'. Together they form a unique fingerprint.

Cite this