Abstract
We describe a knowledge and resource light system for an automatic morphological analysis and tagging of Brazilian Portuguese.1 We avoid the use of labor intensive resources; particularly, large annotated corpora and lexicons. Instead, we use (i) an annotated corpus of Peninsular Spanish, a language related to Portuguese, (ii) an unannotated corpus of Portuguese, (iii) a description of Portuguese morphology on the level of a basic grammar book. We extend the similar work that we have done (Hana et al., 2004; Feldman et al., 2006) by proposing an alternative algorithm for cognate transfer that effectively projects the Spanish emission probabilities into Portuguese. Our experiments use minimal new human effort and show 21% error reduction over even emissions on a fine-grained tagset.
Original language | English |
---|---|
Pages | 33-40 |
Number of pages | 8 |
State | Published - 2006 |
Event | 2006 International Workshop on Cross-Language Knowledge Induction - Trento, Italy Duration: 3 Apr 2006 → … |
Conference
Conference | 2006 International Workshop on Cross-Language Knowledge Induction |
---|---|
Country/Territory | Italy |
City | Trento |
Period | 3/04/06 → … |