Automatic Identification of Learners’ Language Background based on their Writing in Czech

Katsiaryna Aharodnik, Marco Chang, Anna Feldman, Jirka Hana

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

The goal of this study is to investigate whether learners’ written data in highly inflectional Czech can suggest a consistent set of clues for automatic identification of the learners’ L1 background. For our experiments, we use texts written by learners of Czech, which have been automatically and manually annotated for errors. We define two classes of learners: speakers of Indo-European languages and speakers of non-Indo-European languages. We use an SVM classifier to perform the binary classification. We show that non-content based features perform well on highly inflectional data. In particular, features reflecting errors in orthography are the most useful, yielding about 89% precision and the same recall. A detailed discussion of the best performing features is provided.

Original languageEnglish
Title of host publication6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference
EditorsRuslan Mitkov, Jong C. Park
PublisherAsian Federation of Natural Language Processing
Pages1428-1436
Number of pages9
ISBN (Electronic)9784990734800
StatePublished - 2013
Event6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Nagoya, Japan
Duration: 14 Oct 2013 → …

Publication series

Name6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference

Conference

Conference6th International Joint Conference on Natural Language Processing, IJCNLP 2013
Country/TerritoryJapan
CityNagoya
Period14/10/13 → …

Fingerprint

Dive into the research topics of 'Automatic Identification of Learners’ Language Background based on their Writing in Czech'. Together they form a unique fingerprint.

Cite this