Annotating an Arabic learner corpus for error

Ghazi Abuhakema, Reem Faraj, Anna Feldman, Eileen Fitzpatrick

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

This paper describes an ongoing project in which we are collecting a learner corpus of Arabic, developing a tagset for error annotation and performing Computer-aided Error Analysis (CEA) on the data. We adapted the French Interlanguage Database FRIDA tagset (Granger, 2003a) to the data. We chose FRIDA in order to follow a known standard and to see whether the changes needed to move from a French to an Arabic tagset would give us a measure of the distance between the two languages with respect to learner difficulty. The current collection of texts, which is constantly growing, contains intermediate and advanced-level student writings. We describe the need for such corpora, the learner data we have collected and the tagset we have developed. We also describe the error frequency distribution of both proficiency levels and the ongoing work.

Original languageEnglish
Title of host publicationProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PublisherEuropean Language Resources Association (ELRA)
Pages1347-1350
Number of pages4
ISBN (Electronic)2951740840, 9782951740846
StatePublished - 1 Jan 2008
Event6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, Morocco
Duration: 28 May 200830 May 2008

Publication series

NameProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

Other

Other6th International Conference on Language Resources and Evaluation, LREC 2008
CountryMorocco
CityMarrakech
Period28/05/0830/05/08

Fingerprint

frequency distribution
language
Learner Corpus
student
Error Analysis
Intermediate
Proficiency
Annotation
Language
Data Base
Student Writing
Interlanguage

Cite this

Abuhakema, G., Faraj, R., Feldman, A., & Fitzpatrick, E. (2008). Annotating an Arabic learner corpus for error. In Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (pp. 1347-1350). (Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008). European Language Resources Association (ELRA).
Abuhakema, Ghazi ; Faraj, Reem ; Feldman, Anna ; Fitzpatrick, Eileen. / Annotating an Arabic learner corpus for error. Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA), 2008. pp. 1347-1350 (Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008).
@inproceedings{2ad2755441e2469187e22350aa060ec0,
title = "Annotating an Arabic learner corpus for error",
abstract = "This paper describes an ongoing project in which we are collecting a learner corpus of Arabic, developing a tagset for error annotation and performing Computer-aided Error Analysis (CEA) on the data. We adapted the French Interlanguage Database FRIDA tagset (Granger, 2003a) to the data. We chose FRIDA in order to follow a known standard and to see whether the changes needed to move from a French to an Arabic tagset would give us a measure of the distance between the two languages with respect to learner difficulty. The current collection of texts, which is constantly growing, contains intermediate and advanced-level student writings. We describe the need for such corpora, the learner data we have collected and the tagset we have developed. We also describe the error frequency distribution of both proficiency levels and the ongoing work.",
author = "Ghazi Abuhakema and Reem Faraj and Anna Feldman and Eileen Fitzpatrick",
year = "2008",
month = "1",
day = "1",
language = "English",
series = "Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008",
publisher = "European Language Resources Association (ELRA)",
pages = "1347--1350",
booktitle = "Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008",

}

Abuhakema, G, Faraj, R, Feldman, A & Fitzpatrick, E 2008, Annotating an Arabic learner corpus for error. in Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008, European Language Resources Association (ELRA), pp. 1347-1350, 6th International Conference on Language Resources and Evaluation, LREC 2008, Marrakech, Morocco, 28/05/08.

Annotating an Arabic learner corpus for error. / Abuhakema, Ghazi; Faraj, Reem; Feldman, Anna; Fitzpatrick, Eileen.

Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA), 2008. p. 1347-1350 (Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Annotating an Arabic learner corpus for error

AU - Abuhakema, Ghazi

AU - Faraj, Reem

AU - Feldman, Anna

AU - Fitzpatrick, Eileen

PY - 2008/1/1

Y1 - 2008/1/1

N2 - This paper describes an ongoing project in which we are collecting a learner corpus of Arabic, developing a tagset for error annotation and performing Computer-aided Error Analysis (CEA) on the data. We adapted the French Interlanguage Database FRIDA tagset (Granger, 2003a) to the data. We chose FRIDA in order to follow a known standard and to see whether the changes needed to move from a French to an Arabic tagset would give us a measure of the distance between the two languages with respect to learner difficulty. The current collection of texts, which is constantly growing, contains intermediate and advanced-level student writings. We describe the need for such corpora, the learner data we have collected and the tagset we have developed. We also describe the error frequency distribution of both proficiency levels and the ongoing work.

AB - This paper describes an ongoing project in which we are collecting a learner corpus of Arabic, developing a tagset for error annotation and performing Computer-aided Error Analysis (CEA) on the data. We adapted the French Interlanguage Database FRIDA tagset (Granger, 2003a) to the data. We chose FRIDA in order to follow a known standard and to see whether the changes needed to move from a French to an Arabic tagset would give us a measure of the distance between the two languages with respect to learner difficulty. The current collection of texts, which is constantly growing, contains intermediate and advanced-level student writings. We describe the need for such corpora, the learner data we have collected and the tagset we have developed. We also describe the error frequency distribution of both proficiency levels and the ongoing work.

UR - http://www.scopus.com/inward/record.url?scp=85037170718&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85037170718

T3 - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

SP - 1347

EP - 1350

BT - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

PB - European Language Resources Association (ELRA)

ER -

Abuhakema G, Faraj R, Feldman A, Fitzpatrick E. Annotating an Arabic learner corpus for error. In Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA). 2008. p. 1347-1350. (Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008).