TY - GEN
T1 - Automatic detection of idiomatic clauses
AU - Feldman, Anna
AU - Peng, Jing
PY - 2013
Y1 - 2013
N2 - We describe several experiments whose goal is to automatically identify idiomatic expressions in written text. We explore two approaches for the task: 1) idiom recognition as outlier detection; and 2) supervised classification of sentences. We apply principal component analysis for outlier detection. Detecting idioms as lexical outliers does not exploit class label information. So, in the following experiments, we use linear discriminant analysis to obtain a discriminant subspace and later use the three nearest neighbor classifier to obtain accuracy. We discuss pros and cons of each approach. All the approaches are more general than the previous algorithms for idiom detection - neither do they rely on target idiom types, lexicons, or large manually annotated corpora, nor do they limit the search space by a particular type of linguistic construction.
AB - We describe several experiments whose goal is to automatically identify idiomatic expressions in written text. We explore two approaches for the task: 1) idiom recognition as outlier detection; and 2) supervised classification of sentences. We apply principal component analysis for outlier detection. Detecting idioms as lexical outliers does not exploit class label information. So, in the following experiments, we use linear discriminant analysis to obtain a discriminant subspace and later use the three nearest neighbor classifier to obtain accuracy. We discuss pros and cons of each approach. All the approaches are more general than the previous algorithms for idiom detection - neither do they rely on target idiom types, lexicons, or large manually annotated corpora, nor do they limit the search space by a particular type of linguistic construction.
UR - http://www.scopus.com/inward/record.url?scp=84875491615&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-37247-6_35
DO - 10.1007/978-3-642-37247-6_35
M3 - Conference contribution
AN - SCOPUS:84875491615
SN - 9783642372469
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 435
EP - 446
BT - Computational Linguistics and Intelligent Text Processing - 14th International Conference, CICLing 2013, Proceedings
T2 - 14th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2013
Y2 - 24 March 2013 through 30 March 2013
ER -