Member Login
E-mail:
Password:

Reset Password

 

 

Vol 26, No. 3 (May 2009)

[article | discuss (0) | print article]

Modifying Corpus Annotation to Support the Analysis of Learner Language

Markus Dickinson
Indiana University
Chong Min Lee
Georgetown University

Abstract:
A crucial question for automatically analyzing learner language is to determine which grammatical information is relevant and useful for learner feedback. Based on knowledge about how learner language varies in its grammatical properties, we propose a framework for reusing analyses found in corpus annotation and illustrate its applicability to Korean postpositional particles. Simple transformations of the corpus annotation allow one to quickly use state-of-the-art parsing methods.

Untitled

KEYWORDS

Korean Postpositional Particles, Learner Language, Dependency Parsing, Treebank Conversion

INTRODUCTION AND MOTIVATION

The goal of many intelligent computer-aided language learning (ICALL) systems is to provide intelligent feedback to learners on their language production (see Heift & Schulze, 2007), and the first step is generally to automatically assign a linguistic analysis to the sentence. This requires a grammatical description of appropriately or inappropriately used constructions. For situations such as describing subject-verb agreement, the grammatical model used is uncontroversial because most grammars have subjects and verbs. This is naturally not the case with all linguistic constructions, and one must carefully consider the appropriate representation for a language in order to support the analysis of learner constructions.

Many approaches to detecting ill-formed input develop a grammar as a part of the process of developing a parser (e.g., Vandeventer Faltin, 2003; Schneider & McCoy, 1998; Menzel & Schröder, 1999). While potentially effective, this is a time-consuming process, limiting the reusability of such methods. Furthermore, there is often a gap between these methods and state-of-the-art parsing methods (e.g., Charniak & Johnson, 2005; Nivre et al., 2007b; Petrov, Barrett, Thibaux, & Klein, 2006). These modern statistical parsers are generally fast, accurate, robust, and reusable. The mechanisms for parsing are separate from the grammar since the grammar is generally learned from a corpus containing syntactic annotation, that is, a treebank (e.g., see Abeillé, 2003). Given a treebank, one can quickly have a working parser. To the extent that corpus annotation can be used for a grammar model, statistical parsing has the potential to speed up the process of analyzing learner data.

However, is the grammatical annotation found in a corpus the most appropriate model to use for analyzing learner data? If not, how can we use the annotation in a fashion which supports analyzing learner language and providing intelligent feedback?

0x01 graphic

545

To answer these questions, we first need to make clear what we mean by 'supporting' the analysis of learner language. We want to automatically acquire a model of correct usage, which can be adjusted for learner language. The goal of this paper is thus not to perform error detection or diagnosis of any construction, but rather to provide a framework for obtaining grammatical models from annotated corpora that identify the relevant features for learner language. This viewpoint draws much from the notion of annotation-based processing (Amaral & Meurers, 2007) in which the analysis of learner data is a "process of enriching the learner input with annotations." This slightly different use of the term annotation is based on the same idea of applying linguistic analysis to data. In corpus annotation, the data are annotated by hand, whereas in annotation-based processing, annotation is automatic. The crucial issue is to determine which properties should be annotated, and in this paper we investigate what can be done when the annotation provided in a corpus does not match the annotation desired for automatically analyzing learner language.

To make these issues concrete, we select a language construction in need of automatic analysis for learners, namely Korean postpositional particles. These function similarly to prepositions in English and have correlates in Japanese. Such particles have clear pedagogical needs and thus are the focus of ICALL systems for Korean and Japanese (Dickinson, Eom, Kang, Lee, & Sachs, 2008; Nagata, 1995). Crucially, particles make up a significant portion of learner errors (Ko et al., 2004; Lee, Jang, & Seo, 2009), paralleling preposition errors made by ESL learners (Izumi, Uchimoto, Saiga, Supnithi, & Isahara, 2003; Tetreault & Chodorow, 2008). Thus, we need to determine how to automatically analyze Korean particles in learner language.

BACKGROUND: KOREAN PARTICLES

In Korean, postpositional particles are used to indicate grammatical functions, thematic roles, and the locations of people and objects, as in neun (topic) and i (subject) in (1) (Dickinson et al., 2008).1 In some ways, then, they are similar to English prepositions, but, whereas prepositions are limited in their role as markers of grammatical function (e.g., the dative to), Korean postpositions are wider in scope (similar to other languages, e.g., Basque [de Ilarraza, Gojenola, & Oronoz, 2008]), potentially marking the role of a word in a sentence, adding meaning to a word, or connecting a word to another word or to the whole discourse.

(1) Sumi-neun chaek-i pilyohae-yo

Sumi-top book-sbj need-polite

'Sumi needs a book.'

Since learners of Korean commonly omit a particle or substitute one particle for another (Lee et al., 2009), we might expect learners to make errors as in (2). The noun chaek 'book' must be marked with the subject particle -i/ka2 to indicate that the book is the subject that is needed, as in (1). However, English-speaking learners often use the object particle -eul/reul with the noun as in (2), wrongly suggesting that the verb pilyohaeyo 'need' is a transitive verb.

(2) *Sumi-neun chaek-eul pilyohae-yo

Sumi-top book-obj need-polite

'Sumi needs a book.'

546

It is clear that particles are difficult to acquire for learners (Ko et al., 2004). Korean locative particles mark distinctions that are not made in English, differentiating, for example, between the location of a static object versus the location of a dynamic activity. It is thus also no surprise that particles are also difficult to capture in linguistic theory (e.g., see Lee, 2004; Yoon, 2005).

Lee et al. (2009) and Ko et al. (2004) categorize particle errors by learners of Korean into six error types: omission, replacement, addition, malformation, paraphrasing, and spacing. With the exception of malformations (the wrong morphophonemic alternation) and spacing errors, these errors require contextual information to be detected. Of the remaining four types, paraphrasing errors are beyond the scope of most ICALL work (see, however Bailey & Meurers, 2008), and addition errors require a detailed analysis of complex particles (i.e., more than one particle stacked together). Thus, for this study, we focus on delineating the information needed to detect omission and replacement errors, which together make up over 60% of particle errors made by beginning learners (Lee et al.).

As mentioned above, particles often function as case markers, indicating nominative, accusative, or dative case, as in (3). For these particles, the relationship between the verb and the noun needs to be known.

(3) Sumi-ka Jisu-ege chaek-eul ju-ass-ta.

Sumi-sbj Jisu-dat book-obj give-past-decl

'Sumi gave Jisu a book.'

Another type of syntactic role that particles can indicate is that of modification. This is a situation in which particles function most similarly to prepositions, indicating the type of verbal activity, location of a noun, and so forth. As with prepositions (e.g., see Tetreault & Chodorow, 2008), this means that one needs to know specific lexical, syntactic, and semantic information about the verb and the noun.

Other particles mark connectives, indicate information about a speaker's intention, or add meaning to a sentence, such as the topic marker (see [1] above). Clearly, discourse information is needed for this type of particle (e.g., see Lee, Byron, & Jang, 2005; Hong, 2000). In this paper, we focus on what we refer to as syntactic postpositional particles, those expressing syntactic relations among words, including both argument and adjunct functions. In the section below on a case study of modifying corpus annotation, we fully outline the linguistic properties needed to analyze the usage of these particles in learner language.

USING ANNOTATED CORPORA

To analyze learner language, we could use prebuilt parsers for Korean (e.g., Chung, 2004; Seo, 1993), but these tools are designed for robust analysis and not for learner language. Ungrammatical data have been shown to be a problem for NLP technology used as a part of English learner language analysis (e.g., De Felice & Pulman, 2008; Lee & Knutsson, 2008), and we expect the same for Korean. We want more flexibility to adapt the systems and thus look to training technology from a grammatically annotated corpus.

A potential pitfall in using grammatical corpus annotation to analyze learner language is that the annotation may not be the most appropriate for the task at hand. For example, English corpora often lack agreement features, obviously important for analyzing learner

547

language. We address the general question of obtaining a grammar we want from the annotation we have in this section, and apply it to the specific case of Korean particles in the section on a case study of modifying corpus annotation.

Add Information

A general idea for parsing with treebanks is to extract extra information which is not explicitly encoded in the annotation. The first major way of doing this is to recover linguistic properties which are only implicitly annotated. The recovery of so-called latent annotation has been successfully employed to improve parsing by providing the parser with better information (see Klein & Manning, 2003; Pate & Meurers, 2007). For example, subject and object noun phrases (NPs) are not marked in the Penn Treebank (PTB, Marcus, Santorini, & Marcinkiewicz, 1993), but this distinction can be automatically recovered by including parent annotation in a label. In this case, subject NPs are reannotated as NPˆS, indicating that the parent is S, and object NPs are reannotated as NPˆVP.

Latent annotation is useful as information that goes into training the parser. In this case, it turns out that subject NPs (NPˆS) are more likely to expand as pronouns, and thus including latent annotation in the corpus that the parser trains on improves accuracy. Whichever properties are important for a final analysis can benefit from being included in training, thereby providing more accurate statistics.

Latent properties can also be recovered after parsing, that is, from the resulting tree. This is appropriate when introducing the distinction does not help--or actually hurts--accuracy. Such degradations arise because having more distinctions means we have fewer data about each individual property (see following section). Thus, latent annotation should be introduced into training only as needed.

The second major way to incorporate additional information into corpus annotation is to use one's intuition, by encoding hand-crafted linguistic generalizations (see Dickinson, 2006). For example, if a treebank lacks elements of agreement information, we can use knowledge about pronouns to add some. In (4a), for instance, the PTB tagset does not distinguish which type of personal pronouns (PRP) is used, but we can change the annotation to (4b) because He is always third person singular. This method works best when the amount of distinctions to be introduced is small, and their inclusion is highly reliable.

(4) a. He/PRP laughs/VBZ

b. He/PRP-3s laughs/VBZ

As another example, for determiner error detection, Nagata, Kawai, Morihiro, and Isu (2006) write rules to add mass/count noun distinctions to a corpus. Finally, one can use an external source to make the desired distinctions. For example, if the PTB tagset (Marcus et al., 1993) lacks a needed distinction between subordinating conjunction and preposition, one can also tag the data using the Brown corpus tagset (Kucera & Francis, 1967), which makes this distinction, and merge the results. For general methods across corpora and languages, this is less desirable because an additional resource may not always be available.

Remove Information

As adding information to the annotation can help, so too can removing information. The more

548

information in the annotation, the more data are needed to obtain accurate statistics about patterns in the annotation. Removing information, often in the form of collapsing distinctions, can result in more effective technology. Additionally, many linguistic properties are not predictive of others. For example, Hana, Feldman, and Brew (2004) demonstrate that verb tense in Russian does not predict noun case. Thus, for POS tagging, they train subtaggers in which each one contains only partial information and then merge the results for full tagging. Feature-based models are similar in that they examine only predictive information.

In a related situation, a model with less information can provide different patterns, filling in what another model may not have captured. For example, Metcalf and Boyd (2006) train two parsers: the first contains lexical information to capture individual verb subcategorization properties, and the other, less informative, model highlights more general verb subcategorization trends by not including lexical information. By comparing the output of these two models, they are able to identify verb subcategorization errors in the text.

A CASE STUDY OF MODIFYING CORPUS ANNOTATION

The Data

The data we use for our case study come from the Penn Korean Treebank (KTB), version 2.0 (C.-H. Han, N.-R. Han, Ko, & Palmer, 2002), a syntactically annotated corpus of 5,010 sentences (132,040 words) consisting of constituency (i.e., phrasal) annotation. In addition to basic constituents, the annotation also consists of function labels (e.g., subject [SBJ]). We use this popular corpus because we have easy access to it, and it is similar to other Penn corpora (for English, Chinese, and Arabic). One could also explore the Sejong Corpus (Kim, 2006), but the official version was not available when we started this work.

There are two points worth noting about the annotation of the KTB. First, due to Korean's complex morphology (as an agglutinative language), internal morphemes of a so-called eojeol 'word-phrase' are represented not as separate tokens but as bound morphemes, distinguished by using a plus sign between morphemes within a word. Even though each morphological unit is annotated, only a full word is available for the annotation of the tree. As learners will generally be assumed to write full words, we will follow the convention of using full words as syntactic units. Secondly, the treebank contains null elements, including traces and empty pronominals. Clearly, people do not write with such empty elements; thus, they have to be removed, as we describe in the section below on acquiring dependencies.

Modeling Correct Particle Usage

Turning to the question of how to tell whether a particle is being used appropriately, we have identified some main questions that need to be addressed by an analysis. The issue is whether these questions can be addressed directly by the annotation, and, if not, how they can be derived (see section below on recovering information from annotation).

Analysis versus annotation

The first question for which the annotation needs to have an answer is straightforward: What is the verb, and what are the surrounding NPs? This is directly available in the annotation, as we can see in an example like (5). The verb is given the label VV, and the surrounding noun phrases are annotated as NPs.3

549

0x01 graphic

Given that we have a verb and NPs, the next question is whether the annotation indicates which NPs depend upon which verb. This is partially available from the treebank: as can be seen in (5), the subject (SBJ) NP and the object (OBJ) NP are clearly within the projection of the verb. However, in this and in cases like (6) below, which verb is not clear: both humchi 'steal' and ka 'go' are annotated as VV. The annotation needs to mark the head so that the head verb is clear.

Additionally, the object NP in (6) is handled via the use of the empty element *T.* We need to discard such null elements in order to obtain only a surface string. Ideally, the verb will also be connected to the string acting as the object, namely, komunseo 'old document,' which one obtains by following the linking of traces as shown by the underlined indices.

0x01 graphic

Although it is useful to know simply which words are related, it is even more helpful for particle usage to know what type of relationship a verb and its dependent NPs share. Again, the information is only partially provided because the annotation scheme maintains somewhat coarse function labels. In (7), for instance, we have the relations SBJ and COMP, but COMP is a very general grammatical term for "NP complements that occur with adverbial postposition[s]" (C.-H. Han, N.-R. Han, Ko, 2001, p. 4) and is realizable by several kinds of particles. Thus, we have to find a way to insert more fine-grained information into the function labels.

0x01 graphic

In a related matter, is the particle annotation fine-grained enough to distinguish different uses that learners have to distinguish? Particles are attached to nouns as subword units, and their annotation is restricted to five different tags: PCA (case), PAD (adverbial), PAN (adnominal), PCJ (conjunctive), and PAU (auxiliary, including topics). Given the discussion above in the section on Korean particles, we restrict our attention only to the particles which focus on syntactic roles, namely, PCA, PAD, and PAN. These labels make important distinctions, such as between arguments and adjuncts (PCA vs. PAN/PAD) and as such should be included, but they are again not rich enough to provide feedback on usage. Consider (8) (where [8c] is a hypothetical example). Whether used correctly or not, all three locative markers (-e, -buteo, and -eso) are labeled PAD and form part of an NP-ADV. The label inventory does not distinguish

550

these cases the way it distinguishes SBJ from OBJ use, largely because these are lexical and semantic differences, an issue we return to in the section below on acquiring dependencies.4

0x01 graphic

Dependency structures

The annotation we have been describing as desirable is essentially dependency annotation, a common form of annotation to identify grammatical relations between words. An example is shown in (9), with arrows drawn from heads to dependents. Not surprisingly, dependencies have been argued to be appropriate for Korean and Japanese (e.g., Chung, 2004; Seo, 1993; Kudo & Matsumoto, 2000).

0x01 graphic

In fact, for detecting preposition errors in English, grammatical functions are among the most important features (De Felice & Pulman, 2007; Lee & Knutsson, 2008). Chodorow, Tetreault, and Han (2007) use information from the surrounding heads of noun and verb phrases and mention the need to distinguish argument and adjunct uses, all of which is captured in a dependency analysis. The advantage of using a full dependency structure, instead of simple context-based features (Tetreault & Chodorow, 2008) is that a parser has a better chance of accounting for word order variation. This is relevant in that Korean allows for relatively free word order--or scrambling (see Chung, 2004). Additionally, a dependency analysis should provide the relevant grammatical relations for feedback: once an error has been detected, the dependency relations can be consulted to see what type of function the particle has.

Recovering Information from Annotation

Acquiring dependencies

After analyzing Korean particles, we have determined that a dependency representation would be appropriate for learner language, but the treebank contains only constituencies.

551

The problem of converting constituency structures to (unlabeled) dependencies is not a new one, however, and such a conversion can be done once one knows what the heads of phrases are (e.g., see Collins, 1999; Nilsson & Hall, 2005). A list of so-called head rules indicate how to determine the head category of a phrase for a particular annotation scheme. To derive a list of such rules requires only a small amount of knowledge of the annotation scheme. The full list of head rules we use is given in the Appendix.

In addition, even though we have dependencies, we still need to remove empty elements in order to obtain only surface strings. We do this after extracting dependency relations, and this allows us to obtain dependencies between all and only actual words in the sentence. This process is straightforward, except for 68 sentences, where the trace is the head; we remove these sentences from the data.

Acquiring grammatical relations

In order to be able to provide relevant feedback to learners, we not only need to know which words are dependent upon which other words, but also the specific relationship they have. In other words, we need dependency labels. The most obvious starting point is simply to use the function labels that are included in the treebank on phrases, namely, SBJ, OBJ, COMP, and ADV.5 While these can be easily extracted, the set of syntactic function labels is too coarse to be able to say whether one particle is being used correctly (see [8] above). We expand the set of function labels by augmenting each relation with more specific information about the type of particle being used.

However, what kind of particle information can be included? On the one hand, we could add the particle's POS tag, to indicate more properties of each particle. But this information is not really any more fine grained than the current labels; for example, the difference between PCA and PAD is close to the difference between OBJ and ADV. On the other hand, we could include each particle name in the relation. This would put information about the type of relation into the label, for example, ADV-ege. Aside from its redundancy, with 59 particle types in the KTB (32 for PCA, PAD, and PAN), using individual particle names means we might not have enough data to obtain accurate statistics.

To approach these issues, we use two strategies: normalizing and thresholding. The intuition behind normalization is that some particles function in the same manner, and their selection relies on nonsyntactic factors such as morphophonemic alternations or pragmatic choice. Thus, we group particles into classes, using linguistic intuition, as shown in Table 1, and treat the class as a label. All other relations receive a generic label. We follow the conventions in the KTB, even though, for example, -europuteo could be considered a stacked particle and -ko could be considered a complementizer.

552

0x01 graphic

For PCA, we use the function labels SBJ and OBJ because most PCA particles can be replaced by -ka/i (SBJ) or -reul/eul (OBJ). Adverbial particles (PAD) are not easy to group, however, and thus we use a frequency restriction--or threshold--to focus on particles which appear over 50 times in the corpus, giving us 16 particles. This is similar to work on detecting errors in English prepositions in which a subset of them is selected to analyze (De Felice & Pulman, 2008; Tetreault & Chodorow, 2008; Gamon et al., 2008). Extending the method to rarer particles would likely require more data.

Removing Information from Annotation

With these divisions into new dependency labels, our parser can learn the general distribution of the types of particles that learners are attempting to use. However, there is massive redundancy in the labeling; EGE, for example, will be used whenever -ege is encountered. We can remove this redundancy in two ways: the label can return to being coarse grained, or the word token itself can be changed such that it no longer contains the particle. This latter option is what we want because particles are exactly what we expect learners to misuse. The particle they use (if any) may not match what was intended. Whether or not we actually include particles in the corpus, the labels we have should still predict the presence of a particular type of particle.

Thus, we create a second corpus to train from, namely, one which is identical to the dependency-annotated corpus but does not contain the particles of interest. Training a parser on this corpus allows us to capture what might have been meant by a learner because it is less influenced by the actually realized particles. In other words, this model captures the general relations between words, irrespective of which particle is actually used; this is akin to feature-based models which predict the correct preposition based on the surrounding context without using information about the preposition itself (see Tetreault & Chodorow, 2008; De Felice & Pulman, 2008).

The two models provide a different picture of the data, in some sense aiming to model both the learner's intentions (no particles) and their production (particles). This additional

553

model, while using less information, can supplement the original one in error detection and diagnosis, whether by examining mismatches or as features for a machine learner.

PARSING EXPERIMENTS

The two preceding sections on recovering information from annotation and removing information from annotation showed how the treebank alterations are appropriate for learner language. We now want to show that they are able to be accurately applied in an automatic way.

Dependency Parsing

We extract dependency relations from a sentence before training our parser for a number of reasons. The first is simply that since dependency parsers determine only word-word relations, they are very efficient (see Nivre, 2003; McDonald & Pereira, 2006) and can run in a real-time ICALL setting. Second, with methods devoted to multilingual dependency parsing (Buchholz & Marsi, 2006; Nivre et al., 2007a), using dependency parsing will better ensure a greater degree of applicability to new languages. A final point about training a parser specifically on the properties we intend to annotate is that learning is optimized for those distinctions. For example, if dependencies are desired for a language like Korean, then the parser can learn that word order is not as important a feature for determining the subject as much as a case marker or specific lexical items.

Evaluation

Experiment details

To evaluate the parser on the KTB, we use tenfold cross-validation: we run the parser 10 times, each time training on nine tenths of the corpus and testing on the remaining tenth. For our experiments, we use the gold standard POS tags found in the treebank; future work should incorporate POS tagging (e.g., Han & Palmer, 2004).

We run two different sets of experiments to gauge accuracy. The first is to evaluate the parser as a straight dependency parser: are we achieving reasonable accuracy on regular, in-domain language? As a subpart of this evaluation, we also see whether either of our two parsing models is able to correctly assign a head and a relation label for each word. This will tell us whether the two models are providing complementary information, and thus what the potential is for getting the dependency relation correct.

Our second evaluation is to create a small evaluation corpus from the treebank, consisting of 100 sentences with randomly inserted errors (see De Felice & Pulman, 2007). These 100 sentences were removed from the training data; in other words, the data sets are disjoint. With these 100 sentences, we created two sets of data, one with randomly selected substitutions and one with randomly selected omissions. This allows us to see how the parser works irrespective of other learner errors such as misspellings. With such a small data set, we must be careful in drawing too many conclusions, but it can at least demonstrate the potential of the methodology.

For all experiments, we use MaltParser (Nivre et al., 2007b), a freely available, state-of-the-art dependency parser, and we report unlabeled and labeled attachment scores (UAS,

554

LAS), that is, the percentage of words which are correctly attached to the appropriate head. Given that one may select a different parser or even perform the annotation modifications after parsing, the results we present are only indicative of general effectiveness.

Parsing results

We have two models, one with particles (Model 1) and one without (Model 2), using the same set of relation labels.6 As we can see in the first row of Table 2, Model 1 serves as an effective dependency parser, comparable to results for other languages (Nivre et al., 2007b).

0x01 graphic

Removing particles, perhaps unsurprisingly, results in a degradation in accuracy, down to 67.15% in labeled attachment. Crucially though, the model provides complementary information. If we compare the models by examining every word to see whether either of the two models correctly predicts the head and the dependency relation, we find that one model is correct 85.33% of the time (compare 67.15% and 81.77% LAS). This tells us the potential accuracy of using both models and confirms that they are capturing different and useful information.

In fact, examining the differences can provide an initial gauge on how the models can inform error detection. From the first experiments, we have 10 sets of training data, and so we can compare each of the 10 trained Model 1 models against their Model 2 counterparts on our 100-sentence evaluation data. When we compare the differences between the models on the evaluation set with substitution errors, we find the two models on average agree on 2,102 relations and disagree on 423. Most agreements (99%) are for correct usage, while 18.41% of disagreements are particle errors, identifying on average 78 of the 100 errors. This figure of 78% is what is most important to focus on now: even with this rather crude way of mismatching models, most of the errors are identified by discrepancies between the models. The results for the omission data set show similar trends, with even more errors detected. The models agree on 2,087 relations and disagree on 438, with 99.36% of agreements being correct usage and 19.81% of disagreements being incorrect, and we find 87% of the errors. We have thus outlined a way to identify where to suspect misusage in particle selection.

SUMMARY AND OUTLOOK

We have shown how the annotation found in a corpus can be adapted for situations requiring analysis of learner language. We examined the specific case of providing a parsing model to provide accurate information about Korean postpositional particles, but the methods of using more or less information are quite general. The ability to use models with differing information allows us to highlight cases which are more likely to be erroneous.

Given that the treebank we used contains newspaper data, a next step is to make the parser more aware of learner data. For statistical parsers, one can perform so-called domain

555

adaptation by self-training (McClosky, Charniak, & Johnson, 2006). We can retrain the parser by first running it on a diverse corpus of Korean (see Han, Chodorow, & Leacock, 2006, p. 5) and then retraining the parser on these trees. Interestingly, learning from even a small set of corrected learner sentences can improve performance (Nagata et al., 2006). Alongside this, we must test the methods on a real learner corpus, such as the one described by Lee et al. (2009).

Following that, more thorough error detection and diagnosis needs to be done, such as predicting which particle should have been used or extending the methodology to additional errors arising from stacked particles. The methodology of adapting corpus annotation described here could be used to provide features for machine learning methods (De Felice & Pulman, 2008; Tetreault & Chodorow, 2008; Gamon et al., 2008), rule-based methods (de Ilarraza et al., 2008; Eeg-Olofsson & Knutsson, 2003), or other error detection methods.

NOTES

1 For expository ease, we provide transliterations using the Revised Romanization of Korean. Abbreviations used are: top = topic, sbj = subject, obj = object, dat = dative, and decl = declarative.

2 The distinction between -ka and -i is a simple morphotactic one; likewise for -eul/reul.

3 The remaining examples are from the KTB, represented as bracketed structures; + marks a morpheme boundary, and / marks a POS label. For clarity, some annotation is left out.

4 Note that whichever method of error detection is used, it must allow for more than one correct particle in the same context, similar to English prepositions (see Tetreault & Chodorow, 2008).

5 We do not use the two other function tags, VOC (vocative) and LV (light verb): VOC only appears once, and LV can be collapsed into OBJ for our purposes.

6 An initial experiment for Model 1 with a reduced set of labels showed only marginal improvement.

REFERENCES

Abeillé, A. (Ed.). (2003). Treebanks: Building and using syntactically annotated corpora. Dordrecht: Kluwer Academic Publishers.

Amaral, L., & Meurers, D. (2007). Putting activity models in the driver's seat: Towards a demand-driven NLP architecture for ICALL. Paper presented at EUROCALL, University of Ulster, Coleraine, Northern Ireland.

Bailey, S., & Meurers, D. (2008). Diagnosing meaning errors in short answers to reading comprehension questions. In J. Tetreault, J. Burstein, & R. De Felice (Eds.), Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, held at ACL 2008 (pp. 107-115). Columbus, OH: Association for Computational Linguistics. Retrieved April 10, 2009, from http://aclweb.org/ anthology-new/W/W08/W08-0913.pdf

Buchholz, S., & Marsi, E. (2006). CoNLL-X shared task on multilingual dependency parsing. In L. Márquez & D. Klein (Eds.), Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X) (pp. 149-164). New York: Association for Computational Linguistics.

Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In K. Knight, H. T. Ng, & K. Oflazer (Eds.), Proceedings of ACL-05 (pp. 173-180). Ann Arbor, MI: Association for Computational Linguistics.

556

Chodorow, M., Tetreault, J., & Han, N. (2007). Detection of grammatical errors involving prepositions. In F. Costello, J. Kelleher, & M. Volk (Eds.), Proceedings of the 4th ACL-SIGSEM Workshop on Prepositions (pp. 25-30). Prague, Czech Republic: Association for Computational Linguistics.

Chung, H. (2004). Statistical Korean dependency parsing model based on the surface contextual information. Unpublished doctoral dissertation, Korea University, Seoul.

Collins, M. (1999). Head-driven statistical models for natural language parsing. Unpublished doctoral dissertation, University of Pennsylvania, Philadelphia, PA.

De Felice, R., & Pulman, S. (2007). Automatically acquiring models of preposition use. In F. Costello, J. Kelleher, & M. Volk (Eds.), Proceedings of the 4th ACL-SIGSEM Workshop on Prepositions (pp. 45-50). Prague, Czech Republic: Association for Computational Linguistics.

De Felice, R., & Pulman, S. (2008). A classifier-based approach to preposition and determiner error correction in L2 English. In D. Scott & H. Uszkoreit (Eds.), Proceedings of COLING-08 (pp. 169-176). Manchester, UK: Coling 2008 Organizing Committee.

de Ilarraza, A. D., Gojenola, K., & Oronoz, M. (2008). Detecting erroneous uses of complex postpositions in an agglutinative language. In D. Scott & H. Uszkoreit (Eds.), Proceedings of COLING-08 (pp. 31-34). Manchester, UK: Coling 2008 Organizing Committee.

Dickinson, M. (2006). Rule equivalence for error detection. In J. Hajič & J. Nivre, Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT 2006) (pp. 187-198). Prague, Czech Republic: Institue of Formal and Applied Linguistics.

Dickinson, M., Eom, S., Kang, Y., Lee, C. M., & Sachs, R. (2008). A balancing act: How can intelligent computer-generated feedback be provided in learner-to-learner interactions. Computer Assisted Language Learning, 21, 369-382.

Eeg-Olofsson, J., & Knutsson, O. (2003). Automatic grammar checking for second language learners--The use of prepositions. In E. Röognvaldsson (Ed.), Proceedings of Nodalida '03. Reykjavik, Iceland: Northern European Association for Language Technology.

Gamon, M., Gao, J., Brockett, C., Klementiev, A., Dolan, W., Belenko, D., et al. (2008). Using contextual speller techniques and language modeling for ESL error correction. In Y. Matsumoto & A. Copestake (Eds.), Proceedings of the International Joint Conference on Natural Language Processing (pp. 449-456). Hyderabad, India: Asian Federation of Natural Language Processing.

Han, C.-H., Han, N.-R., & Ko, E.-S. (2001). Bracketing guidelines for Penn Korean treebank (Technical report, IRCS). Philadelphia, PA: University of Pennsylvania.

Han, C.-H., Han, N.-R., Ko, E.-S., & Palmer, M. (2002). Development and evaluation of a Korean treebank and its application to NLP. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, A. Martin Municio, D. Tapias, et al. (Eds.), Proceedings of LREC-02 (pp. 1635-1642). Las Palmas, Canary Islands, Spain: European Language Resources Association.

Han, C.-H., & Palmer, M. (2004). A morphological tagger for Korean: Statistical tagging combined with corpus-based morphological rule application. Machine Translation, 18, 275-297.

Han, N.-R., Chodorow, M., & Leacock, C. (2006). Detecting errors in English article usage by non-native speakers. Natural Language Engineering, 12, 115-129.

Hana, J., Feldman, A., & Brew, C. (2004). A resource-light approach to Russian morphology: Tagging Russian using Czech resources. In D. Lin & D. Wu (Eds.), Proceedings of EMNLP-04 (pp. 222-229). Barcelona: Association for Computational Linguistics.

Heift, T., & Schulze, M. (2007). Errors and intelligence in computer-assisted language learning: Parsers and pedagogues. New York: Routledge.

Hong, M. (2000). Centering theory and argument deletion in spoken Korean. The Korean Journal of Cognitive Science, 11, 9-24.

557

Izumi, E., Uchimoto, K., Saiga, T., Supnithi, T., & Isahara, H. (2003). Automatic error detection in the Japanese learners' English spoken data. In E. W. Hinrichs & D. Roth (Eds.), Proceedings of ACL-03 (pp. 145-148). Sapporo, Japan: Association for Computational Linguistics.

Kim, H. (2006). Korean national corpus in the 21st century Sejong project. In Proceedings of the 13th NIJL International Symposium (pp. 49-54). Tokyo: National Institute for Japanese Language.

Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In E. W. Hinrichs & D. Roth (Eds.), Proceedings of ACL-03 (pp. 423-430). Sapporo, Japan: Association for Computational Linguistics.

Ko, S., Kim, M., Kim, J., Seo, S., Chung, H., & Han, S. (2004). An analysis of Korean learner corpora and errors. Seoul: Hankuk Publishing.

Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.

Kudo, T., & Matsumoto., Y. (2000). Japanese dependency analysis based on support vector machines. In H. Schütze & K.-Y. Su (Eds.), Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (pp. 18-25). Hong Kong: Association for Computational Linguistics.

Lee, J., & Knutsson, O. (2008). The role of PP attachment in preposition generation. In A. Gelbukh (Ed.), Proceedings of CICLing 2008, 9th International Conference on Intelligent Text Processing and Computational Linguistics (pp. 643-654). Haifa, Israel: Springer.

Lee, S.-H. (2004). Case markers and thematic roles. Seoul: Hankuk Publishing.

Lee, S.-H., Byron, D. K., & Jang, S. B. (2005). Why is zero marking important in Korean? In R. Dale, K.-F. Wong, J. Su, & O. Y. Kwong (Eds.), Proceedings of IJCNLP-05 (pp. 588-599). Jeju Island, Korea: Springer.

Lee, S.-H., Jang, S. B., & Seo, S. K. (2009). Annotation of Korean learner corpora for particle error detection. CALICO Journal, 26, 529-544.

Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19, 313-330.

McClosky, D., Charniak, E., & Johnson, M. (2006). Reranking and self-training for parser adaptation. In N. Calzolari, C. Cardie, & P. Isabelle (Eds.), Proceedings of COLING-ACL-06 (pp. 337-344). Sydney, Australia: Association for Computational Linguistics.

McDonald, R., & Pereira, F. (2006). Online learning of approximate dependency parsing algorithms. In D. McCarthy & S. Wintner (Eds.), Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL) (pp. 81-88). Trento, Italy: Association for Computational Linguistics. Retrieved April 13, 2009, from http://aclweb.org/anthology/E06-1011

Menzel, W., & Schröder, I. (1999). Error diagnosis for language learning systems. ReCALL, 11, 20-30.

Metcalf, V., & Boyd, A. (2006). Head-lexicalized PCFGs for verb subcategorization error diagnosis in ICALL. In Workshop on Interfaces of Intelligent Computer-Assisted Language Learning. Columbus, OH.

Nagata, N. (1995). An effective application of natural language processing in second language instruction. CALICO Journal, 13, 47-67.

Nagata, R., Kawai, A., Morihiro, K., & Isu, N. (2006). A feedback-augmented method for detecting errors in the writing of learners of English. In C. Cardie & P. Isabelle (Eds.), Proceedings of the International Conference on Computational Linguistics and Meeting of the Association for Computational Linguistics (pp. 241-248). Sydney, Australia: Association for Computational Linguistics.

Nilsson, J., & Hall, J. (2005). Reconstruction of the Swedish treebank Talbanken (MSI report 05067). Växjö, Sweden: Växjö University, School of Mathematics and Systems Engineering.

558

Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In H. Bunt (Ed.), Proceedings of the 8th International Workshop on Parsing Technologies (IWPT 03) (pp. 149-160). Nancy, France: Association for Computational Linguistics.

Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., et al. (2007a). The CoNLL 2007 shared task on dependency parsing. In J. Eisner (Ed.), Proceedings of EMNLP-CoNLL 2007 (pp. 915-932). Prague, Czech Republic: Association for Computational Linguistics.

Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S. et al. (2007b). MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13, 95-135.

Pate, J., & Meurers, D. (2007). Refining syntactic categories using local contexts--Experiments in unlexicalized PCFG parsing. In S. Kübler, J. Hajič, & K. De Smedt (Eds.), Proceedings of the Sixth Workshop on Treebanks and Linguistic Theories (TLT 2007) (pp. 103-114). Bergen, Norway: Northern European Association for Language Technology.

Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. In N. Calzolari, C. Cardie, & P. Isabelle (Eds.), Proceedings of COLING-ACL-06 (pp. 433-440). Sydney, Australia: Association for Computational Linguistics.

Schneider, D., & McCoy, K. (1998). Recognizing syntactic errors in the writing of second language learners. In C. Boitet & P. Whitelock (Eds.), Proceedings of the Meeting of the Association for Computational Linguistics (pp. 1198-1204). Montreal, Canada: Association for Computational Linguistics.

Seo, K.-J, (1993). A Korean language parser using syntactic dependency relations between word-phrases. Unpublished master's thesis, Korea Advanced Institute of Science and Technology, Daejeon, Korea.

Tetreault, J., & Chodorow, M. (2008). The ups and downs of preposition error detection in ESL writing. In D. Scott & H. Uszkoreit (Eds.), Proceedings of COLING-08 (pp. 865-872). Manchester, UK: Coling 2008 Organizing Committee.

Vandeventer Faltin, A. (2003). Syntactic error diagnosis in the context of computer assisted language learning. Unpublished doctoral dissertation, Université de Genève, Geneva, Switzerland.

Yoon, J. H. (2005). Non-morphological determination of nominal particle ordering in Korean. In L. Heggie & F. Ordonez (Eds.), Clitic and affix combinations: Theoretical perspectives (pp. 239-282). Amsterdam: John Benjamins.

559

APPENDIX

Head rules

The head rules in Table 3 work as follows: within a rule, we find the leftmost or rightmost element which is a possible head. For S → NP VP, for example, VP is the rightmost possible head and thus is the head.

0x01 graphic

ACKNOWLEDGMENTS

We would like to thank Sun-Hee Lee, Seok Bae Jang, Soojeong Eom, and Sandra Kübler for their comments at various stages of this work and the anonymous reviewer for his/her helpful suggestions.

AUTHORS' BIODATA

Markus Dickinson is an assistant professor in the Department of Linguistics at Indiana University, specializing in computational linguistics. His research interests include exploring the intersection of corpus annotation and linguistic processing, especially the detection of annotation errors, and research into the automatic analysis of learner language, especially for intelligent computer-assisted language learning applications.

Chong Min Lee is a Ph.D. student in the Department of Linguistics at Georgetown University, specializing in computational linguistics. His research interests are temporal information processing and automatic analysis of learner language. His dissertation will be on the topic of constructing the temporal structure of a document, using temporal relations identified with machine learning methods. An additional avenue of his research is the automatic analysis of Korean learner language.

560

AUTHORS' ADDRESSES

Department of Linguistics

Indiana University

317 Memorial Hall

1021 E. Third St.

Bloomington, IN 47405

Phone: 812 856 2535

Fax: 812 855 5363

Email: md7@indiana.edu

Chong Min Lee

Department of Linguistics

Georgetown University

ICC 479

37th and O Streets, NW

Washington, DC 20057-1051

Phone: 202 687 5956

Fax: 202 687 6174

Email: cml54@georgetown.edu

561