Due to two last minute cancellations, the workshop will end a little sooner than anticipated.
From the earliest days of CALL, researchers have envisioned language-learning applications capable of leveraging some system of linguistic intelligence—provided by natural language processing (NLP) tools and techniques—to generate individualized feedback and provide scaffolding support that adapts to the individual needs of the learner. Early attempts at achieving these lofty goals have been fraught with challenges and limitations, and the extent to which NLP-enhanced systems have impacted language instruction remains minimal (Meurers, 2013). Yet, to suggest that inquiry into intelligent CALL (ICALL) has run its course would be to fall victim to Amara's Law--allowing short-term limitations to persuade us to underestimate the impact of NLP in the long run. In the near future, access to big data will pour fuel on NLP-driven applications, necessitating ICALL’s place on the agenda of instructed SLA research and revealing its potential to contribute significantly to language teaching and learning. This presentation will explore the future of learner analytics and the critical role NLP might play in language instruction and research.
Because of their complex formats and structures, authentic texts have very long been considered inappropriate for NLP processing. Only after cleaning and cleans- ing the data, further processing could be performed. NLP tools as such destroy the formating information that can be helpful in a language learning context. This process often results in the production of texts lacking any resemblance to the original document. Thanks to the growing standardisation of text formats, especially for texts created on a regular basis, such as periodical publishing on the internet, NLP processing is now coming to terms with established XML formats, so that it will soon be possible to make a clear distinction between text content streams and formating streams. Only by reaping the benefits of the advantages of XML based formating, can NLP tools be used in a publishing workflow, whereby a text stream can be annotated, while keeping the original layout intact.
In this talk, we present the procedure and the tools we developed to create a non-destructive workflow, whereby XML documents are linguistically annotated and enriched with encyclopedic information for a selection of named entities, covering persons, places and organisations. The annotation and enrichment procedure, wrapped in a webservice pipeline, was developed as part of the iRead+ project, that aimed to enhance the reading experience on tablet computers for three types of readers: the general reader, the language learner and the struggling reader. The advantage of the NLP workflow resides in the fact that any well-formed XML source document can be annotated and enriched, without loss of typographic formating information.
We will describe how the pipeline can be integrated within a language learning platform, and show in which way the NLP pipeline differs from similar tools developed for language learners, such as WERTi and Fern.
We should like to present an overview of our latest work on modifying a popular CALL application for studying Latin, Greek and Classical Arabic into a platform for automatically collecting data on individual differences and experimentally comparing the effectiveness of different ICALL methods.
Although not everything needed for the study of classical languages is of immediate relevance to the study of modern languages, these ancient languages have accumulated uniquely rich resources for evaluating a variety of ICALL approaches, including the use of annotated corpora for automatic evaluation and feedback. The availability of multiple aligned translations and complete syntactic analyses in dependency treebanks of all the major classical texts permits the automatic generation of learning games and automatic proficiency assessments for both vocabulary acquisition and morpho-syntactic competence. Such comprehensive resources are rarely present for modern languages, but their availability in the classical languages gives us a unique opportunity to evaluate their utility, and thus the benefit of developing comparable resources for other languages. The value of the classical languages for ICALL research is also increased by the relative standardization of the classics curriculum globally, where the student's first encounter with a given word or construction is likely to be in the same text, whether he or she is in Leipzig or Zagreb, Nebraska or Brazil.
In collaboration with leading SLA researchers including Brian MacWhinney at Carnegie Mellon and XML application experts including Jonathan Robie , we are accordingly adapting the widely used Alpheios platform to collect "big data" on how different kinds of users interact with the various tools the system provides, and how modifications of the tools themselves can affect proficiency outcomes among different groups of users.
We anticipate depositing raw data in open repositories with enough detail to permit replication.
Grammatical metaphors occupy a dominant place in academic writing (Halliday, 1985). It is one of the main qualities that distinguish academic writing from spoken language and other registers of writing. This importance warrants extensive investigation of how grammatical metaphors are used and the frequency by which they occur in published academic writing. For Second Language (L2) learners, familiarity with grammatical metaphors could help them in approximating the writing norms of their target academic domains. The study reported in the present paper seeks to contribute to our knowledge about grammatical metaphors by analyzing the patterns of grammatical metaphors use in native and non-native PhD dissertations. By employing computational techniques for analyzing text, this study was able to examine a corpus of 6 PhD dissertations (224158 tokens) from the discipline of Applied Linguistics. The computational tool developed for this study had an accuracy of 0.67, a precision of 0.56 and a recall of 0.94. Furthermore, the tool had 0.60 Inter-Rater Reliability (IRR) agreement with one human rater and 0.26 with a second human rater. The two human raters had 0.53 IRR agreement with each other. Results show differences between native and non-native dissertations in the frequency of grammatical metaphors per thousand word and the diversity of grammatical metaphor types.