Learner corpora

from Wikipedia, the free encyclopedia

Learner corpora are large computerized collections of written and / or spoken texts ( text corpora ) that have been produced by learners of a second or foreign language and that can be analyzed with the help of special software. Learner corpora are used in foreign language didactics, for example, to create and evaluate error typologies for certain groups of language learners and to align teaching methods accordingly. Learner corpora are a relatively new development within corpus linguistics , with the first learner corpora being put together in the late 1980s and early 1990s.

Areas of application of the learner corpus analysis

Learner corpora in research in second language acquisition

Most of the research in learner corpus analysis so far has been mainly linguistic-descriptive. In the comparative analysis of the learner output with the data from native speakers, the research now deals with the following questions, for example:

  • What lexemes , phrases , grammatical phenomena or syntactic structures are of learners overused ( overuse ) to rarely used ( underuse ) misused ( misuse ), or idiosyncratic use ( idiosyncratic use )?
  • In which areas do learners tend to adopt an avoidance strategy, i.e. where do they not make use of the full potential of the target language?
  • In which areas do learners behave like native speakers ( native-like ), in which areas not?
  • How can you define the core areas in which learners with a certain mother tongue (= L1) do not behave like a native and for which they need special help?

A second - and in this context very promising - research area is the analysis and comparison of different learner corpora from different mother tongues ( Contrastive Interlanguage Analysis ). Research in this area deals, for example, with the question to what extent the L2 speaking behavior or the L2 writing behavior of learners is influenced by their mother tongue. On the one hand, problems that are specific to a certain L1 background can be classified much better, and on the other hand, the comparison of learner data from different L1 backgrounds brings to light the problems that can be summarized as general problems of all learners of a certain foreign language can. In this way, specific and problem-sensitive improvements in teaching can be derived.

Learner corpora

For the English language , the four corpora compiled under the direction of the Université Catholique de Louvain in cooperation with various universities around the world and belonging to the so-called "Louvain Family of Corpora" are particularly suitable :

  • LINDSEI (Louvain International Database of Spoken English Interlanguage)
  • ICLE (International Corpus of Learner English)
and the corresponding native language comparative corpora
  • LOCNEC (Louvain Corpus of Native English Conversation)
and
  • LOCNESS (Louvain Corpus of Native English Essays), each for spoken and written English.

For German, the Humboldt-Universität zu Berlin provides Falko - “An error-annotated learner corpus of German as a foreign language”.

Learners analyze learner corpora

The most sensible application of learner corpora for learners themselves is the combination and comparison of learner texts with native speaker texts. Such scenarios of so-called data-driven learning (foreign language learning with the help of electronic texts or tools) enable learners in the role of researchers to identify which areas are particularly problematic for learners with the same mother tongue. So if the learner deliberately becomes aware of typical mistakes and finds them in a learner's corpus, he can work on these and thereby improve his foreign language skills considerably. This may apply to interference as well as idiomatic expressions, inadequate use of words, etc.

Implications for Foreign Language Didactics

The analysis of learner corpora and the precise description of different levels of learner language has very practical implications for foreign language didactics, as it involves considerations of curricula, teaching material preparation and methodologies in the classroom. So far, results from learner corpus analyzes have been incorporated into the current dictionaries Longman Dictionary of Contemporary English (LDOCE) (2003) and Cambridge Advanced Learners Dictionary (CALD) (2003), both of which contain comments on typical learner errors that are intended to be avoided.

literature

  • Christiane Brand, Susanne Kaemmerer: The Louvain International Database of Spoken English Interlanguage (LINDSEI): Compiling the German Component . In: Sabine Braun, Kurt Kohn, Joybrato Mukherjee (eds.), Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods . Pp. 127-140.
  • Sylviane Granger (ed.): Learner English on Computer . Longman, London 1998.
  • Sylviane Granger: A Bird's-eye view of learner corpus research . In: Sylviane Granger, Joseph Hung, Stephanie Petch-Tyson (eds.), Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching . John Benjamin, Amsterdam, Philadelphia 2002, pp. 3-33.
  • Sylviane Granger: Computer learner corpus research: current state and future prospects . In: Ulla Connor, Thomas A. Upton (eds.), Applied Corpus Linguistics - A Multidimensional Perspective . Rodopi, Amsterdam, New York 2004, pp. 123-145.
  • Geoffrey Leech: Preface . In: Sylviane Granger (ed.): Learner English on Computer . Longman, London 1998.
  • Gunter R. Lorenz: Adjective Intensification - Learners versus Native Speakers: A Corpus Study of Argumentative Writing . Rodopi, Amsterdam 1999.
  • Nadja Nesselhauf: Learner Corpora and their Potential for Language Teaching . In: John Sinclair (ed.), How to Use Corpora in Language Teaching . John Benjamin BV, Amsterdam 2004, pp. 125-152.
  • Hakan Ringbom: High frequency verbs in the ICLE corpus . In: Antoinette Renouf (ed.), Explorations in Corpus Linguistics . Rodopi, Amsterdam 1998, pp. 191-200.
  • Cambridge Advanced Learners Dictionary . Cambridge University Press, Cambridge 2003.
  • Longman Dictionary of Contemporary English . Longman, London 2003.

See also

Web links

Wiktionary: Learner's language  - explanations of meanings, word origins , synonyms, translations

References and comments

  1. cf. Granger et al. (2002) p. VII
  2. cf. Leech (1998), p. XIV
  3. Classification according to Lorenz (1999)
  4. See Ringbom (1998)
  5. Granger (1998), p. 12f.
  6. cf. Brand / Kaemmerer (2006)
  7. Peter Siemen, Anke Lüdeling, Frank Henrik Müller: FALKO - an error-annotated learner corpus of the German of the HU Berlin
  8. Falko
  9. Nesselhauf (2004), p. 126
  10. Granger (2004), pp. 136f.