Text Engineering Software Laboratory
Tesla
|
|
---|---|
Basic data
|
|
developer | University of Cologne |
operating system | platform independent |
programming language | Java |
category | Natural language processing |
License | Eclipse Public License |
tesla.spinfo.uni-koeln.de |
Tesla ( Text Engineering Software Laboratory , German laboratory for processing texts ) is software that can be used to carry out reproducible experiments on textual data. Textual data are all types of data that can be represented by a sequence of discrete units.
Tesla has been developed since 2005 at the Institute for Linguistics at the University of Cologne (Linguistic Information Processing Department) and provides a software environment for scientists who work with texts.
The conceptual focus of the framework is on experimental data and process analysis; this is how scientists are supported
- select different types of texts ( e.g. natural language texts or DNA transcriptions) as the basis for your experiments,
- to apply established as well as newly developed procedures to these texts and
- to document the experiments in a form with which they can be reproduced and repeated.
Tesla is implemented as a component system in Java , which was implemented on the basis of a client-server architecture . The user can manage texts and design experiments via the Eclipse -based client. Experiments consist of the starting material to be analyzed (individual texts or text collections) and components that take on certain tasks of text processing ( e.g. tokenization , part-of-speech tagging or sequence alignment ). The components can be combined with one another if their interfaces are coordinated. The interfaces of the components are the results they generate, which are linked to the raw data (texts) as annotations . In contrast to comparable systems such as UIMA , the input and output interfaces of Tesla components are hardly restricted, which enables finely granulated component encapsulation, and it is also possible, for example, to add complex data types (such as graphs or high-dimensional vectors ) as annotations use.
literature
- Jürgen Hermes, Stephan Schwiebert: "Classification of text processing components: The Tesla Role System." In: Fink, Lausen, Seidel and Ultsch: "Advances in Data Analysis, Data Handling and Business Intelligence", Springer Verlag 2010 Abstract
- Jürgen Hermes: "Text processing: design and application." Dissertation, University of Cologne. PDF document
- Stephan Schwiebert: "Tesla. A virtual laboratory for experimental computer and corpus linguistics." Dissertation, University of Cologne. PDF document