Internet linguistics

The Internet Linguistics (IL) is a branch of linguistics that deals with the scientific study of language use, depending on the specific online environment. Based on linguistics, linguistic phenomena that arise in and through the Internet are analyzed.

introduction

definition

The IL deals “with the use of language in the Internet and thus with a specific communicative context, the characteristics of which flow into all analyzes. The IL is an interface discipline which - as is typical for applied linguistics - combines linguistic approaches, communication and media science methods and also motivates socio and psychological issues ”. It is a broad definition in order to do justice to the dynamics of the object of investigation. It is also the task of the IL to describe the new communication modalities.

Emergence

With the media age, a new field of research for linguistics (and other scientific disciplines) has emerged. The first publications on the subject of language and communication on the Internet were published in the 1990s. Examples are “Language and Communication in the Net” by Runkehl / Schlobinski / Siever (1998), “Net Language” by Crystal (2006) and Rosenbaum (1996) and “Phenomena of Chat Communication” by Beißwenger (2002). The linguist David Crystal wrote the first English-language publication on the subject of "Internet linguistics". The scholarly discussion of the IL in German-speaking countries can be traced back to the linguists Konstanze Marx and Georg Weidacher. They expanded the field of research from isolated observations to the use of language on the Internet.

overview

In the IL the following questions are investigated: How do language and the Internet influence one another? How is communication in the internet services themselves organized? How does the Internet expand the subject of linguistics? Will the specific communication environment change? Accordingly, not only web content itself is analyzed, but also such linguistic phenomena that can only be transmitted with the help of Internet technology, such as B. hashtags, Skype conversations, memes etc. The term "Internet linguistics" has established itself because the lexeme Internet is used more widely in everyday language and the web, the net, the WWW, the Internet as a communication space, the Internet as a storage medium and that Include the Internet as a transmission medium.

The IL does not consider how linguistics is presented on the Internet. Both methodologically and in terms of content, the IL has links to other scientific disciplines.

Methods

The methods of the IL are instruments for collecting and analyzing digital data under linguistic questions. Although they include traditional approaches in linguistics , they are largely interdisciplinary in nature. Methodological convergences exist with the related disciplines of corpus linguistics , computational linguistics and media and communication studies . Due to the dynamic nature of the research subject, other methods are constantly being added.

Typical methods and tools of the IL are:

Corpus generation and analysis
Log file analysis
Offline data collection
Online questionnaires

Corpus generation and analysis

Corpora are extensive data collections of spoken and written texts that are also used in traditional linguistics. In internet linguistics, they are characterized by the following:

hypothesis-led investigation of linguistic phenomena
compiled with a view to the research question in order to map possible characteristic properties of the relevant language segment
exist in the order of several million text words
available electronically
can be analyzed by means of computers and statistical programs and processes

The two large already existing German-language corpora are the DeReKo of the IDS-Mannheim and the digital dictionary of the German language project of the BBAW , which, however, contain written contemporary language itself. Examples of corpora of net-based language are:

the USENET corpus, in English
the Dortmund chat corpus
Medienssprache.net, with SMS and nickname corpora
deWac, the German side of WaCky
the Swiss SMS corpus
Leipzig and Swiss Whatsapp corpus

The generation of corpuses on the Internet faces particular challenges. These include a. the processing of big data volumes, the citation problem and the issue of copyright. See also The WWW as a corpus .

For an introduction and to deepen the subject, reference is made to the online courses on basics and tools, the linguistic contributions in the corpus linguistics handbook and the current textbook Internet linguistics. In addition to the web as a corpus, recent publications also deal with the linguistic consideration of Twitter and social media.

Other helpful projects and tools:

TEI (Text Encoding Initiative), which develops and maintains document formats for encoding texts in digital form.
BootCat, creates online corpora
WebCorp live, a linguistic search engine

Log file analysis

With the help of installed logging programs, technical information about the data traffic is collected and evaluated during the log file analysis . This method is comparable to offline participant observation, which also records content. The test subjects know that their data has been recorded. This method is particularly suitable for online conversations in social media channels and chat rooms.

The observer paradox according to Labov is a challenge for the log file analysis . That means observing and recording conversations as if they had taken place without observation or recording.

Offline data collection

This method is suitable for investigating online phenomena that cannot be explored using any of the previous methods or that have no or limited access to the data. Often the research interest is z. B. in very personal, emotional communication, which takes place primarily in protected or non-public areas of social networks or forums.

The aim is to establish personal offline contact with test subjects in order to motivate them to create minutes of their online activity for research purposes.

Marx and Weidacher propose a questionnaire for questionnaires or flyers for offline data collection.

Online questionnaires

The questionnaire is a classic survey instrument used in empirical social research . In the IL it is used to e.g. B. to examine the distribution of specific lexemes or judgments about grammaticality. The following advantages and disadvantages result, whereby the advantages predominate:

Can be filled out and sent quickly (reach larger groups of test subjects even far away, without major financial outlay)
good response rate, unlike by post or with a survey leader
higher motivation because respondents choose when to answer
It is not possible to control who completes the questionnaire how, when and why and whether it is answered at all by a person (and not a group)

The possibilities of web-based linguistic research in connection with mass data will continue to be discussed in scientific specialist publications. All the problems of the classic sociological questionnaire remain to be considered: How representative is the sample, are the questions clear and is there a bias?

SoSci Survey offers help in creating questionnaires for non-commercial purposes. There you will also find a reference to other free alternatives for smaller surveys such as LamaPoll or Voycer, or open source tools for your own servers such as LimeSurvey, opensurveypilot and Wextor.

The WWW as a corpus

Dynamic data pool The main reason why the WWW is so interesting for linguistics is that it offers a dynamic and virtually infinite data pool. This data pool is constantly being regenerated, since users' verbal statements are produced every second and published in Web 2.0 . Thus, the linguistics WWW offers the basis for comprehensive investigations of the most diverse linguistic phenomena.

Different text producers and types of text In principle, the WWW is available to everyone, regardless of social status, education, age, occupation or origin. Users are offered various social networks and forums, such as Twitter , Facebook , Instagram or Tumblr . These platforms offer members the opportunity to find out about certain facts, to exchange ideas with other users, to react to content, to communicate and to express their own opinion. This diversity creates a wide variety of language material, as the users move in the field of tension between different types of text. They can express themselves in private messages, as well as in e-mails, comments or status messages.

Anonymity The alleged anonymity on the Internet makes users feel less reluctant to express themselves freely. This can lead to emotional expressions that would not be produced in other media. Speakers or authors in public media such as newspapers, radio or TV are aware of their audience and know that what they say may have consequences. The anonymity of the Internet, on the other hand, produces its own language pool and lulls the text producers into security. The fact that the authors feel protected by this anonymity in Web 2.0 can be an advantage for linguistic studies, since the linguists have such an unfiltered amount of linguistic material at their disposal. In recent years, however, there has been a tendency for users to become aware of the fact that linguistic statements are made public on the WWW.

Copyright In connection with the language material used for research purposes, the question of copyright arises. The difficulty for the IL lies in deciding how to deal with published utterances and to what extent it is legitimate to use voice data from the Internet for research purposes.

Quotation as a solution to the problem of copyright law In principle, it is possible to use linguistic data for linguistic investigations, even if these are subject to copyright law . However, the language material must be cited under certain conditions:

The place of publication from which the quotation comes should always be quoted. The date and time should also be given if possible. This information is not only necessary for legal reasons. For the linguistic investigation, precise time specifications are required.
If the users use pseudonyms or nicknames , anonymization must be guaranteed when using the data material.
Anonymisation is not necessary for real names and public institutions. The speakers express themselves consciously in public networks and use the WWW to advertise for themselves, for a party or for the company.
In the case of screenshots taken from private chats, for example, the real names and images must be made unrecognizable.

Who is the text producer? One of the challenges is locating the author. If this does not succeed, doubts can arise about the correctness and authenticity of the texts produced, as the data material may have been falsified, passed on and duplicated.

The search for relevant data Due to the huge amount of data on the WWW, it is difficult for linguists to find relevant linguistic material for the respective study. The IL is faced with the question of when and in what context the language material was produced, because the texts on the Internet may have been written years before publication.

Search Engines Search engines such as Google or Yahoo should pay particular attention when looking at language material on the WWW. The following problems can be found with regard to the search engines according to Noah Bubenhofer:

Limited query language

Die herkömmlichen Suchmaschinen des WWW sind nicht in der Lage Auslassungszeichen zu berücksichtigen, wie es Datenbanken in der Regel tun. Es ist nicht möglich mit Auslassungszeichen wie Apostrophen ('), Gedankenstrichen (–) oder Auslassungspunkten (…) zu arbeiten, um die Flexionsformen bestimmter Lexeme zu bestimmen.

Missing annotation

Suchmaschinen weisen, anders als bestehende Korpora, keine Annotationen auf. Sie können keine Zusatzinformationen zur gesuchten Phrase geben. Unklar bleiben etwa Informationen bzgl. Präpositionen, Nominativ oder Artikel.

Representativeness of the recorded websites

Suchmaschinen können nicht den gesamten relevanten Content zeigen. Sie sind nicht in der Lage alle Inhalte zu erfassen und zu indizieren. Bestimmte Inhalte werden aus rechtlichen oder politischen Gründen vorenthalten. Diese Inhalte fallen unter den Begriff Deep Web. So lässt die chinesische Regierung etwa Seiten wie Facebook oder besondere Nachrichtenseiten sperren, um den Bürgern bestimmte Informationen und Nachrichten vorzuenthalten.

Lack of transparency in indexing and ranking

Die Suchmaschinen versuchen dem User die relevantesten Webseiten zu präsentieren, wodurch ein Ranking entsteht. Es bleibt allerdings unklar, nach welchen Kriterien die Suchmaschinen dieses Ranking erstellen. Es lässt sich vermuten, dass die Webseiten-Anbieter in Konkurrenz miteinander stehen und somit das Ranking manipulieren.

Communication on the Internet as a research subject of the IL

Context factors

The communication situation on the Internet can be very different from offline communication. The Internet puts people in a changed communication context that tends to be more efficient. The effort involved in communication is decreasing, while the scope of communication itself is constantly increasing. Technical developments such as tablets or smartphones have made a major contribution to the fact that people are no longer limited in time or place to communicate. Interpersonal communication is thus made easier.

The technical developments also result in different forms of interactivity, ranging from quasi-synchronous to asynchronous communication. The degree of interactivity also depends on the situation and the habits of the communication partners, such as communication via WhatsApp . Since messages can be received immediately after they have been sent, quasi-synchronous communication is possible in principle. However, if one of the communication partners cannot or does not want to react immediately, the communication becomes asynchronous.

The semi-public phenomenon can also be related to communication efficiency. It is difficult to maintain control over personal content on the Internet. Information that was originally intended for private purposes can quickly find its way into public space. Depending on the privacy settings, strangers can also gain intimate insights into the private life of others via social networks. The boundaries between private and public blur and a semi-public arises. Nevertheless, many people use social services and accept the semi-public even if there is a critical awareness of possible consequential problems. The basis for this semi-public was laid by the new participation options of Web 2.0, which also resulted in a change in the communication situation.

In Internet communication, communication partners find themselves in a context in which they have to learn rules for dealing with one another that are still at the conventionalization stage. Netiquette offers approaches for such rules . Multitasking is also relevant for the communication context: on the Internet, younger generations in particular often use many applications at the same time. This means, for example, that a person is watching YouTube videos while sending text messages.

Questions and objects of investigation that arise from the communication situation:

How does the change from Web 1.0 to Web 2.0 affect communication? What influence does the semi-public have in this context?
Does the efficiency of communication affect the polite interaction with one another on the Internet? What are the rules for communication? To what extent is netiquette already established?
How does multitasking affect communication?

Technical conditions and consequences for users

The emergence of Internet linguistics was initially accompanied by similarly idealistic expectations as the emergence of the Internet as a medium itself. Internet pioneers dreamed of a society growing closer together, exchanging and networking via the Internet's communication channels. However, skeptics criticized online communication for being unemotional and anonymous. For the IL as a research field, however, it is particularly relevant that channels have emerged through which users can exchange ideas and thus offer material for linguistic analyzes. How a respective communication takes place in detail depends largely on personal, situational and technical influencing factors and must therefore be considered against this background.

At the beginning of IL research, forms of communication such as email or chat were of central importance. Now it is mainly posts from social networks such as Facebook, Twitter or Instagram. Speech material from mobile communication services such as WhatsApp or Threema is also increasingly the subject of internet linguistic analyzes.

The development from Web 1.0 to Web 2.0 is a basic requirement for modern Internet communication . The Web 1.0 largely consisted of offers that were created by a few users and passively consumed by many users. With Web 2.0, however, the cornerstone for interactive Internet communication was laid. Content is created by users themselves via online services such as YouTube , Wikipedia and Facebook . The user can move within a spectrum from purely private (e.g. encrypted e-mails and messenger programs such as Jabber or Signal ) to completely public communication (e.g. blogs).

Parallel to the development from Web 1.0 to Web 2.0, the introduction of mobile Internet access has changed modern Internet communication. While services such as WAP or UMTS were poorly accepted at the beginning of the 2000s, the demand for mobile internet increased with the increasing spread of smartphones. On modern devices with highly developed displays in the form of touchpads, navigating the Internet is much easier and more flexible than on classic cell phones.

The changed communication behavior through mobile online services also has an impact on the psyche of the user. With constant availability, expectations arise between those communicating, which can bring about changes in the respective interpersonal relationship. By questioning their own communication behavior, an Internet user can protect himself from possible information overload. Every internet user is faced with the question of the extent to which they want to make their private data public on the internet. The disclosure of private information can create an attack surface that others may use as a template for cyberbullying. The described phenomena can be examined by the IL using verbal utterances in order to gain knowledge about the consequences of modern processes of Internet communication for the user.

Description of communication on the Internet

The challenge of the IL is the adequate description of communication in the context of the Internet. Classic models such as that of Koch / Oesterreicher can no longer fully grasp the complexity of Internet communication. Communication is divided into media and conceptual features that are related to each other. The media aspect relates to the specific form of realization. The concept describes the style of an utterance. Marx / Weidacher give an overview:

Concept → Realization ↓	spoken	written
phonic	familiar face-to-face conversation	(formulated in advance) lecture
graphically	printed interview	Administrative regulation

Something phonically realized also tends to correlate characteristics of the spoken language at the conceptual level. This also applies to the graphic implementation. The following characteristics can be determined:

Verbal communication	Written communication
Co-presence of those communicating	spatial-temporal distance
synchronous	asynchronous
interactive - dialogical	not interactive - monological
(tends to) be personal	(tends to be) impersonal
multimodal	monomodal

Internet communication is characterized by features of conceptual orality, also known as oral literacy. Although it is a form of communication written down in the media. A dialogical and quasi-synchronous communication is possible, which is not considered in the Koch / Oesterreicher model. In addition, cases are not taken into account in which a conversion from an oral form to a written form and vice versa is carried out, as possible with the help of computer programs.

The role of the medium is also ignored. This is critical as the medium can be the determining factor that shapes the nature of communication. For example, the type of medium decides whether quasi-synchronous communication is possible and how much the communication partners can react to one another and thus interact. Even if participants perceive themselves as conversation participants when chatting or instant messaging and communicate verbally, this does not necessarily correspond to a conversation. The classic definition of a conversation includes verbal communication and direct contact between those communicating, which is not possible with chatting or instant messaging. One of the tasks of the IL is therefore to classify and find means for an adequate description of communication on the Internet.

Textual online communication

From the perspective of text linguistics , the question arises to what extent Internet text linguistics is required for the analysis of online texts . On the one hand, text structures on the Internet reflect elements and patterns that are already known from traditional texts, so that Internet-specific text linguistics does not appear to be necessary. On the other hand, prototypical features of conventional texts can be distinguished relatively clearly from typical features in Internet texts:

Features of prototypical conventional texts	Special features of (many) texts on the Internet
stability	Fluidity
Seclusion	Hypertextuality
Linguistic / written form	Multimodality
Monologicity	Dialogicity

In comparison, the approach of examining Internet texts only on the basis of theoretical and methodological foundations that were developed for the analysis of traditional print texts, for example, appears to be of limited use. The adaptation of theories and methods of text linguistics to texts in a specific online environment, on the other hand, appears sensible.

Language on the Internet as a research subject of the IL

Is there an internet language?

Scientists now largely agree that there is no Internet-specific language that is used solely for communication on the Internet. The following arguments speak for this:

First, there is no language with its own system. Every user is able to communicate within the Internet, regardless of the language community they belong to. It is not necessary to learn an internet language like a foreign language. The only exception is the markup language HTML .

Second, a specific Internet language would require a homogeneous language area. However, the Internet is a hybrid medium made up of different parts. Lexicons and blogs are just as much a part of the medium as the presentation of various companies, products or universities. This thematic heterogeneity alone presupposes the use of different languages that are not used exclusively on the Internet.

This also results in different types of linguistic design. The spectrum includes a slang style in social networks as well as a science style in online publications. All of these styles are shaped by their origins and professional requirements. They have their origin outside the network. The idiolect of every user means that different forms of expression arise even within the same form of communication (chat environments, forums, blogs, etc.).

A final and decisive argument against the existence of an Internet language is an ever increasing transmission of supposedly Internet-specific phenomena in areas of offline communication. Acronyms such as YOLO (you only live once) or hashtags are also used in offline communication. Emoticons also appear on T-shirts or flyers and thus detach themselves from their online environment.

Phenomena in language use and in Internet communication

Although there is no internet language, internet-specific phenomena can still be observed within internet communication. They were created by the special communication conditions on the Internet and show how language users can adapt to the changed communication situation. Since it is often difficult to distinguish between pure online communication and offline communication, it must be clear why individual features are internet-specific and thus are considered to be the subject of the IL. The following key questions can provide information:

Where do we find indications of new communication conditions in the data collections of spoken and written texts on the Internet?
Do we find this on all linguistic levels of description?
What new words have emerged and why is this linguistically relevant?
Where does the WWW motivate a change in meaning?
What processes are involved in understanding words that are used differently on the internet?

In the following, individual phenomena of Internet communication will be singled out and discussed.

Emoticons

Emoticons are characters that are used to represent complex emotional facts and actions in a compact and effective manner. In addition to the transmission of affective emotional information, the use of emoticons extends the components of online communication to include offline communication. Smileys can reproduce facial expressions and thus often express feelings and moods (☺, ☹, :-D etc.), which are conveyed in direct communication via facial expressions. But they can also replace non-verbal messages such as gestures and voice pitch. The wink smiley (;-)), for example, often accompanies comments that are not meant seriously. While the pitch of the voice is used as a signal of irony in direct communication, these signals are often replaced by emoticons in online communication. In the old SMS communication, smileys generated by punctuation marks (e.g. :-)) were mainly used. In messenger services such as WhatsApp, users can choose from a large number of images using a separate virtual keyboard. These characters are divided into different categories. a. Include traffic signs and pictures of animals, foods, or plants. You can even replace some of the accompanying text. Emoticons can have various functions that are relevant to the meaning of a spoken utterance. Examples are specification, emphasis and weakening on a propositional but also on an emotional level.

Vocabulary extensions

The rapid and varied development of the Internet has contributed to the fact that numerous new terms have become established. One can speak of an additional lexical dimension. A lexical gap has arisen which has been filled with Internet-specific terms. As a result, there was massive expansion of vocabulary. This type of neoplasm is also used in offline communication. The extensions can be categorized in many ways and continued indefinitely. The following list provides some examples:

Vocabulary fields	New term
Internet offers	Facebook, Twitter, YouTube, Tumblr, Instagram
(communicative) actions	blog, chat, tweet, google, spam
technology	Intranet, browser, app, cookie, QR code
entertainment	Flash mob, feed, netiquette
economy	Social media marketing, online shop, online banking
society	Digital natives, intranet generation, network community

The changes within an existing vocabulary are not limited to the formation and establishment of new words. The adaptation of internet-specific terms takes place primarily on the grammatical-morphological level. It is noticeable here that, due to the adoption of English elements, some types of word formation occur more frequently that are not typical for German. In addition to composition and derivation, there are also numerous examples of conversion and contamination:

Composition (composition): e.g. B. photo-blog.
Derivation : e.g. B. blogg-er and based on this blogg-er-in.
Conversion (change of part of speech): e.g. B. engl. blog → (to) blog; German blog → blogging
Contamination (crossword): e.g. B. Vlog (video + blog), Blargon (blog + jargon).

Acronyms and Abbreviations

Two other ways to develop new words are acronyms and abbreviations. They are characteristic of internet communication. These two phenomena merge strongly and are made up of individual letters. Usually these are the first letters of two or more words. Either each individual letter is pronounced individually (e.g. afk) or the entire sequence of letters, if this is articulatory (e.g. lol). Acronyms can be inflected, for example in asapst (as soon as possible –st) and are mainly used for chatting and tweeting, but also in e-mails because they are very space-saving and economical. In addition, the use of acronyms can signal a certain affiliation with the Internet community and thus contribute to socialization processes. A small selection should give an initial overview:

afk	→ away from keyboard (not on the computer)
own	→ actually
fyi	→ For your information
lol	→ laughing out loud (laughing out loud)
plz / pls	→ please (please)
rofl	→ rolling on the floor laughing (rolling on the floor laughing)
thx	→ thanks (thank you)
possibly	→ maybe
yolo	→ you only live once (you only live once)

Interjections

Interjections , also known as sound words, are written tonal signs such as "hmm", "grump", "argh", "ohm", "hihi" etc. They imitate auditory perceptible events and indicate thought processes or emotions. The most common way of expressing this direct verbality in writing is by simulating laughter (haha, ha ha), as well as expressing astonishment, surprise or regret (oh). An equally common interjection is “hm”. This sound word functions primarily as a listener feedback and signals to the other person that you are on the computer, mobile phone or similar. and has read the message. Interjections in general are similar in function to emoticons. To a certain extent, they are written out smileys and thus represent a less economical variant.

Iterations

As iteration doubling (or multiple duplication) is of individual letters or punctuation designated (eg. As halloooo). This linguistic feature is primarily used as a compensatory means to clarify prosody or as a substitute for forms of intensification. So a positive emotion can be increased by an iteration :-)))). Consistent capitalization, as in "OOOOOOOHHHHHHHHHH", is used for the spoken language parameter volume. The iteration of characters, on the other hand, stands for stretching and intonation contours and is mostly used to mark an emphasis. Both strategies can also be found combined. Like most linguistic features in Internet communication, iterations can be used as a substitute for non- and paraverbal means and offer an economic advantage. Doubling letters or punctuation marks requires less effort than paraphrasing.

Inflective

Inflective , or action words, are adaptations from the comic language. Grammatically speaking, these are verb stems without an inflectional ending. They are usually set in asterisks (asterisks) or angle brackets and used to simulate para- or non-verbal actions linguistically. Examples are * wink *, * gag * or * laugh *. In this way, psychological or physical sensitivities and actions can be presented in an economically viable way without complex paraphrasing.

E.g.:

the #SPD Bauwurst ... * gag * I feel sick now!

(arijibeer, Twitter, May 30th 2015, 11:18 am)

Ellipses

Ellipses , omissions on a syntactic level, are a frequently occurring phenomenon in Internet communication. Language and time-saving reasons play an important role here.

E.g.:

full sweet but

(justin, Twitter, June 6th 2015, 12:04 pm)

Leetspeak

Leetspeak is a kind of sign language. Individual letters or words are replaced by numbers that look similar (e.g. L3375P33k = Leetspeak). It has its origins in the computer scene. There it acted as a kind of secret language that is much more difficult to decipher. Young users in particular make use of elements from Leetspeak. Users replace individual graphemes or syllables with numbers, such as in n8 (night) or w8 (wait).

Hashtags

Hashtags denote a word or a string of characters preceded by a pound sign. They function as meta tags and are used as keywording within a body of text, as image and video captions or are placed before or after a body text. Various services such as Facebook, Instagram and Twitter use hashtags to make it easier to search for terms within the respective network. The type of keywords used ranges from the names of individuals, events and events to slogans that have arisen from the advertising industry or within the user community. They often refer to (media) events. It is completely irrelevant whether hashtags are composed of letters, numbers or strings.

However, hashtags are not just used as keywording, they can fulfill very different functions. By setting hashtags under pictures on Instagram, for example, you can rewrite what is shown in the photo in an extremely economical way. When searched, it appears together with photos of all those users who have published a picture with the same hashtag and can thus be assigned to a certain interest group (#lovewins). On Twitter, a hashtag is usually assigned to a specific (discussion) topic. Everyone who comments on a certain topic uses the previously defined hashtag within their tweets and their comments are visible to everyone who is interested in the topic discussed on the respective platform. The user can also refer directly to previously published short messages (#jesuischarlie).

Memes

An internet meme is the humorous / sarcastic reaction of the internet community to a (media) event. A meme motif can be a photo, a work of art, or a film with text. In terms of content, the text relates both to the image and, in the broadest sense, to an event that is referred to in a sarcastic manner. The image and the text can only be understood by the users in combination with one another.

Practice-relevant and interdisciplinary aspects of Internet linguistics

As an interface discipline and sub-area of applied linguistics , practice-relevant aspects of the IL are crucial within research. In many areas it is not just questions motivated by linguistic interest and results that make up the research area of the IL. For example, a media science study of the World Wide Web raises questions that are solved by looking at language on the Internet. In this way, the debate about the quality of journalism on the Internet (for example in expert discussions) can also be approached using language analysis methods.

Social and behavioral sciences can also draw conclusions from analyzing linguistic communication on the Internet. Research on cyber bullying is an example of this.

Further conclusions on other areas of society from the language analysis of online communication are possible. A conceivable area of application are results on the binding nature of linguistic statements, for example the legally binding nature of contractual commitments or the treatment of the question of when statutory online terms and conditions must be clear and understandable.

As an interdisciplinary approach, no clear separation of the respective sciences can be made. In research on the subject of cyberbullying, for example, social science, psychological or media theoretical methods can be used alongside linguistic methods. The corresponding research output of the individual disciplines, which is taken up by other areas, cannot therefore be clearly defined. It is more about an interlocking of methods and results that is more practical and oriented towards the research object than towards specific subject areas. As in other areas related to interpersonal processes, an examination of language can be the starting point for a wide range of topics.

Internet linguistics and data processing

With the development of Web 3.0 and the progress of the technical processing and production of language, interfaces between linguistic and technical disciplines arise in the form of shared methods as well as by taking up knowledge, etc.

On the one hand, this concerns the information technology handling of language, the area of computational linguistics . On the other hand, language can be viewed as an information carrier that contains information in the form of unstructured data . The technical process of extracting data from natural language is called information extraction . This technically based handling of language can be divided into automatic language processing and automatic language production.

Automatic language processing

Automatic language processing plays an important role in the area of human-computer interaction : Natural language inputs simplify handling a machine, especially a computer. The prerequisites for this are given above all in the interactions with networked devices, which is why this is an interface discipline with the IL. Examples of such services are the Siri software , Google Now and semantic search engines .

Information extraction is also used to obtain data that can be digitally processed. They must be structured on the basis of corpora, which are generally based on online texts. Therefore, there are also interfaces to the IL area. Wikidata and Google's Knowledge Vault are examples of such information databases .

Search engine ranking is a process that is closely related to language implemented and processed on the web. The text producer wants to act linguistically in such a way that its own products are placed higher in the search engine ranking. This area is called search engine optimization . The aim is for the algorithm with which the content is automatically assessed to carry out an advantageous classification. So far, hyperlinks ( PageRank ) have been the most relevant . One of the research areas of the IL is the investigation of language that was realized under these specifications and the interaction with language that is not realized on the Internet.

Advances in the area of automated language processing make newer ranking methods possible. In February 2015, for example, Google announced that it would also evaluate websites based on trustworthiness. The alleged facts are extracted from the text content to be evaluated using information extraction and compared with a knowledge database. This database was created from a web corpus using Information Extraction. The correspondence of the extracted information with the knowledge database forms the level of trustworthiness.

Example of mobile automatic translation of written texts: WordLens

Machine translation services play a separate role. International availability, the availability of the texts in digital form and offers, e.g. Translating web content automatically and immediately, e.g. from Google or Facebook, ensures that a potential translation plays a role when writing texts ( translation-friendly writing ). Research fields for the IL are on the one hand the formal properties of appropriately designed expressions. On the other hand, questions arise about the influence on language in general. This also includes the use of mobile Internet devices, which make it possible to digitize any type of utterance and thus bring it into a possible context of automatic language processing and translation.

An example of this effect are apps such as Word Lens or the mobile version of Google Translate , which can translate written and oral statements directly using a camera and microphone.

As a result, some of the texts available on the network are automatically translated texts and thus products of automatic language production.

Automatic speech production

In the case of automatic language production, findings from formal linguistics are used. For an empirical approach it is particularly interesting to what extent such texts can be found, what properties they have and what role they play in human communication.

Examples of such texts are the automatic translations, automatically generated press releases or texts generated by automatic completion or auto -correction.

Temporal progression of the number of articles on Wáray-Wáray-Wikipedia

The relevance of such texts for communication processes can be demonstrated using various concrete examples. There is a Wikipedia version in the Filipino language Waray-Waray , which contains around one million articles with 88 active users. The reason for this lies in articles that were automatically translated by other Wikipedia: various programmers ensured a rapid increase in the number of articles with bots that do automatic translations.

Machine-generated press releases provided material for discussion, especially in the USA. In California, for example, the Quakebot software publishes reports on earthquakes in the form of press releases. Another example is the Wikipedia Live Monitor , which automatically generates reports from Wikipedia updates.

For an investigation of the communication processes in the IL, in addition to the language products created, the following discourses and the effects on everyday language are of interest. For example, in SMS and online communication, follow-up communication about incorrectly corrected utterances can be observed and thus communication processes that would otherwise not take place. In addition, the question arises of how natural language behaves in view of the possibility of its processing and production by artificial intelligence .

Differences in the methodology of linguistics and data processing

With regard to the methodology and the focus, it can be stated that computer science and linguistics often pursue different approaches that meet in the IL as an interface discipline. Linguistics examines language, its internal relationships and its structure. Therefore, both individual comments and social aspects are considered relevant. The focus is on the role of communicator and recipient. In the area of data processing, the focus is on information that is explicitly coded in language. Due to the amount of data that is handled automatically, individual cases fade into the background. Relevant aspects are the formal, structural structure of language, if language-internal aspects are considered at all: In some cases, for example in the minimally supervised learning approach, the focus is only on the naming of certain arguments in sentences, without the syntactic or semantic ones Dimension of the rest of the sentence.

It remains questionable to what extent aspects such as pragmatics and information structures can be taken into account in automatic language processing in the future.

literature

M. Beißwenger, A. Storrer: Corpora of Computer-Mediated Communication. In: A. Lüdeling, M. Kytö̈ (Eds.): Corpus Linguistics. An International Handbook . De Gruyter, Berlin 2008, ISBN 978-3-11-020733-0 , pp. 292-308

M. Brinker, SF Sager: Linguistic Conversation Analysis. An introduction . 4. Revised and supplemented edition. Erich Schmidt Verlag, Berlin 2006, ISBN 3-503-07981-5

N. Bubenhofer: Linguistic usage pattern. Corpus linguistics as a method of discourse and cultural analysis . 2009, ISBN 978-3-11-021584-7

D. Crystal: Language and the Internet . Cambridge University Press, Cambridge 2006, ISBN 0-521-86859-9

C. Dürscheid, K. Frick: Keyboard-to-screen communication yesterday and today: SMS and WhatsApp in comparison . In: A. Mathias, T. Runkehl, T. Siever (Eds.): Languages? Diversity! Language and communication in society and in the media. An online commemorative publication for the anniversary of Peter Schlobinski . In: Networx , No. 64, 2014, ISSN 1619-1021

M. Gatto: Web as Corpus. Theory and Practice . Bloomsbury, London 2014, ISBN 978-1-4411-5098-1

K. Marx, G. Weidacher: Internet Linguistics . A textbook and workbook. Narr Verlag, Tübingen 2014, ISBN 978-3-8233-6809-0

A. Mathias, T. Runkehl, T. Siever (Eds.): Languages? Diversity! Language and communication in society and in the media. An online commemorative publication for the anniversary of Peter Schlobinski . In: Networx , No. 64, 2014, ISSN 1619-1021

F. Kessler: Instant Massaging. A new form of interpersonal communication . In: Networks , No. 52. 2008, ISSN 1619-1021

P. Koch, W. Oesterreicher: Writing and language . In: H. Günther, O. Ludwig (Ed.): Writing and writing. Writing and Its Use. An interdisciplinary handbook of international research. An Interdisciplinary Handbook of International Research (HSK 10.1). De Gruyter, Berlin / New York 1994, ISBN 978-3-11-011129-3 , pp. 587-604

M. Kresic: Communication Theory and Internet . In: Networks , 15, 2000.

O. Rosenbaum: chat slang. Lexicon of Internet Language: Understand and use over 3000 terms . Hanser, Munich 1996, ISBN 3-446-18868-1 .

P. Schlobinski et al .: Simsen. A pilot study on linguistic and communicative aspects in SMS communication . In: Networks , 22, 2011.

T. Siever: Digital world. Communicative consequences and consequences of communication . In: A. Mathias, J. Runkehl, T. Siever, (Eds.): Sprachen? Diversity! Language and communication in society and the media. An online commemorative publication for the anniversary of Peter Schlobinski . In: Networks , No. 64, 2014, pp. 197-234, ISSN 1619-1021

A. Storrer: About the effects of the internet on our language . In: H. Burda, M. Döpfner, B. Hombach, J. Rüttgers (eds.): 2020 - thoughts on the future of the Internet . Klartext Verlag, Essen 2010, ISBN 3-8375-0376-3 , pp. 219-224

V. Thaler: Orality, written form, synchronicity. An analysis of old and new concepts to classify new forms of communication . In: V. Agel, H. Feilke, A. Linke (eds.): Journal for Germanistische Linguistik , 35, 2007, pp. 147-182. De Gruyter, Berlin / Boston

M. Zappavigna: Discourse of Twitter and Social Media. How we use language to create affiliation on the web . Bloomsbury, London 2012, ISBN 978-1-4411-4186-6

Web links

A reduced redundancy USENET corpus (2005–2011) (English)
Birmingham blog Corpus (English)
COSMAS II - Corpus Search, Management and Analysis System of the Institute for German Language
Dortmund chat corpus from the Technical University of Dortmund
Vocabulary portal of the University of Leipzig
Digital dictionary of the German language from the Berlin-Brandenburg Academy of Sciences
Limas - corpus of the University of Duisburg-Essen
TIGER Corpus of the University of Stuttgart
wacky.sslmit.unibo.it
corpus.byu.edu (English)
The Corpus of Contemporary American English (English)
British National Corpus (English)
WebCorp live (English) search engine for linguistics
Corpora search engine from the University of Leeds

Projects, tools

www.medienssprache.net
www.bubenhofer.com
semtracks research group
socialmedialinguist (English)
Text Encoding Initiative (English)
Simple utilities to boot strap C orpora A nd T erms from the Web (English)
Technical or qualitative problems with search engines

Individual evidence

↑ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r ^s ^t ^u ^v ^w ^x ^y ^z ^aa ^ab ^ac ^ad ^ae ^af Konstanze Marx, Georg Weidacher: Internetlinguistik - Ein Lehr- und Arbeitsbuch . Narr Verlag, Tübingen 2014, ISBN 978-3-8233-6809-0 .

↑ E. Cölfen, H. Cölfen, U. Schmitz: Linguistics on the Internet: The book on the network - with CD-Rom . Westdeutscher Verlag, Opladen 1997, ISBN 3-531-12892-2 .

↑ ^a ^b M. Gatto: Web as Corpus. Theory and Practice . 2014.

^ Expansion and maintenance of the corpora of written contemporary language - The German reference corpus - DeReKo. Retrieved July 12, 2015 .

↑ Resources - Corpora . dwds.de; Retrieved July 12, 2015.

↑ A reduced redundancy USENET corpus (2005–2011) psych.ualberta.ca (English) accessed on July 12, 2015.

↑ tu dortmund - Dortmund chat corpus , accessed on July 12, 2015.

↑ Torsten Siever: media analysis - corpora at medienssprach.net. Retrieved July 12, 2015 .

↑ wacky.sslmit.unibo.it

↑ sms.linguistik.uzh.ch

↑ whatsup-deutschland.de

↑ whatsup-switzerland.ch ( Memento of the original from January 2, 2015 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2
↑ ^a ^b Noah Bubenhofer: Introduction to Corpus Linguistics: Practical Basics and Tools , bubenhofer.com, accessed on July 19, 2015.
↑ M.Beißwenger, A. Storrer: Corpora of Computer-Mediated Communication . 2008.
^ M. Zappavigna: Discourse of Twitter and Social Media . 2012.
↑ tei-c.org ( Memento of the original from June 12, 2015 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2
↑ bootcat.sslmit.unibo.it
↑ webcorp.org.uk
↑ D. Gysin et al .: Language and People in Web 2.0 - Linguistic Perspectives on YouTube, SchülerVZ & Co. 2012
^ Orthmann, 2004 .
↑ Britta Juska-Bacher, Chris Biemann, Uwe Quasthoff: Web-based linguistic research: possibilities and limitations when dealing with mass data . In: Linguistics online . tape 61 , no. 4 , September 1, 2013, p. 7–29 , doi : 10.13092 / lo.61.1274 ( bop.unibe.ch [accessed on April 13, 2020]).
↑ Porst, R .; 2009 Questionnaire - A workbook. Wiesbaden: VS Verlag für Sozialwissenschaften ISBN 978-3-531-16435-9
↑ soscisurvey.de
↑ lamapoll.de ( Memento of the original from July 31, 2015 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2
↑ ^a ^b Technical or qualitative problems with search engines , suchmaschinenkompetenz.de, accessed on July 19, 2015
^ T. Siever: Digital world. Communicative consequences and consequences of communication . In: A. Mathias, J. Runkehl, T. Siever, (Eds.): Sprachen? Diversity! Language and communication in society and the media. An online commemorative publication for the anniversary of Peter Schlobinski . In: Networks , No. 64, 2014, pp. 197-234.
↑ N. Döring: Psychological consequences of Internet use. In: The citizen in the state. Politics and the Internet, issue 04/2014.
↑ cf. H. Hanekop, (2010): Mobile Internet and local space. Everyday life between local presence and "Always Online". In: Die alte Stadt, issue 02/2010.
^ S. Pöschl, N. Döring: Access anytime, anywhere, with anyone? Determinants of Mobile Accessibility in Cell Phone Communication - A Review . In: K. Marx, M. Schwarz-Friesel (eds.): Language and communication in the technical age. How much Internet (v) can our society endure? de Gruyter, Berlin 2012.
^ P. Koch, W. Oesterreicher: Writing and language . In: H. Günther, O. Ludwig (Ed.): Writing and writing. Writing and Its Use. An interdisciplinary handbook of international research. An Interdisciplinary Handbook of International Research (HSK 10.1). De Gruyter, Berlin / New York 1994, ISBN 978-3-11-011129-3 , pp. 587-604.
^ ^A ^b V. Thaler: Orality, written form, synchronicity. An analysis of old and new concepts to classify new forms of communication . In: V. Agel, H. Feilke, A. Linke (eds.): Journal for Germanistische Linguistik , 35, 2007, pp. 147-182. De Gruyter, Berlin / Boston
↑ A. Storrer: About the effects of the Internet on our language . In: H. Burda, M. Döpfner, B. Hombach, J. Rüttgers (eds.): 2020 - thoughts on the future of the Internet . Klartext Verlag, Essen 2010, ISBN 3-8375-0376-3 , pp. 219-224.
^ M. Brinker, SF Sager: Linguistic conversation analysis. An introduction . 4. Revised and supplemented edition. Erich Schmidt Verlag, Berlin 2006, ISBN 3-503-07981-5 .
^ Arne Ziegler: Text structures of internet-based communication. Do we need media text linguistics? In: Michael Beißwenger, Ludgar Hoffmann, Angelika Storrer (eds.): Internet-based communication. Osnabrück contributions to language theory. OBST, Duisburg 2004, ISBN 3-924110-68-9
^ B. Sandig: Text as a prototypical concept . In: M. Mangasser-Wahl (Ed.): Prototype theory in linguistics. Application examples - method reflection - perspectives. Staufenburg, 2000, ISBN 3-86057-706-9
^ H. Jones Rodney: Technology and sites of display . In: J Carey (Ed.): The Routledge Handbook of Multimodal Analysis. 2009, ISBN 978-0-415-66777-7
↑ M. Kresic: Communication Theory and Internet . In: Networks , 15, 2000.

↑ P. Schlobinski et al .: Simsen. A pilot study on linguistic and communicative aspects in SMS communication . In: Networks , 22, 2011.

^ ^A ^b T. Siever: Digital world. Communicative consequences and consequences of communication . In: A. Mathias, J. Runkehl, T. Siever, (Eds.): Sprachen? Diversity! Language and communication in society and the media. An online commemorative publication for the anniversary of Peter Schlobinski . In: Networks , No. 64, 2014, pp. 197-234, ISSN 1619-1021

^ D. Crystal: Language and the Internet . Cambridge University Press, Cambridge 2006, ISBN 0-521-86859-9

^ ^A ^b ^c F. Kessler: Instant Massaging. A new form of interpersonal communication . Networks No. 52, 2008, ISSN 1619-1021

↑ Siever et al., 1998, p. 99

↑ Kessler, 2008, p. 28

^ M. Zappavigna: Discourse of Twitter and Social Media. How we use language to create affiliation on the web . Bloomsbury, London 2012, ISBN 978-1-4411-4186-6

↑ zeit.de

↑ Quality problems in journalism and their causes . bundestag.de, February 23, 2011; Retrieved July 19, 2015.

↑ Why cyberbullying is an issue for linguists . ( Memento of the original from July 15, 2015 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. University of Graz ; Retrieved July 19, 2015. @1@ 2

^ The corresponding BGH decision at DeJure , BGH, February 23, 2005 - IV ZR 273/03, accessed on July 19, 2015.

↑ Harald Brennecke: Violation of the transparency requirement - clauses in terms and conditions must be clear and understandable . Brennecke.pro; Retrieved July 19, 2015.

↑ Vanishing river gorge shows geology in fast forward , New Scientist , August 20, 2014, accessed July 19, 2015.

↑ Xin Luna Dong: Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources , February 12, 2015 (English) arxiv : 1502.03519v1 .

↑ Waray-Waray Wikipedia stats.wikimedia.org; Retrieved July 19, 2015.

↑ Nicholas Diakopoulos: Bots on the Beat - How can we instill journalistic ethics in robot reporters? Slate , April 2, 2014 (English); Retrieved July 19, 2015.

↑ MHO: Quakebot writes first report about the earthquake in Los Angeles . Heise online , March 18, 2014, accessed on July 19, 2015.

↑ Bea Francis: Twitter who? New app turns Wikipedia into a real time news source . digitaltrends.com, April 16, 2013 (English); Retrieved July 19, 2015.

↑ And then it will rain dogs and cats . Le Monde diplomatique , January 8, 2015; Retrieved July 19, 2015.

↑ Hans Uszkoreit, Feiyu Xu, Hong Li: Analysis and Improvement of Minimally Supervised Machine Learning for Relation Extraction . ( Memento of the original from July 21, 2015 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. (PDF) German Research Center for Artificial Intelligence ; Retrieved July 19, 2015. @1@ 2

[Marx-1] ↑ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r ^s ^t ^u ^v ^w ^x ^y ^z ^aa ^ab ^ac ^ad ^ae ^af Konstanze Marx, Georg Weidacher: Internetlinguistik - Ein Lehr- und Arbeitsbuch . Narr Verlag, Tübingen 2014, ISBN 978-3-8233-6809-0 .

[Cölfen-2] E. Cölfen, H. Cölfen, U. Schmitz: Linguistics on the Internet: The book on the network - with CD-Rom . Westdeutscher Verlag, Opladen 1997, ISBN 3-531-12892-2 .

[Gatto-3] M. Gatto: Web as Corpus. Theory and Practice . 2014.

[4] Expansion and maintenance of the corpora of written contemporary language - The German reference corpus - DeReKo. Retrieved July 12, 2015 .

[5] Resources - Corpora . dwds.de; Retrieved July 12, 2015.

[6] A reduced redundancy USENET corpus (2005–2011) psych.ualberta.ca (English) accessed on July 12, 2015.

[7] tu dortmund - Dortmund chat corpus , accessed on July 12, 2015.

[8] Torsten Siever: media analysis - corpora at medienssprach.net. Retrieved July 12, 2015 .

[9] wacky.sslmit.unibo.it

[10] sms.linguistik.uzh.ch

[11] whatsup-deutschland.de