Dialogue act

An act of dialogue , also called a conversation movement , describes the function of an utterance in a dialogue with regard to neighboring pairs of utterances between two or more dialogue partners on the pragmatic, semantic and syntactic level.

Use of the word

In linguistics there is no clear definition of the term `` act of dialogue '', which means that it can be assigned several meanings:

An act of dialogue is referred to as a loose designation of speech acts in the context of a dialogue.
An act of dialogue is described as a combination of speech acts under pressure from semantic criteria of the utterances.
Dialog files are assigned an internal structure belonging to one or more dialog / communication functions.

Speech act vs. Dialogue act

A speech act refers to a single utterance. It is a goal-directed intention of a speaker in a dialogue and describes a communication unit that has one or more speech acts.

An act of dialogue, on the other hand, also denotes an utterance in a dialogue , but in contrast to the act of speaking, the previous or subsequent utterances of a conversation partner can also be taken into account in order to map the function of the utterances with regard to the pragmatic, semantic and syntactic meanings in the dialogue. The focus here is on understanding the context of the dialogue and not just on understanding the individual utterances.

Basics

The motivation behind the modeling of dialogue files is that spoken conversations can be analyzed and the individual utterances can be identified with dialogue files. This enables statements to be understood, interpreted and answered in a context-appropriate manner. This is a great advantage, especially on a pragmatic level. By determining which is based on the speaker's intention and by modeling the dialogue acts, natural language can be understood in dialogue. However, not just one but several functions can be combined in one utterance. This is called a multifunctional utterance. Therefore the assignment of the dialog function is not always clear!

Example : multifunctionality of the utterance: "I'm coming tonight."

Dialog acts : Promise (promise), Informative statments (Information)

Dialog files are accordingly functional units that can change the contextual content of an utterance. In addition, a human-machine dialogue can be simplified and implemented with the help of dialogue act analysis.

Dialogue act aspects

There are three aspects in the dialogue act that determine the contextual meaning with regard to the utterances:

Form of utterance : Determines the context of what is spoken / written, which is evoked by the act of dialogue.
Semantic content : The semantic content is of particular importance for the new context that arises as soon as the dialogue act has been carried out. This meaning that has arisen does not always have to exist before the act of dialogue.
Communication function / dialogue function : What is important is determined by the communicative function.

Example utterance: "Is it raining?"

Dialog function : YES / NO QUESTION
Semantic content : current weather conditions.

${\ displaystyle \ Rightarrow}$ Speaker wants to know if it's raining.

Task-oriented vs. non-task-oriented dialogue file

A distinction can be made between two types of dialogue acts, the task-oriented dialogue act and the non-task-oriented dialogue act.

Dialog files that are task-oriented only intend the completion of a task by two or more interlocutors. Both try to find a way through the conversation to solve a certain task or to achieve a certain goal, such as transporting oranges from A to B.

Non-task-oriented dialogue acts are often referred to as casual conversational speech . These cause a purely informal conversation between the respective interlocutors, for example about cars.

DAMSL - Dialog Act Markup in Several Layers

DAMSL is a dialog annotation scheme developed by the Multiparty Discourse Group in the Discourse Research Initiative (DRI) meeting in Pennsylvania, 1996. This annotation scheme marks utterance characteristics that describe the role of utterances and their relationships with one another. It is designed to analyze task-oriented dialogues of a speaker and a listener. The speaker represents the person who initiates the dialogue and the respective utterances, while the listener represents the person who reacts to these utterances.

Utterances represent the speaker's intention and thus represent a certain content. Since the contents and intentions can be of different types, an utterance has different levels on which it can be described in more detail. These levels are divided into different main categories, whereby not every utterance has to serve all levels:

Communication status : assesses whether an utterance can be interpreted or whether it was successfully completed. Speakers can, for example, make mistakes in their statement, change the content in the meantime, break off the utterance or speak to themselves.
Information level : represents the semantic content of an utterance. These can include, for example, an action request or sequence of actions, attention, incomprehension or understanding of a statement, opening or closing a conversation, or questions to the speaker.
Forward-Looking-Function : provides information on how the utterance influences the ideas and actions of the other person.
Backward-Looking-Function : describes to what extent the statement matches the previous one.

Expressions have certain properties that relate to the structure and content of the respective expression and can therefore provide information about its function. Most of the time, the utterances have complex functions that can be characterized by the course of the dialogue and the purpose behind each utterance.

Forward looking function

This function describes the effect of an utterance on the further dialogue. It is usually very difficult to interpret what effect an utterance should or can have on the listener. For this purpose, the further course of the dialogue is examined in order to determine the function of this utterance. The utterances can, for example, represent simple functions, such as informative statements, characterize immediate or near future calls for action.

Backward looking function

This function describes how the interlocutor reacts to the previous utterance of a speaker. In this case, the previous act of dialogue or the previous utterance is considered and the response to this utterance is then retrospectively characterized. In this way, the listener can accept, reject, answer or correct a previous statement. The utterance can, for example, represent the functions of the agreement (agreement), answering, having understood or misunderstood.

As described at the beginning, an utterance can combine several acts of dialogue at the same time. This is another reason why it is not always easy or unambiguous to assign a particular act of dialogue to an utterance.

DAMSL annotation examples

A and B within the examples represent speakers of an utterance, with B giving possible answer utterances to the context given by A.

Forward looking function examples :

Info-Request: 				A: Sag mir wie spät es ist.
Action-Directives:  			A: Mach die Tür zu.
Inuencing-Addressee-Future-Action:  	A: Was hältst du davon zu Joey’s Pizza zu gehen?

Examples of different utterances in a specific context :

Context: 	A: Möchten sie das Buch und den Review haben?
Accept		B: Ja, gerne.
Accept-Part 	B: Ich hätte gerne das Buch.
Maybe          B: Ich muss erst darüber nachdenken.
Reject-Part 	B: Ich brauche den Review nicht.
Reject         B: Nein, danke.
Hold           B: Muss ich dafür zahlen?

Examples of expressive features :

Task-Info-Request: 		A: Welche Zeiten sind verfügbar?
Communication-Info-Request:	A: Was hast du gesagt?

Cue model

The cue model is a technical approach to model dialogue act detection and interpretation. The idea of the model is that an utterance has specific surface properties, which are represented with the help of different cues. A cue is a simple indicator that represents one of the following levels:

Lexical and syntactic cues : based on conversational analytic traditions (e.g. W-words and auxiliary verbs in questions)
Prosodic cues : pauses in speech, pitch (rise towards the end of utterance → question), emphasis ...
Discourse cues : context related.

It is believed that the listener uses certain cues to decide how a speaker's utterance can be interpreted. These cues represent the properties that belong to the respective act of dialogue. An utterance can therefore be interpreted using various combinations of certain cues. They are associated with specific dialogue files with a certain probability. The source of knowledge, which makes an estimate about the act of dialogue, is based on the structure of the conversation, prosody and the lexical and syntactic surface structure of an utterance according to the cue model. The probabilities can then be calculated and estimated based on their occurrence in a corpus.

Dialog files of a certain utterance can be determined with the help of various machine learning methods. For example, they can be learned using hidden Markov models , neural networks or Bayesian classifiers . The procedure for classifying dialogue files is as follows:

First, the various cue combinations for the respective dialogue file are learned.
The system now receives an utterance as input and uses the learned model to return the most likely dialog act.

Often so-called N-gram models (uni-, bi-, tri-grams) are learned. An example of an N-gram (bi-gram) for the dialogue act reformulation on a lexical level looks like this:

Example utterance: "You mean"

W (“mean” | “you”), in this case you is the history of mean and W represents the probability of the occurrence of “mean” given “mean”.

The individual probabilities of the respective cues are offset against each other and thus the total maximum probability of the cue combinations is estimated and thus the most likely act of dialogue is determined. In order to determine an act of dialogue that follows another act of dialogue in a dialogue, the most likely act of dialogue that follows one or more preceding acts of dialogue is determined by means of N-grams over the act of dialogue.

Individual evidence

↑ ^a ^b ^c ^d ^e ^f Harry Bunt: "Context and Dialogue Control", 1994
↑ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Dan Jurafsky: "Pragmatics and Computational Linguistics", 2005
↑ Alexander Clark and Andrei Popescu-Belis: "Multi-level Dialogue Act Tags", 2004
↑ ^a ^b ^c ^d Mark G. Core and James F. Allen: "Coding Dialogs with the DAMSL Annotation Scheme", 1997
↑ ^a ^b ^c James Allen and Mark Core . Website for "Draft of DAMSL: Dialog Act Markup in Several Layers", 1997
↑ ^a ^b ^c ^d ^e , Nick Webb: "Cue-Based Dialogue Act", 2010

[bunt-1] ↑ ^a ^b ^c ^d ^e ^f Harry Bunt: "Context and Dialogue Control", 1994

[jurafsky-2] ↑ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Dan Jurafsky: "Pragmatics and Computational Linguistics", 2005

[clark-3] Alexander Clark and Andrei Popescu-Belis: "Multi-level Dialogue Act Tags", 2004

[core-4] Mark G. Core and James F. Allen: "Coding Dialogs with the DAMSL Annotation Scheme", 1997

[coreWEB-5] James Allen and Mark Core . Website for "Draft of DAMSL: Dialog Act Markup in Several Layers", 1997

[webb-6] , Nick Webb: "Cue-Based Dialogue Act", 2010