Gesture recognition

from Wikipedia, the free encyclopedia

Gesture recognition is the automatic recognition of gestures made by humans using a computer . A branch of computer science deals with algorithms and mathematical methods for recognizing gestures and the use of gestures for human-computer interaction . Every posture and body movement can in principle represent a gesture. However, the recognition of hand and head gestures is of greatest importance. A variant of gesture recognition is the recognition of so-called mouse gestures .

definition

With reference to human-computer interaction , Kurtenbach and Hulteen define a gesture as follows: “A gesture is a motion of the body that contains information. Waving goodbye is a gesture. Pressing a key on a keyboard is not a gesture because the motion of a finger on its way to hitting a key is neither observed nor significant. All that matters is which key was pressed " . In contrast, Harling and Edwards forego the requirement for movement and understand a gesture to be static hand positions. A distinction can be made between systems in which the sensors necessary for detection are located directly on the body of the user, and those in which the user is observed by external sensors.

Gesture recognition is an active field of research that tries to integrate gestures into human-computer interaction . It has applications in the control of virtual environments, but also in the translation of sign languages , the remote control of robots or musical compositions .

Recognizing human gestures falls within the more general framework of pattern recognition . In this context, systems consist of two processes: the representation process and the decision-making process. The presentation process converts the raw numerical data into a form that is adapted to the decision-making process and then classifies the data .

Gesture recognition systems inherit this structure and have two additional processes: the acquisition process, which converts the physical gesture into numerical data , and the interpretation process, which indicates the meaning of the series of symbols resulting from the decision-making process.

Hand and arm gestures are most commonly interpreted. They typically consist of four elements: hand configuration, movement, orientation and position. A rough classification of gestures can also be made by separating the static gestures, which are referred to as hand postures, and the dynamic gestures, which are sequences of hand postures.

Two main families of gesture detection systems can be considered, systems with and without aids on the body. In systems with body aids, gestures are recorded by additional devices (sensor gloves, exoskeletons, markers) that directly measure some properties of the gesture, generally the various joint bending angles. In auxiliary-free systems, the gesture is recorded from a distance by a sensor (camera, ultrasound). The main advantage of the remote approach is its non-limiting nature. It enables the user to carry out a gesture spontaneously without any prior set-up effort. The main disadvantages are the increased complexity of the processing, as well as the limitation of the detection area. Resource-based methods, on the other hand, are faster and more robust.

Gesture recognition with aids on the body

Most systems based on sensors worn on the body or hand-operated use acceleration or position sensors integrated in data gloves . The disadvantage of systems based on data gloves is that the user has to put on the glove in order to use the system.

Hand-operated systems, such as the Nintendo Wii controller and the BlueWand manufactured by BeeCon, can also be used for gesture input. Both systems can be picked up by the user and have acceleration sensors to determine the movement of the respective device.

Newer devices such as smartphones and tablet computers mainly use touchscreens that can be used by swiping gestures. In particular, multi-touch screens enable the recognition of several independent finger presses at the same time, so that, for example, two diagonally positioned fingertips can be used to make windows larger or smaller.

Gesture recognition without aids on the body

Systems with external sensors are mostly camera-based systems. The cameras are used to take pictures of the user. There are systems with one camera as well as with several cameras, the newer systems often working with 3D data that work either via time-of-flight cameras or so-called structured light cameras. Camera-based processes use 2D and 3D image analysis techniques to identify the user's posture. Camera-based gesture recognition is used, for example, in games for the EyeToy that can be connected to game consoles . A completely new approach is gesture control via stereoscopy . The advantage is that it works without infrared light and therefore also works outdoors.

When it comes to technical image analysis, a distinction must be made between several approaches: Either a database is created with relevant gestures that have been created on the basis of a meridian of over 1,000 video analyzes per gesture. Recorded control gestures are then compared with the database and determined accordingly. This solution is used, for example, by Microsoft with the Xbox in conjunction with the Kinect 3D camera . The analysis can be carried out in two-dimensional space using image and video information. In three-dimensional space one speaks of volumetric calculation, for example bodies are represented by NURBS or polygons. A calculation of 3D data in real time is still being developed. The disadvantage of this database-based analysis is that it requires a lot of computing power with the database. Alternatively, the software works with a real skeleton recognition, i.e. H. The body, hand and / or fingers are recognized from the camera data and assigned to the predefined gestures using a simplified skeleton model. This solution promises a much greater variety of gestures and precision, but is technically much more demanding.

The aim of research and development in the coming years is to implement gesture recognition in the context of embedded software , which is platform and camera independent and requires little energy, so it can also be used in cell phones, tablets or navigation systems , for example.

In 2012 a number of commercial providers announced that they wanted to come onto the market with devices for gesture recognition, which should be significantly better than the currently available devices (especially the Kinect for the Xbox ). For example, Samsung presented the Smart TV at CES 2012 in Las Vegas. Another company is LeapMotion, whereby the promotional video for The Leap was criticized in the community because scenes that were sometimes obviously made were recorded. In Germany, gesture control is particularly a topic in the automotive industry, where particularly stable and mobile systems are required, such as those manufactured by gestigon, who are also working on an embedded solution. 3D gesture recognition is also popular in the areas of digital signage, media technology, media art and performance. A simple way to use gesture recognition in these areas and e.g. B. Controlling other software is Kinetic Space. Other manufacturers include Omek, Softkinetic and Myestro Interactive.

Gesture types

The letter "J" in a Canadian sign language

A distinction can be made between two types of gestures. With continuous gestures there is a direct connection between the movement observed by the computer and a state in the computer. For example, a pointer can be controlled by pointing at the screen. In discrete gestures are, however, limited to amounts of unique gestures with which usually each action is associated. An example of discrete gestures is the sign language , wherein each gesture is associated with a particular meaning. For touch-sensitive screens ( touchscreens ), on the other hand, simple finger movements, such as pinching (pinch gesture) or spreading (spreading gesture) with two fingers, are common.

recognition

During the actual recognition of gestures, the information from the sensors flows into algorithms that analyze the raw data and recognize gestures. Algorithms are used for pattern recognition . In order to remove noise in the input data and to reduce data, the first step is often to preprocess the sensor data. Features are then extracted from the input data. These characteristics serve as input for the classification. Hidden Markov models , artificial neural networks and other techniques, which mostly have their origin in research on artificial intelligence , are often used for this.

Individual evidence

  1. Kurt Bach G. and Hulteen EA "Gestures in Human-Computer Communication". In: The Art of Human-Computer Interface Design. Pp. 309-317, 1990
  2. ^ PA Harling and ADN Edwards. 1997. "Hand Tension as a Gesture Segmentation Cue." In: Progress in Gestural Interaction, pp. 75–88.
  3. ^ ScienceDirect: Gesture Recognition
  4. ^ Fuhrmann T., Klein M. and Odendahl M. "The BlueWand as interface for ubiquitous and wearable computing environments". In: Proceedings of the European Conference on Personal Mobile Communications. pp. 91-95, 2003.
  5. Pavlovic VI, Sharma R. and Huang TS "Visual interpretation of hand gestures for human-computer interaction: a review". IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 19, pp. 677-695, 1997.
  6. Vladimir I. Pavlovic, Rajeev Sharma, Thomas S. Huang, Visual Interpretation of Hand Gestures for Human-Computer Interaction ; A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997.
  7. Huang CL and Huang WY, “Sign language recognition using model-based tracking and a 3D Hopfield neural network,” Machine Vision and Applications, vol. 10, pp. 292-307, 1998.

Web links