Time Warp Edit Distance

from Wikipedia, the free encyclopedia

At Time Warp Edit Distance (TWED) is a distance measure between discrete time series . Compared to other distance measures (e.g. DTW , LCS ), TWED is a metric . In addition, the data points present in the measured time series do not necessarily have to be sampled at the same frequency , that is to say they do not necessarily have to be present at equidistant sampling times, which is, however, implicitly assumed with other distance measures. The TWED algorithm was developed in February 2009 by P.-F. Marteau published.

A typical problem when processing time series is determining the similarity of time series to one another, for example in the context of clustering or classification .

Similar time series can be recognized by humans just by looking. Humans recognize, for example, that two time series that they compare with one another can be arbitrarily similar, even if they are compressed, offset in time or have similar easily recognizable differences.

As an example of similar time series that are shifted relative to one another with respect to the time axis, consider two pedestrians. Both pedestrians cover the same distance and have the same speed. However, one pedestrian starts later than the other. While both of them cover this distance, the value of the altitude meters above sea level at which the pedestrians are should be measured at the same waypoints. After both have covered the distance, the measured values ​​of the altitude can be plotted against a timeline - this results in two time series. The time series of the pedestrian who started later is shifted compared to the time series of the other pedestrian.

As an example of similar time series, one of which is stretched in comparison to the other or the other is compressed in comparison to one, consider a pedestrian and a cyclist. Both cover the same distance, with the cyclist moving faster than the pedestrian. While they both cover this distance, the value of the altitude meters above sea level at which pedestrians or cyclists are located is measured at the same waypoints. After both have covered the distance, the measured values ​​of the altitude can be plotted against a time line - this results in two time series, whereby the time series of the pedestrian is compressed compared to the time series of the cyclist.

Further examples of time series that can be similar, apart from the time reference, are, for example, pressure curves when recording the writing pressure when a (any, but fixed) word is written. The writer can write at different speeds, but the characteristics of his handwriting effectively remain the same. The written words are therefore similar, but distorted over time in relation to the writing pressure. Examples of application can be found in the area of ​​biometrics using so-called signature pads .

A human observer would very likely recognize a great similarity between time series, but a computer not immediately. A computer is dependent on a distance measure in order to determine the distance (and thus to make a statement about a similarity, with "little distance" being associated with a high degree of similarity). This distance can be rigid or elastic. A rigid distance measure, such as the Euclidean distance or the Manhattan distance, could not ignore the differences between time series that result only from stretching or shifting with respect to the time axis - this means that along a time line, data points of the respective time series for each Point in time, assume a data point at the same point in time in the other time series for calculating the distance. If one of the time series is compressed, these rigid distance measures are not suitable for calculating the distance. For this reason, elastic spacing measures such as TWED exist. Described informally, TWED deletes data points from one of the time series in order to align them section by section with the other. The higher the cost of deleting or the further away supposedly suitable data points are (measured along the time axis), the more dissimilar are the time series. While techniques such as re-sampling can also be used to adjust time series of different lengths along the timeline , this additional workload is not required when using elastic distance measures.

TWED and other spacing measures

TWED uses components or concepts that can be found in this way or similar in other distance measures, especially in the Levenshtein distance , which, however, is not primarily intended for measuring the similarity of time series. Over the Levenshtein distance, TWED also has similarities to DTW.

Extensions and applications of TWED

GTWED kernel

Gaussian TWED is, even if the name suggests otherwise, not an "extension" of TWED, but GTWED is an application of TWED as a measure of distance in support vector machines . Here the Euclidean distance measure often used in the Gaussian kernel is replaced by TWED.

Medical applications

There are works in which it is presented how TWED can be used to compare time series recorded by pulse measurement.

Definition, pseudocode, implementation in different programming languages

definition


in which



The recursion is initialized as follows: with by convention.



Implementation in C

An implementation of the TWED algorithm in the C programming language can be downloaded from the author's homepage.

Implementation in Matlab

TWED algorithm:

function [ distance, DP ] = twed( A, timeSA, B, timeSB, lambda, nu )
% [distance, DP] = TWED( A, timeSA, B, timeSB, lambda, nu )
% Berechne Time Warp Edit Distance (TWED) für die Zeitreihen A und B
%
% A      := Werte der Zeitreihe A (e.g. [ 10 2 30 4])
% timeSA := Zeitstempel der Werte von Zeitreihe A (e.g. 1:4)
% B      := Werte der Zeitreihe B
% timeSB := Zeitstempel der Werte von Zeitreihe B
% lambda := Bestrafung für Abstände bei Löschoperationen
% nu     := Bestimmt die Elastizität; nu >=0 erforderlich für Distanzmaß.


    %Überprüfe, ob für jeden Zeitpunkt ein Zeitstempel existiert
    if length(A) ~= length(timeSA)
        warning('Die Länge von A ist ungleich der Länge timeSA')
        return
    end

    if length(B) ~= length(timeSB)
        warning('Die Länge von B ist ungleich der Länge timeSB')
        return
    end

    if nu<0
        warning('nu ist negativ')
        return
    end
    % Padding hinzufügen (Matlab unterstützt keine 0 Indizierung)
    A  = [ 0 A ];
    timeSA = [ 0 timeSA ];
    B  = [ 0 B ];
    timeSB = [ 0 timeSB ];

    % Dieser Algorithmus verwendet dynamische Programmierung
    DP = zeros(length(A),length(B));

    % Initialisiere die DP Matrix, setze die erste Zeile und Spalte auf
    % unendlich
    DP(1,:) = inf;
    DP(:,1) = inf;
    DP(1,1) = 0;

    % Die Länge der Zeitreihe A
    n = length(timeSA);
    % Die Länge der Zeitreihe B
    m = length(timeSB);
    % Berechnung der minimalen Kosten
    for i = 2:n
        for j = 2:m
            cost = Dlp(A(i), B(j));

            % Berechnen und Zwischenspeichern der Kosten der verschiedenen Operationen
            C = ones(3,1) * inf;

            % Löschen in A
            C(1) = DP(i-1,j) +  Dlp(A(i-1),A(i)) + nu * (timeSA(i) - timeSA(i-1)) + lambda;
            % Löschen in B
            C(2) = DP(i,j-1) + Dlp(B(j-1),B(j)) + nu * (timeSB(j) - timeSB(j-1)) + lambda;
            % Belassen von Datenpunkten in beiden Zeitreihen
            C(3) = DP(i-1,j-1) + Dlp(A(i),B(j)) + Dlp(A(i-1),B(j-1)) + ...
                   nu * ( abs( timeSA(i) - timeSB(j) ) + abs( timeSA(i-1) - timeSB(j-1) ) );

            % Wähle die Operation mit den minimalen Kosten und aktualisiere die DP Matrix
            DP(i,j) = min(C);
        end
    end

    % Die Distanz (die Kosten zur Anpassung der Zeitreihen aneinander) befindet sich im höchsten Matrixelement von DP
    distance = DP(n,m);

    % Funktion zur Berechnung des Euklidischen Abstands
    function [cost] = Dlp( A, B)
        cost = sqrt( sum( (A - B).^2 ,2));
    end

end

Backtracking to retrace the most cost-effective path:

function [ path ] = backtracking( DP )
% [ path ] = BACKTRACKING ( DP )
% Berechnung des kostengünstigsten Pfades
% DP := DP Matrix aus der TWED Funktion

    x = size(DP);
    i = x(1);
    j = x(2);

    % In path werden die Indizes des Pfades in umgekehrter Reihenfolge gespeichert
    path = ones(i + j, 2 ) * Inf;

    steps = 1;
    while( i ~= 1 || j ~= 1 )
        path(steps,:) = [i; j];

        C = ones(3,1) * inf;

        % Belassen von Datenpunkten in beiden Zeitreihen
        C(1) = DP(i-1,j-1);
        % Löschen in A
        C(2) = DP(i-1,j);
        % Löschen in B
        C(3) = DP(i,j-1);

        % Berechnung des Indexes mit den geringsten Kosten
        [~,idx] = min(C);

        switch idx
            case 1
                % Belassen von Datenpunkten in beiden Zeitreihen
                i = i-1;
                j = j-1;
            case 2
                % Löschen in A
                i = i-1;
                j = j;
            case 3
                % Löschen in B
                i = i;
                j = j-1;
        end
        steps = steps +1;
    end
    path(steps,:) = [i j];

    % Der Pfad wurde in umgekehrter Reihenfolge berechnet, deswegen muss er umgedreht werden
    path = path(1:steps,:);
    path = path(end:-1:1,:);

end

literature

  • P.-F. Marteau: Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching . In: IEEE Transactions on Pattern Analysis and Machine Intelligence . tape 31 , no. 2 , February 2009, p. 306-318 , doi : 10.1109 / TPAMI.2008.76 .

Individual evidence

  1. Dongyu Zhang: Time Series Classification Using Support Vector Machine with Gaussian Elastic Metric Kernel . In: Pattern Recognition (ICPR), 2010 20th International Conference on . August 2010, p. 29-32 , doi : 10.1109 / ICPR.2010.16 .
  2. Lei Liu, Wangmeng Zuo, Zhang Dongyu, Naimin Li Hongzhi Zhang: Classification of pulses Wrist Blood Flow Signal Using Time Warp Edit Distance . In: ICMB 2010, LNCS 6165 . S. 137-144 , doi : 10.1007 / 978-3-642-13923-9_14 .
  3. P.-F. Marteau: Homepage on the IRISA servers . Retrieved August 26, 2014.