Sequence pattern

from Wikipedia, the free encyclopedia

A sequence pattern is the uniform sequence of elements in transactions . Finding sequence patterns is a method of data mining . To give an example, in transactions involving customer purchases, the following question is asked: “Which items are being bought one after the other?”. The sequence pattern is not to be confused with the association analysis, which asks the question: "Which items are bought together?".

For the examination of sequence patterns, the transaction database must contain not only the elements of the transaction, but also the transaction time and an association characteristic (e.g. customer number).

Principle of sequence pattern recognition

The algorithm for finding sequence patterns is structured as follows:

  1. Sorting the database
    Sorting according to the togetherness characteristic (e.g. customer number) as the primary and transaction time as the secondary key. Structure of the sequences sorted according to what they belong together
  2. Finding the frequent item sets
  3. Transformation of the database
    Only the frequent item quantities are assigned to the customers (only serves to increase efficiency).
  4. Find the sequence pattern
    Frequent item sets are combined to form sequence patterns and checked whether they reach the minimum support (analogous to finding association rules ). It must be noted that a found pattern is not contained in a longer one.

application areas

Bioinformatics : Protein Sequences in DNA Analysis. The DNA consists of four bases (A, C, G, T) and 20 amino acids. The task in many areas of bioinformatics is to find sequences of the same type as long as possible.

Web mining : sequence of visited websites. The sequence of the websites visited, which lead to a successful purchase in a shop or to a cancellation, can be used to improve the website.