Sequence pattern
A sequence pattern is the uniform sequence of elements in transactions . Finding sequence patterns is a method of data mining . To give an example, in transactions involving customer purchases, the following question is asked: “Which items are being bought one after the other?”. The sequence pattern is not to be confused with the association analysis, which asks the question: "Which items are bought together?".
For the examination of sequence patterns, the transaction database must contain not only the elements of the transaction, but also the transaction time and an association characteristic (e.g. customer number).
Principle of sequence pattern recognition
The algorithm for finding sequence patterns is structured as follows:
- Sorting the database
- Sorting according to the togetherness characteristic (e.g. customer number) as the primary and transaction time as the secondary key. Structure of the sequences sorted according to what they belong together
- Finding the frequent item sets
- Transformation of the database
- Only the frequent item quantities are assigned to the customers (only serves to increase efficiency).
- Find the sequence pattern
- Frequent item sets are combined to form sequence patterns and checked whether they reach the minimum support (analogous to finding association rules ). It must be noted that a found pattern is not contained in a longer one.
application areas
Bioinformatics : Protein Sequences in DNA Analysis. The DNA consists of four bases (A, C, G, T) and 20 amino acids. The task in many areas of bioinformatics is to find sequences of the same type as long as possible.
Web mining : sequence of visited websites. The sequence of the websites visited, which lead to a successful purchase in a shop or to a cancellation, can be used to improve the website.
swell
- Data Mining and Data Warehousing - Prof. Andreas Reber (PDF file; 27 kB)