Knowledge Discovery in Databases

from Wikipedia, the free encyclopedia

Knowledge Discovery in Databases ( KDD ), in German knowledge discovery in databases , complements the often interchangeably used data mining to preliminary investigations and be evaluated transforms data .

The aim of the KDD is to identify previously unknown technical relationships from existing, mostly large databases. In contrast to data mining, KDD as an overall process also includes the preparation of the data and the evaluation of the results.

The term KDD was coined in scientific circles by Gregory Piatetsky-Shapiro , while in practice the term data mining is more common, but traditionally has negative connotations in statistics.

The sub-steps of the KDD process are:

  1. Provision of background knowledge for the respective specialist area
  2. Definition of the goals of knowledge acquisition
  3. Data selection
  4. Data cleansing
  5. Data reduction (e.g. through transformations )
  6. Selection of a model in which the knowledge found is to be represented
  7. Data mining , the actual data analysis
  8. Interpretation of the knowledge gained

These steps are usually run through several times. A common procedure model is the Cross-Industry Standard Process for Data Mining (CRISP-DM) .

software

  • ELKI is a research project of the Ludwig Maximilians University in Munich that contains numerous data mining algorithms (mainly for cluster analysis and outlier detection, but also index structures ) for use in teaching and research.
  • KNIME is a freely available open source tool for interactive data analysis and data mining .
  • RapidMiner is a freely available open source tool for machine learning , data mining and predictive analytics that supports all steps of the knowledge discovery process from data selection, data cleansing, data reduction and transformation, through modeling and validation, to visualization and deployment.
  • Splunk is a software platform for text data. The free version is limited to indexing 500 MB per day, and essential analyzes such as cluster analysis are reserved for the commercial version.
  • Weka is an open source tool that was developed by the University of Waikato. It contains an extensive collection of algorithms for knowledge discovery in databases.
  • Wolfram Alpha is a knowledge database that can be used free of charge and with which some data analysis is also possible.

literature

  • Martin Ester, Jörg Sander: Knowledge Discovery in Databases: Techniques and Applications . Springer, Berlin 2000, ISBN 3-540-67328-8 .
  • Fayyad, Usama; Piatetsky-Shapiro, Gregory and Smyth Padhraic (1996), From Data Mining to Knowledge Discovery in Databases , AI Magazine, American Association for Artificial Intelligence, California, USA, pages 37-54.
  • Alpar, Paul and Niederreichholz, Joachim (2000), Data Mining in Practice: Procedures and Use Cases for Marketing, Sales, Controlling and Customer Support , Vieweg Verlag, Wiesbaden, Germany.