Environment for DeveLoping KDD-Applications Supported by Index-Structures

from Wikipedia, the free encyclopedia
Environment for DeveLoping KDD-Applications Supported by Index-Structures

ELKI Screenshot.jpg
ELKI 0.4 visualizes the OPTICS result
Basic data

Maintainer Technical university Dortmund
developer Ludwig-Maximilians-University Munich
Current  version 0.7.5
(February 15, 2019)
operating system Platform independent
programming language Java
category Data mining , research , math , statistics
License AGPL (from version 0.4.0 on)
https://elki-project.github.io/

E nvironment for Deve L oping K DD Applications Supported by I ndex-Structures ( ELKI ), in German as "environment for developing knowledge discovery applications using index structure support", a research project is the original on databases -Lehrstuhl by Professor Hans- Peter Kriegel at the Ludwig Maximilians University in Munich and is now being continued at the Technical University of Dortmund .

It is a modular software package (“ framework ”) written in Java for knowledge discovery in databases . The focus is on procedures for cluster analysis , outlier detection and the use of index structures in such procedures. As a university research project , the focus is on easy expandability, readability and use in research and teaching at the university, not at maximum speed or in integration with existing business intelligence applications. For example, none of the released versions has a database interface to existing industrial database systems, and using the software requires prior knowledge and reading the documentation. The target group for the project are researchers, students and software developers .

The modular architecture of the software allows numerous combinations of the algorithms , data types , distance measures and index structures it contains . When developing new processes or distances, the new module can therefore easily be combined with the existing modules and evaluated. The visualization modules often allow the results to be presented easily and thus compared. The development effort and the development time of such modules is considerably simplified by the reuse of existing program codes, so that the software can be used well as a basis for seminar, diploma and master theses.

Included algorithms

The following algorithms are included in ELKI (excerpt):

history

Version 0.1 (July 2008) already contained numerous algorithms from the areas of cluster analysis and outlier detection, as well as some index structures such as the R * tree . The focus of the first release was on subspace clustering methods.

Version 0.2 (July 2009) added functions for time series analysis , especially distance functions for this.

Version 0.3 (March 2010) expanded the selection of outlier detection algorithms and visualization modules.

Version 0.4 (August 2011) adds numerous methods for detecting spatial outliers in geospatial data .

Version 0.5 (April 2012) focuses on the evaluation of cluster analysis results, new visualizations and a few new algorithms.

Version 0.6 (June 2013 / January 2014) comes with an extension for 3D parallel coordinates and additional algorithms.

Version 0.7 (August 2015) adds unsafe data types and algorithms for unsafe data.

Version 0.7.5 (February 2019) adds additional clustering procedures, outlier methods, evaluation measures and index structures.

Awards

ELKI started as an implementation of Arthur Zimek's doctoral thesis, which won the Association for Computing Machinery’s “SIGKDD Doctoral Dissertation Award 2009 Runner-up” for its contributions to “Correlation Clustering”. The algorithms published in the course of the dissertation (4C, COPAC, HiCO, ERiC, CASH) together with a few precursors and comparison methods are available in ELKI.

The demonstration of version 0.4 at the conference “Symposium on Spatial and Temporal Databases 2011” with the geo-outlier extensions for ELKI won the “Best Demonstration Paper Award” of the conference.

Related applications

  • KNIME (Konstanz Information Miner) - Project of the University of Konstanz for interactive data analysis in Eclipse .
  • RapidMiner - a freely and commercially available application with a focus on machine learning .
  • Scikit-learn - Python project with methods from machine learning.
  • WEKA - a similar project from the University of Waikato, with a focus on classification algorithms.

Web links

Individual evidence

  1. ^ Hans-Peter Kriegel , Peer Kröger, Arthur Zimek: Outlier Detection Techniques . Tutorial. In: 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2009) . Bangkok, Thailand 2009 ( dbs.ifi.lmu.de [PDF; 1000 kB ; accessed on March 26, 2010]).
  2. ELKI references overview. Sources of literature on functionality in ELKI. Retrieved October 29, 2019 .
  3. Data Mining Algorithms in ELKI. List of implemented algorithms. Retrieved October 29, 2019 .
  4. a b Erich Schubert, Arthur Zimek: ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg" . February 10, 2019, arxiv : 1902.03616 [cs.LG] .
  5. Elke Achtert, Hans-Peter Kriegel , Arthur Zimek: ELKI: A Software System for Evaluation of subspace clustering Algorithms . In: Proceedings of the 20th international conference on Scientific and Statistical Database Management (SSDBM 08) . Springer-Verlag, Hong Kong, China 2008, doi : 10.1007 / 978-3-540-69497-7_41 ( dbs.ifi.lmu.de [PDF; 80 kB ]).
  6. Elke Achtert, Thomas Bernecker, Hans-Peter Kriegel , Erich Schubert, Arthur Zimek: ELKI in time: ELKI 0.2 for the performance evaluation of distance measures for time series . In: Proceedings of the 11th International Symposium on Advances in Spatial and Temporal Databases (SSTD 2010) . Springer-Verlag, Aalborg 2009, doi : 10.1007 / 978-3-642-02982-0_35 ( dbs.ifi.lmu.de [PDF; 230 kB ]).
  7. Elke Achtert, Hans-Peter Kriegel , Lisa Reichert, Erich Schubert, Remigius Wojdanowski, Arthur Zimek: Visual evaluation of Outlier Detection Models . In: 15th International Conference on Database Systems for Advanced Applications (DASFAA 2010) . Springer-Verlag, Tsukuba, Japan 2010, doi : 10.1007 / 978-3-642-12098-5_34 .
  8. a b Elke Achtert, Ahmed Hettab, Hans-Peter Kriegel , Erich Schubert, Arthur Zimek: Spatial Outlier Detection: Data, Algorithms, Visualizations . In: 12th International Symposium on Spatial and Temporal Databases (SSTD 2011) . Minneapolis MN 2011, doi : 10.1007 / 978-3-642-22922-0_41 .
  9. Elke Achtert, Sascha Goldhofer, Hans-Peter Kriegel , Erich Schubert, Arthur Zimek: Evaluation of clustering Metrics and Visual Support . In: 28th International Conference on Data Engineering (ICDE) . Washington DC 2012, doi : 10.1109 / ICDE.2012.128 .
  10. Elke Achtert, Hans-Peter Kriegel , Erich Schubert, Arthur Zimek: Interactive Data Mining with 3D Parallel Coordinate Trees . In: Proceedings of the ACM International Conference on Management of Data (ACM SIGMOD) . New York City 2013, doi : 10.1145 / 2463676.2463696 .
  11. Erich Schubert, Alexander Koos, Tobias Emrich, Andreas Züfle, Klaus Arthur Schmid, Arthur Zimek: A Framework for Clustering Uncertain Data . In: Proceedings of the VLDB Endowment . tape 8 , no. 12 , 2015, p. 1976–1987 ( vldb.org [PDF]).
  12. a b Arthur Zimek: Correlation clustering . In: ACM SIGKDD (Ed.): ACM SIGKDD Explorations Newsletter . tape 11 , no. 1 , 2009, p. 53-54 , doi : 10.1145 / 1656274.1656286 .
  13. Arthur Zimek: Correlation Clustering . PhD thesis. Ludwig Maximilians University Munich , Munich 2008, urn : nbn: de: bvb: 19-87361 ( edoc.ub.uni-muenchen.de [PDF]).
  14. SIGKDD Doctoral Dissertation Award. (No longer available online.) ACM SIGKDD, archived from the original on November 29, 2010 ; accessed on April 16, 2011 .