Checkmk

from Wikipedia, the free encyclopedia
Checkmk
Basic data

developer tribe29 GmbH (former Mathias Kettner GmbH)
Publishing year 2008
Current  version 1.6.0p6
( November 11, 2019 )
operating system Linux
programming language Python, C ++
category IT infrastructure monitoring
License GPLv2 and other open source licenses, Checkmk Enterprise License
German speaking Yes
https://checkmk.de

Checkmk ( spelling : checkmk ) is software developed in Python and C ++ for monitoring IT infrastructure . It is used to monitor servers , networks , applications, public clouds , containers , storage, databases and environmental sensors.

Checkmk is available in three editions: an Open Source Edition ("Checkmk Raw Edition - CRE"), a commercial Enterprise Edition ("Checkmk Enterprise Edition - CEE") and a commercial edition for managed services providers ("Checkmk Managed Services Edition - CME "). These Checkmk editions are available for a number of platforms , in particular for different versions of Debian , Ubuntu , SLES and RedHat / CentOS , as well as a Docker image. In addition, physical appliances of different sizes as well as a virtual appliance are offered to simplify the administration of the underlying operating system through a graphical interface and to enable high availability solutions.

The agents that Checkmk uses to collect data on many systems are available for 11 platforms, including Windows .

history

Checkmk was created in 2008 as an agent- replacing shell script for Inetd and was published in April 2009 under the GPL . It was originally based on Nagios and expanded this with several components. The open source variant is still based on a Nagios core and bundles it with other open source components in one system.

For many years now, the commercial editions of Checkmk have developed into a completely independent monitoring system that has replaced all essential components with its own - including a specially developed monitoring core. Most of the developments for the commercial variants, especially all plug-ins , are also incorporated into the Checkmk Raw Edition.

While Checkmk was previously designed specifically for monitoring large and heterogeneous on-premise environments, from version 1.5+ (1.5p12) it also supports the monitoring of AWS , Azure , Docker and Kubernetes services.

Checkmk is developed by tribe29 GmbH from Munich , which operated under the name Mathias Kettner GmbH until April 16, 2019. In the course of the name change, the old spelling “Check_MK” was changed to “Checkmk”.

Tribe29 GmbH follows an open core business model . The Open Source Edition is available under various open source licenses - mostly GPLv2 - while large parts of the commercial editions run under the proprietary 'Checkmk Enterprise License'.

product

Checkmk combines three types of IT monitoring:

  • Status-based monitoring, which records the status ("health") of a device or an application using threshold values.
  • The metric-based monitoring that enables the recording and analysis of time series graphs. The CEE has its own HTML5- based graphing system for this purpose, as well as integration with Grafana.
  • And log- and event-based monitoring, in which important events can be filtered out and actions can be triggered based on them.

In order to guarantee a very broad monitoring, Checkmk currently brings 1700 own plug-ins with each edition, all of which are licensed under GPLv2. These plug-ins are maintained as part of the product and are regularly supplemented with further plug-ins or extensions to existing plug-ins. It is also possible to connect existing legacy Nagios plug-ins.

In order to significantly simplify set-up and operation, all Checkmk components are delivered fully integrated. A rule-based 1: n configuration and a high degree of automation significantly accelerate work processes. The automations include:

  • Automated host detection (optional, where relevant)
  • Automated services detection
  • Automated configuration of many plugins using preconfigured threshold values ​​and rules
  • Automatic agent updates (a CEE feature)
  • Automatic and dynamic configuration that enables the inclusion of volatile services with a lifespan of a few seconds, as in the Kubernetes environment (from CEE v1.6)
  • Automatic discovery of tags & labels from sources such as Kubernetes, AWS and Azure (from CEE v1.6)

There are also playbooks for the use of configuration and deployment tools such as Ansible or Salt .

Checkmk is often used in very large, distributed environments in which a large number of locations (e.g. 300 Faurecia locations ) and / or well over 100,000 devices (e.g. Edeka ) are monitored. This is made possible, among other things, by the fact that Checkmk's Microcore has a much lower CPU resource consumption than e.g. B. Nagios and thus offers a significantly higher performance on the same hardware. Furthermore, the non-persistent data is kept in-memory in the RAM , which significantly improves the access time.

Components

Monitoring core ("Checkmk Microcore - CMC")

The commercial variants of Checkmk use their own monitoring core written in C ++. This has a significantly higher performance than a Nagios core. In addition, from version 1.6 onwards, it allows the dynamic recording of short-lived objects, such as B. Containers, since in contrast to the Nagios kernel no restart is necessary when the configuration changes. The open source variant "Checkmk Raw Edition" currently still uses a Nagios core.

Configuration & Check Engine

Checkmk offers an independent service detection and setting generation. In performing the tests (English checks ) Checkmk works in a unique way. In the test period, each host is (English hosts ) only contacted once. The test results are transmitted to the monitoring core as passive checks . This significantly improves the performance on the monitoring server as well as on the hosts to be monitored. Checkmk uses different methods to access the data of the target systems. These include agents that are installed on the target system, "Special Agents", which run on the Monitoring Server and communicate with the API of the target system, the SNMP - API for for monitoring. B. Network devices and printers and HTTP / TCP protocols to communicate with web services and Internet services. By default, Checkmk follows the "pull principle", i. H. the data is explicitly queried by the monitoring system in order to quickly identify should a system suddenly fail and no longer respond to a "pull". Alternatively, a “push” can also be configured, on which the system transfers its data directly to Checkmk or an intermediate host.

Data interface ("Livestatus")

The live status is the most important interface in Checkmk. It provides live access to all data from the monitored hosts and services. The data are fetched directly from the RAM, which avoids slow hard disk access and gives quick access to the monitoring without overloading the system. Access is via a simple protocol and is possible from all programming languages without a special library.

Web GUI ("Multisite")

Multisite is Checkmk's web GUI. In addition to a quick page structure, it offers user-definable views and dashboards, distributed monitoring by integrating several monitoring instances via Livestatus, integration of NagVis , an integrated LDAP connection, access to status data via web service and much more. Dashboards and views can be differentiated according to users or user groups, e.g. B. vSphere- specific views for VMware admins. The web GUI currently exists in German and English.

Web Administration ("WATO")

The web administration tool makes a system based on Checkmk completely administrable via the browser . This also includes the administration of users, roles, groups, time periods, and more. In this way, granular authorizations can be assigned using a role concept. Existing role-based access controls (LDAP, AD) can be used for this. The WATO works rule-based, so that the configuration remains intuitive even in complex environments and the necessary configuration effort is low. Automatic discovery and configuration, as well as automated agent updates, further accelerate configuration. CMDBs can also be integrated via an HTTP API for accelerated configuration .

Notification system

Several notification channels can be set up for each user and configured differently based on rules. So z. For example, e-mails can be triggered all day long, but SMS can only be triggered for important problems during standby times - without having to create multiple artificial users. The notifications can go to all or to specific teams, e.g. B. only to the storage admins in the event of a failed hard drive. Duplicate notifications are grouped together so that no user is notified twice via a specific channel. Furthermore, users can configure their notifications themselves. Alert messages can be managed centrally in distributed environments. Actions can be triggered automatically for identified problems (alarm control), e.g. B. via scripts. Integrations to email and SMS gateways as well as to communication and IT service management solutions such as Slack , Jira , PagerDuty, OpsGenie, VictorOps and ServiceNow are delivered directly with Checkmk .

Business intelligence

The BI module is integrated into the graphical user interface. Based on rules, it aggregates the overall status of business processes, their dependency on complex applications and their dependency on IT infrastructure elements from many individual status data from hosts and services. It can also be used to display applications that are composed of microservices , which in turn consist of Kubernetes pods and deployments. Furthermore, worst-case scenarios can be simulated in real time and historical conditions can be analyzed in order to understand the causes of deteriorated performance.

Event Console

The Event Console integrates the processing of log messages and SNMP traps into the monitoring. It is configured using a flexible set of rules and decides whether the incoming messages are discarded or how they are classified. She can count, correlate, expect messages, rewrite messages and more. Identical entries can be combined into a single event (e.g. several failed logins) in order to maintain an overview of the events. It also has a built-in syslog daemon that receives messages directly on port 514 and an SNMP trap receiver that receives traps on port 162.

Metric graphing

The commercial Checkmk editions use their own metric and graphing system. With this, time series metrics can be analyzed over long periods of time with interactive HTML5 graphs. The maximum resolution is one second. Data can be imported from a variety of data sources and formats for metrics ( JSON , XML , SNMP, etc.) and stored in long-term data storage that saves disk space.

Alternatively, Graphite or InfluxDB can be connected via an export interface . From version CEE 1.5p16 there is also a data source plug-in for Grafana to integrate data directly from Checkmk into Grafana for visualization. The Checkmk Raw Edition currently continues to use PNP4Nagios as a graphing system.

Reporting

The reporting enables the direct provision of PDF reports, ad-hoc or automated at regular intervals. It includes an availability analysis in which the history of the states can be made available with one click over any period of time with calculation of the availability. Availability calculations can exclude non-monitored times, adjust the resolution or ignore short intervals. In addition to the availability calculations, the reporting also includes SLA reporting, in which complex SLAs can be monitored. Reporting is only available in the commercial versions of Checkmk.

Hardware-software inventory

The hardware-software inventory determines the inventory of all hardware and software that are installed on devices and systems. This z. B. monitor changes to the hardware and software, check the existence of installed security updates and static data through dynamic parameters, such as. B. update the current hard disk usage via monitoring data. There is a deep integration with the Configuration Management Database (CMDB) i-doit , which enables the exchange of CMDB data with monitoring data.

See also

Individual evidence

  1. github.com .
  2. a b Checkmk EULA . tribe29 GmbH. Accessed May 31, 2019.
  3. Use cases . tribe29 GmbH. Retrieved June 15, 2019.
  4. Checkmk editions . tribe29 GmbH. Retrieved November 27, 2015.
  5. Download version . tribe29 GmbH. Retrieved July 10, 2019.
  6. Monitoring Agents . tribe29 GmbH. Retrieved June 12, 2019.
  7. Mathias Kettner (check_mk) . In: Meet The Community . Nagios Enterprises. August 17, 2009. Archived from the original on January 6, 2012. Retrieved November 27, 2015.
  8. Götz Rieger: Simply Nagios - network monitoring with OMD and Check_MK . In: c't . November 3, 2012, p. 190 ( Introduction online [accessed November 27, 2015]).
  9. Mathias Huber: Nagios extension Check_mk in version 1.1.10 . Linux magazine . March 9, 2011. Retrieved November 27, 2015.
  10. Peter Siering: Check_MK monitoring system in a fresh version 1.4.0 . Heise online 05/2017. May 31, 2017. Accessed May 31, 2017.
  11. ^ Mathias Kettner: The Check_MK Micro Core. Retrieved June 14, 2018 .
  12. Checkmk community announcement 1.5 Plus (1.5.p12) . tribe29 GmbH. February 17, 2019. Retrieved July 11, 2019.
  13. tribe29 - Our Story . tribe29 GmbH. Retrieved June 14, 2019.
  14. Grafana Data Source Plugin. April 17, 2019, accessed on July 9, 2019 .
  15. Automated service detection . tribe29 GmbH. Retrieved February 17, 2017.
  16. Monitoring dynamic environments . tribe29 GmbH. Retrieved May 7, 2019.
  17. Ansible integration with Checkmk. May 1, 2019, accessed May 8, 2019 .
  18. Salt integration with Checkmk. May 2, 2019, accessed May 9, 2019 .
  19. Checkmk worldwide at Faurecia. October 23, 2018, accessed October 23, 2018 .
  20. EDEKA lecture. May 12, 2017. Retrieved May 12, 2017 .
  21. Heike Jurzik, Marcel Arentz: vSphere Monitoring with Checkmk. Linux-Magazin , July 1, 2019, accessed July 2, 2019 .