Geocoding

from Wikipedia, the free encyclopedia

The term geocoding stands for a (complex) process in which postal addresses are checked with the help of a reference database, improved if necessary and given a spatial reference. So-called geocoded addresses should be as complete and correct as possible in structure and location after the process and have geo-keys and / or x, y coordinates. Via geo keys and / or coordinates, postal addresses as well as their additional information (e.g. coming from the CRM) receive a direct (via x, y) and / or indirect spatial reference (geo key).

Geocoding is a central basis for enriching addresses with additional spatial data and for adding additional address information such as B. to be able to spatially evaluate purchase data.

introduction

Geocoding of addresses is also increasingly being used by non-geographic disciplines, as the need for (a) spatial analyzes of addresses (e.g. for customer density, distance to the nearest point, etc.) and / or (b) their additional information ( e.g. sales per customer) and / or (c) the enrichment of the addresses with additional space-related variables (e.g. building type, building density, neighborhood information) is steadily increasing. Because geocoding is a complex, error-prone process, this article serves as a scientific basis for the points to be (particularly) paid attention to in geocoding, here for Germany.

The input file

The addresses to be geocoded should be as current and complete as possible in the following address structure:

5-stellige Postleitzahl, Ort/Gemeinde, Straße, Hausnummer / Hausnummernzusatz

Beispiel: 51145, Köln, Planckstr. 14

Note: The lack of an address component can lead to geocoding errors, e.g. B. the lack of the house number addition. (see also quality of geocoding). On the geocoding of historical addresses, geocoding systems are i. d. Usually not designed because the geo-reference inventory against which geocoding is carried out is as up-to-date and precise as possible. B. does not include historical street names. For this reason, historical addresses should first be cleaned up by post before they can be geocoded.

The georeferencing file / database

During the geocoding process, the geocoding software compares incoming addresses (from the input file) with a geo-reference file (usually in a database) to which the software has access. The comparison is a logical search for the best possible 1: 1 hit from the input address to the geo-reference file. If possible, this contains all postal addresses at a specific point in time, i.e. H. all postcodes and postcodes at a specific point in time (postal territorial status) as well as all places and municipalities with districts, streets, house numbers and house number supplements at a specific time (official territorial status). Note: The postal area statuses differ from the official area statuses.

Furthermore, the geo-reference file contains the geo keys and geo-coordinates associated with the territories, which, in addition to address hits, are included in the output file. Standardized geo keys are the 5-digit postcode, which can be used to refer to the 5-digit postcode map, and the official municipality key, AGS for short. Both areas change during the year, which is why it is important to know during the geocoding process which sources (post office and / or office) are in the geo-reference file with which areas. The postal file, which in addition to the postcodes also includes postal locations and their streets with names, is available as a regional status during the year. The official structures including the (official) geo key of the districts and municipalities, their districts or city districts and quarters, their streets with names and buildings with addresses are i. d. Usually only available once a year.

The coordinates for each address in the georeferencing inventory are i. d. Usually from the land registry offices, the so-called house coordinates (HKs), alternatively partly also from private-sector providers such as Deutsche Telekom, the navigation providers and Google or by means of Open Street Map (OSM). Note: The house coordinates of the land registry offices are updated annually, but do not show a clear, uniform official area status. This must be created accordingly for the reference file, e.g. data status 10/18, area status 12/17. This also applies to the private sector providers and OSM

The output file

The output file of the geocoding should contain the incoming addresses and / or a unique ID, the address hits found (address / address match), the quality of the geocoding and the time of geocoding (time stamp) as well as metadata on the reference file.

Geocoding quality

The assessment of the quality with which addresses were geocoded (or not geocoded) is decisive for the qualitative classification of all subsequent processes. This includes (a) all subsequent spatial analyzes of the addresses and their additional information) and (b) the enrichment of the addresses with additional spatial data. (A) also includes the intersection with other geographic areas such as B. the geo-grid (e.g. INSPIRE 100 × 100 m).

The quality of a geocoding must provide information about the completeness and error rate of the input file (absolute and / or percentage), the hit probability for each address (input address to reference address), in the case of address non-hits, on which spatial level was assigned 'anyway' ( Example: House number was missing, but street center could be determined) and the quality of the location of the enriched coordinate (e.g. building entrance, building center, interpolated house coordinate, street section center) as well as information about the deviation of the assigned location to the building or the real address location.

The greatest influencing factors are the quality (up-to-dateness, completeness, correctness) of the input file and the quality of the geo-reference file (up-to-dateness, completeness, correctness). Furthermore, the geocoding logic with which the input file is compared with the georeference file plays a central role.

Geocoding software / systems

Special geocoding systems, also called geocoders for short, which are available as offline and online services, are used to geocode postal addresses. With online geocoding, an address is transmitted, which in turn is relevant to data protection.

The quality of the result of a geocoder depends largely on which reference file and which geocoding logic is used. That these differ from one geocoder to another (there is no standard here), different geocoders produce i. d. Usually with the same input file different output files of different quality.

Inverse address geocoding

Inverse geocoding of addresses (also known as reverse geocoding ) describes the opposite: using geocoordinates, the closest possible address is matched as best as possible. The inverse address geocoding plays due to the increasing number of GPS coordinates e.g. B. cell phones play an increasingly important role (see also)

See also

Individual evidence

  1. Markus Böhmer: Handbook Geomarketing . Ed .: Herter, Mühlbauer. Wichmann Verlag, 2007, ISBN 978-3-87907-453-2 , p. 127 f .
  2. Jens Gladis: Handbook Geomarketing 2nd edition . Ed .: Herter, Mühlbauer. Wichmann Verlag, 2018, ISBN 978-3-87907-653-6 , pp. 137 f .
  3. How app data becomes new target groups. December 13, 2016, accessed July 2, 2019 .