Enterprise Search

from Wikipedia, the free encyclopedia

Enterprise search or company- wide search denotes a sub-area of information retrieval and denotes the process of computer-aided content-oriented search with the help of an internal company search engine , which indexes content using so-called crawlers .

However, the search is usually not performed live on the original data sources, but on the search index. This index primarily contains internal data sources such as documents from various databases and entries from file systems.

Hits or documents found are displayed in the context of the search query as a text excerpt (“snippet”). This preview allows you to quickly assess the relevance of the results. The continuous indexing of the individual data sources ensures that the results (result set) are up to date.

From the point of view of companies, the benefit of Enterprise Search is to support employees in their search for work-relevant information.

functionality

In most cases, search engines consist of three main components: a crawling / indexing engine, a query engine and a ranking / relevancy engine.

The crawling / indexing engine takes care of the procurement of the documents and data from the sources and stores this information in an efficiently searchable structure. It also takes care of the creation of document caches that are used to display the document preview in the result view. The query engine searches the index for hits and creates a list of the results. The ranking / relevancy engine is responsible for sorting resp. Order of hits.

As a rule, a web browser is used as the interface and the results are displayed in a similar form to that of Internet search engines.

Interfaces

Many enterprise search manufacturers offer various adapters or connectors for widespread corporate applications in order to be able to display the content in the search solution. In addition to querying the customer database directly, plug-ins for group e-mail applications, content or document management systems are typical. Integration as a separate file system ( network drive ) is also often possible. “Federated Search” connectors are often used, which pass the search query on to a target system and then integrate the partial results obtained into the results.

Components

A general distinction is made between frontend and backend.

In addition to the individual connectors, the backend typically contains the crawler , indexer and parser for the search queries made by the various frontends. These queries are forwarded to the actual search engine, which provides the information from the indexed database.

In the front end there is generally greater freedom of design. It can simply be an input field, or it can offer more convenience, for example by suggesting a suspected typing error, displaying other related subject fields or navigating through a tag cloud or facet classification . The further and further restriction of the number of hits by adding further criteria to the search query or by choosing a sub-term (for example along a taxonomy tree ) is also referred to as drilldown . The formatting of the result (for example, division into different pages) is typically also done in the front end. The front end usually also includes all pure convenience functions such as the ability to save search queries and make them again later.

Comparison of corporate search and Internet search

In enterprise search as well as in internet search, basically similar techniques and algorithms are used. On the one hand, there are the crawlers. Another thing they have in common is the large indexes and the sorting of hits according to relevance.

The following differences exist:

safety
In order to protect information and data against unauthorized access, those responsible must release their data sources. Access to the information you are looking for must comply with the applicable company regulations and rules and data protection guidelines. Integrated rights management ensures that users in the company only find the data that they are allowed to access. This means that the authorization of the users of files and folders must be ensured within the company in order to prevent the misuse of data in the company and outside of it.
Link structure
The ranking is not influenced by the parameter "Number of links to a document". However, some applications and sources have their own indexes. In order to improve the performance of the search engines, you should use these indexes. This saves valuable process resources. In the absence of the possibility of determining the relevance of information based on links, the metadata concept is massively gaining importance in company searches .
swell
The searchable data comes not only from web servers , but from various other storage locations. These include network drives, intranets, applications, e-mail systems, local data and removable media such as B. USB sticks or CD-ROM drives.
Content
Contents are not optimized for indexing by a search engine resp. manipulated and there is no spam . This means that both structured and unstructured data are suitable for use.

Comparison between company search engine and database

In contrast to databases with the purpose of managing the structured content, search engines are mainly used to open up unstructured content. There is also a big difference with regard to the number of sources to be searched: Enterprise Search can search several different sources, whereas database queries are usually limited to one. The query language for search architectures is a lot simpler, since keywords can simply be entered and no database query languages ​​such as SQL are necessary. In addition to these aspects, search engines are many times faster; a query usually takes a maximum of one second, in contrast to complex database queries that can take several hours.

Current situation

In the latest update of their study The Diverse and Exploding Digital Universe, the market researchers from IDC forecast a veritable explosion in the amount of digital information and the variety of forms. The flood of digital information is currently growing by 60 percent annually. By 2011 it should reach around 1,800 exabytes (10 to the power of 18 bytes), which would be ten times as much as in 2006.

According to IDC, people are responsible for 70 percent of this data growth. Nevertheless, the IT departments of organizations and companies are involved in the storage, provision, transmission and data protection of around 85 percent of the resulting data. This rapidly growing and multifaceted flood of data confronts IT managers with an unprecedented level of complexity. In times of need, many companies try to keep the wild growth under control with uniform, central systems for data management and storage. According to Juergen Lange, however, DMS solutions reach their limits very quickly. The consequences are that it is becoming increasingly difficult for employees to get the information they need.

As a result, searching for and finding information is becoming a key factor of survival for companies. Compliance with safety-relevant regulations plays a key role in this. While this should be a matter of course for enterprise search solutions, the majority of the search engine software offered free of charge has gaps - after installation, such programs create a complete table of contents in a database on the computer in which they save data content and application behavior. These search engines then officially transmit the reports to the outside world.

Providers of such solutions assure that they do not transmit any personal data, but only movement and behavioral data, but the data protection guidelines that are used for this usually remain their secret. As a result, the security mechanisms of many companies are often ineffective after installation. Indexings that cover the first ten thousand words often reproduce complete content. Such knowledge outside the German or European legal area harbors an incalculable entrepreneurial risk potential; Theft of and trade in information is a lucrative market.

In Germany and Europe - compared to the USA - there is relatively little know-how and competence for enterprise search solutions. Only a few German companies and European research projects have mastered this key technology. Here politicians are called upon to support German SMEs. In addition, the case law must urge foreign providers to respect national and European data protection guidelines.

See also

literature

  • Martin White: Making Search Work. Implementing Web, Intranet and Enterprise Search. Facet Publishing, London 2007, ISBN 978-1-85604-602-2 .
  • Juergen Lange: Flood of data - curse or blessing? How to find information easily and securely with Enterprise Search. A strategic tool for companies. Frankfurter Allgemeine Buch, Frankfurt am Main 2009, ISBN 978-3-89981-196-4 .
  • Julian Bahrs: Enterprise Search - Search engines for company content. In: Dirk Lewandowski (ed.): Handbook Internet search engines. User orientation in science and practice. Akademische Verlagsgesellschaft AKA, 2009, ISBN 978-3-89838-607-4 , pp. 329-355, online version .

Web links

Individual evidence

  1. ^ Udo Kruschwitz, Charlie Hull: Searching the Enterprise . In: Foundations and Trends in Information Retrieval , 11, 2017, pp. 1–142, doi: 10.1561 / 1500000053
  2. The Diverse and Exploding Digital Universe ( Memento of the original from April 4, 2013 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. (PDF; 452 kB) @1@ 2Template: Webachiv / IABot / www.emc.com