Desktop search

A desktop search , including a file search , searches the entire PC , i.e. hard drives and SSDs . With regard to mobile devices , one can also speak of file system search more generally . In computer networks , so the Mitdurchsuchen network drives and file servers , it is said network search (network-wide search). All graphic operating systems contain a desktop search; there are also special desktop search programs .

On the other hand, there are search engines like Google , Bing or Yahoo , which are used to search the Internet .

"Classic" search and indexed search

A distinction is made between the classic search, which searches the file system directly, and an index-based search, in which the entire database of the computer is read in, filtered, indexed and stored in a database at regular intervals . When searching, only the database is searched, not the entire file system, which means that hits can be displayed in almost real time.

The forerunner and pioneer of index-based desktop search systems is the Unix tool locate . This classic command line program searches an index of the file system that is updated at regular intervals by a cron job. Originally, locate only supported searching for file names. Over time, however, it has been expanded more and more, including regular expressions . Even today (2011) it is an important helper. Compared to modern desktop searches, however, locate has a decisive disadvantage: It only refers to file names, not to their context, file types or content (pure file search ). Locate also does not react in real time to changes in the file system, but only after the file system index has been updated. This means that the content returned by locate may no longer be up-to-date.

Searching the hard drive again for every search query was very common in the past, for example on Unixen at find . The search request creates a very high system load because, depending on the search, each file name, and sometimes the content of each file, must be compared with the specified search pattern. This high load, especially with computers that are independent of fixed power sources, as well as the waiting time to be incurred are hardly acceptable under application conditions, since the search is usually limited to a defined group of files (e.g. all IRC protocols). In addition, this type of search is quite unspecific and can either not be expanded at all or only with knowledge beyond the beginner level (e.g. the obvious combination of grep with find ). With grep under Unix, find under IBM / Windows and similar programs, a real content-based search ( full text search ) was created, albeit limited to largely pure text files .

Therefore, specialized desktop search systems have been developed. They automatically index a predefined group of files (for example all files in the home directory of the respective user or all files in the file system that the user can read), filter, analyze and sort their contents according to their respective type (for example OpenDocument , PDF , image files , E Emails etc.) and store the results in their index. The index is informed about changed and newly created files either by the programs themselves or by monitoring the file system (for example via Inotify).

The user then queries the index using functions integrated in programs or offered in individual programs. Internally, access often takes place using a uniform, standardized database interface ( e.g. SQL ), some of which is also made directly available to the user. This very highly specifiable (for example through regular expressions) and fast (apparently almost instantaneous) search represents a considerable gain in productivity and time. Integrated into the normal file dialogs, files can be found and selected quickly and efficiently using search results without Spending a long time navigating the file system.

Indexing external storage locations in distributed systems is, because they may not be permanently available, more time-consuming with regard to synchronization , and also requires additional rights allocation (enabling for indexing by other users). Here offer proxy server -based to centralized Web solutions. The same applies to web-based external data storage ( cloud computing ).

Types of desktop search

System-wide search: Tools of this type are for example Lookeen Search , Superior Search , Strigi , Meta Tracker , Beagle , Exalead , xfriend , Copernic Desktop Search , formerly Google Desktop , Windows Search , Yahoo Desktop Search , Spotlight or free open source software such as Open Semantic Search, Lucene or Recoll. This type of search is intended to provide the most comprehensive and universal representation of the files in a system. Support for Google Desktop was discontinued in 2011, so there are no longer any plugins available.
Specific search: Small search fields integrated in the user applications are used to search through lists, collections, databases etc. ( search function ) . These search types are being integrated more and more into the system-wide search applications.
The absolute search: Every piece of information that can be interpreted by a person is shown in the search. Every application that integrates its information into the search provides an interface to the search application. This method can also be integrated directly into the file system layer of an operating system in order to provide even more intuitive access and to enable associative file management . In terms of functionality, associative file management often overlaps with desktop search, and it is not uncommon for both technologies to go hand in hand. Current examples of this method include:

Search functions in KDE , which have grown together more and more since the development of version 4.0. Numerous search functions and interfaces have been developed for KDE, which tend to cooperate better and better. An example of this is the close connection between the catapult and the cat and beagle, and applications such as Amarok .
Search functions in BeOS / Zeta , where both an abstraction and convenient access occur via associative file management.

Opportunities and dangers

The central storage and mapping of the data of a computer enables a much more intuitive finding and new ways of dealing with the data. However, it also harbors dangers: the data is stored centrally, according to the wishes of some providers (e.g. Google ) even centrally on servers on the Internet. This offers potential for abuse for industrial espionage, crackers and governments, for whom such a central data pool represents a tempting opportunity to investigate crimes, but can also be misused for preventive surveillance of individuals.

functionality

The basis of all of these systems is the use of a database to store the meta information, as well as the provision of suitable interfaces / APIs for access to this database. The processing of the information in this database for the user can then take place in various ways.

Representation via search programs

In terms of programming, this method is simple and robust. The search is carried out using your own user programs. No serious interventions in the functionality of the underlying operating system are necessary, but accordingly there is no intuitive integration, for example in file dialogs and the like.

Examples of this method include Open Semantic Desktop Search, Recoll, Lookeen , Google Desktop Search , regain , DocFetcher , xfriend , Windows Search , Copernic Desktop Search , Yahoo Desktop Search , exalead desktop free or Beagle (in its basic configuration).

Integration into the file system layer (associative file management)

Currently, developments are tending towards associative file management , representation via the file system layer / the virtual file system (VFS) of the operating system. This can be done in a number of ways, each of which can have certain advantages and disadvantages. Such an integration enables a particularly intuitive operation, since in the ideal case the user does not have to differentiate between found and hierarchically stored data.

A mature development of this kind can also be found in the BeOS -based operating systems, where it was placed on the BFS file system over the operating system's VFS. It offers a very fast, user-friendly desktop search from all applications and has proven to be extremely stable.

The large free desktop environments KDE and Gnome , but also the Finder in Apple's macOS , have long been offering comparable services via their virtual file system. The referencing of files on the basis of their meta-information seems to be an essential approach in the future and could, especially for inexperienced users, lead to a considerable relief in dealing with computers . The integration of these functions via such a VFS offers numerous advantages for developers, but it also requires that applications that want to use these functions directly are not allowed to access the file system directly, but must do so via the corresponding system libraries using the VFS . However, there are now approaches that integrate these VFS systems directly into the file system, under Linux for example via the FUSE file system.

Microsoft worked at times on WinFS , an SQL -based file system layer that would have been ideal for a system-wide desktop search. However, such a deep integration into the Windows operating system VFS, as Microsoft had intended, led to unexpected problems. Microsoft announced that the project would be discontinued in June 2006.

Search engine worked similarly, but avoided some potential problems by displaying its data in the form of a network drive. The company no longer exists.

Individual evidence

↑ Open Semantic Search Engine
↑ Recoll Desktop Search
↑ 7.9.2011 Google Desktop: Search tool for PCs is discontinued ( Memento from October 24, 2012 in the Internet Archive )
↑ Open Source Desktop Search Engine
↑ Recoll Desktop Search
↑ Lookeen Desktop Search
↑ regain
↑ DocFetcher

[1] Open Semantic Search Engine

[2] Recoll Desktop Search

[3] 7.9.2011 Google Desktop: Search tool for PCs is discontinued ( Memento from October 24, 2012 in the Internet Archive )

[4] Open Source Desktop Search Engine

[5] Recoll Desktop Search

[6] Lookeen Desktop Search

[7] regain

[8] DocFetcher