Full text research
A full-text search (often also full-text search ) is the finding of words or groups of words in a large number of identical or different files on a computer, a server and / or on the Internet. The search areas - mostly texts - are indexed beforehand with the appropriate program-internal or independent index tools .
Full-text research in connection with full-text indexing is increasingly used to obtain information and to find it from known as well as unknown (but available on the media) documents (see also Google ). The full-text search serves to find, discover and extract unknown, non-trivial and important information from large amounts of unstructured texts / files and is therefore also an important part of text mining . It is an immediate solution for a specific question when systems such as document management and data mining are not available.
In the context of databases , full-text search means that in addition to an SQL query that is otherwise used , which requires knowledge of the field structure, searches can also be carried out independently of the field.
The full-text search appeared in the mid- 1970s . Traditional systems are used in which a person keywords in the later-find text or meta files record had (intellectual systems of order ). In databases, certain fields were assigned an index that could be searched more quickly. If necessary, the database model was adapted accordingly. However, these procedures were for the most part no longer feasible for many areas, since such costly and time-consuming work tends to work poorly for larger databases. Among other things, the search engine Yahoo failed with such an approach in the mid- 1990s . In the mid-1970s, however, new search types were introduced in addition to the classic word search such as phrase search or wildcard search as well as ranking procedures in order to meet the increasing requirements.
Another possibility opened up with relational databases with the introduction of field types such as Type = Memo (with Microsoft Access ), Type = BLOB ( MySQL ) or varchar (with other SQL databases), which are able to record longer texts. Here, the indexing of the tables in a database, which often takes place anyway, can be used together with the placeholder search for corresponding SQL queries if the respective documents are stored in such database fields.
The problem was that SQL queries required knowledge of the relevant syntax, which was only available to a few specialists. Therefore, retrieval systems were developed which, with appropriate instructions, were suitable for a larger group of users.
With the technical progress in information technology and the increase in processing speed, it has become possible to apply full-text searches to a larger number and larger files. In addition, the entire original text was prepared for the purpose of later quick retrieval in such a way that every document that only contains at least one word of the search query can be found. The full text indexing is used for this, for example as an inverted file . However, this means that those documents are not found that match the complex of topics you are looking for, but other words, e.g. B. Use synonyms . Nowadays this problem is dealt with using ontologies .
Depending on the system used, there may be a. the following search options:
- Search for a specific word , case-sensitive or not
- Phrase search, for example "With Wikipedia you can"
- Boolean operators : "and / or / not"
- Environment search: words or phrases that are "n" words apart (with PDF index less than 3 pages)
- Placeholder search:
- for single letters "?" like Ma? er = Mayer / Maier / Mauer ...
- for any number of letters "*"
- within database tables with SQL "SELECT text FROM table WHERE text LIKE '% search term%'"
- Fuzzy or fault-tolerant search Fuzzy search
- Thesaurus / synonym search
- Natural language search with relevance sorting: "Find all IT articles in Wikipedia"
- Combination with the above options
- Macro search: A possibility to carry out recurring search queries with predefined macros