Inverse document frequency

from Wikipedia, the free encyclopedia

The inverse document frequency ( English Inverse Document Frequency (IDF)) is used in information retrieval of a word or Termes for indexing of documents to determine the release ability.

A word that occurs often in only a few documents is more suitable than one that occurs in almost every document or only very rarely. Together with the term frequency (see Tf-idf measure ) it is used to weight words during automatic indexing .

The inverse document frequency can be calculated as

where denotes the number of documents and the number of documents that contain the term . As the document frequency increases, the fraction becomes smaller.

See also