hebOCR

from Wikipedia, the free encyclopedia
hebOCR

Hocr-logo.png
Hocr-bialik-1.png
Basic data

developer Yaacov Zamir
Current  version 0.11-rc1
(2011-01-15)
operating system Linux ( macOS )
programming language C , C ++ , Python
category Text recognition software
License GPL 3 ( Free Software )
https://github.com/yaacov/hebocr

hebOCR (formerly HOCR ) is free text recognition software for Hebrew script by Yaacov Zamir. It is especially intended for ancient religious texts and poetry. It is published as free software together with the source code under the GPL . The core of the software is the libhocr program library , which is written in C and C ++ . There are two user interfaces for this : the hocr-gtk created by Yuval Tanny with Python and GTK + , which offers a graphical user interface , and the command line program hocr , which has more capabilities and is intended for automation . With qHocr there is also an (external) Qt -based, graphic frontend . To the library there is a Python and Perl - connection through which they can be controlled, for example via script.

hebOCR can process texts with Nikud , which is very important for Hebrew poetry, and handle complex page designs. GTK + can read the range of image file formats that GTK + supports (including PNG , JPEG , TIFF , BMP ). Furthermore, with a preprocessing step, it can automatically detect and compensate for inclined positions in the text, deal with spots in the original and also process very dark, light or color-cast originals. Recognition results are output with UTF-8 coding as simple ASCII text or in the HTML-based hOCR format.

The first version (0.2.0) was released on August 14, 2005. At the beginning of December 2005, hocr 0.4.6 was included in the unstable branch of the Linux distribution Debian and in the same month in the testing branch; in Ubuntu is hocr packages found from the version of June 2006 (Dapper Drake).

Web links

Commons : HOCR  - collection of pictures, videos and audio files

Individual evidence

  1. hocr.berlios.de/documentation/html ( Memento of the original from July 10, 2009 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / hocr.berlios.de
  2. packages.qa.debian.org/h/hocr/news/20051211T224905Z.html
  3. packages.qa.debian.org/h/hocr/news/20051223T220806Z.html
  4. launchpad.net/ubuntu/+source/hocr