CuneiForm

from Wikipedia, the free encyclopedia
CuneiForm
Basic data

developer Cognitive Technologies
Current  version 1.1.0
(April 19, 2011)
operating system Windows ( Linux and FreeBSD ports available)
programming language C ++ , C
category Text recognition
License BSD ( Free Software )
German speaking Yes
launchpad.net

CuneiForm ( English for cuneiform ) is a text recognition software for printed recognition templates from the Russian company Cognitive Technologies (President Olga Anatoljewna Uskowa ), which is now available as free software .

features

CuneiForm recognizes printed documents, but not handwriting or the like, with language models for over 20 different languages. The recognition of complicated table structures also works well. Results can in RTF , HTML or ASCII stored -Text or directly to the word processor Word or spreadsheet Excel export. It preserves the document structure and fonts and enables batch processing.

history

CuneiForm was once the market leader in Russia (in competition with FineReader company ABBYY ) and was with some scanners included.

In 1993 Cognitive Technologies entered into an OEM contract with the Canadian Corel Corporation, which allowed the integration of the recognition library into the Corel Draw package, which contained it from version 3.0.

In 1996, OCR CuneiForm'96 was published. It was the first text recognition package to work with an adaptive recognition method; H. a method that Multifont- and Font-recognition combines: There is an internal replica of the fonts used in the recognition template (English for. fonts ) of characters that are depicted in a recognizable quality. This also enables the recognition of poorly depicted characters, as the software adapts itself dynamically during recognition. With this recognition method, the recognition accuracy is significantly increased.

In 1997, the use of neural networks in recognition was introduced.

Since 1999, the software has been able to preserve the appearance of the template by recreating the arrangement of the elements in the output.

As part of a program that is declared to make text recognition technology available to everyone, Cognitive Technologies announced on April 2, 2008 that it would ultimately make the software completely available as free software. As a first step, after a few years with no development progress, a freeware version was published on December 12, 2007 . Furthermore, a free text recognition service was set up on the World Wide Web in June 2008 .

As an investor and project coordinator, Cognitive Technologies wants to promote the development of a new version of the software. Since the beginning of April 2008, the core of the recognition engine has been freely available under the simplified BSD license to enable commercial use. On August 30, 2009, the original user interface was also disclosed.

Cuneiform Linux

Jussi Pakkanen has created a platform-independent compilable version of the software that runs on Linux , BSD , macOS and Windows . These independent developments will eventually be integrated into the main branch of Cognitive Technologies. It is a pure command line version, which by means of the integration of ImageMagick allows reading of a variety of file formats, while otherwise only uncompressed Windows Bitmap (BMP) is supported. From version 0.5 the software can also output in the description language hOCR .

Front ends

  • YAGF is a Qt -4-based graphical user interface, which can read images directly from a scanner via X Sane and carry out a spell check using lib aspell .
  • Cuneiform-Qt is another Qt-based front end.
  • OCRFeeder provides a complete (scanning, image processing , analyzing and receiving page design, proofreading, ...) desktop OCR solution with which you can also use CuneiForm as a backend.
  • WatchOCR is a free OCR server for PDFs. WatchOCR uses CuneiForm to create searchable PDFs from PDFs with (scanned) images. Using a web interface, WatchOCR can be configured in such a way that it automatically converts newly scanned PDFs (in a specific folder) into searchable PDFs. WatchOCR is available in Deb format for Ubuntu and as a pre-configured LiveCD.

Using a script ( xsane2cunei ), CuneiForm can also be integrated into the XSane scan software . Can from the hocr issue of CuneiForm using the command-line program hocr2pdf paintings- PDF files are made by machine searchable. The command line tools pdfsandwich or pdfocr automate this process. The Archivista document management system also makes PDFs searchable by machine using CuneiForm and hocr2pdf.

Web links

Individual evidence

  1. a b www.openhub.net .
  2. Cognitive Technologies открыла код OCR Cuneiform
  3. see the project cuneiform-linux on launchpad.net
  4. symmetrica.net/cuneiform-linux/yagf-en.html ( memento of the original dated December 15, 2009) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / symmetrica.net
  5. http://en.altlinux.org/Cuneiform-Qt
  6. Archived copy ( Memento of the original from February 17, 2013 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.watchocr.com
  7. exactcode.de/site/open_source/exactimage/hocr2pdf
  8. http://tobias-elze.de/pdfsandwich/
  9. https://github.com/gkovacs/pdfocr