Distributed Proofreaders (also DP or PGDP for short ) is a web-based project to support the international Project Gutenberg and was launched in 2000 by Charles Franks. It deals with the proofreading of the books scanned by Project Gutenberg by volunteers. So far, around 33,500 texts have been proofread.
One tries to keep the workload for a single proofreader as low as possible by dividing scanned books into individual pages and according to the brute force method (here: as large as possible of editors reads only one book page out of the thousands provided for proofreading ) to achieve the largest possible workload.
Here is the same principle as in distributed computing ( distributed computing proceed). The crucial difference is that here it is not a very large number of computers that are linked to one another via the Internet, but that an arbitrarily large number of people provide their cooperation via the Internet and thus digitize hundreds of books through their proofreading in a short time .
The currently around 1400 active participants organize themselves into teams on a voluntary basis according to origin or interests; Team Germany, for example, already has almost 500 members who work at all levels of DP.
Process of global book digitization
There are basically three phases in the process.
- In the initialization phase, a book is selected by an experienced proofreader who has been involved for a long time. The selected book must be free of copyright . The original project is based on American copyright law (texts published up to 1922), while Distributed Proofreaders Europe uses the largely uniform rule in Europe that the author of the book must have died more than 70 years ago.
- The initiator first scans each page of the book. The scans cover the entire book, i.e. cover sheet, table of contents, texts and images.
- The pages are then analyzed using OCR software. The first, but still extremely incorrect, raw text is then available.
- Then the amount of data is uploaded to the Distributed Proofreader homepage and put up for discussion in the forum as a further project proposal. After a positive vote, the project will then be activated for proofreading. It is then available for calling up on the homepage together with other projects worldwide.
Rounds 1 to 3 of proofreading ("Proofing")
After calling up the project, one page of the book is displayed. The scanned original page (as a graphic) is displayed in the upper half of the screen and the recognized OCR text is displayed in the lower half of the screen. The proof reader now reads the text on the original page and compares it with the OCR text (raw text). Scan errors are corrected and special characters are added.
This actual "proofing" takes place in two or three rounds, with each page being processed by two different participants. Only experienced proofreaders are admitted to the higher rounds.
Rounds 4 and 5 ("formatting")
In the fourth and fifth round, formatting is added (e.g. italics, headings, footnotes). While the entry barriers to the fourth round are relatively low, only experienced participants have access to the fifth round (the second of the formatting).
The previously unconnected pages of the raw text are automatically combined into a text document. An experienced proofreader, who has achieved the status of "post-processor", completes the layout with the graphics, ie he adapts them, improves them or adds any gaps in the text. He checks the document for complete agreement with the original work. In addition to the mandatory text format, he can also generate other formats, especially HTML.
The project is ended. The digitized works is on the server from Project Gutenberg (not to be confused with the commercial providers Project Gutenberg-DE ) released. Every internet user can now download and read this work. The plant is thus available to the whole world.
Meaning of distributed proofreaders
Over time, Distributed Proofreading (DP) became the largest source of e-texts for Project Gutenberg, so that Distributed Proofreaders became an official part of Project Gutenberg in 2002 . So far (March 2017) around 33,500 texts have been republished by Distributed Proofreaders, in January 2011 there were 19,500 texts. The texts do not come from any special subject areas; there are z. B. represented literature, science, sheet music, magazines and popular non-fiction books to name a few.
On March 9, 2007 , Distributed Proofreaders announced the completion and publication of the first 10,000 texts. To celebrate this and to show the diversity of the books edited in DP, a selection of 15 titles has been published together:
- Slave Narratives, Oklahoma (A Folk History of Slavery in the United States From Interviews with Former Slaves)
- by Work Projects Administration (English)
- by Powell, John Wesley (English)
- by Caldecott, Randolph [Illustrator] (English)
- by Serpa Pinto (Portuguese)
- by Smith, EE ("Doc") (English)
- by Spyri, Johanna (English)
- by Spyri, Johanna (German)
- by Punch
- by Evelyn, John (English)
- by Thérèse de Dillmont (English)
- by Francisco Ernantez Arana (fl. 1582), trans. by and edit. by Daniel G. Brinton (1837–1899) (English with Central American Indian)
- by Richard Runciman Terry (1864–1938) (English)
- by Burkett, Charles William (English)
- by Carolus Linnaeus (Carl von Linné) (Latin)
- http://www.pgdp.net - Homepage of the founder Charles Franks. Edits texts in all languages that use the Latin alphabet, provided they were published before 1923. Largest and most active DP site.
- http://www.pgdpcanada.net/c/default.php - Distributed Proofreaders Canada . Edits texts published after 1923, provided the author died at least 50 years ago. Latest DP site.
- http://dp.rastko.net/de - Distributed Proofreaders from Europe. Edits texts in all European languages. Currently little active DP side.