Spell checker and User:Cyde/List of requests for unblock: Difference between pages

From Wikipedia, the free encyclopedia
(Difference between pages)
Content deleted Content added
m Reverted edits by 222.90.52.42 to last version by Winchelsea (HG)
 
Cydebot (talk | contribs)
m Robot: Listifying from Category:Requests for unblock (4 entries)
 
Line 1: Line 1:
*[[User talk:77.42.176.221]]
{{Refimprove|date=July 2008}}
*[[User talk:Rezistenta]]
In [[computing]], a '''spell checker''' is an [[application software|applications program]] that flags words in a document that may not be [[spelling|spelled]] correctly. Spell checkers may be stand-alone capable of operating on a block of text, or as part of a larger application, such as a [[word processor]], [[email client]], electronic [[dictionary]], or [[search engine]].
*[[:Category:Requests for unblock-auto]]

*[[:Category:Requests for username changes when blocked]]
<div class="thumb tright">
<div class="thumbinner" style="width:252px;">
<div style="width:240px;" style="font-family: arial; font-size: 12px; font-weight: bold; background-color: #ffffff">
Eye have a spelling chequer,<br />
It came with my Pea Sea.<br />
It plane lee marks four my revue<br />
Miss Steaks I can knot sea.

Eye strike the quays and type a word <br />
And weight four it two say <br />
Weather eye am write oar wrong <br />
It tells me straight a weigh.

Eye ran this poem threw it,<br />
Your shore real glad two no.<br />
Its vary polished in it's weigh.<br />
My chequer tolled me sew.

A chequer is a bless thing,<br />
It freeze yew lodes of thyme.<br />
It helps me right all stiles of righting,<br />
And aides me when eye rime.

Each frays come posed up on my screen<br />
Eye trussed too bee a joule.<br />
The chequer pours o'er every word<br />
Two cheque sum spelling rule.
</div>
<div class="thumbcaption">
An unsophisticated spell checker will find little or no fault with this poem because it checks words in isolation. A more sophisticated spell checker will make use of word [[n-gram]] to consider the context in which a word occurs.</div>
</div>
</div>

==Operation==
Simple spell checkers operate on individual words by comparing each of them against the contents of a [[dictionary]], possibly performing [[stemming]] on the word. If the word is not found it is considered to be a error, and an attempt may be made to suggest a word that was likely to have been intended. One such suggestion algorithm is to list those words in the dictionary having a small [[Levenshtein distance]] from the original word.

When a word which is not within the dictionary is encountered most spell checkers provide an option to add that word to a list of known exceptions that should not be flagged.

==Design==
A spell checker customarily consists of two parts:
# A set of routines for scanning text and extracting words, and
# An algorithm for comparing the extracted words against a known list of correctly spelled words (ie., the dictionary).

The scanning routines sometimes include language-dependent algorithms for handling [[morphology (linguistics)|morphology]]. Even for a lightly inflected language like [[English language|English]], word extraction routines will need to handle such phenomena as [[contraction (grammar)|contraction]]s and [[possessive (linguistics)|possessive]]s. It is unclear whether morphological analysis provides a significant benefit.
[http://company.yandex.ru/articles/iseg-las-vegas.html]

The word list might contain just a list of words, or it might also contain additional information, such as hyphenation points or lexical and grammatical attributes.

As an adjunct to these two components, the program's [[user interface]] will allow users to approve replacements and modify the program's operation.

One exception to the above paradigm are spell checkers which use based solely statistical information, for instance using [[n-gram]]s. This approach usually requires a lot of effort to obtain sufficient statistical information and may require a lot more runtime storage. These methods are not currently in general use. In some cases spell checkers use a fixed list of misspellings and suggestions for those misspellings; this less flexible approach is often used in paper-based correction methods, such as the ''see also'' entries of encyclopedias.

==History==
The first spell checkers were widely available on mainframe computers in the late 1970s. A group of six linguists from [[Georgetown University]] developed the first spell-check system for the IBM corporation<ref>[http://cled.georgetown.edu/faculty/ Faculty & Staff: The Center for Language, Education & Development<!-- Bot generated title -->]</ref>. The first spell checkers for personal computers appeared for [[CP/M]] and [[TRS-80]] computers in 1980, followed by packages for the [[IBM PC]] after it was introduced in 1981. Developers such as Maria Mariani, Soft-Art, Microlytics, Proximity, Circle Noetics, and Reference Software rushed [[Original equipment manufacturer|OEM]] packages or end-user products into the rapidly expanding software market, primarily for the PC but also for [[Apple Macintosh]], [[VAX]], and [[Unix]]. On the PCs, these spell checkers were standalone programs, many of which could be run in [[Terminate and Stay Resident|TSR]] mode from within word-processing packages on PCs with sufficient memory.

However, the market for standalone packages was short-lived, as by the mid 1980s developers of popular word-processing packages like [[WordStar]] and [[WordPerfect]] had incorporated spell checkers in their packages, mostly licensed from the above companies, who quickly expanded support from just [[English language|English]] to [[Europe]]an and eventually even [[Asian language]]s. However, this required increasing sophistication in the morphology routines of the software, particularly with regard to heavily-inflected languages like [[Hungarian language|Hungarian]] and [[Finnish language|Finnish]]. Although the size of the word-processing market in a country like [[Iceland]] might not have justified the investment of implementing a spell checker, companies like WordPerfect nonetheless strove to localize their software for as many as possible national markets as part of their global [[marketing]] strategy.

Recently, spell checking has moved beyond word processors as [[Firefox]] 2.0, a [[web browser]], has spell check support for user-written content, such as when editing Wikitext,writing on many [[webmail]] sites, [[blogs]], and [[social networking]] websites. The web browsers [[Konqueror]] and [[Opera (web browser)|Opera]], the email client [[Kmail]] and the [[instant messaging]] [[client (computing)|client]] [[Pidgin (software)|Pidgin]] also offer spell checking support, transparently using [[GNU Aspell]] as their engine.
[[Mac OS X]] now has spell check in virtually all bundled apps and many third party apple take advantage of this as well. Safari, Mail, iChat and more all have spell check capability.

== Functionality ==
The first spell checkers were "verifiers" instead of "correctors." They offered no suggestions for incorrectly spelled words. This was helpful for [[typos]] but it was not so helpful for logical or phonetic errors. The challenge the developers faced was the difficulty in offering useful suggestions for misspelled words. This requires reducing words to a skeletal form and applying pattern-matching algorithms.

It might seem logical that where spell-checking dictionaries are concerned, "the bigger, the better," so that correct words are not marked as incorrect. In practice, however, an optimal size for English appears to be around 90,000 entries. If there are more than this, incorrectly spelled words may be skipped because they are mistaken for others. For example, a linguist might determine on the basis of [[corpus linguistics]] that the word ''[[baht]]'' is more frequently a misspelling of ''bath'' or ''bat'' than a reference to the Thai currency. Hence, it would typically be more useful if a few people who write about Thai currency were slightly inconvenienced, than if the spelling errors of the many more people who discuss baths were overlooked.
[[Image:Spell check.PNG|right|thumb|A screenshot of the Abiword spell checker]]
The first MS-DOS spell checkers were mostly used in proofing mode from within word processing packages. After preparing a document, a user scanned the text looking for misspellings. Later, however, batch processing was offered in such packages as [[Oracle Corporation|Oracle]]'s short-lived [[CoAuthor]]. This allowed a user to view the results after a document was processed and only correct the words that he or she knew to be wrong. When memory and processing power became abundant, spell checking was performed in the background in an interactive way, such as has been the case with the Sector Software produced Spellbound program released in 1987 and [[Microsoft Word]] since Word 95.

In recent years, spell checkers have become increasingly sophisticated; some are now capable of recognizing simple [[grammatical]] errors. However, even at their best, they rarely catch all the errors in a text (such as [[homonym]] errors) and will flag [[neologism]]s and foreign words as misspelling.

== Spell-checking other languages ==

English is unusual in that most words used in formal writing have a single spelling that can be found in a typical dictionary, with the exception of some jargon and modified words. In many languages, however, it's typical to frequently combine words in new ways. In German, compound nouns are frequently coined from other existing nouns. Some scripts do not clearly separate one word from another, requiring word-splitting algorithms. Each of these presents unique challenges to non-English language spell checkers.

==Context-sensitive spell checkers==

Recently, research has focused on developing algorithms which are capable of recognizing a misspelled word, even if the word itself is in the vocabulary, based on the context of the surrounding words. Not only does this allow words such as those in the poem above to be caught, but it mitigates the detrimental effect of enlarging dictionaries, allowing more words to be recognized. The most common example of errors caught by such a system are [[homophone]] errors, such as the bold words in the following sentence:
:'''Their''' coming '''too''' '''sea''' if '''its''' '''reel'''.

The most successful algorithm to date is Andrew Golding and Dan Roth's "Winnow-based spelling correction algorithm" <ref>[http://www.springerlink.com/content/u13k033301184r82/ SpringerLink - Journal Article<!-- Bot generated title -->]</ref>, published in 1999, which is able to recognize about 96% of context-sensitive spelling errors, in addition to ordinary non-word spelling errors. A [[Context sensitive user interface|context-sensitive]] spell checker appears in [[Microsoft Office 2007]]<ref>[http://blogs.msdn.com/correcteurorthographiqueoffice/archive/2006/06/05/617653.aspx CorrecteurOrthographiqueOffice : Contextual spelling in the 2007 Microsoft Office system<!-- Bot generated title -->]</ref>.

== Criticism ==

Some critics of technology and computers have attempted to link spell checkers to a trend of skill losses in writing, reading, and oral skill. They claim that the convenience of computers has led people to become lazy, often not proofreading written work past a simple pass by a spell checker. Supporters claim that these changes may actually be beneficial to society, by making writing and learning new languages more accessible to the general public. They claim that the skills lost by the invention of automated spell checkers are being replaced by better skills, such as faster and more efficient research skills. Other supporters of technology point to the fact that these skills are not being lost to people who require and make use of them regularly, such as authors, critics, and language professionals<ref>Baase, Sara. A Gift of Fire: Social, Legal, and Ethical Issues for Computing and the Internet. 3. Upper Saddle River: Prentice Hall, 2007. Pages 357-358. ISBN 0-13-600848-8.</ref>.

A good example of the problem with completely relying on spell checkers is the "[http://www.paulhensel.org/teachspell.html Spell-checker Poem]" shown above, which was originally composed by Dr. Jerrold H. Zar in 1991, with help by Mark Eckman <ref>http://grammar.about.com/od/spelling/a/spellcheck.htm. Accessed on July 31, 2008.</ref>. The original poem was 225 words long and contained 123 words which were used incorrectly. The poem is valid according to most spell checkers, even though most people could tell at a simple glance that most words are used incorrectly.

== See also ==
*[[Nearest neighbor (pattern recognition)]]
*[[Record linkage problem]]
*[[Spelling suggestion]]
*[[Grammar checker]]
*{{selfref|[[:Category:Spelling checking programs]]|}}

==References==
<references />

== External links ==
<!-- Please don't gum this up with your favorite spelling checker. Just one link should hopefully be enough. -->
*[http://citeseer.ist.psu.edu/context/167352/0 Computer Programs for Detecting and Correcting Spelling Errors]

[[Category:Spelling checking programs|*]]
[[Category:Text editor features]]
[[Category:Spelling]]

[[af:Speltoetser]]
[[da:Stavekontrol]]
[[de:Rechtschreibprüfung]]
[[fr:Correcteur (informatique)]]
[[it:Controllo ortografico]]
[[lb:Spellchecker]]
[[ms:Penyemak ejaan]]
[[nl:Spellingscontrole]]
[[ja:スペルチェッカ]]
[[nn:Retteprogram]]
[[pl:Korektor pisowni]]
[[ru:Система проверки правописания]]
[[fi:Oikeinkirjoituksen tarkistin]]
[[ta:சொல் திருத்தி]]
[[wa:Coridjrece]]
[[zh:拼写检查]]

Revision as of 20:31, 6 October 2008