Wikipedia:Bots/Requests for approval/Spelian

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by WikipedianProlific (talk | contribs) at 23:17, 21 March 2007 (→‎Discussion: reply.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Operator: WikipedianProlific

Automatic or Manually Assisted: Automatic but supervised.

Programming Language(s): AWB

Function Summary: Replacing common misspellings which cannot be the result of anything other than unintentional error.

Edit period(s) (e.g. Continuous, daily, one time run): Daily run lasting several hours.

Edit rate requested: 4/5 per minute.

Already has a bot flag (Y/N)NO


Spelian is an automatic AWB based bot intended to trawl pages of Wikipedia semi-automatically making spelling corrections to specific recurring spelling errors. For example, words like prominent are almost always spelt prominant which is incorrect. It is also highly unlikely that the misspelling is intentional. So using AWB, the operator (WikipedianProlific) selects a word and does a Google search for it. 5 pages are then selected at random as a sample to ensure that the misspelling is not intentional. As a rough example, some misspellings have around 100 occurrences while others have 2000. AWB is then set up and 'run' automatically allowing Spelian to trawl through the offending articles correcting them as it goes. This process is much faster than a user manually checking every single page prior to editing. AWB will not make any automatic changes other changing the spelling of the one word being run. This is because on occasion AWB can reformat or alter words, pages and links for the worse. For precaution, records of all lists run will be kept in the extremely unlikely event that a mass revert be required. To ensure that intentional misspellings aren't picked up the Bot will have 'list of common misspellings' removed from its list and word selection will be based on strict criteria. These criteria can be found on the bots userpage here.

Discussion

My position on automatic spelling correction bots is that they should always be run fully manually, to avoid breaking product name, template calls and links, or even subverting the meaning of the text. Sometimes, an incorrect spelling is appropriate (eg in articles about bad spelling, or in almost any solrt of article where scientific terms or quotes use a bad spelling), hence my feeling for the need for such a bot to be *fully manual*. Martinp23 18:34, 21 March 2007 (UTC)[reply]

I see there is a potential for an automatic spelling replacer to make an erroneous alteration to an article, and hence the policy on usually not allowing such bots. However, Spelian is targeting specifically misspelt words which are extremely unlikely to the product of anything other than unintentional user error, rather than targeting a blanket of generic spelling mistakes. A good example of a word that Spelian might target is pejorative. It is almost always missspelt perjorative due to user misunderstanding of the correct spelling. It is also highly unlikely (as much so as one can be sure) that the word is an intentional misspelling. I appreciate that some users will have concerns about a high volume bot like this, would you support a trial run of perhaps 3 or 4 words, each of which has no more than say 50-100 erroneous occurances? Lists of the changed pages will be kept just in case an automatic mass revert be needed. Thanks. WikipedianProlific(Talk) 18:46, 21 March 2007 (UTC)[reply]
Note that I am not a WP:BAG member, so would be unable to approve a trial off my own back. As a suggestion, however, would it be possible for you to do a "dry run" of the bot on the word pejorative across 1000 pages, outputting just a list of the pages which it could edit if allowed, without actually correcting the pages? This should provide us with a quick and easy list of pages which could be affected on a run, without any potential collateral damage. On the other hand - do you have a rough list of words which you plan to correct in the near future? Martinp23 22:19, 21 March 2007 (UTC)[reply]
This is likely a bad idea. A manual bot is quite reasonable and probably required. You need to look at the mispelled word in context to determine whether or not the spelling mistake is really a mistake in each specific case. A hueristic like you want to use is good, but it won't be perfect and that's the problem. We don't want to be introducing errors that would be really difficult to catch. -- RM 22:41, 21 March 2007 (UTC)[reply]
I do indeed have a list Martinp23, I actually manually ran Perjorative --> pejorative the other day. Not a single occurance was anything other than a misspelling. My list is derived from words which are suitable on Lists of common misspellings. I also appreciate the concerns you have RM about the potential risk to presently correct articles, I have two theories on that. Firstly has anyone tried this before? It may not be as bad as we think it could be. Because it nots really a spell checker, more a word replacer, out to catch common mistakes. Provided the words are thought about I think it should be fine. I'm just asking for a trial period to test the theory in, and see what we come out with. It might be that we're very happy with the results? The second theory on it is, lets say it does produce one error. It wouldn't be the first bot out there to produce one or two anomalies. But if it corrects say 10,000 articles before it makes that mistake is it justified? I'm not so sure but it is something worth mulling over I think. Comments appreciated, thanks. WikipedianProlific(Talk) 22:53, 21 March 2007 (UTC)[reply]
I don't know how much you know about this issue, and considering how often it comes up there should probably be a FAQ written by someone. The problem is that a spelling mistake is sometimes justified, such as in a quotation. I'm not the expert on this issue, but there are other subtle cases where a replacement would be a bad thing. The problem with these types of edits is that they are very hard to detect because of the difficulty with reverting from a correctly spelled word to an incorrectly spelled word. It seems counter-intuitive. Thus this type of error is very hard to catch, and we do not want this type of error when manual spell checkers (which have broad community support) are preferred over automatic bots (which have little if any community support). We don't outright ban auto-spellcheck bots because we believe that some day someone may come up with a bot that is intelligent enough to handle all the cases. If you have not reviewed the previous spellcheck bot requests, I can find you the links if you'd like. -- RM 23:07, 21 March 2007 (UTC)[reply]
I have ran past several, but then I do genuinely think this has some good things to offer. The key to it is carefully selecting the right words to run. However I see that there are also good points on both side of the arguement. Ideally a trial run of the bot would be nice to test on a controlled set of maybe no more than 300 pages what its potential is. However, I can see why even that may be more than WP:BAG are willing to give at this time due to the hazards it presents. WikipedianProlific(Talk) 23:17, 21 March 2007 (UTC)[reply]