Wikipedia:Bots/Requests for approval: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Kotepho (talk | contribs)
Essjay (talk | contribs)
Line 42: Line 42:


Any questions, or concerns, let me know. --<b><font color="666666">[[User:Lightdarkness|light]]</font><font color="#000000">[[User:Lightdarkness|darkness]]</font></b><sup> ([[User_talk:Lightdarkness|talk]])</sup> 18:48, 16 July 2006 (UTC)
Any questions, or concerns, let me know. --<b><font color="666666">[[User:Lightdarkness|light]]</font><font color="#000000">[[User:Lightdarkness|darkness]]</font></b><sup> ([[User_talk:Lightdarkness|talk]])</sup> 18:48, 16 July 2006 (UTC)

*Speedy approved, no problems whatsoever. <span style="font-family: Verdana">[[User:Essjay|<font color="#7b68ee">'''Essjay'''</font>]] [[User talk:Essjay|<font color="#7b68ee">(<small>Talk</small>)</font>]]</span> 22:19, 16 July 2006 (UTC)


=Bots in a trial period=
=Bots in a trial period=

Revision as of 22:19, 16 July 2006

New to bots on Wikipedia? Read these primers!

To run a bot on the English Wikipedia, you must first get it approved. Follow the instructions below to add a request. If you are not familiar with programming consider asking someone else to run a bot for you.

 Instructions for bot operators

Current requests for approvals

  1. This bot is supposed to be manually run.
  2. I will run XaxaBot once, before examining the output and making any code changes. If I write code to upload pages, I will upload one page per test run.
  3. This is a python bot written in the pywikipedia framework.
  4. This bot is only for sandbox purposes. If I'm going to write a bot to help Project Echo, I need to know it's going to work before I submit it formally. I'm a bad programmer (methodologically speaking), so I need to test each line or function of code before moving on. As far as importance, I will also be using this bot to test code for bots for the Bot Request page.

More comments: In it's current state, it fails the usefulness requirement, but that will necessarily change in time. I can vouch for it's harmlessness because I meticulously check output for errors before running a real test which includes an upload action. It will not be a server hog because I will happily use database dumps (they're as good as the real thing, right?).

Basically, just a SandBot. Xaxafrad 07:01, 16 July 2006 (UTC)[reply]

Perhaps the test wiki would be a better place to run this -- Tawker 18:50, 16 July 2006 (UTC)[reply]
If you'll only be running it in your sandbox at this point, there is no need to request approval. Do your testing in your sandbox, get it working, and then bring it back to us once it's ready to do what it's programmed to do. And be sure to tell us what it is that it's going to do (you mention Project Echo, but didn't really say what it's going to *do* for the project). Essjay (Talk) 14:44, 16 July 2006 (UTC)[reply]


  • What: orphaning (by linking in most cases) fair use images outside of ns:0
  • How: Based off lists generated by sql queries and reviewed to exclude some potentially ok pages (Portals there is some debate over, Wikipedia:Today's featured article and its subpages, Wikipedia:Recent additions, etc) using replace.py
  • How often: Probably in batches of ~500 pages based on the type of image (such as {{albumcover}}) list of images to remove list of pages to edit. With 10747 images and 19947 pages it may take a while still. Once this group is done, updates will depend on frequency of database dumps and/or whenever the toolserver works again and I can wrangle someone into running a report/get an account.

Kotepho 09:19, 12 July 2006 (UTC)[reply]

This one looks like it's going to be a magnet for complaints with people who don't understand image use policy but it does sound necessary. I'd start with a very well written our FAQ page and leave a talk page message on their page saying what the bot did and why it did it before I would run/approve it -- Tawker 21:36, 15 July 2006 (UTC)[reply]
Durin's page is quite good. Kotepho 21:49, 16 July 2006 (UTC)[reply]
Isn't this similar in at least some ways to what OrphanBot does? I'd like to hear Carnildo's comments on this, given that he runs OrphanBot and is on the approvals group. Essjay (Talk) 14:52, 16 July 2006 (UTC)[reply]
It is basically the same role, the main reason I brought it up is looking at OrphanBot's talk over time it does have a lot of complaints, (I suspect its people not familiar with policy mostly howerver) - it's just something FRAC, I have no problems with the bot personally and I'll give it the green light for a trial run, I just wanted to make sure Kotepho knew what deep water (complaints wise) this is :) -- Tawker 18:52, 16 July 2006 (UTC)[reply]
Well, someone has to do it, and I'd likely do at least some of them by hand so the complaints will come either way. Kotepho 21:49, 16 July 2006 (UTC)[reply]


Requests to add a task to an already-approved bot

LDBot

Yo, per the request of Joturnner, I've expanded the functionality of LDBot to create the new subpages for the Current Events portal. An example edit can be seen here.

I'm not sure if I can approve my own bot, as I am in the approvals group, but I see no problem with it, as it's behaving just as the AFD functionality does. It'll create the page at midnight EASTERN, rather than UTC like the AFD functions.

Any questions, or concerns, let me know. --lightdarkness (talk) 18:48, 16 July 2006 (UTC)[reply]

  • Speedy approved, no problems whatsoever. Essjay (Talk) 22:19, 16 July 2006 (UTC)[reply]

Bots in a trial period

Between February and April this year, I made a large number of typo-fixing edits (approximately 12,000 in total). All of these were done manually – every edit was checked before saving – although I have written software similar to AutoWikiBrowser to assist with the process. This software is designed specifically for spellchecking and so, while not as flexible as AWB, has a number of advantages. It reports the changes made in the edit summary, can check articles very quickly (in less than a second), and can easily switch between different corrections (for example, "ther" could be "there", "the" or "other") in a way that AWB cannot. Central to this is a list of over 5000 common errors that I have compiled from various sources, including our own list of common misspellings, the AutoCorrect function of Microsoft Office, other users' AWB settings, and various additions of my own. As I mentioned, I have done an extensive amount of editing with the aid of this software, using my main account. I have recently made further improvements to the software; over the last couple of days I have made a few edits to test these improvements, and I am now satisfied that everything works.

While I believe Wikipedia is now so heavily used that (a) no one person could hog the servers even if they wanted to, and (b) the Recent Changes page is more or less unusable anyway, a couple of users have expressed concerns about the speed of these edits (which reached 10 per minute during quiet periods). Most notably, Simetrical raised the issue during my RfA. As I stated in my response to his question, I was not making any spellchecking edits at that time, but I explained that I would request bot approval should I decide to make high-speed edits in the future. That time has now come; I have created User:GurchBot, and I request permission to resume exactly what I was doing in April, but under a separate account. I will leave the question of whether a bot flag is necessary to you; I am not concerned one way or the other.

Thanks – Gurch 19:45, 15 July 2006 (UTC)[reply]

As long as you are checking it yourself and ignoring the "sic"s, it seems good to me. Alphachimp talk 23:54, 15 July 2006 (UTC)[reply]
Yes, I check every edit before I save it, and I ignore [sic] when I see it. I have incorrectly fixed a couple of [sic]s in the past because I (the falliable human) failed to spot them; one of my improvements has been to add [sic]-detecting to the software so it can alert me to this, and hopefully make an error less likely in future – Gurch 10:03, 16 July 2006 (UTC)[reply]
  • I don't have any issue with this, provided you aren't doing any of the spelling corrections that tend to cause problems, such as changes from Commonwealth English to American English and visa versa. As long as it's only correcting spelling errors and doesn't touch spelling variations, it should be fine. I'd like to see a week's trial (which is standard) to get a good idea of exactly what will be taking place, and also for users to add thier comments. A week's trial is approved, please report back this time next week. Essjay (Talk) 14:47, 16 July 2006 (UTC)[reply]
    I have never corrected spelling variations, regional or otherwise – being from the UK, I have long since given up and accepted all variants as equally permissible anyway. If you wish, I can upload the entire list and replace the (now out-of-date) User:Gurch/Reports/Spelling; I will probably do this at some point anyway. I won't be around in a week's time, so you can expect to hear from me in a month or so. For now, you can take this to be representative of what I will be doing – Gurch 16:11, 16 July 2006 (UTC)[reply]

I would like permission to run a bot to tag newly created copyvio articles with {{db-copyvio}} (although I would only tag with {{nothing}} and exit so I can look over the edit until I am confident in its accuracy in identifying copyvios). The bot is written in perl, although it calls replace.pl (from pywikimediabot). Once I work out the bugs, I would want to have the bot running continuously. -- Where 01:44, 12 July 2006 (UTC)[reply]

How do you intend to gather the "newly created copyvio articles"? — xaosflux Talk 03:00, 12 July 2006 (UTC)[reply]
The bot watches on the RC feed at browne.wikimedia.org. Every new article is downloaded, and the text is run through a yahoo search to see if there are any matches outside of Wikipedia. -- Where 04:12, 12 July 2006 (UTC)[reply]
But what if the text is a GFDL or PD source, or quotes a GFDL/PD source?--Konstable 04:52, 12 July 2006 (UTC)[reply]
Also, how about fair use quotes? --WinHunter (talk) 05:59, 12 July 2006 (UTC)[reply]
  • Wouldn't it be better to report potential copyvios (at an IRC channel, and at WP:AIV or a similar page for non-IRC folks) instead of just tagging them outright? Also, you could use Copyscape, similar to how the Spanish Wikipedia implemented this idea. Try talking to User:Orgullomoore for ideas. Titoxd(?!?) 06:35, 12 July 2006 (UTC)[reply]
    • Yes, I suppose since the bot is bound to have a large number of false detections of coyvios it would be best to report it in a way other than simply tagging articles for speedy deletion. I like Titoxd's idea of listing the possible copyvios on a page similar to AIV (later, perhaps, I can implement an IRC notification bot if this goes okay). I looked at copyscape, however, and it only will allow for 50 scans per month unless I pay them money, which I am not willing to do. Thanks for your time! -- Where 14:44, 12 July 2006 (UTC)[reply]
      • Again, ask Orgullomoore. He runs more than just 50 scans a month, so you two might be able to work something out. Titoxd(?!?) 05:31, 13 July 2006 (UTC)[reply]
  • What would be best is if it put a notice on the talk page "This article might be a copyvio" and added that article to a daily list (in the bot's userspace) of suspected copyvios. Then humans could use their judgement to deal with them properly... overall I think it would speed things up tremendously, since we'd have all the likely copyvios in one place. It should probably avoid testing any phrases in quotation marks, but other than that, I don't think it would pick up a huge number of false positives. In my experience with newpage patrol, for every 99 copyvios there's maybe 1 article legitimately copied from a PD/GPL site. Like I said earlier, it's rather amazing that we don't have a bot doing this already, and I'm glad someone's developing it finally. Contact me if you need any non-programming help with testing. --W.marsh 21:30, 12 July 2006 (UTC)[reply]
    • The problem with putting a notice on a talk page would be that it would create a large number of talk pages for deleted articles; that being said, if you still think it is a good idea, I will trust your judgement and implement it anyway once I am confident in the bot's accuracy. Also, just out of curiosity, what do you think is wrong with searching for exact phrases? (when I was not testing for exact phrases, the bot claimed that a page was a copyvio of a webpage that listed virtually every word in the English language). Thanks for your suggestions, and your time. -- Where 23:02, 12 July 2006 (UTC)[reply]
Oh, you're probably right about the talkpages, I hadn't thought of that. For the other thing, I mean that it shouldn't search for phrases that were originally in quotation marks in the test article, since those are probably quotations that might be fair use. But it should definently search for other exact phrases from the article on Google/Yahoo whatever. By the way, I think Google limits you to 3,000 searches/day, Yahoo might too... not sure if that will have an impact. --W.marsh 23:09, 12 July 2006 (UTC)[reply]
I got the impression that yahoo was more lenient than google. But if worse comes to worse, I will have to just use the web interface rather than the API (which should allow me unlimited searches). -- Where 23:31, 12 July 2006 (UTC)[reply]
This seems like a good idea, but the only concern I would have is that the process be supervised by a non-bot (i.e. human, hopefully). Tagging the talk page or on an IRC channel seems like a good idea; admins would simply have to remember to check those often and make sure that the bot is accurate. Thanks! Flcelloguy (A note?) 05:11, 13 July 2006 (UTC)[reply]
I agree; the bot will have a fair amount of errors because of the concerns voiced above. Thus, the bot will edit only one page, which will be outside article space. This page would contain a listing of suspected copyvios found by the bot. During the trial period, I would set the bot to edit a page in my userspace; if the bot is successful, perhaps the page could be moved to the Wikipedia namespace. Does that address your concern? If not, I'm open to suggestions :) -- Where 18:05, 13 July 2006 (UTC)[reply]
I like this idea in general. My only concern is that even with liberal filters it could create a massive, unmanageable backlog. Have you tried to estimate how many pages per day/week would this generate? Misza13 T C 19:10, 13 July 2006 (UTC)[reply]
I have not done so yet; however, based on tests so far, I would estimate that the backlog would be managable. It is hard to tell for sure though, without a trial. Thus, I just started the bot so it commits to a file, and does not touch Wikipedia. When I finish this trial, I will be able to give an estimation of how many suspected copyvios it finds per day. -- Where 19:29, 13 July 2006 (UTC)[reply]
I just did a 36 minute test, in which 4 potential copyvios were identified. If I did the calculatins correctly, this would mean that 160 potential copyvios would be identified on a daily basis (assuming that the rate of copyvios is constant, which is obviously not the case). This is a lot, but should be managable (especially if A8 is amended). Also, I should be able to reduce the number of false identifications with time. Two of the items identified were were not copyvios; one was from a Wikiedia mirror, and I am still examining the cause of the other one. -- Where 21:53, 13 July 2006 (UTC)[reply]
Yes, having the bot edit one page and listed the alerts there would alleviate my concerns. The test is also quite interesting, though I would like to perhaps see a longer test - maybe 24 or 48 hours? 36 minutes may not be reliable data to efficiently estimate the daily output. Thanks! Flcelloguy (A note?) 23:56, 13 July 2006 (UTC)[reply]
Okay; I am starting another test and will have it run overnight. -- Where 00:08, 14 July 2006 (UTC)[reply]

The bot is currently listing possible copyvios to User:Where/cp as it finds them. -- Where 01:56, 15 July 2006 (UTC)[reply]

Suggestion, could you change the listing format (see below)
  • Thats the current format
Good idea! The bot now uses that format. Thanks! -- Where 15:14, 15 July 2006 (UTC)[reply]
New format looks better, but of the 3 items listed on there right now, none are actionable, see comments per item on that page. — xaosflux Talk 01:03, 16 July 2006 (UTC)[reply]
Thanks :). I removed the items. -- Where 01:48, 16 July 2006 (UTC)[reply]

Proposed disambiguation bot, manually assisted, running m:Solve_disambiguation.py. I will be using this to work on the backlog at Wikipedia:Disambiguation pages with links; bot assisted disambiguation is substantially more efficient than any other method. The bot will run from the time it gets approval into the foreseeable future. --RobthTalkCleanup? 16:20, 13 June 2006 (UTC)[reply]

I see no reason we couldn't have a trial run, at least. robchurch | talk 20:44, 1 July 2006 (UTC)[reply]
Thanks. I'll start running it at low speed in the next couple of days. --RobthTalk 04:04, 2 July 2006 (UTC)[reply]
(In response to a request for a progress report): I've made a small run, which went quite well, but limits on my disposable time have prevented me from making any larger runs just yet--RobthTalk 01:07, 10 July 2006 (UTC)[reply]
  • No problem, trial extended, keep us informed and report back when you have enough done for us to make a final decision. Essjay (TalkConnect) 08:24, 12 July 2006 (UTC)[reply]


Approved

Approved, not flagged

Approved, flagged