Wikipedia:AutoWikiBrowser/Regular expression: Difference between revisions
clean up, Replaced: article → page (8) using AWB |
added "Help via external links" section (with two links suggested by JHunterJ) |
||
Line 450: | Line 450: | ||
| --> |
| --> |
||
|} |
|} |
||
== Help via external links == |
|||
* http://www.regular-expressions.info/ |
|||
* http://perldoc.perl.org/perlre.html |
|||
Revision as of 00:40, 29 July 2008
- Home
Introduction and rules - User manual
How to use AWB - Discussion
Discuss AWB, report errors, and request features - User tasks
Request or help with AWB-able tasks - Technical
Technical documentation
This is the Regular expresions subsection of the user manual for AutoWikiBrowser.
|
Chapters: | Core · Database scanner · Find and replace · Regular expressions · General fixes |
---|
Regular expression definitions
Regular expressions | |
Anchors | |
^ | Start of string |
\A | Start of string |
$ | End of string |
\Z | End of string |
\b | Word boundary |
\B | Not word boundary |
\< | Start of word |
\> | End of word |
Character Classes | |
\c | Control character |
\s | White space |
\S | Not white space |
\d | Digit |
\D | Not digit |
\w | Word |
\W | Not word |
\x | Hexadecimal digit |
\O | Octal digit |
Quantifiers | |
* | 0 or more |
+ | 1 or more |
? | 0 or 1 |
{3} | Exactly 3 |
{3,} | 3 or more |
{2,4} | 2, 3 or 4 |
Escape Character | |
\ | Escape Character |
Metacharacters (must be escaped) | |
Metacharacter | Metacharacter escaped |
^ | \^ |
$ | \$ |
( | \( |
) | \) |
< | \< |
. | \. |
* | \* |
+ | \+ |
? | \? |
[ | \[ |
{ | \{ |
\ | \\ |
| | \| |
> | \> |
Special Characters | |
\n | New line |
Groups and Ranges | |
Note: Ranges are inclusive | |
. | Any character except new line (\n) |
(a|z) | a or z |
( ) | Capture group (captures anything between the "(" ")" |
[def] | Range d or e or f |
[^abc] | Range not a or b or c |
[a-q] | Letter between a and q |
[A-Q] | Upper case letter between A and Q |
[0-7] | Digit between 0 and 7 |
String Replacement | |
$1- returns sam | (sam) (max) (pete) |
$2 - returns max | (sam) (max) (pete) |
$3 - returns pete | (sam) (max) (pete) |
Sample Patterns | |
Regex pattern | Will Match |
([A-Za-z0-9-]+) | Letters, numbers and hyphens |
(\d{1,2}\/\d{1,2}\/\d{4}) | Date 3/24/2008 or 03/24/2008 |
\[\[\d{4}\]\] | 4 digit number wiki link [[2008]] |
Regular expression examples
Regular expression examples (Regex) | |
Description: | Search for flagicon template and remove |
Find: | \{\{.*?flagicon.*?\|.*?\}\} |
Replace With: | (nothing) |
Example of text to search: | {{flagicon|USA}} [[United States]] |
Result: | [[United States]] |
Comments: |
Tips and tricks
User made shortcut editing macros
You can make your own shortcut editing macros. When you edit an page, you can enter your short-cut macro keys into the page anywhere you want AWB to act upon them.
For example you are examining an page in the AWB edit box. You see numerous items like adding {{fact}}
, inserting line breaks <br>, commenting out entire lines <!-- comment -->, inserting state names, <ref>Insert footnote text here</ref>, insert Level 2,3,or even 4 headlines, etc... This can all be done by creating your short-cut macro keys.
- The process
- Create a rule. See Find and replace, Advanced.
- Edit your page in the edit box. Insert your short-cut editing macro key(s) anywhere in the page you want AWB to make the change(s) for you.
- Re-parse the page. Right click on the edit box and select Re-parse from the context pop up menu. AWB will then re-examine your page with your macro short-cut key(s), find your short-cut key(s) and preform the action you specified in the rule.
Naming a short-cut macro key can be any name. But it is best to try and make it unique so that it will not interfere with any other process that AWB may find and suggest. For that reason using /// followed by a set of lowercase characters that you can easily remember is best (lowercase is used so that you do not have to use the shift key). You can then enter these short-cut macros keys you create into the page manually or by using the edit box context menu paste more function. The reason why we use three '/' is so that AWB will not confuse web addresses/url's in an page when re-parsing.
Examples:
Create a rule as a regular expression.
User made short-cut editing macros examples | |
Short-cut key: | ///col |
Name: | Comment out entire line |
Find: | ///col(.*) |
Replace With: | <!-- $1 --> |
Example before reparsing: | ///colThe quick brown fox jumps over the lazy dog |
Result after re-parsing: | <!-- The quick brown fox jumps over the lazy dog --> |
Comments: | |
Short-cut key: | ///br |
Name: | Insert line feed |
Find: | ///br |
Replace With: | <br /> |
Example before reparsing: | Eat some more///br of these soft French buns///br and drink some tea |
Result after re-parsing: | Eat some more<br /> of these soft French buns<br /> and drink some tea |
Comments: | |
Short-cut key: | ///fac |
Name: | Insert {{fact}} with current date |
Find: | ///fac |
Replace With: | {{fact|date={{subst:CURRENTMONTHNAME}} {{subst:CURRENTYEAR}}}} |
Example before reparsing: | The quick brown fox jumps over the lazy dog///fac |
Result after re-parsing: | The quick brown fox jumps over the lazy dog{{fact|date={{subst:CURRENTMONTHNAME}} {{subst:CURRENTYEAR}}}} |
Comments: |