Cross-site scripting

Cross-site scripting (XSS) is a type of computer security vulnerability typically found in web applications which allow code injection by malicious web users into the web pages viewed by other users. Examples of such code include HTML code and client-side scripts. An exploited cross-site scripting vulnerability can be used by attackers to bypass access controls such as the same origin policy. Vulnerabilities of this kind have been exploited to craft powerful phishing attacks and browser exploits. As of 2007, cross-site scripting carried out on websites were roughly 80% of all documented security vulnerabilities.^[1] Often during an attack "everything looks fine" to the end-user.^[2]

Background

In general, cross-site scripting holes can be seen as vulnerabilities present in web pages which allow attackers to bypass security mechanisms. By finding clever ways of injecting malicious scripts into web pages, an attacker can gain elevated access privileges to sensitive page content, session cookies, and a variety of other objects.

Cross-site scripting was originally referred to as CSS,^[2] although this usage has been largely discontinued (due to confusion with Cascading Style Sheets).

Several high profile security vulnerabilities followed the Netscape introduction in 1995 of the JavaScript language.^[3] Netscape began to realize some of the security risks of allowing a Web server to send executable code to a browser (even if only in a browser sandbox). The company introduced the same origin policy in Netscape Navigator version 2.^[4] One key problem is the case where users have more than one browser window open at once. In some instances, a script from one page should be allowed to access data from another page or object, but in others, this should be strictly forbidden because a malicious website could attempt to steal sensitive information. The policy forbids browsers to load a script when it crosses the boundary of the current "Window" object^[5] unless the script originated from the same domain and over the same protocol and the same port if port is specified.^[4] Essentially, this policy was intended to allow interaction between objects and pages but in theory a malicious Web site would not be able to access sensitive data in another browser window. Unfortunately browser vendors implemented the policy in different ways and the result was unpredictable behavior.^[5] The policy also had loopholes, for example, an HTML element embedded in a page or resource at the origin host may link to a script hosted elsewhere and the browser will load that script when it loads the page.^[5]

Since then, other similar access-control policies have been adopted in other browsers and client-side scripting languages to protect end-users from malicious Web sites but the policies may depend on the user themself to guide access control according to their preferences. For example, an interpreter could partition what a script may access and what protocols it may invoke,^[6] or digital signatures might identify scripts and their source to the user or user agent before a script can load.^[7]

Types

As of now (May 7, 2008) three distinct types of XSS vulnerability are known.

DOM-based (Type 0)

DOM or Document Object Model is a standard object model for representing HTML or XML and related formats. This form of XSS vulnerability has also been referred to as local cross-site scripting, and while it is not new by any means, a 2005 paper (DOM-Based cross-site scripting) does a good job of defining its characteristics. With DOM-based cross-site scripting vulnerabilities, the problem exists within a page's client-side script itself. For instance, if a piece of JavaScript accesses a URL request parameter and uses this information to write some HTML to its own page, and this information is not encoded using HTML entities, an XSS hole will likely be present, since this written data will be re-interpreted by browsers as HTML which could include additional client-side script.

In practice, exploiting such a hole would be very similar to the exploit of non-persistent type vulnerabilities (see below), except in one very important situation. Because of the way older versions of Internet Explorer (IE) treat client-side script in objects located in the "local zone" (for instance, on the client's local hard drive), an XSS hole of this kind in a local page can result in remote execution vulnerabilities. For example, if an attacker hosts a malicious website, which contains a link to a vulnerable page on a client's local system, a script could be injected and would run with privileges of that user's browser on their system. (Local HTML pages are commonly installed with standard software packages, including Internet Explorer.) This bypasses the entire client-side sandbox, not just the cross-domain restrictions that are normally bypassed with XSS exploits. The Local Machine Zone Lockdown of IE6 on Windows XP Service Pack 2, and IE7 closes this hole in current versions of IE.

Non-Persistent (Type 1)

This kind of cross-site scripting hole is also referred to as a reflected vulnerability, and is by far the most common type. These holes show up when data provided by a web client is used immediately by server-side scripts to generate a page of results for that user. If unvalidated user-supplied data is included in the resulting page without HTML encoding, this will allow client-side code to be injected into the dynamic page. A classic example of this is in site search engines: if one searches for a string which includes some HTML special characters, often the search string will be redisplayed on the result page to indicate what was searched for, or will at least include the search terms in the text box for easier editing. If all occurrences of the search terms are not HTML entity encoded, an XSS hole will result.

At first blush, this does not appear to be a serious problem since users can only inject code into their own pages. However, with a small amount of social engineering, an attacker could convince a user to follow a malicious URL which injects code into the results page, giving the attacker full access to that page's content. Due to the general requirement of the use of some social engineering in this case (and normally in Type 0 vulnerabilities as well), many programmers have disregarded these holes as not terribly important. This misconception is sometimes applied to XSS holes in general (even though this is only one type of XSS) and there is often disagreement in the security community as to the importance of cross-site scripting vulnerabilities.

Persistent (Type 2)

This type of XSS vulnerability is also referred to as a stored or second-order vulnerability, and it allows the most powerful kinds of attacks. A type 2 XSS vulnerability exists when data provided to a web application by a user is first stored persistently on the server (in a database, filesystem, or other location), and later displayed to users in a web page without being encoded using HTML entities. A classic example of this is with online message boards, where users are allowed to post HTML formatted messages for other users to read.

These vulnerabilities are usually more significant than other types because an attacker's malicious script is rendered more than once. This could potentially hit a large number of other users with little need for social engineering or the web application could even be infected by a cross-site scripting virus.

The methods of injection can vary a great deal, and an attacker may not need to use the web application itself to exploit such a hole. Any data received by the web application (via email, system logs, etc) that can be controlled by an attacker must be encoded prior to re-display in a dynamic page, else an XSS vulnerability of this type could result.

Exploit scenarios

Attackers intending to exploit cross-site scripting vulnerabilities must approach each class of vulnerability differently. For each class, a specific attack vector is described here. (The names below come from the cast of characters commonly used in computer security.)

DOM-based attack

Mallory sends a URL to Alice (via email or another mechanism) of a maliciously constructed web page.
Alice clicks on the link.
The malicious web page's JavaScript opens a vulnerable HTML page installed locally on Alice's computer.
The vulnerable HTML page contains JavaScript which executes in Alice's computer's local zone.
Mallory's malicious script now may run commands with the privileges Alice holds on her own computer.

Non-Persistent

Alice often visits a particular website, which is hosted by Bob. Bob's website allows Alice to log in with a username/password pair and store sensitive information, such as billing information.
Mallory observes that Bob's website contains a reflected XSS vulnerability.
Mallory crafts a URL to exploit the vulnerability, and sends Alice an email, making it look as if it came from Bob (i.e., the email is spoofed).
Alice visits the URL provided by Mallory while logged into Bob's website.
The malicious script embedded in the URL executes in Alice's browser, as if it came directly from Bob's server. The script can be used to email Alice's session cookie to Mallory. Mallory can then use the session cookie to steal sensitive information available to Alice (authentication credentials, billing info, etc) without Alice's knowledge.

Persistent

Bob hosts a web site which allows users to post messages and other content to the site for later viewing by other members.
Mallory notices that Bob's website is vulnerable to a type 2 XSS attack.
Mallory posts a message, controversial in nature, which may encourage many other users of the site to view it.
Upon merely viewing the posted message, site users' session cookies or other credentials could be taken and sent to Mallory's webserver without their knowledge.
Later, Mallory logs in as other site users and posts messages on their behalf....

Please note, the preceding examples are merely a representation of common methods of exploit and are not meant to encompass all vectors of attack.

Real-world examples

There are literally hundreds of examples of cross-site scripting vulnerabilities available publicly. Just a few examples to illustrate the different types of holes will be listed here.

An example of a DOM-based vulnerability was once found in an error page produced by Bugzilla where JavaScript was used to write the current URL, through the document.location variable, to the page without any filtering or encoding. In this case, an attacker who controlled the URL might have been able to inject script, depending on the behavior of the browser in use. This vulnerability was fixed by encoding the special characters in the document.location string prior to writing it to the page.
A famous example for Non-Persistent XSS vulnerabilities: Two XSS vulnerabilities in Google.com website were identified and published by Yair Amit in December 2005. The vulnerabilities allowed an attacker to impersonate legitimate members of Google's services or to mount a phishing attack. This publication presented an obscure way to bypass common XSS countermeasures by using UTF-7 encoded payloads.
Two DOM-based XSS vulnerabilities were exploited humorously, in August 2006, through a fake news summary which claimed President Bush appointed a 9 year old boy to be the chairperson of the Information Security Department. This claim was backed up with links to cbsnews.com and www.bbc.co.uk, both of which were vulnerable to separate XSS holes which allowed the attackers to inject an article of their choosing.
An example of a Persistent vulnerability was found in Hotmail, in October 2001 by Marc Slemko, which allowed an attacker to steal a user's Microsoft .NET Passport session cookies. The exploit for this vulnerability consisted of sending a malicious email to a Hotmail user, which contained malformed HTML. The script filtering code in Hotmail's site failed to remove the broken HTML and Internet Explorer's parsing algorithm happily interpreted the malicious code. This problem was quickly fixed, but multiple similar problems were found in Hotmail and other Passport sites later on.
Netcraft announced on June 16, 2006 that a security flaw in the PayPal web site is being actively exploited by fraudsters to steal credit card numbers and other personal information belonging to PayPal users. The issue was reported to Netcraft via their own anti-phishing toolbar. Soon after, Paypal reported that a "change in some of the code" on the Paypal website had removed the vulnerability.
On October 13, 2005 Samy exploited a security flaw in MySpace resulting in over one million friend requests being made to its creators profile. Qualifying as a Persistent vulnerability, it used multiple XMLHttpRequests to propagate itself.
An XSS vulnerability in Community Architect Guestbook was disclosed by Susam Pal on April 19, 2006 which can be exploited by malicious people to conduct script insertion attacks. As a result, many free web-hosting services which used the guestbook were vulnerable to such attacks.
On November 8th, 2006 Rajesh Sethumadhavan discovered a Persistent vulnerability in the social network site Orkut which would make it possible for Orkut members to inject HTML and JavaScript into their profile [1]. Rodrigo Lacerda used this vulnerability to create a cookie stealing script known as the Orkut Cookie Exploit which was injected into the Orkut profiles of the attacking member(s). By merely viewing these profiles unsuspecting targets had the communities they owned transferred to a fake account of the attacker. On December 12th, Orkut fixed the vulnerability.
On October 10th, 2007, the website belonging to the Australian Liberal Party had a Type 2 security hole exploited, resulting in a photograph of then Australian Prime Minister John Winston Howard being captioned with a lewd suggestion.[2][3]
On April 19, 2008, the Community Forum for the Barack Obama presidential campaign was exploited to redirect traffic to the Hillary Clinton website. [4]

Avoiding XSS vulnerabilities

Reliable avoidance of cross-site scripting vulnerabilities currently requires the encoding of all HTML special characters in potentially malicious data. This is generally done just before display by web applications (or client-side script), and many programming languages have built-in functions or libraries which provide this encoding (in this context, also called quoting or escaping).

An example of this kind of quoting is shown below, from within the Python interpreter:-

~> python
Python 2.3.5 (#2, Aug 30 2005, 15:50:26) 
Type "help", "copyright", "credits" or "license" for more information.
>>> import cgi

>>> print "<script>alert('xss');</script>"
<script>alert('xss');</script>

>>> print cgi.escape("<script>alert('xss');</script>")
&lt;script&gt;alert('xss');&lt;/script&gt;

Here, the first print statement produces executable client-side script, whereas the second print statement outputs a string which is an HTML-quoted version of the original script. The quoted versions of these characters will appear as literals in a browser, rather than with their special meaning as HTML tags. This prevents any script from being injected into HTML output, but it also prevents any user-supplied input from being formatted with benign HTML.

The ultimate problem with trying to avoid XSS vulnerabilities is that every situation is different. For any given situation, the needs and the issues change. For instance, if user input is going into the src attribute of a hyperlink, cgi.escape() would not be sufficient. Let's say a picture was to be added to a page of pictures, in this fashion:

 <img src='$url'>

An attacker could enter "doesntexist.jpg' onerror='alert(document.cookie)" to add an event which triggers when the browser fails to load "doesntexist.jpg", executing the code.

If one were to implement a function like cgi.escape() (which comes with Python), one would be best off converting <, >, &, " and ' characters to their equivalent HTML entity.

As stated above, the unfortunate consequence of this fix is that users are prevented from embedding non-malicious HTML into pages. Because HTML standards do not provide any simple mechanism to disable client-side scripts in specific portions of a web-page, it is difficult to reliably cleanse script from normal HTML.

The most reliable method is for web applications to parse the HTML, strip tags and attributes that do not appear in a whitelist, and generate valid HTML.

Simplified filtering methods (e.g. just removing known dangerous tags or operating on characters of input, not parsed nodes) can be circumvented with malformed HTML code or non-standard attributes that may contain script. As a similar attack targeted Hotmail, on October 2001.

Other forms of mitigation

The easiest way to eliminate XSS vulnerabilities is to encode (HTML quote) all user-supplied HTML special characters, thereby preventing them from being interpreted as HTML. Unfortunately, users of many kinds of web applications (commonly forums and webmail) wish to use some of the features HTML provides. There are some web applications (such as MySpace, MediaWiki, and most forum software), which attempt to identify malicious HTML constructs, and neutralize them, either by removing it or encoding it. But due to the flexibility and complexity of HTML and related standards, and the continuous addition of new features, it is almost impossible to know for sure if all possible injections are eliminated. In order to eliminate certain injections, any server-side algorithm must either reject broken HTML, understand how every browser will interpret broken HTML, or (preferably) fix the HTML to be well-formed using techniques akin to those of HTML Tidy.

Besides content filtering, other methods for XSS mitigation are also commonly used. One example is that of cookie security. Many web applications rely on session cookies for authentication between individual HTTP requests, and because client-side scripts generally have access to these cookies, most simple XSS exploits are written to steal these cookies. To mitigate this particular threat (though not the XSS problem in general), many web applications tie session cookies to the IP address of the user who originally logged in, and only permit that IP to use that cookie. This is effective in most situations (if an attacker is only after the cookie), but obviously breaks down in situations where an attacker is behind the same NATed IP address or web proxy. Internet Explorer also has a feature, called the HttpOnly flag, which allows a webserver to set a cookie which is unavailable to client-side scripts. Support for HttpOnly was added in Mozilla Firefox 2.0.0.5.^[8] While this is a useful feature, it does not prevent the use of XSS to perform cross-site request forgery attacks.

An additional common mitigation is to use input validation of all potentially malicious data sources. This is a common theme in application development (even outside of web development) and is generally very useful. For instance, if a form accepts some field, which is supposed to contain a phone number, a server-side routine could remove all characters other than digits, parentheses, and dashes, such that the result cannot contain a script. (Incidentally, this can be used to prevent other injection attacks, such as SQL injection, from being successful.) While effective for most types of input, there are times when an application, by design, must be able to accept special HTML characters, such as '<' and '>'. In these situations, HTML entity encoding is the only option.

Finally, some web applications are written to (sometimes optionally) operate completely without the need for client-side scripts. This allows users, if they choose, to disable scripting in their browsers before using the application. In this way, even potentially malicious client-side scripts could be inserted unescaped on a page, and users would not be susceptible to XSS attacks. Unfortunately external content can still be loaded into the page with tags like <iframe> or <object>, which often is enough to trick users.

Many browsers can be configured to disable client-side scripts on a per-domain basis. If scripting is allowed by default, then this approach is of limited value, since it blocks bad sites only after the user knows that they are bad, which is too late. Functionality that blocks all scripting and external inclusions by default and then allows the user to enable it on a per-domain basis is more effective. This has been possible for a long time in Internet Explorer (since version 4) setting up the so called "Security Zones", and in Opera since version 9 using its "Site Specific Preferences", with a somewhat better interface. A user-friendly solution for Firefox and other Gecko based browsers is the open source NoScript extension, featuring also a specific Anti-XSS protection functionality.

The most significant problem with blocking all scripts on all websites by default is substantial reduction in functionality and responsiveness (client-side scripting is much faster than server-side scripting because it does not need to connect to a remote server and the page or frame does not need to be reloaded). Another problem with script blocking is that most users do not understand it, and would not know how to properly secure their browsers using it, if the protection were disabled by default. Another drawback is that many insecure sites do not work without client-side scripting, forcing users to disable protection for that site and opening their systems to the threat.

Related vulnerabilities

There are several classes of vulnerabilities or attack techniques which are related, and worth mentioning:

Cross-zone scripting vulnerabilities, which exploits "zone" concepts in software, usually execute code with a greater privilege.
HTTP header injection vulnerabilities, which can be used to create cross-site scripting conditions in addition to allowing attacks such as HTTP response splitting.
Cross-site request forgery (CSRF/XSRF) is almost the opposite of XSS, in that rather than exploiting the user's trust in a site, the attacker exploits the site's trust in the client software, submitting requests that the site believes come from its own authenticated users.
SQL Injection vulnerabilities, which exploits a security vulnerability occurring in the database layer of an application. When user input is incorrectly filtered any SQL statements can be executed by the application.

Notes

^ During the second half of 2007, 11,253 site-specific cross-site vulnerabilities were documented by XSSed, compared to 2,134 "traditional" vulnerabilities documented by Symantec, in "Symantec Internet Security Threat Report: Trends for July-December 2007 (Executive Summary)" (PDF). Symantec Corp. April 2008. pp. 1–2. Retrieved 2008-05-11.
^ ^a ^b Rafail, Jason (2001). "Cross-Site Scripting Vulnerabilities" (PDF). CERT Coordination Center, Carnegie Mellon University. Retrieved 2008-05-27.
^ Champeon, Steve (April 6, 2001). "JavaScript: How Did We Get Here?". O'Reilly Media. Retrieved 2008-05-27. {{cite web}}: Check date values in: |date= (help)
^ ^a ^b Ruderman, Jesse (maintainer). "The Same Origin Policy". Mozilla. Retrieved 2008-05-27.
^ ^a ^b ^c Powell, Thomas and Schneider, Fritz. JavaScript: The Complete Reference (2 ed.). McGraw-Hill/Osborne. ISBN 0072253576. Retrieved 2008-05-27.{{cite book}}: CS1 maint: multiple names: authors list (link)
^ Mayer, Alain. "Security Policy and Access Control". USITS '99 Conference Proceedings via Usenix. Retrieved 2008-05-27.
^ Taylor, John (maintainer). "JavaScript Security in Mozilla". Mozilla. Retrieved 2008-05-27.
^ [Branch Firefox 2.0.0.5 fixlist (NOW RELEASED)]

External links

[1] During the second half of 2007, 11,253 site-specific cross-site vulnerabilities were documented by XSSed, compared to 2,134 "traditional" vulnerabilities documented by Symantec, in "Symantec Internet Security Threat Report: Trends for July-December 2007 (Executive Summary)" (PDF). Symantec Corp. April 2008. pp. 1–2. Retrieved 2008-05-11.

[Rafail-2] Rafail, Jason (2001). "Cross-Site Scripting Vulnerabilities" (PDF). CERT Coordination Center, Carnegie Mellon University. Retrieved 2008-05-27.

[Champeon-3] Champeon, Steve (April 6, 2001). "JavaScript: How Did We Get Here?". O'Reilly Media. Retrieved 2008-05-27. {{cite web}}: Check date values in: |date= (help)

[Ruderman-4] Ruderman, Jesse (maintainer). "The Same Origin Policy". Mozilla. Retrieved 2008-05-27.

[Powell-Schneider-5] Powell, Thomas and Schneider, Fritz. JavaScript: The Complete Reference (2 ed.). McGraw-Hill/Osborne. ISBN 0072253576. Retrieved 2008-05-27.{{cite book}}: CS1 maint: multiple names: authors list (link)

[6] Mayer, Alain. "Security Policy and Access Control". USITS '99 Conference Proceedings via Usenix. Retrieved 2008-05-27.

[7] Taylor, John (maintainer). "JavaScript Security in Mozilla". Mozilla. Retrieved 2008-05-27.

[8] [Branch Firefox 2.0.0.5 fixlist (NOW RELEASED)]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]