The K-Zone: An introduction to cross-site scripting (XSS) vulnerabilities

What is the problem?

Cross-site scripting, which has become known as `XSS' (because the abbreviation CSS is already in use for something else) is a way for a malicious person to cause his victim to execute scripts (JavaScript, etc.) in the victim's Web browser. There are various types of XSS attack, but this article will focus on what have become known as `type 1' attacks. The characteristic of a type 1 attack is that it allows the malicious person to cause his victim to run a script as if it had come from a legitimate site. It is this impersonation of a trusted, legitimate site that creates a particular problem for the people who run the legitimate site. Normally a Web browser will not allow scripts associated with one site to get access to data associated with a different site. However, by presenting a script that appears to be from a legitimate site, XSS avoids this crucial security feature in the browser, and makes data associated with the legitimate site accessible to the script. It is this site-to-site data transfer that has given rise to the name `cross-site scripting'.

What the malicious script can do depends on the nature of the legitimate site, and the data it handles. The script will not be able to run anything on the server because the attack, by its very nature, is initiated against a user's browser. In addition, the script will not be able to do more than the browser allows that kind of script to do. For example, the malicious person will not be able to cause the victim's browser to run JavaScript that will do something that JavaScript itself is incapable of doing. In most cases, attacks have focused on reading cookies containing personal data. Since many legitimate sites do, in fact, store user data in cookies, the ability of XSS attacks to read cookies from legitimate sites is potentially a very serious problem for those sites.

Who is vulnerable?

For a type 1 XSS attack to succeed, the following elements must all be in place.

What is a reflective URL?

The existence of reflective URLs is crucial for a type-1 XSS attach to succeed. In short, a reflective URL is a URL which, when issued on the application on the Web server, causes the server to send to the browser an HTML page containing some text specified in the URL itself.

For example, consider an application that presents the user's browser with a form containing a single input element for data entry. When the user hits a submit button, this form data is sent to the server in (usually) a POST request. Suppose further that, with some kinds of input, the submission itself cannot be processed, and the application redraws the input form containing the unprocessable data. This situation has all the elements of a reflective URL.

Perhaps an example might help to make things clearer.

Suppose the application presents a form asking the user for his credit card number. If the user submits the form, and has entered a sensible credit card number (right number of digits, etc), the application moves onto the next stage of processing. But if the user enters a badly-formtted credit card number, the application redraws the page with a message to that effect, and the user's previous entry in the input field ready for him to edit.

A reflective URL for this form would specify a request parameter that, not only was an invalid credit card number, but contained sufficient text to cause the application to put that text into the page.

If you want to try this yourself, it is straightforward enough using a simple JSP page. The example below will run on the Tomcat Web server, among others. All you need to do is place this JSP into a directory that the Web server understands as appropriate for JSP pages, and invoke it from a browser. This JSP example simulates the kind of thing that goes on in more-or-less all Web-based applications that process user input.

<%
String data = request.getParameter("data");
if (data == null) data = "";
%>
                                                                                
<form action="#" method="post">
<input name="data" value="<%=data%>"/><br/>
<input type="submit"/>
</form
In this simple script, the user input is, in fact, not processed at all; but that is irrelevant -- in a type 1 XSS attack it is what happens to unprocessable input that is usually important. Note that in the example JSP page, the form contains an element called data, and that any request parameter called data is written into the form without any further checks. In the `real' application, of course, the application would have checked the input, and only re-presented the form if the input needed to be edited.

For the sake of simplicity, I assume that the JSP is called xsstest.jsp, and is installed in the root directory of the server's document hierarchy. Now, suppose we invoke the JSP above using a URL with the following form:

http://host:port/xsstest.jsp?data="><script>alert('You are running a script')</script><"
If your Web browser is anything like mine, your browser will run the script
alert('You are running a script')
and pop up a box containing this message. Of course, this is a harmless script; but I could have made the script do anything that JavaScript supports. More to the point, the script would appear to the browser as if it had come from the server hosting the JSP page -- which, in a sense, it has.

The reason this works is because the request parameter data, which contains an HTML script sequence, has been `injected' into the application's output to the browser. The browser does not care that the JavaScript function that is injected is in the middle of a data entry form: this is perfectly legitimate for JavaScript.

How is the reflective URL invoked?

For an XSS attack to succeed, there must be a way for the malicious person to cause the victim to invoke the reflective URL on the legitimate site. This ought to be difficult but, because most e-mail clients now process HTML, it is now disturbingly easy -- the malicious person sends the victim an HTML e-mail containing a link to the reflective URL, and the victim clicks it. Of course the victim doesn't have to click the link -- the skill for the malicious person is to spoof the e-mail in such a way that the victim will want to do so. Sadly, experience has shown that this is not at all difficult.

How can XSS attacks be prevented?

There is no single, straightforward way to prevent XSS exploitation. The application must examine every request parameter submitted to ensure that it cannot inject executable code into the HTML page emitted by the application, while still accepting all legitimate input.

This is straighforward to accomplish -- although tedious for the application developer -- with prescriptive forms of input like credit card numbers. Since a credit card number can only legitimately contain spaces, dashes, and digits, the application can safely replace anything but these characters with harmless ones (spaces, for example) before issuing the page containing the credit card number to edit. But where the user's input can legitimately contain characters like >, < and " there is potentially a problem. These characters cannot be overwritten without breaking the functionality of the application.

Many application developers have taken the view that it is better to limit the functionality of the application than to risk an XSS attack. Such applications simply overwrite all suspect characters before emitting user input back to the browser.

Summary

XSS attacks are a problem for legitimate Web sites that host applications, so long as those applications are vulnerable to reflective URLs. The only way to prevent XSS attacks completely is to remove all reflective URLs from the application, which will not usually be all that straightforward. To do this requires filtering all user input that is re-emitted to an HTML page, to ensure that it does not contain character sequences that will cause a script to be invoked.
©1994-2006 Kevin Boone, all rights reserved