Projects‎ > ‎

XForms - Prevent Injection in HTML Area

This project have been implemented. For the documentation, see: HTML cleanup in the User's Guide.


When the HTML editor (either the YUI RTE or FCK) sends a value to the server, that value is sent in the form of an Ajax request inside a element of the form:

<xxforms:event name="xxforms-value-change-with-focus-change" source-control-id="xhtml-editor">
    Some &lt;b&gt;text&lt;/b&gt; here
</xxforms:event>
.

Right now, the value is taken by the server as-is and inserted into the node to which the <xforms:input> is bound to. This is a door open to JavaScript injection. Either through the editor (maybe by pasting code which includes JavaScript) or by doctoring the Ajax request, a malicious user could include JavaScript in the HTML inserted in the instance, which could then be shown to another user if there is a page with an <xforms:output mediatype="text/html">.

The goal is to prevent any possible injection, and while at it clean the HTML we get from the HTML editor. The server-side Java code that handles a new value coming from a HTML area will be modified to:
  1. Run JTidy on the HTML. This will transform the HTML into XML (or XHTML). The JTidy library is already bundled with Orbeon Forms. There is a utility function that can used here (which calls JTidy): XFormsUtils.htmlStringToDocument().
  2. Run an XSLT file which only keeps a known subset of the HTML which is considered be safe. For now, we can keep all the text, and elements such as <b>, <i>, <ol>, <ul>, <li>, <p>, <span>. We can then always add other elements to keep in the XSL in the future. The path to the XSLT file can be hardcoded in the Java code, and can be stored under src/resources-packaged/ops/xforms.
  3. Serialize the output of the XSLT transformation back into the node as as string.
A few tests that we should be able to run once this is implemented (the control-xhtml-area.xhtml sandbox example can be used as a test bed):
  • Generally speaking, the control should behave in the same way it does now. If you write "Some text here" in the control, the value in the instance should be "Some <b>text</b> here", just as it is now.
  • We should be able to paste some "dirty" HTML from Word and have it cleaned up on the server-side.
  • We should be able to simulate a JavaScript injection by doctoring the Ajax request to add some JavaScript inside the <xxforms:event> and the JavaScript should be removed on the server.
Comments