Indexing Form Data

NOTE: This page is a work in progress for the next version of Orbeon Forms.

Rationale

When using relational databases, XML data is stored as a CLOB. When creating, loading, updating, or deleting a form, queries only involve one CLOB, and thus performance isn't an issue. Queries on multiple CLOBs pose more of a problem. Those are in particular done from summary pages, to extract data show in columns of the summary page, or when users perform a search. Right now, those queries require XPath expressions to run on the XML, which isn't efficient with XML stored in CLOBs.

Implementation

Form Runner persistence model

Data referred by those XPath expressions is extracted from the form data when saved, yielding a set of name-value pairs. This is done by Form Runner's persistence model, so this part is common to all the persistence layers. The name-value pairs are passed to the persistence layer in the PUT request, and an envelope is added around the form data to pass down name-value pairs to a specific persistence layer. The envelope's header contains the name-value pairs we want indexed, under a <fr:index> element. Name-value pairs are added with the name stored as an element name, as in <last-name>Smith</last-name>, since this also how they appear in the form data.

<fr:data>
    <fr:head>
        <fr:index>
            <last-name>Smith</last-name>
            <company-name>Acme</company-name>
        </fr:index>
    </fr:head>
    <fr:body>
        <form>
            <personal-information>
                <last-name>Smith</last-name>
                <first-name>John</first-name>
            </personal-information>
            <company>
                <company-name>Acme</company-name>
            </company>
        </form>
    </fr:body>
</fr:data>

SQL persistence layers

  1. An id column is added to the orbeon_form_data table, populated on insert from a sequence generator, so it can be referenced in other tables.
  2. We introduce a new table orbeon_form_data_index with columns data_id, name, and value.
  3. Upon create and update, the persistence layer adds rows to orbeon_form_data_index based on the key-value pairs under <fr:index>.
  4. Upon search, the name is already provided to the persistence layer, e.g. <query name="first-name" path="..."/>. The search implementations are updated to use query orbeon_form_data_index based on the name, instead of running the XPath expression in the XML data.

Other persistence layers

The eXist and MongoDB persistence layers are modified to take into account the new envelope, specifically extracting the data under <fr:body> in the create and update operations, and ignoring the rest of the envelope.

Service to recreate the index

For backward compatibility (customers having existing data, without an orbeon_form_data_index table), and support cases where a column is added to the search or summary page, a service to recreate the index is provided. The service:
  1. Removes the content of orbeon_form_data_index.
  2. Retrieves all the forms definitions (as the home page already does).
  3. For each form, figures the XPath expressions to the element (as done by the Form Runner persistence model).
  4. For each form form / data in the database, based on the information gathered in #3, it extracts the values from the form data, and adds rows to  orbeon_form_data_index.