Comments? Feedback?

This wiki does not yet support public comments (a limitation of Google Sites), so we encourage you to post your comments either:

On Twitter by responding to @orbeon.

On our community mailing list: subscribe sending an email to ops-users-subscribe@ow2.org (content of subject/body doesn't matter), you'll get a response with the email to use to send your message to the community mailing list.

Recent site activity

How-to guides‎ > ‎XForms Logic‎ > ‎

Combining autocomplete, XQuery and Lucene based search


The Problem

Users need to select a coded item from a large set or even multiple sets of coded items. A simple starts-with() or contains() type of search will not work because there are lots of similar terms in the sets. As an example, consider these four items taken from a list of pharmaceutical products containing about 80,000 in total:


INJVLST AQUA AD INJECTABILIA AMPUL  1ML
INJVLST AQUA AD INJECTABILIA AMPUL  2ML
INJVLST AQUA AD INJECTABILIA AMPUL  5ML
INJVLST AQUA AD INJECTABILIA AMPUL 10ML


In this case, starts-with() will get you nowhere and contains() isn't helpful either. We need to be able to provide multiple search terms, which will be combined into a more sophisticated query.

The Solution

Use the autocomplete control in combination with XQuery and Lucene index based searching. This is not only an elegant solution to the above problem it is also very fast.

The data

In our case we have a set of vocabulary files. These files are contained in a collection in the eXist database: /db/XML/vocab.


Each file is identified by its object identifier (OID). For instance, the file containing the ISO language codes is named 1.0.639.1.xml and it looks like this:

<codeSystem xmlns="urn:hl7-org:v3">
    <name>ISO 639-1</name>
    <desc>ISO language codes</desc>
    <code code="nl" codeSystem="1.0.639.1" displayName="Dutch"/>
    <code code="en" codeSystem="1.0.639.1" displayName="English"/>
    .....
</codeSystem>

The index

First we need to configure the Lucene index. Indexes are defined in collection configuration files (collection.xconf), these files are placed in a collection hierarchy under /db/system/config that mirrors the database structure. So our index defintion for the /db/XML/vocab collection needs to be placed in /db/system/config/db/XML/vocab.


This is the content of our collection.xconf file:


<collection xmlns="http://exist-db.org/collection-config/1.0">
    <index xmlns:hl7="urn:hl7-org:v3">
        <lucene>
            <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
            <text qname="hl7:code"/>
            <text qname="@displayName"/>
            <text qname="@codeSystem"/>
        </lucene>
    </index>
</collection>


The index element contains the HL7 namespace declaration referenced in the Lucene configuration. Indexes are configured for the code element and the codeSystem and displayName attributes. Note that the namespace prefix is not used with the attributes. Although the code element does not contain any text an index is created as it provides context for the attributes. Note: Don't forget to rebuild the index (using the eXist client) if you change the definition.

The query

xquery version "1.0";
declare namespace hl7="urn:hl7-org:v3";

<result>
    {
        let $searchTerms := tokenize(lower-case(request:get-parameter
                                     ('searchString',('test'))),'\s')

        let $maxResults := xs:integer('50')
        let $query := <query>
            <bool>
                {
                    for $term in $searchTerms
                    return
                    <wildcard occur="must">{concat($term,'*')}</wildcard>
                }
            </bool>
        </query>
        return
       
        for $code in subsequence(xmldb:xcollection('/db/XML/vocab')
                     //hl7:code[ft:query(@displayName,$query)],1,$maxResults)

        order by $code/@displayName
        return
        <code code="{$code/@code}" codeSystem="{$code/@codeSystem}"
        displayName="{normalize-space($code/@displayName)}"/>
    }
</result>

Everything is contained within a result element; this ensures that the response we get will always be well-formed xml. The first step is to get the query terms from the request parameter. A space ('\s') is used as a delimiter for the search terms because the space bar is the biggest and easiest to reach key on the keyboard. We use the tokenize function to split the searchString into a sequence of terms. The variable maxResults is used to hold the maximum number of results returned, if necessary it could be obtained from a request parameter. Next we build a Lucene query element from the sequence of search terms. For each term in searchTerms a wildcard element is constructed and an asterisk is added to the term.

If the searchString is 'in aq 2m' this will result in the following query:


<query>
<bool>
<wildcard occur="must">in*</wildcard>
<wildcard occur="must">aq*</wildcard>
<wildcard occur="must">2m*</wildcard>
</bool>
</query>


Now that we have our query it is time to put it to work:

for $code in subsequence(xmldb:xcollection('/db/XML/vocab')//hl7:code[ft:query(@displayName,$query)],1,$maxResults)

It says: give me all the hl7:code elements in the collection /db/XML/vocab where the displayName attribute contains words beginning with 'in' and words beginning with 'aq' and words beginning with '2m', but never give me more than the first 50 results. The subsequence() function limits the number of results, this is essential when working with large sets. The xmldb:xcollection() function is used to limit the search to this specific collection. The key to using Lucene based search is the ft:query() function:

//hl7:code[ft:query(@displayName,$query)]

It is used like any other xpath expression and takes the item to search and the query as arguments. Next the results are ordered by @displayName and the code elements are returned. One could also just return the element as is (return $code) but then it would be in the HL7 namespace, this way it is not.


The form

To make this work we first need three instances:


<!-- instance for selected code -->
<xforms:instance id="selected-code-instance">
<root>
<code code="" codeSystem="" displayName=""/>
</root>
</xforms:instance>
<!-- instance for code search string -->
<xforms:instance id="code-search-instance">
<instance>
<searchFor/>
</instance>
</xforms:instance>
<!-- instance for code itemset -->
<xforms:instance id="code-itemset-instance">
<result>
<code/>
</result>
</xforms:instance>


An instance for the selected code element, one for the search string and one for the itemset that is to be to be displayed in the autocomplete control.

We also need a submission to call the xquery and update the itemset:


<xforms:submission id="update-codes" ref="instance('code-search-instance')"
action="/exist/rest/db/xquery/code.xq?searchString={instance('code-search-instance')/searchFor}"
method="get" instance="code-itemset-instance" replace="instance"/>


This submission uses a very nice feature of the eXist database: If the URL of an HTTP request points to an xquery, the query will be executed and the result will be returned. The action takes the search string from the 'code-search-instance' and passes it to the code.xq xquery as a request parameter (searchString).

Finally we have the code for the autocomplete control:


<fr:autocomplete ref="instance('selected-code-instance')//code/@code" id="codeSearch" dynamic-itemset="true"
max-results-displayed="50">
<!-- React to user selecting from list -->
<xforms:action ev:event="xforms-value-changed">
<xforms:setvalue ref="instance('selected-code-instance')/code/@codeSystem"
value="instance('code-itemset-instance')/code[@code=instance('selected-code-instance')//code/@code]/@codeSystem"/>
<xforms:setvalue ref="instance('selected-code-instance')/code/@displayName"
value="instance('code-itemset-instance')/code[@code=instance('selected-code-instance')//code/@code]/@displayName"
/>
</xforms:action>
<!-- React to user searching -->
<xforms:action ev:event="fr-search-changed">
<xxforms:variable name="search-value" select="event('fr-search-value')"/>
<xxforms:variable name="make-suggestion" select="string-length($search-value) >= 3 and not(instance('code-itemset-instance')//code[@displayName=$search-value])"/>
<xforms:action if="$make-suggestion">
<!-- Update itemset -->
<xforms:setvalue ref="instance('code-search-instance')/searchFor" value="$search-value"/>
<xforms:send submission="update-codes"/>
</xforms:action>
<xforms:action if="not($make-suggestion) and not(instance('code-itemset-instance')//code[@displayName=$search-value])">
<!-- Delete itemset -->
<xforms:delete nodeset="instance('code-itemset-instance')/code"/>
</xforms:action>
</xforms:action>
<xforms:itemset nodeset="instance('code-itemset-instance')/code">
<xforms:label ref="@displayName"/>
<xforms:value ref="@code"/>
</xforms:itemset>
</fr:autocomplete>


This code builds on the dynamic-itemset example from the autocomplete documentation. The control is bound to code/@code in the 'selected-code-instance'. At the bottom we have the itemset definition, the value is bound to the code attribute and the label is bound to the displayName
attribute.
There are two actions: One for handling the selection of an item and one for handling the search.
The action reacting to the 'xforms-value-changed' event sets the values of the @codeSystem and @displayName attributes based on the selected code. This approach may seem a bit verbose, but if allows for greater flexibility. The interesting part in is the action that reacts to the 'fr-search-changed' event. This is also based on the dynamic-itemset example with one important addition: We check if the search term is not already present in our 'code-itemset-instance'. This is necessary because the selected displayName can contain characters that modify the Lucene search thus changing the 'code-itemset-instance', remember that the itemset is used for obtaining the attribute values.

The following two screenshots show the control in action.
Part of the list after five keystrokes:





Only one remains at the eighth keystroke:





Just eight keystrokes are needed to select the code out of a total of 92,159 codes in 273 codesystems.

References