The ProblemUsers need to select a coded item from a large set or even multiple sets of coded items. A simplestarts-with() or contains() type of search will not work because there are lots of similar terms in the sets. As an example, consider these four items taken from a list of pharmaceutical products containing about 80,000 in total:
INJVLST AQUA AD INJECTABILIA AMPUL 1MLINJVLST AQUA AD INJECTABILIA AMPUL 2MLINJVLST AQUA AD INJECTABILIA AMPUL 5MLINJVLST AQUA AD INJECTABILIA AMPUL 10ML
starts-with() will get you nowhere and contains() isn't helpful either. We need to be able to provide multiple search terms, which will be combined into a more sophisticated query.The SolutionUse the autocomplete control in combination with XQuery and Lucene index based searching. This is not only an elegant solution to the above problem it is also very fast.The dataIn our case we have a set of vocabulary files. These files are contained in a collection in the eXist database: /db/XML/vocab.Each file is identified by its object identifier (OID). For instance, the file containing the ISO language codes is named 1.0.639.1.xml and it looks like this: <codeSystem xmlns="urn:hl7-org:v3"> <name>ISO 639-1</name> <desc>ISO language codes</desc> <code code="nl" codeSystem="1.0.639.1" displayName="Dutch"/> <code code="en" codeSystem="1.0.639.1" displayName="English"/> .....</codeSystem>The indexFirst we need to configure the Lucene index. Indexes are defined in collection configuration files (collection.xconf), these files are placed in a collection hierarchy under /db/system/config that mirrors the database structure. So our index defintion for the /db/XML/vocab collection needs to be placed in /db/system/config/db/XML/vocab.This is the content of our collection.xconf file:
<collection xmlns="http://exist-db.org/collection-config/1.0"> <index xmlns:hl7="urn:hl7-org:v3"> <lucene> <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> <text qname="hl7:code"/> <text qname="@displayName"/> <text qname="@codeSystem"/> </lucene> </index></collection>The queryxquery version "1.0";declare namespace hl7="urn:hl7-org:v3";<result> { let $searchTerms := tokenize(lower-case(request:get-parameter let $maxResults := xs:integer('50') let $query := <query> <bool> { for $term in $searchTerms return <wildcard occur="must">{concat($term,'*')}</wildcard> } </bool> </query> return for $code in subsequence(xmldb:xcollection('/db/XML/vocab') order by $code/@displayName return <code code="{$code/@code}" codeSystem="{$code/@codeSystem}" displayName="{normalize-space($code/@displayName)}"/> }</result>
Everything is contained within a result element; this ensures that the response we get will always be well-formed xml. The first step is to get the query terms from the request parameter. A space ('\s') is used as a delimiter for the search terms because the space bar is the biggest and easiest to reach key on the keyboard. We use the tokenize function to split the searchString into a sequence of terms. The variable maxResults is used to hold the maximum number of results returned, if necessary it could be obtained from a request parameter. Next we build a Lucene query element from the sequence of search terms. For each term in searchTerms a wildcard element is constructed and an asterisk is added to the term. If the searchString is 'in aq 2m' this will result in the following query:
<query><bool><wildcard occur="must">in*</wildcard><wildcard occur="must">aq*</wildcard><wildcard occur="must">2m*</wildcard></bool></query>
Now that we have our query it is time to put it to work: for $code in subsequence(xmldb:xcollection('/db/XML/vocab')//hl7:code[ft:query(@displayName,$query)],1,$maxResults)It says: give me all the hl7:code elements in the collection /db/XML/vocab where the displayName attribute contains words beginning with 'in' and words beginning with 'aq' and words beginning with '2m', but never give me more than the first 50 results. The subsequence() function limits the number of results, this is essential when working with large sets. The xmldb:xcollection() function is used to limit the search to this specific collection. The key to using Lucene based search is the ft:query() function: //hl7:code[ft:query(@displayName,$query)]It is used like any other xpath expression and takes the item to search and the query as arguments. Next the results are ordered by @displayName and the code elements are returned. One could also just return the element as is (return $code) but then it would be in the HL7 namespace, this way it is not.
|



