Projects‎ > ‎

XForms - Fields Spell Checking

This page describes a project which has been implemented in Orbeon Forms, although differently than described here.
See the documentation for the existing spell checker component.



The goal of this project is to create a new widget (with the UI implemented in XBL) which can be included in a form to provide a spell checking capability of all the fields on the form. Initially, the component will just show a button ("Spell check fields"). When activated, it will show a dialog similar to the dialog shown by Word or by the Mac standard spell checker when running the spell checking, and it will spell check the content of all the fields.

The implementation has two parts:
  • Client-side – The UI is implemented as an XBL components. Users include the "Spell check fields" button in their page by including the XBL component. Then the XBL component uses <xforms:submission> to talk to a service which does the spell checking.
  • Server-side – A service is implemented that does the spell checking.
    • The service is implemented in XPL.
    • It uses a new Spell Check processor (oxf:spell-check) which is implemented using Jazzy.
    • The service is implemented in Form Runner, but the component can be used outside of FR. The URI of the service could be /fr/service/spell-check, following the pattern established by existing services used in FR for persistence, internationalization, creating PDF, and other purposes. Services are defined in the FR page-flow.xml.

Component UI

The dialog that shows up when the spell checking is started can look like the standard Mac spell checking dialog shown in the following screenshot.
  • To this dialog, we would remove the Guess, Learn, and Forget buttons.
  • As we spell check the fields, the "current field" (in which the current misspelt word appears) is highlighted adding a CSS class, which for instance adds a color to the border of the field. The component also makes sure that this field is visible in the viewport (using the focus() method on that field).
  • Since we are not highlighting the misspelt word in the field, we add towards top of dialog the context in which the misspelt word appears (e.g. two words before and two words after, with the misspelt word in the middle shown in bold).



Implementation:
  1. The XBL components initially gets the list of the fields to spell check. It will do so with a new XForms function which returns the list of the effective ID of all the controls in the page.
  2. For each ID, it gets the node to which the control is bound, to get the string value of that node (this is the value it will spell check). For this purpose, a new XForms function will be provided, which returns the node bound to a control specified by effective ID.
  3. The list of pairs of control ID / control value is used to create the document sent to the spell checking XML API (see below).
  4. The language used for spell checking (to be specified in calls to the spell checking XML API) is retrieved from the lang attribute on the <html> element. If not present, "en" will be used. (This is the same logic we use to internationalize the YUI date picker.)

XML API

For simplicity, the REST API implemented by the spell check service follows the same format as input/output documents of the new oxf:spell-check processor, so the XPL that implements the service is quite simple, as it just runs the oxf:spell-check processor.

Input

  • The input is a series of text fragments.
  • Each fragment has an ID.
  • The language used for spell checking is specified by the xml:lang attribute on the root element.
  • The content of each control to be spell checked will be passed by the component to the REST service as a text fragment.
  • The value of each control can be passed straight inside a <text> element, except for HTML areas. In that case, the component will need to strip the markup and extract only the text from the component. After the Prevent Injection project is implemented, this will be made simpler as we'll know that the value of the control is well-formed markup. This means we'll be able to take the value of the control, parse it with saxon:parse(), and get the string value of the parsed document.
For instance:

<texts xml:lang="en">
    <text id="your-vehicle">Nice grey sparty car.</text>
    <text id="other-vehicle">Blue rusted truck.</text>
    <text id="accident">
        The reckless trock driver just changed lane without looking and bumped
        into my car. He must have either been drunk, or must have won his
        driving license at the lottery, or both.
    </text>
</texts>

Output

  • The output lists the spelling errors for each text fragment.
  • For each spelling error, it gives the misspelt word and a (possibly empty) list of suggestions.
  • If the same word is misspelt in multiple text fragments, a <spelling-error> will be listed for each fragment. This is somewhat redundant, but allows the XBL component to be more efficient as it can this way work on a fragment by fragment basis (and consider at each fragment only the misspelt words for that fragment).
  • If the word trock appears twice in a fragment, there will be only one <spelling-error> in the response. But the component will ask the user the question twice, as the right word for first occurrence might be truck and the right word for the second occurrence might be trick.
For instance:

<texts>
    <text id="your-vehicle">
        <spelling-error>
            <misspelt-word>sparty</misspelt-word>
            <suggestion>sporty</suggestion>
            <suggestion>party</suggestion>
        </spelling-error>
    </text>
    <text id="accident">
        <spelling-error>
            <misspelt-word>trock</misspelt-word>
            <suggestion>track</suggestion>
            <suggestion>trick</suggestion>
            <suggestion>truck</suggestion>
        </spelling-error>
    </text>
</texts>

Choosing a Spell Checking Library

We'd like the library to have dictionaries for French, German, Spanish, and Czech. The following libraries are being considered:
  • Jazzy
    • Licensed under LGPL. This is good.
    • It is used by a number of projects.
    • It hasn't been updated since 2005.
    • It just comes with an English dictionary.
    • For English, the dictionary from the Wordlist project can be used.
    • JazzyDicts provides a converter from the MySpell dictionary format.
    • A large number of dictionaries can be downloaded from the OpenOffice Wiki, and they can be converted to the "word list" format used by Jazzy with JazzyDicts.
  • JOrtho
    • Licensed under GPL. This is a show-stopper.
    • JOrtho is a more active project. As of February 2009, it has been last updated just one month ago.
    • Dictionaries are available out of the box for Arabic, German, Spanish, French, Italian, English, Russian, and Polish.
    • Dictionaries are based on the Wiktionary project.

Future Improvements

  • A form author might not want all the fields on the page to be spell checked. An address field would be a good example of such a field, as addresses often use names which are not found in a dictionary. We would add a capability for the form users to exclude some controls, so they are not checked. This could be done:
    • By passing a list of controls IDs to the XBL component.
    • By adding a class on the controls themselves (the XBL component will then check for the existence of that class).
Comments