Implementation Notes - XForms Engine Static XPath Dependency Analysis


STATUS: as of 2011-02, an implementation of XPath dependencies in the UI and in the model is in place in the XForms engine. You can enable it with a property.

How the implementation works

Saxon Pathmap

The implementation leverages the Saxon PathMap implementation:
  • for a given XPath expression, a series of paths is produced
  • the representation is kind of optimized as a tree, e.g. /a/(b | c) instead of producing /a/b and /a/c
  • most Saxon built-in functions support PathMap, but not all
Orbeon uses a slightly modified version of PathMap as the built-in Saxon one did not support some functionality.

XPathAnalysis

A core class is XPathAnalysis, which is passed an expression (String or Expression), and extracts:
  • whether dependences could be computed (not every expression can be handled)
  • set of dependent paths
  • set of returnable paths
  • set of dependent models
  • set of dependent instances
  • set of returnable instances
For example: instance('people')/person[../age ge 21] returns:
  • yes
  • dependent path: instance('people')/age
  • returnable path: instance('people')/person
  • dependent model: whichever model instance 'people" is
  • dependent instance: "people"
  • returnable instance: "people"
Analyzed expressions are always rooted at an instance. Paths are represented in a canonical way:
  • the instance id is a prefixed id (for XBL support)
  • path elements are Saxon NamePool numbers, so that QName comparison is trivial

Bindings and values

A control can have a binding and/or a value.

Consider this instance:

<people>
  <person>
    <name>Mary</name>
    <age>20</age>
  </person
  <person>
    <name>Bob</name>
    <age>27</age>
  </person
</people>

Consider a top-level group:

<xf:group id="my-group" ref="person[../age ge 21]">

The binding is a path relative to something: the in-scope XPath evaluation context. Say this is instance('people').

So for each binding / value, the analysis for the context and for the in-scope variables is made available.

Here, first, an XPathAnalysis is computed for instance('people').

Then, the expression person[../age ge 21] is analyzed:
  • the XPathAnalysis for instance('people') is cloned and passed
  • the expression for person[../age ge 21] is added to the PathMap for instance('people')
  • the result is a new PathMap object equivalent to having analysed instance('people')/person[../age ge 21] directly
  • then all the paths and dependencies are inferred from the Pathmap
  • the final result is a new XPathAnalysis object
Now what woud cause the binding for my-group to require an update? Clearly, either:
  • a structural change to the instance
  • if no structural change has taken place, a change to the value of instance('foo')/age
Currently, any structural change in a model invalidates all bindings and values touching that model. (We can do much better in the future.)

The value change is handled easily. If, between two refreshes, any changes is done to the value of the node instance('foo')/age, the binding needs to be reevaluated. Otherwise, no evaluation is needed.

Note that even if the value of node instance('foo')/person changes, the binding need not change.

The algorithm to determine if a binding must change is:
  • for each value change, keep track of the changed nodes
  • at node change time, or at refresh time, compute a canonical path
  • at refresh time, for each binding
    • if there was a structural change, reevaluate it
    • if the binding was not analyzed, reevaluate it
    • if any value change path intersects the binding expression's dependent paths, the binding must be updated
    • otherwise, the binding does not need reevaluation
For a value, the algorithm is the same except that returnable paths are also checked.

Model binds and MIPs

Model binds and MIPs are supported as well. The purpose is:
  • determine if an instance is touched by binds, calculate/computed MIPs, and validation MIPs [DONE]
  • not re-evaluated a MIP if not needed [DONE]
  • automatically skip model recalculate if no calculate/computed MIPs [DONE]
  • automatically skip model revalidate if no validate MIPs [DONE]
  • automatically skip model rebuilds [TODO]
  • partial model rebuilds [TODO]
  • automatically skip model recalculate if no calculate/computed dependencies [TODO]
  • automatically skip model revalidate if no validate dependencies [TODO]

Expressions supported

Not every XPath expression can be supported, especially if nothing is known about the XML document.

The following is easy to support:
  • child:: axis with named node
  • preceding-sibling:: / following-sibling:: with named node
Parent / ancestor axes can be handled in some cases:
  • e.g. instance('instance')/a/../b
  • instance('instance')/a/b/a/(c/ancestor::a | d)
In general, the following is not supported:
  • expressions containing "*" (some cases can be handled)
  • expressions containing "//"

Implementation notes

Main files

The implementation is partially done in Scala:

Static XForms document initialization

Processing order during static XForms document initialization:
  • top-level
    • top-level model DOMs are extracted and Model objects created
    • top-level controls DOM is extracted
    • top-level model DOMs from top-level controls are extracted and Model objects created
    • top-level Model objects are analyzed
      • variables
      • binds
    • top-level controls DOM is processed
      • each element is tested for an XBL binding, if so is processed (see below)
      • ControlsAnalysis is created for every control element (including those with XBL bindings)
  • elements with XBL bindings
    • shadow content is produced
    • xbl:implementation model DOMs are extracted and Model objects created
    • models within shadow content are extracted and Model objects created
    • Model objects within shadow tree controls DOM are analyzed
    • shadow tree controls DOM is processed recursively
  • all controls binding and value analysis are obtained

Next steps

  • P1:
    • structural dependencies (detect modified subtrees) [TODO]
      • e.g. binding to "person" need not be reevaluated for any structural change to nodes under "person"
    • improved RRR dependencies [TODO]
    • check possible bug:
      • "XForms Controls:$fr-resources/detail/items/page-size/item" analyzed="false" -> see failing unit test"
  • P2: more support
    • bugs
      • [ #315618 ] XPath analysis: expressions of type /foo/bar are not han
    • performance:
      • P2: static LHHA value stored in static state [TODO]
    • all XPath functions: produce proper maps where possible
      • xxf:binding() [TODO]
      • xxf:binding-context() [TODO]
      • xxf:component-context() [TODO]
      • index() [TODO]
      • case() [TODO]
      • xxf:index() [TODO]
      • MORE
    • AVT value [TODO]
  • P3: even more support
    • skip control tree diffs (to produce Ajax response) if not needed [TODO]
    • can some schema information be used or even inferred from the actual instances? [TODO]
  • DONE
    • P1: correctness [DONE]
      • rationale
        • enabling dependencies must not cause existing forms to malfunction
        • all non-analyzable expressions must be marked as !ok
    • mark as !ok
      • @context on controls [DONE]
      • @bind on controls [DONE]
      • all unsupported XPath functions [DONE]
      • model variables [NOW SUPPORTED IN VIEW]
      • xxf:sequence [DONE]
      • AVT value [DONE]
    • complete LHHA analysis [DONE]
    • xxf:sequence [DONE]
    • support for @bind on controls [DONE]
    • support for @context everywhere [DONE]
    • XPath functions
      • xxf:repeat-current() [DONE]
      • xxforms:context() [DONE]
      • xxf:serialize()/saxon:serialize() [DONE]
      • context() [DONE]
    • support for itemsets dependencies [DONE]
    • model variables exposed by binds [DONE]
    • xxforms:instance() (also static analysis to determine model + prefixed ids) [DONE]


March 2010 experiment - notes

Starting point

  • form with large repeat has 2737 controls requiring update, taking about 1500 ms
  • this update takes place even if only one value of a leaf control, such as a checkbox, is changed by the user

Experiment

The idea of the experiment is to determine if, in a short amount of development time, it is possible to develop a small dependency system that works on this use case.

Simple UI example:

<xforms:group ref="instance('main')">
    <xforms:repeat nodeset="department">
        <xforms:repeat nodeset="employee">
            <xforms:input ref="."/>
        </xforms:repeat>
    </xforms:repeat>
</xforms:group>

This is how things are meant to work:
  • at static analysis time
    • XPath expression analysis performed
    • analysis is 100% static
    • analysis is based on the Saxon PathMap code
  • a marking mechanism keeps track of modified element/attribute values between refreshes
    • it is not actually necessary to keep all individual nodes, but paths to nodes, e.g. only one marking is needed for:
      • instance()/company[2]/employee[1] and
      • instance()/company[1]/employee[7]
      • -> canonically store instance('foobar')/company/employee
  • during refresh
    • UI bindings are considered from root to leaf
    • for each UI binding expression (@ref, @nodeset)
      • static XPath analysis of the expression is retrieved
        • the result
          • is a series of paths starting at an instance root
          • is stored in XFStaticState
      • each path changed in the instance is compared to the list of binding paths
        • instance('foobar')/company/employee impacts itself
        • instance('foobar')/company/employee impacts instance('foobar')/company/employee/name
          • however, value controls cannot bind to complex content, so this case doesn't actually occur for bindings!
        • instance('foobar')/company/employee doesn't impact instance('foobar')/company
      • the binding is updated
        • if there is intersection, the binding is recomputed (XPath evaluation)
        • if there is no intersection, the binding is kept (no XPath evaluation)
Phase 1 limitations:
  • structural changes cause a full rebuild every time
  • upward axes are not supported
  • UI cannot make use of model variables
  • @context is not handled
  • @bind is not handled
  • @model is not handled

Implementation notes

  • branch saxon9 has port of Saxon 9.1
  • for UI variables support
    • variables need to be represented as controls so they can hold a binding between refreshes
      • done: XXFormsVariableControl
    • PathMap needs a way of being notified of variables as expression roots
      • Saxon might have to be patched for this

Comments