Projects‎ > ‎

Pipelines - Core XPL Engine Improvements Proposals

Note: This page describes an Orbeon Forms project, not a feature which is currently part of Orbeon Forms.

Pipeline keys, validities, reads, and caching

Inefficient unread input scenario

In Form Builder, we have identified an issue:
  • oxf:xslt connects a "request" input, connected to an oxf:request
  • XSLT may not actually read the "request" input
  • oxf:request in this case returns a null key, because it doesn't read its config input upon getKey()
  • therefore, the XSLT result is not cacheable
Current code uses oxf:null-serializer to work around the issue:

<!-- Get request information -->
<!-- Noscript parameter -->
<p:processor name="oxf:request">
    <p:input name="config">
        <config>
            <include>/request/parameters/parameter[starts-with(name, 'fr-noscript')]</include>
        </config>
    </p:input>
    <p:output name="data" id="request"/>
</p:processor>

<!--
    DO NOT REMOVE THIS UNLESS YOU REALLY KNOW WHAT YOU ARE DOING! This is in place to make sure we read the
    #request output above. components.xsl below may not read it at times, which causes oxf:request to never cache
    its output, leading to oxf:xforms-to-xhtml's input to not be cacheable. Tricky.
-->
<p:processor name="oxf:null-serializer">
    <p:input name="data" href="#request"/>
</p:processor>

<!-- Apply UI components -->
<p:processor name="oxf:unsafe-xslt">
    <p:input name="data" href="#themed-data"/>
    <p:input name="instance" href="#instance"/>
    <p:input name="request" href="#request"/>
    <p:input name="config" href="components/components.xsl"/>
    <p:output name="data" id="after-components"/>
</p:processor>

The idea is to think whether we can fix this, e.g.:
  • is it reasonable to read and cache oxf:request's config input upon getKey()?
    • oxf:url-generator does not seem to do this either

See next section for more details.

Determining what causes an input not to be cacheable scenario

[2010-07-06]

The XForms engine performs much better if the XForms document is cacheable. This allows the XForms engine to reuse data structures, including static analysis.

When the XForms document is not cacheable, it is often hard to figure out why. While analyzing this, we found out that the particular split of key/validity/read in use at the moment is problematic for the same reasons shown above:
  • some inputs are not read and do not return a key
  • because some configurations are not read, often no reasonable information as to why the key is null is provided

Proposal for a solution

Facts:
  • processors often read a "configuration" input and, if possible, store in cache an object associated with that input
    • oxf:url-generator: stores a Config object containing URL and other configuration settings
    • oxf:xslt: stores compiled stylesheet and URL dependencies
    • oxf:xinclude: stores URL dependencies
    • etc.
  • this typically calls ProcessorImpl.readCacheInputAsSAX()
  • if a processor output's getKey() returns null, it is almost certain that read() will then be called
The main idea is that, upon getKey(), a processor that needs its configuration input to produce a key might as well call read() on its configuration input if it is not in cache. This way, it has a chance of producing a key right away instead of the caller later having to call getKey() again.

Benefits:
  • this solves the scenarios above
    • inefficient unread input scenario
    • determining what causes an input not to be cacheable scenario
  • this avoids calling getKey() twice (before AND after read())
  • could this also increase the chance of findings data in cache?
Implementation steps:
  • all processors that read configuration inputs to determine their output key must
    • read that configuration input upon getKey() if not already available from cache
    • use the configuration information to return a key
    • if, after reading, the configuration input cannot be stored in cache, store it in transient state so that read() will find it (otherwise two reads will happen, which is inefficient and not allowed)
  • ProcessorImpl.readCacheInputAsSAX()
    • like now
      • call getKey() before reading
      • if found in cache, return object
      • if not found in cache
        • read()
        • try to store in cache if key is not null
    • main difference
      • do not try to call getKey() again a second time after reading
  • tracing
    • update to store key / read information nicely

Related proposal: conditional read

Once the improvement to getKey() above is implemented, it is possible to implement a conditional read:
  • if key is not null, ask cache for any object with that key
  • if found, call long readIfNeeded(long validity, XMLReceiver xmlReceiver)
    • this passes a resolved validity timestamp and a receiver
    • if any validity along the chain is newer than the validity passed, output to xmlReceiver is produced
    • the method returns the new validity
    • the cache is updated


Implemented projects

Support for comments in pipelines

NOTE: This feature is implemented since 2010-06-28. See: Handling XML Comments.

Current situation:
Next steps:
  • Refactor to support also SAX LexicalHandler
  • Possibly: create interface implementing both ContentHandler and LexicalHandler
  • Start with candidate methods in ProcessorImpl:
    • ProcessorOutput.read()
    • ProcessorImpl.ProcessorOutputImpl.readImpl()
    • ProcessorImpl.readInputAsSAX()
  • Some processors might have to be modified

Comments