Introduction
Converters are processors converting XML documents from one format to another. For
example, the standard HTML converter documented below converts an XML document into
an HTML document. This HTML document can then be sent to a web browser using the HTTP serializer, or attached to an
email with the Email processor.
Converters typically have a data output containing the converted
document.
Standard converters
The standard converters convert XML infosets (the XML documents that circulate in
Orbeon Forms pipelines) into text according to standard output methods
defined by the XSLT specification. They convert to the following formats:
- XML: a standard XML document
- HTML: a standard HTML document
- XHTML: a standard XHTML document
- Text: any text document
The resulting text is sent to the data output. It is embedded in an XML
document as specified by the text
document format.
Configuration
The configuration of the standard converters consists of the following optional
elements:
Element |
Purpose |
Default |
method |
XSLT output method (one of xml , html , xhtml or text ) |
xml , html or text , depending on the serializer; or [SINCE: 2011-02-16] custom serializer as specified in a configuration property |
content-type |
Content type hint specified on the output document element |
Specific to each serializer |
encoding |
Encoding hint specified on the output document element |
utf-8 |
version |
HTML or XML version number |
4.01 for HTML (ignored for XML, which always output 1.0) |
public-doctype |
The public doctype |
"-//W3C//DTD HTML 4.01 Transitional//EN" for HTML, none otherwise |
system-doctype |
The system doctype |
"http://www.w3.org/TR/html4/loose.dtd" for HTML, none otherwise |
omit-xml-declaration |
Specifies whether an XML declaration must be omitted |
false for XML and HTML (i.e. a declaration is output by default), ignored otherwise |
standalone |
If true, specifies standalone="yes" in the document
declaration. If false, specifies standalone="no" in the
document declaration. If missing, no standalone attribute is produced.
For more information about standalone document declarations, please
refer to the relevant
section of the XML specification. In most cases, this does not need
to be specified.
|
not specified for XML, ignored otherwise |
indent |
Specifies if the output is indented. This means that line breaks maybe
be inserted between adjacent elements. The actual level of indentation
is specified with the indent-amount configuration element.
|
true (ignored for text method) |
indent-amount |
Specifies the number of indentation space |
1 (ignored for text method) |
Example:
<config>
<content-type>text/html</content-type>
<encoding>utf-8</encoding>
<version>4.01</version>
<public-doctype>-//W3C//DTD HTML 4.01//EN</public-doctype>
<system-doctype>http://www.w3.org/TR/html4/strict.dtd</system-doctype>
<indent-amount>4</indent-amount>
</config>
XML converter
The XML converter outputs an XML document conform to the XSLT xml
semantic. By default, the output is indented with no spaces and encoded using
the UTF-8 character set. The default MIME content type is
application/xml . The following is a simple XML converter example:
<p:processor name="oxf:xml-converter">
<p:input name="config">
<config>
<content-type>application/xml</content-type>
<encoding>iso-8859-1</encoding>
<version>1.0</version>
</config>
</p:input>
<p:input name="data" href="oxf:/my-xml-document.xml"/>
<p:output name="data" id="xml-document"/>
</p:processor>
This is an example of output produced by the XML converter:
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string" content-type="application/xml; charset=iso-8859-1"><?xml version="1.0" encoding="iso-8859-1" standalone="no"?> <claim xmlns="http://orbeon.org/oxf/examples/bizdoc/claim"> <insured-info> <general-info> <name-info> <title-prefix>Dr.</title-prefix> <last-name>Doe</last-name> <first-name>John</first-name> <title-suffix/> </name-info> <address> <address-detail> <street-name>N Columbus Dr.</street-name> <street-number>511</street-number> <unit-number/> </address-detail> <city>Chicago</city> <state-province>IL</state-province> <postal-code>60611</postal-code> <country>USA</country> <email>jdoe@acme.org</email> </address> </general-info> <person-info> <gender-code>M</gender-code> <birth-date>1972-10-01</birth-date> <marital-status-code>C</marital-status-code> <occupation>Manager</occupation> </person-info> <family-info> <children> <child> <birth-date>2003-02-02</birth-date> <first-name>Marco</first-name> </child> <child> <birth-date/> <first-name/> </child> </children> <comments>No comments at this point!</comments> </family-info> <claim-info> <accident-type>FOOT</accident-type> <accident-date>2004-07-06</accident-date> <rate/> </claim-info> </insured-info> </claim></document>
HTML converter
The HTML converter outputs an HTML document conform to the XSLT
html semantic. By default, the doctype is set to HTML
4.0 Transitional and the content is indented with no space and encoded
using the UTF-8 character set. The default content type is
text/html . The following is a simple HTML converter example:
<p:processor xmlns:p="http://www.orbeon.com/oxf/pipeline" name="oxf:html-converter">
<p:input name="config">
<config>
<content-type>text/html</content-type>
<encoding>iso-8859-1</encoding>
<public-doctype>-//W3C//DTD HTML 4.01 Transitional//EN</public-doctype>
<version>4.01</version>
</config>
</p:input>
<p:input name="data">
<html>
<head>
<title>My HTML document</title>
</head>
<body>
<p>
This is the content of the HTML document.
</p>
</body>
</html>
</p:input>
<p:output name="data" id="html-document"/>
</p:processor>
This is an example of output produced by the HTML converter:
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string" content-type="text/html; charset=iso-8859-1">
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>My HTML document</title>
</head>
<body>
<p>
This is the content of the HTML document.
</p>
</body>
</html>
</document>
[SINCE 2012-06-19]
HTML version 5 is now supported and you can output an HTML 5 doctype simply with the following configuration:
Text converter
The Text converter outputs a text document conform to the XSLT text
semantic. By default, the output is encoded using the UTF-8 character set. This
serializer is typically useful for pipelines generating Comma Separated Value
(CSV) files. The default content type is text/plain . The following
is a simple Text converter example:
<p:processor xmlns:p="http://www.orbeon.com/oxf/pipeline" name="oxf:text-converter">
<p:input name="config">
<config/>
</p:input>
<p:input name="data">
<document>
This is just plain text. It will be output without the <em>text</em> and <em>em</em> elements.
</document>
</p:input>
<p:output name="data" id="text-document"/>
</p:processor>
This is an example of output produced by the Text converter:
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string" content-type="text/plain; charset=utf-8">
This is just plain text. It will be output without the text and em
elements.
</document>
Custom serializer classes
[SINCE: 2011-02-16]
Setting custom serializers
The <method> element of oxf:xml-converter , oxf:html-converter , and oxf:text-converter supports setting a qualified name pointing to a custom Java class to perform the serialization:
<method>p :org.orbeon.saxon.event.XML1252Emitter</method>
NOTE: The prefix can map to any namespace but must be in scope for the <method> element.
The XML converter also supports the default-method configuration property which allows to choose the serializer class to use when no <method> element is specified:
<property as="xs:QName" processor-name="oxf:xml-converter" name="default-method"
value="oxf:org.orbeon.saxon.event.XML1252Emitter"/>
NOTE: The prefix can map to any namespace but must be in scope for the <property> element.
Built-in custom serializers
By default, two such serializers are available:
- org.orbeon.saxon.event.XML1252Emitter
- org.orbeon.saxon.event.HTML1252Emitter
These serializers are the same as the standard XML and HTML serializers, except that they convert any character in the range 127-159 to non-control, graphic unicode characters, as if these were in the Windows CP-1252 character set.
In cases where data was incorrectly converted from Windows CP-1252 to unicode, these serializers provide a quick way of fixing up data sent to the browser.
For example, to configure the XForms engine to fix-up characters for both the initial page show and Ajax requests, set these properties in properties-local.xml :
<property as="xs:QName" processor-name="oxf:xml-converter" name="default-method"
value="oxf:org.orbeon.saxon.event.XML1252Emitter"/>
<property as="xs:QName" processor-name="oxf:html-converter" name="default-method"
value="oxf:org.orbeon.saxon.event.HTML1252Emitter"/>
To-XML converter
The To-XML Converter produces a parsed XML document from a binary document format.
Configuration
The data input of the To-XML Converter follows the binary document format. Its data output
is an XML document.
The mandatory config input consists of the following optional elements:
Element |
Purpose |
Default |
validate |
Whether to perform validation at the time of parsing |
false |
handle-xinclude |
Whether to handle XInclude at the time of parsing |
false |
Example
This is an example of use:
<p:config xmlns:p="http://www.orbeon.com/oxf/pipeline">
<p:param type="output" name="data"/>
<!-- Read an XSLT document and output it in binary format -->
<p:processor name="oxf:url-generator">
<p:input name="config">
<config>
<url>parsing-view.xsl</url>
<mode>binary</mode>
</config>
</p:input>
<p:output name="data" id="xml-file-as-binary"/>
</p:processor>
<!-- Serialize back the binary format to XML while performing XInclude -->
<p:processor name="oxf:to-xml-converter">
<p:input name="data" href="#xml-file-as-binary"/>
<p:input name="config">
<config>
<handle-xinclude>true</handle-xinclude>
</config>
</p:input>
<p:output name="data" ref="data"/>
</p:processor>
</p:config>
Plain HTML/XHTML converters[SINCE 2012-07-31]
These converters take an XHTML input document, and performs the following: - remove all elements not in the XHTML namespace
- remove all attributes in a namespace
- remove the prefix of all XHTML elements
- remove all other namespace information on elements
- for XHTML
- add the XHTML namespace as default namespace on the root element
- all elements in the document are in the XHTML namespace
- otherwise
- don't output any namespace declaration
- all elements in the document are in no namespace
These processors are used to make sure clean XHTML or HTML output (depending on file serialization) is produced.
Examples:
<p:processor name="oxf:plain-html-converter"> <p:input name="data" href="#rewritten-data"/> <p:output name="data" id="xhtml-data"/> </p:processor>
<p:processor name="oxf:plain-xhtml-converter">
<p:input name="data" href="#rewritten-data"/>
<p:output name="data" id="xhtml-data"/>
XSL-FO converter
The XSL-FO Converter produces PDF documents from an XSL-FO description of the page. The default
content type is application/pdf .
The resulting binary stream is sent to the data output. It is embedded
in an XML document as specified by the binary document format.
|
|