Introduction
This document describes the functionalities that
should be implemented by the XSLT-FO converter processor.
The XSLT-FO converter reads a XSL-FO file and
renders the resulting pages to a specified output. The processor
uses the Apache FOP (Formatting Objects Processor)
library as main component.
XSL-FO is an XML vocabulary that is used to specify
a pagination and other styling for page layout output. The acronym
“FO” stands for Formatting Objects. XSL-FO defines a set of
elements in XML that describes the way pages are set up. The contents
of the pages are filled from flows. There can be static flows that
appear on every page (for headers and footers) and the main flow
which fills the body of the page. See chapter two for more
information about XSLT-FO compliance of the processor.
When to use this processor:
If your input data is heavily
based on XML.
XSL-FO
processor uses the standard XSL-FO file format as input, lays the
content out into pages, then renders it to the requested output. One
great advantage of using XSL-FO as input is that XSL-FO is itself an
XML file, which means that it can be conveniently created from a
variety of sources. The most common method is to convert semantic
XML to XSL-FO, using an XSLT transformation.
If you want to support output
formats like PDF,RTF,PNG, SVG,TXT and so on
see
http://xmlgraphics.apache.org/fop/0.95/output.html.
If you DONT want to generate PDF
that contains 1000+ pages on every instance.
The
processor is based on the Apache FOP library. Apache FOP is known
for slow processing power. It won't fit for the cases where you
might need to process and create 1000+ pages in a very short time.
Libraries
The following additional libraries,
apart from the ones shipped with Orbeon, are used in the realisation
of this component:
The release 3.7_beta1 of Orbeon uses FOP version 0.93
Input document:
config
Purpose
This input document is used in order
to set all configuration options for the XSLT-FO processor.
Config option
|
Data type
|
Description
|
Default
|
config/content-type
|
String
|
Specifies the output format.
Currently it only supports
application/pdf,application/rtf,text/richtext and text/rtf
|
application/pdf
|
Namespace
All nodes in this document should be in
the following namespace: http://www.orbeon.com/oxf/xslfo/config
XML Schema definition
<?xml version="1.0" encoding="UTF-8"
?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://orbeon.ictu.nl/oxf/xslfo/config"
xmlns="http://orbeon.ictu.nl/oxf/xslfo/config"
elementFormDefault="qualified">
<xs:element name="config">
<xs:complexType>
<xs:sequence>
<xs:element name="content-type"
type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
|
Example document
Input document: data
Purpose
This input document contains the
XSLT-FO content.
Namespace
All nodes in this document should be in
the following namespace: http://www.w3.org/1999/XSL/Format
XML Schema definition
http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/foschema/fop.xsd?view=co
Example document
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set> <!-- fo:layout-master-set
defines in its children the page layout: the pagination
and layout specifications - page-masters: have the role
of describing the intended subdivisions
of a page and the geometry of these subdivisions
In this case there is only a simple-page-master which
defines the layout for all pages of the
text --> <!-- layout information -->
<fo:simple-page-master master-name="simple"
page-height="29.7cm"
page-width="21cm" margin-top="1cm"
margin-bottom="2cm"
margin-left="2.5cm"
margin-right="2.5cm"> <fo:region-body
margin-top="3cm"/> <fo:region-before
extent="3cm"/> <fo:region-after
extent="1.5cm"/> </fo:simple-page-master>
</fo:layout-master-set> <!-- end: defines page
layout -->
<!-- start page-sequence
here comes the text (contained in flow objects) the
page-sequence can contain different fo:flows the
attribute value of master-name refers to the page layout
which is to be used to layout the text contained in this
page-sequence--> <fo:page-sequence
master-reference="simple">
<!-- start
fo:flow each flow is targeted at one
(and only one) of the following: xsl-region-body
(usually: normal text) xsl-region-before (usually:
header) xsl-region-after (usually: footer)
xsl-region-start (usually: left margin)
xsl-region-end (usually: right margin) ['usually'
applies here to languages with left-right and top-down
writing direction like English] in this case
there is only one target: xsl-region-body -->
<fo:flow flow-name="xsl-region-body">
<!-- each paragraph is encapsulated in a block element
the attributes of the block define font-family
and size, line-heigth etc. -->
<!-- this
defines a title --> <fo:block font-size="18pt"
font-family="sans-serif"
line-height="24pt"
space-after.optimum="15pt"
background-color="blue" color="white"
text-align="center"
padding-top="3pt"> Extensible Markup
Language (XML) 1.0 </fo:block>
<!-- this defines normal text --> <fo:block
font-size="12pt"
font-family="sans-serif"
line-height="15pt"
space-after.optimum="3pt"
text-align="justify"> The Extensible
Markup Language (XML) is a subset of SGML that is completely
described in this document. Its goal is to enable
generic SGML to be served, received, and processed on the Web in
the way that is now possible with HTML. XML has been
designed for ease of implementation and for interoperability with
both SGML and HTML. </fo:block>
<!--
this defines normal text --> <fo:block
font-size="12pt"
font-family="sans-serif"
line-height="15pt"
space-after.optimum="3pt"
text-align="justify"> The Extensible
Markup Language (XML) is a subset of SGML that is completely
described in this document. Its goal is to enable
generic SGML to be served, received, and processed on the Web in
the way that is now possible with HTML. XML has been
designed for ease of implementation and for interoperability with
both SGML and HTML. </fo:block>
</fo:flow> <!-- closes the flow element-->
</fo:page-sequence> <!-- closes the page-sequence
--> </fo:root>
|
Processing
The high level internal workings of
the XSLFO converter processor are illustrated in the following table:
Step
|
Description
|
Notes
|
1
|
Generate a internal Config Object or retrieve it from
Orbeon's cache using the config input document.
|
Use the
org.orbeon.oxf.processor.ProcessorImpl.readCacheInputAsObject()
method in order to cache the config object.
|
2
|
Determine the ContentHandlerOutputStream object based on
the given content-type. If no or no valid content type is given,
then the content-type is application/pdf.
|
Using the internal function getContentHandlerOutputStream()
which contains the determination algorithm.
ContentHandlerOutputStream object extends the outputstream
class and wraps the contentHandler of the processor output
|
3
|
Create the document element
|
Call to the ContentHandlerOutputStream->startDocument
function
|
4
|
Read the input data (xslfo content) as SAX and send this
data to the FOP component. The FOP component resulting binary/text
stream is then send to the ContentHandlerOutputStream object.
|
Using the Apache FOP 0.9.5. component.
Using the internal function readInput()
|
5
|
Close the document element
|
Call to the ContentHandlerOutputStream->close function
|
6
|
Provide the result to the processor ouput
|
|
Output document: data
Purpose
The resulting binary/text stream is sent to the data output. It's
embedded in an XML document:
<document xsi:type="[xs:string|xs:base64Binary]"
content-type="[application/pdf|application/rtf|text/richtext|text/rtf]">
[resulting binary/text stream]
</document>
|
Namespace
None.
XML Schema definition
None.
Example document
Example of PDF output data:
<document xsi:type="xs:base64Binary"
content-type="application/pdf">
JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovUHJvZHVjZXIgKEFwYWNoZSBGT1AgVmVyc2lvbiAw
LjkzKQovQ3JlYXRpb25EYXRlIChEOjIwMDkwNDI4MTQyOTQ2KzAyJzAwJykKPj4KZW5kb2JqCjUg
MCBvYmoKPDwgL04gMwovTGVuZ3RoIDEwIDAgUgovRmlsdGVyIC9GbGF0ZURlY29kZSAKPj4Kc3Ry
ZWFtCnicnZZ3VFPZFofPvTe9UJIQipTQa2hSAkgNvUiRLioxCRBKwJAAIjZEVHBEUZGmCDIo4ICj
Q5GxIoqFAVGx6wQZRNRxcBQblklkrRnfvHnvzZvfH/d+a5+9z91n733WugCQ/IMFwkxYCYAMoVgU
4efFiI2LZ2AHAQzwAANsAOBws7NCFvhGApkCfNiMbJkT+Be9ug4g+fsq0z+MwQD/n5S5WSIxAFCY
jOfy+NlcGRfJOD1XnCW3T8mYtjRNzjBKziJZgjJWk3PyLFt89pllDznzMoQ8GctzzuJl8OTcJ+ON
ORK+jJFgGRfnCPi5Mr4mY4N0SYZAxm/ksRl8TjYAKJLcLuZzU2RsLWOSKDKCLeN5AOBIyV/w0i9Y
zM8Tyw/FzsxaLhIkp4gZJlxTho2TE4vhz89N54vFzDAON40j4jHYmRlZHOFyAGbP/FkUeW0ZsiI7
2Dg5ODBtLW2+KNR/Xfybkvd2ll6Ef+4ZRB/4w/ZXfpkNALCmZbXZ+odtaRUAXesBULv9h81gLwCK
...
</document>
|
Example of RTF output data:
<document xsi:type="xs:string"
content-type="application/rtf">
{\rtf1 \ansi
{\colortbl;
\red0\green0\blue0;
\red255\green255\blue255;
\red255\green0\blue0;
\red0\green255\blue0;
\red0\green0\blue255;
\red0\green255\blue255;
\red255\green0\blue255;
...
</document>
|
Error handling
The following errors should be handled
accordingly:
Error
|
Handling
|
Input documents are not valid XML
|
Processor handles this by throwing an
org.orbeon.oxf.common.ValidationException.
|
Input documents do not comply to the XSD
|
Input config:
Processor handles this by throwing an
org.orbeon.oxf.common.ValidationException.
Input data:
Processor handles this by throwing an
org.apache.fop.fo.ValidationException
|
Input documents do not have the right namespace
|
Input config:
Processor handles this by throwing an
org.orbeon.oxf.common.ValidationException
Input data:
Processor handles this by throwing an
org.apache.fop.fo.ValidationException
|
Exception thrown when apache FOP component has a problem
when parsing XLST-FO data
|
Processor handles this by throwing an
org.apache.fop.apps.FOPException
|
Usage in a pipeline
Within a pipeline the processor can be used as follows:
<p:processor name="oxf:xslfoconverter" xmlns:p="http://www.orbeon.com/oxf/pipeline">
<p:input name="config" >
<config xmlns="http://orbeon.ictu.nl/oxf/xslfo/config">
<content-type>application/pdf</content-type>
</config>
</p:input>
<p:input name="data" href="oxf:/fo/simple.fo"/>
<p:output name="data" id="document"/>
</p:processor>
<p:processor name="oxf:http-serializer" xmlns:p="http://www.orbeon.com/oxf/pipeline">
<p:input name="config">
<config />
</p:input>
<p:input name="data" href="#document" />
</p:processor>
References and Documentation