Orbeon community‎ > ‎Contributions‎ > ‎

XSLFO Converter

Introduction

This document describes the functionalities that should be implemented by the XSLT-FO converter processor.

The XSLT-FO converter reads a XSL-FO file and renders the resulting pages to a specified output. The processor uses the Apache FOP (Formatting Objects Processor) library as main component.

XSL-FO is an XML vocabulary that is used to specify a pagination and other styling for page layout output. The acronym “FO” stands for Formatting Objects. XSL-FO defines a set of elements in XML that describes the way pages are set up. The contents of the pages are filled from flows. There can be static flows that appear on every page (for headers and footers) and the main flow which fills the body of the page. See chapter two for more information about XSLT-FO compliance of the processor.

When to use this processor:

  • If your input data is heavily based on XML.
    XSL-FO processor uses the standard XSL-FO file format as input, lays the content out into pages, then renders it to the requested output. One great advantage of using XSL-FO as input is that XSL-FO is itself an XML file, which means that it can be conveniently created from a variety of sources. The most common method is to convert semantic XML to XSL-FO, using an XSLT transformation.

  • If you want to support output formats like PDF,RTF,PNG, SVG,TXT and so on

    see http://xmlgraphics.apache.org/fop/0.95/output.html.

  • If you DONT want to generate PDF that contains 1000+ pages on every instance.
    The processor is based on the Apache FOP library. Apache FOP is known for slow processing power. It won't fit for the cases where you might need to process and create 1000+ pages in a very short time.


Libraries

The following additional libraries, apart from the ones shipped with Orbeon, are used in the realisation of this component:

             The release 3.7_beta1 of Orbeon uses FOP version 0.93

Input document: config

Purpose

This input document is used in order to set all configuration options for the XSLT-FO processor.

Config option

Data type

Description

Default

config/content-type

String

Specifies the output format. Currently it only supports application/pdf,application/rtf,text/richtext and text/rtf

application/pdf


 

Namespace

All nodes in this document should be in the following namespace: http://www.orbeon.com/oxf/xslfo/config

XML Schema definition

<?xml version="1.0" encoding="UTF-8" ?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://orbeon.ictu.nl/oxf/xslfo/config"

xmlns="http://orbeon.ictu.nl/oxf/xslfo/config" elementFormDefault="qualified">

<xs:element name="config">

<xs:complexType>

<xs:sequence>

<xs:element name="content-type" type="xs:string" />

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

Example document

<config xmlns="http://orbeon.ictu.nl/oxf/xslfo/config">

<content-type>application/rtf</content-type>

</config>

Input document: data

Purpose

This input document contains the XSLT-FO content.

Namespace

All nodes in this document should be in the following namespace: http://www.w3.org/1999/XSL/Format

XML Schema definition

http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/foschema/fop.xsd?view=co

Example document

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

<fo:layout-master-set>
<!-- fo:layout-master-set defines in its children the page layout:
the pagination and layout specifications
- page-masters: have the role of describing the intended subdivisions
of a page and the geometry of these subdivisions
In this case there is only a simple-page-master which defines the
layout for all pages of the text
-->
<!-- layout information -->
<fo:simple-page-master master-name="simple"
page-height="29.7cm"
page-width="21cm"
margin-top="1cm"
margin-bottom="2cm"
margin-left="2.5cm"
margin-right="2.5cm">
<fo:region-body margin-top="3cm"/>
<fo:region-before extent="3cm"/>
<fo:region-after extent="1.5cm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<!-- end: defines page layout -->


<!-- start page-sequence
here comes the text (contained in flow objects)
the page-sequence can contain different fo:flows
the attribute value of master-name refers to the page layout
which is to be used to layout the text contained in this
page-sequence-->
<fo:page-sequence master-reference="simple">

<!-- start fo:flow
each flow is targeted
at one (and only one) of the following:
xsl-region-body (usually: normal text)
xsl-region-before (usually: header)
xsl-region-after (usually: footer)
xsl-region-start (usually: left margin)
xsl-region-end (usually: right margin)
['usually' applies here to languages with left-right and top-down
writing direction like English]
in this case there is only one target: xsl-region-body
-->
<fo:flow flow-name="xsl-region-body">

<!-- each paragraph is encapsulated in a block element
the attributes of the block define
font-family and size, line-heigth etc. -->

<!-- this defines a title -->
<fo:block font-size="18pt"
font-family="sans-serif"
line-height="24pt"
space-after.optimum="15pt"
background-color="blue"
color="white"
text-align="center"
padding-top="3pt">
Extensible Markup Language (XML) 1.0
</fo:block>


<!-- this defines normal text -->
<fo:block font-size="12pt"
font-family="sans-serif"
line-height="15pt"
space-after.optimum="3pt"
text-align="justify">
The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to
enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML
has been designed for ease of implementation and for interoperability with both SGML and HTML.
</fo:block>

<!-- this defines normal text -->
<fo:block font-size="12pt"
font-family="sans-serif"
line-height="15pt"
space-after.optimum="3pt"
text-align="justify">
The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to
enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML
has been designed for ease of implementation and for interoperability with both SGML and HTML.
</fo:block>

</fo:flow> <!-- closes the flow element-->
</fo:page-sequence> <!-- closes the page-sequence -->
</fo:root>


Processing

The high level internal workings of the XSLFO converter processor are illustrated in the following table:

Step

Description

Notes

1

Generate a internal Config Object or retrieve it from Orbeon's cache using the config input document.

Use the org.orbeon.oxf.processor.ProcessorImpl.readCacheInputAsObject() method in order to cache the config object.

2

Determine the ContentHandlerOutputStream object based on the given content-type. If no or no valid content type is given, then the content-type is application/pdf.

Using the internal function getContentHandlerOutputStream() which contains the determination algorithm.


ContentHandlerOutputStream object extends the outputstream class and wraps the contentHandler of the processor output

3

Create the document element

Call to the ContentHandlerOutputStream->startDocument function

4

Read the input data (xslfo content) as SAX and send this data to the FOP component. The FOP component resulting binary/text stream is then send to the ContentHandlerOutputStream object.

Using the Apache FOP 0.9.5. component.


Using the internal function readInput()

5

Close the document element

Call to the ContentHandlerOutputStream->close function

6

Provide the result to the processor ouput



Output document: data

Purpose

The resulting binary/text stream is sent to the data output. It's embedded in an XML document:

<document xsi:type="[xs:string|xs:base64Binary]" content-type="[application/pdf|application/rtf|text/richtext|text/rtf]">

[resulting binary/text stream]

</document>


Namespace

None.

XML Schema definition

None.

Example document

Example of PDF output data:

<document xsi:type="xs:base64Binary" content-type="application/pdf">

JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovUHJvZHVjZXIgKEFwYWNoZSBGT1AgVmVyc2lvbiAw

LjkzKQovQ3JlYXRpb25EYXRlIChEOjIwMDkwNDI4MTQyOTQ2KzAyJzAwJykKPj4KZW5kb2JqCjUg

MCBvYmoKPDwgL04gMwovTGVuZ3RoIDEwIDAgUgovRmlsdGVyIC9GbGF0ZURlY29kZSAKPj4Kc3Ry

ZWFtCnicnZZ3VFPZFofPvTe9UJIQipTQa2hSAkgNvUiRLioxCRBKwJAAIjZEVHBEUZGmCDIo4ICj

Q5GxIoqFAVGx6wQZRNRxcBQblklkrRnfvHnvzZvfH/d+a5+9z91n733WugCQ/IMFwkxYCYAMoVgU

4efFiI2LZ2AHAQzwAANsAOBws7NCFvhGApkCfNiMbJkT+Be9ug4g+fsq0z+MwQD/n5S5WSIxAFCY

jOfy+NlcGRfJOD1XnCW3T8mYtjRNzjBKziJZgjJWk3PyLFt89pllDznzMoQ8GctzzuJl8OTcJ+ON

ORK+jJFgGRfnCPi5Mr4mY4N0SYZAxm/ksRl8TjYAKJLcLuZzU2RsLWOSKDKCLeN5AOBIyV/w0i9Y

zM8Tyw/FzsxaLhIkp4gZJlxTho2TE4vhz89N54vFzDAON40j4jHYmRlZHOFyAGbP/FkUeW0ZsiI7

2Dg5ODBtLW2+KNR/Xfybkvd2ll6Ef+4ZRB/4w/ZXfpkNALCmZbXZ+odtaRUAXesBULv9h81gLwCK

...

</document>


Example of RTF output data:

<document xsi:type="xs:string" content-type="application/rtf">

{\rtf1 \ansi

{\colortbl;

\red0\green0\blue0;

\red255\green255\blue255;

\red255\green0\blue0;

\red0\green255\blue0;

\red0\green0\blue255;

\red0\green255\blue255;

\red255\green0\blue255;


...

</document>


Error handling

The following errors should be handled accordingly:

Error

Handling

Input documents are not valid XML

Processor handles this by throwing an org.orbeon.oxf.common.ValidationException.

Input documents do not comply to the XSD

Input config:

Processor handles this by throwing an org.orbeon.oxf.common.ValidationException.

Input data:

Processor handles this by throwing an org.apache.fop.fo.ValidationException

Input documents do not have the right namespace

Input config:

Processor handles this by throwing an org.orbeon.oxf.common.ValidationException

Input data:

Processor handles this by throwing an org.apache.fop.fo.ValidationException

Exception thrown when apache FOP component has a problem when parsing XLST-FO data

Processor handles this by throwing an org.apache.fop.apps.FOPException


Usage in a pipeline


Within a pipeline the processor can be used as follows:

<p:processor name="oxf:xslfoconverter" xmlns:p="http://www.orbeon.com/oxf/pipeline">    
    <p:input name="config" >        
         <config xmlns="http://orbeon.ictu.nl/oxf/xslfo/config">     
            <content-type>application/pdf</content-type>  
        </config>
     </p:input>
     <p:input name="data" href="oxf:/fo/simple.fo"/>   
     <p:output name="data" id="document"/>  
</p:processor>
        
<p:processor name="oxf:http-serializer" xmlns:p="http://www.orbeon.com/oxf/pipeline">
   <p:input name="config">
        <config />
    </p:input>
   <p:input name="data" href="#document" />
</p:processor>


References and Documentation

Orbeon Processor API

http://www.orbeon.com/ops/doc/reference-processor-api

Apache FOP XSF-FO XSD

http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/foschema/fop.xsd?view=co

Apache FOP Implementation Examples

http://xmlgraphics.apache.org/fop/0.95/embedding.html

How to build Javadoc:

http://xmlgraphics.apache.org/fop/dev/api-doc.html

XSL-FO compliance:

http://xmlgraphics.apache.org/fop/compliance.html.

Supported formats and limitations:

http://xmlgraphics.apache.org/fop/0.95/output.html

Apache FOP known issues

http://xmlgraphics.apache.org/fop/0.95/knownissues_overview.html

ċ
xslfoconverter-1.5-rc3-SNAPSHOT-jar-with-dependencies.jar
(9777k)
Alessandro Vernet,
May 19, 2009, 5:27 AM
ċ
xslfoconverter-1.5-rc3-SNAPSHOT-src.zip
(29k)
Alessandro Vernet,
May 19, 2009, 7:01 AM
Comments