Understanding Report XML Files
This section describes some of the details of the DataVision XML file
format.
There are a few general parts to each XML file: the report, the data source
description, parameter definitions, formula definitions, and the report
sections that contain the fields and strings to output.
The DTD file report.dtd is a mostly
accurate representation of a legal report file. DataVision uses the MinML2
parser to parse its XML files. The DTD snippets below may not be completely
up-to-date. When in doubt, see the DTD file.
All sizes used in a report file are in points (72 points = 1 inch). All
coordinates and lengths are floating point, but may be truncated by layout
engines. Colors are stored as strings of the form ``red;green;blue;alpha''.
Each component is an integer from 0 to 255.
For an example report XML file that uses a JDBC database connection, see
postgresql.xml in the examples directory. For an
example that uses a character-separated file as a data source, see
charsep.xml in the same directory.
Elements
report
The ``report'' element is the root element of the XML. The attribute
``dtd-version'' defines the version of the report XML format used to create
this report XML file. If this attribute is missing, its value is assumed to
be ``0.1''.
The report element's remaining attributes (``name'', ``title'', and
``author'') are values that may be edited and displayed in the report by
using Special Fields.
See also the ``description'' element.
paper
This element's ``name'' attribute describes the paper size used by this
report. The name attribute's value must be one of the strings found in the
paper.properties file contained in the
DataVision.jar JAR file.
jimm/properties/paper.properties is copied into
DataVision.jar; that is the version used by DataVision when it
is running. If you need to change the paper size values, edit the file
paper.properties and then replace it in the JAR file by using
the commands
> cd jimm/properties
> jar uf ../../lib/DataVision.jar paper.properties
description
This element contains the descriptive text entered in the Report Summary
dialog.
source
The ``source'' element contains elements describing the data source this
report will use to read data. Just like a report's output can be sent to
any one of a number of layout engines, a report's input can come from one
of a number of data sources. The ``source'' element contains the element
that defines the data source and the element that defines the query. The data sources implemented are database, a JDBC database
connection, and charsep, a text
file whose lines are rows and columns are separated by a character such as
comma or tab.
The data source can be overridden on the command line. See Running DataVision from the Command
Line.
database
The ``database'' element describes the database that is used in the report.
See Connecting to the Database,
for how to set these values.
driverClassName is the JDBC driver class name.
connInfo is the magic connection info string used by the database
driver to connect to your database. It probably includes the database name
(for example, ``jobs'' or ``salesdata'').
name is the name of the database.
username is your database user name.
charsep
The ``charsep'' element describes the contents of a text file whose lines
contain rows of data and whose columns are separated by a character
(typically a comma or a tab). The ``sep-char'' attribute determines the
character. The separator character can be overridden on the command line.
See Running DataVision from the Command
Line.
The report needs to know the name and type of each column in a
character-separated file. The ``charsep'' element must contain either a
list of ``column'' elements or a ``metadata-url'' element.
column
The ``column'' element's attributes include ``name'', ``type'', and
optionally ``date-format''. The type must be one of ``string'', ``number'',
or ``date''.
A date column may optionally specify the ``date-format'' string which
describes how the dates in that column are formatted. The default format is
``yyyy-MM-dd''. The format string follows the same rules as the date
formatting string for a report field. See The Format Dialog for an
explanation of the date formatting string.
Here is an example charsep data source:
<!-- The column names happen to be the same as database -->
<!-- table.column names, but that's just a coincidence. -->
<!-- OK, I'm lazy. -->
<charsep sep-char=",">
<!-- We need to define column names and types. For date -->
<!-- columns, we need to define a string that will be -->
<!-- used to parse the string so we can turn it into a -->
<!-- java.lang.Date object. In this example, you could -->
<!-- skip the date-format string; the default parse -->
<!-- string (yyyy-MM-dd) is the same string. -->
<column name="office.name" type="string" />
<column name="jobs.id" type="number" />
<column name="jobs.title" type="string" />
<column name="jobs.hourly rate" type="number" />
<column name="jobs.post_date" type="date"
date-format="yyyy-MM-dd" />
</charsep>
metadata-url
If the ``charsep'' element contains a ``metadata-url'' element, that means
the metadata is stored in an external XML file. The ``metadata-url''
element looks like this:
<!-- A charsep data source whose metadata is described -->
<!-- by an external XML file. -->
<charsep sep-char=",">
<metadata-url>http://localhost/foo/metadata.xml</metadata-url>
</charsep>
Note that the URL can be a file URL or any other legal URL.
The metadata XML file (in this example, metadata.xml) must
conform to the DTD found in metadata.dtd, which is quite
simple. In that file, the root ``metadata'' element contains a list of
``column'' elements. Here is a complete example of a metadata XML file:
<!-- See metadata.dtd -->
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<!-- We need to define column names and types. For date -->
<!-- columns, we need to define a string that will be -->
<!-- used to parse the string so we can turn it into a -->
<!-- java.lang.Date object. In this example, you could -->
<!-- skip the date-format string; the default parse -->
<!-- string (yyyy-MM-dd) is the same string. -->
<column name="office.name" type="string" />
<column name="jobs.ID" type="number" />
<column name="jobs.title" type="string" />
<column name="jobs.hourly rate" type="number" />
<column name="jobs.post_date" type="date"
date-format="yyyy-MM-dd" />
</metadata>
query
The ``query'' element must come after the ``database'' or ``charsep''
element. It contains up to three elements: the joins, the sorts, and the
where clause. The report uses these and the list of fields it contains to
create a SQL query.
Each ``join'' element represents one of the links between tables that the
user specifies.
There is a ``sort'' element for each group's field and for any other sorts
the user specifies.
Finally, the ``where'' where clause element specifies an additional
user-specified clause to use. The text of the where clause is contained in
a CDATA element. When generating SQL, the text becomes part of the SQL
WHERE clause. When used with a charsep data source, the text is evaluated
as a Ruby script and the result is used to determine if a record is used or
ignored.
parameters
Parameters are defined and appear in the XML before they are used. Each
parameter has an id, a type (string, numeric, or date), a name, a question
to display when asking the user for a value, an arity, and an optional
default value or values.
Types include ``string'', ``numeric'', ``boolean'', and ``date''.
The arity attribute describes how many values are
allowed in the answer. The arity may be ``single'', ``range'',
``list-single'', or ``list-multiple''.
Default values for each parameter are contained as text within ``default''
elements.
formulas
Formulas are defined and appear in the XML before they are used. Each
formula has an id, a name, and formula text. The formula text is Ruby code.
See Ruby for a brief description
of the language.
Report Sections
A report may have zero or more parameters, formulas, report headers,
report footers, page headers, page footers, groups, and detail sections.
Each section may have zero or more fields and lines.
This DTD snippet says it best:
<!ELEMENT report (database query parameters?
formulas? headers? footers?
page? groups? details)
>
<!ELEMENT headers section*>
<!ELEMENT footers section*>
<!ELEMENT page headers? footers?>
<!ELEMENT group headers? footers?>
<!ATTLIST group column CDATA #REQUIRED>
<!ELEMENT details section*>
<!ELEMENT section (field|line)*>
Fields
Fields display text, including database values. Fields must have a
bounds rectangle, and can have borders and formatting descriptions.
<!ELEMENT field (bounds|format|border)*>
<!ATTLIST field type (column
|formula
|parameter
|text
|special)
#REQUIRED
value CDATA #REQUIRED
>
<!ATTLIST field visible (true|false)
>
The order of the bounds, format, and border elements is not important. I
don't know how to describe that with DTD, though.
There are six types of fields that a report can display: text, database
columns, parameters, formulas, subtotals, and special fields. The ``value''
attribute determines what is displayed.
text Text fields display the value attribute verbatim.
column Column fields display the value of the database column named
by the value attribute (for example, ``member.member id''). Yes, table and
column names may contain spaces.
parameter Parameter fields display the value obtained by
substituting the parameter value whose id is contained in this field's
value attribute. (Remember, parameters are defined separately, and each one
has an id.) The underlying parameter values are retrieved from the user
each time the report is run.
formula Formula fields display the value obtained by executing
the formula whose id is contained in this field's value attribute.
(Remember, formulas are defined separately, and each one has an id.)
subtotal Subtotal fields display the running subtotal for the
numeric database column, numeric parameter field, or formula field whose id
is contained in this field's value attribute.
special Special fields display one of the following values, named in
the value attribute. These values are described in Special Values.
- report.name
- report.title
- report.author
- report.description
- report.date
- report.row
- page.number
- group.count
Lines
Lines, though not yet supported in the Design Window GUI, have been part
of the report XML since the beginning of DataVision. They are simple, with
a thickness, color, and visibility flag. Lines contain two end points.
<!ELEMENT line point point>
<!-- Color is stored as "red;green;blue;alpha" -->
<!ATTLIST line thickness CDATA
color CDATA
visible (true|false)
>
Suppression Procs
A suppression proc determines if a section should be displayed or not. See
Section Suppression
Procs for an explanation.
The ``suppression-proc'' element is contained within a ``section'' element.
It has an optional ``hide'' attribute and optionally contains a formula
containing Ruby code. When the ``hide'' attribute is ``true'', the section
is always hidden. The default value of ``hide'' is ``false''.
The formula contains Ruby code that must return a boolean (true/false)
value. For every row, the code is run to determine if the entire section
should be hidden for that row only. When the code returns true, the section
is hidden.
Here is an example ``suppression-proc'' element that uses the ``hide''
attribute to always hide the section:
<suppression-proc hide="true" />
Here is an example that uses a formula to determine when to hide a section:
<suppression-proc>
<formula id="3" name=""><![CDATA[({jobs.ID} & 1) == 0]]></formula>
</suppression-proc>
Field Formats
A field's format describes how it will be displayed. Optional format
attributes elements specify a field's font family name, alignment, font
size, color, attributes (bold, italic, or underline), wrapping (I'm not
sure this works yet), and formatting (for numeric and date fields). The
attributes bold, italic, and underline are boolean and may optionally take
the values ``true'' or ``false'' (the default is ``true'' if the attribute
is specified).
Color is stored as a string of the form ``red;green;blue;alpha''. Each
value is an integer in the range 0 - 255.
The format string (an attribute of the format element; confusing, eh?)
determines the print format of numeric and date fields. For a complete
description of the format string, see The Format Dialog.
Here is an example field that displays some text, right justified.
<field id="42" type="text" value="My Very Own String">
<bounds x="0" y="0" width="100" height="20" />
<format bold italic underline size="24" align="right" />
</field>