flexml - generate validating XML processor and applications from DTD
flexml [-ASHDvdnLXV] [-sskel] [-ppubid] [-uuri] [-rrootags] [-aactions] name[.dtd]
Flexml reads name.dtd which must be a DTD (Document Type Definition) describing the format of XML (Extensible Markup Language) documents, and produces a ``validating'' XML processor with an interface to support XML applications. Proper applications can be generated optionally from special ``action files'', either for linking or textual combination with the processor.
The generated processor will only validate documents that conform strictly to the DTD, without extending it, more precisely we in practice restrict XML rule [28] to
[28r] doctypedecl ::= '<!DOCTYPE' S Name S ExternalID S? '>'
where the ExternalId
denotes the used DTD. (One might say, in
fact, that flexml implements ``non-extensible'' markup. :)
The generated processor is a flex(1) scanner, by default named name.l with a corresponding C header file name.h for separate compilation of generated applications. Optionally flexml takes an actions file with per-element actions and produces a C file with element functions for an XML application with entry points called from the XML processor (it can also fold the XML application into the XML processor to make stand-alone XML applications but this prevents sharing of the processor between applications).
In OPTIONS we list the possible options, in ACTION FILE FORMAT we explain how to write applications, in COMPILATION we explain how to compile produced processors and applications into executables, and in BUGS we list the current limitations of the system before giving standard references.
Flexml takes the following options.
yylineno
. (This is off by default as the performance
overhead is significant.)
PUBLIC
with the identifier pubid
instead of SYSTEM
, the default.
DOCTYPE
header, to the
specified uri (the default is the DTD name).
Action files, passed to the -a option, are XML documents conforming to the DTD flexml-act.dtd which is the following:
<!ELEMENT actions ((top|start|end)*,main?)> <!ENTITY % C-code "(#PCDATA)"> <!ELEMENT top %C-code;> <!ELEMENT start %C-code;> <!ATTLIST start tag NMTOKEN #REQUIRED> <!ELEMENT end %C-code;> <!ATTLIST end tag NMTOKEN #REQUIRED> <!ELEMENT main %C-code;>
The elements should be used as follows:
top
start
tag
'' attribute. The ``%C-code;
'' component should be C
code suitable for inclusion in a C block (i.e., within {
...}
so
it may contain local variables); furthermore the following extensions
are available:
{
attribute}
: Can be used to access the value of the
attribute as set with attribute=
value in the start tag.
In C, {
attribute}
will be interpreted depending on the
declaration of the attribute. If the attribute is declared as an
enumerated type like
<!ATTLIST attrib (alt1 | alt2 |...) ...>
then the C attribute value is of an enumerated type with the elements
written {
attribute=
alt1}
,
{
attribute=
alt2}
, etc.; furthermore an unset
attribute has the ``value'' {!
attribute}
. If the attribute is
not an enumeration then {
attribute}
is a null-terminated C
string (of type char*
) and {!
attribute}
is NULL
.
end
tag
'' attribute; also here the ``%C-code;
''
component should be C code suitable for inclusion in a C block. In
case the element has ``Mixed'' contents, i.e, was declared to permit
#PCDATA
, then the following variable is available:
{#PCDATA}
: Contains the text (#PCDATA
) of the element as a
null-terminated C string (of type char*
). In case the Mixed
contents element actually mixed text and child elements then pcdata
contains the plain concatenation of the text fragments as one string.
main
main
'' element can contain the C main
function of the XML application. Normally the main
function should
include (at least) one call of the XML processor:
yylex()
:
Invokes the XML processor produced by flex(1) on the XML document
found on the standard input (actually the yyin
file handle: see the
manual for flex(1) for information on how to change this as well as
the name yylex
).
If no main
action is provided then the following is used:
int main() { exit(yylex()); }
It is advisable to use XML <![CDATA[
... ]]
> sections
for the C code to make sure that all characters are properly passed to
the output file.
Finally note that Flexml handles empty elements
<tag/
> as equivalent to
<tag></
tag>.
The following make(1) file fragment shows how one can compile flexml-generated programs:
# Programs. FLEXML = flexml -v
# Generate linkable XML processor with header for application. %.l %.h: %.dtd $(FLEXML) $<
# Generate C source from flex scanner. %.c: %.l $(FLEX) -Bs -o"$@" "$<"
# Generate XML application C source to link with processor. # Note: The dependency must be of the form "appl.c: appl.act proc.dtd". %.c: %.act $(FLEXML) -D -a $^
# Direct generation of stand-alone XML processor+application. # Note: The dependency must be of the form "appl.l: appl.act proc.dtd". %.l: %.act $(FLEXML) -A -a $^
The present version of flexml is to be considered in ``early beta'' state thus bugs should be expected (and the author would like to hear about them). Here are some known restrictions that we hope to overcome in the future:
ID
type attributes are not validated for uniqueness; IDREF
and
IDREFS
attributes are not validated for existence.
ENTITY
and ENTITIES
attribute types are not supported.
NOTATION
declarations are not supported.
xml:
-attributes are treated like any other attributes;
in particular xml:spaces
should be supported.
pcdata
. It should not.
pcdata
of the parent.
flex(1), Extensible Markup Language (XML) 1.0 (W3C Recommendation REC-xml-1998-0210).
Flexml was written by Kristoffer Høgsbro Rose,
<krisrose@debian.org
>.
The program is Copyright (c) 1999 Kristoffer Rose (all rights reserved) and distributed under the GNU General Public License (GPL, also known as ``copyleft'', which clarifies that the author provides absolutely no warranty for flexml and ensures that flexml is and will remain available for all uses, even comercial).
I am grateful to NTSys (France) for supporting the development of flexml. Finally extend my severe thanks to Jef Poskanzer, Vern Paxson, and the rest of the flex maintainers and GNU developers for a great tool.