Subject: Re: XML config file
To: None <tech-userlevel@NetBSD.org>
From: Terry Moore <tmm@mcci.com>
List: tech-userlevel
Date: 07/05/2006 00:55:05
On Tue, 4 Jul 2006, Magnus Eriksson wrote:
> In that case, go with XML all the way. Convert all config files in the
> whole system to XML, have command line tools that manipulate XML the way we
> now use grep, awk, etc to manipulate text, include an XML parser library in
> the base system that anyone can use in their programs, include good
> documentation on all the above, etc.
My experience has been that UNIX config files are textual
representations of relational database tables. As such, the UNIX
text tools are very good at doing the normal kinds of database
operations that one needs to do.
XML files by contrast, are typically textual representations of
trees, not of n-place relations. As such, the UNIX text tools are
fairly weak at processing them directly for doing database-like operations.
I've used things like xml_grep to work around these limitations. For
me, they still don't work very well for this purpose. They're big,
slow, require perl, and they still want to produce XML wrappings
which are NOT fine when one is trying to present information to
people, or transform into C code. I know about the various
transformation tools like Saxon etc -- I use them when transformation
work needs to be done repetitively in production. Then the
notational weight doesn't matter as much.
But a LOT of sysadmin, and a LOT of managing large projects, involves
answering quick questions or performing quick transformations that
are one-off (possibly replicated across a cloud of similar systems in
a network).
The ability to apply massive regular transformations to the sysadmin
files is one of the reasons I find Unix a lot easier to administer
than Windows. There are tools/methodologies based on xmlpath, e.g.,
various perl-ish things to do the same thing; but these are very
heavy, and not suited for quick command line
"calculus". Furthermore, the lack of accepted standards for which
tool to use for these kinds of jobs in XML, makes it seem to me that
in fact doing this will lead to further fragmentation within the Unix world.
If one wants to do use the XML thing and yet not abandon one of the
great strengths of Unix, one needs a tool to convert arbitrary text
databases (with their schema -- different from the XML DTD) to and
from the XML representation.
If one does that, I suggest that the arbitrary text database should
actually be the normative form in most cases, and the XML
transformation is then useful for people and or apps that need to
deal with this.
None of this is to say that XML proplists are not fine for their
purpose. But I am arguing that they may not fit all needs, since
proplists are representing trees of information, possibly embedded in
regular table-like iterations.
I might even speculate that one of the reasons that Unix shell (or
awk or ..) plus the line-oriented text tools are so efficient is that
the combine an adequate procedural framework with a conceptually
adequate relational database framework. (One can argue about
notation in any of the procedural languages. Of course a flat text
file is not an efficient database representation for huge
databases. I'm talking about notation as a tool of thought, not
about the most efficient use of compute cycles.)
To the extent that XML gets in the way of thinking about the problem
(because one has to deal with all the introduced notation, and one
loses the tools one is used to), use of XML will not increase productivity.
To summarize my experience: Tables still have their uses, and they
are different than XML files. Tables represent sequences of tuples,
each with identical structure. XML represents trees. Flat text is a
good way to represent tables, both as a notation and for performing
quick transformations. Flat text is not a great way to represent trees.
This leads to the rules that I currently follow:
If a config file is representing a table, represent it as such.
If a config file is representing a tree, consider XML.
If a config file is a table of mostly identical tuples, some needing
to contain trees, then consider a table, with embedded (one-line) XML
[or references to XML stored separately] to represent the tree part;
or use XML throughout, but then be prepared to build special
extraction tools so one can do the table-like operations that are
likely when the top-level data structure is fundamentally a table.
--Terry