Subject: Re: misc/32221 (NetBSD's web documentation is not valid HTML)
To: None <mishka@netbsd.org, netbsd-bugs@netbsd.org,>
From: None <mishka@netbsd.org>
List: netbsd-bugs
Date: 03/22/2006 15:47:22
Synopsis: NetBSD's web documentation is not valid HTML
State-Changed-From-To: open->analyzed
State-Changed-By: mishka@netbsd.org
State-Changed-When: Wed, 22 Mar 2006 15:47:21 +0000
State-Changed-Why:
Working on this PR I found that DocBook and Website XSLT are very
likely produce valid HTML output. Analysing our HTML files I have found
that most pages are invalid because of following reasons:
1) "Stray" XML namespace declarations. Please note that that
declarations are incorrectly influenced on some other tag
construction, such as <br></br> (must be just <br>).
2) Absent DOCTYPE declarations.
3) Possible incorrect HTML schema.
4) Possible some other reasons.
5) CSS validity.
I hope I've fixed (1) and (2), and now most of XML files can be used as
source for valid HTML pages. For example, the NetBSD Guide is now valid
HTML 4.01 Transitional document.
Regarding (3) and (4) IMHO we should:
1) have a way to detect/verify validation for all HTML pages (some sort
of "make htmllint"). Validator engine used by w3.org is available
for downloads from their site, and I'm wondering is it packaged with
pkgsrc, so we can include it into our toolchain.
2) because valid XML DocBook/Website documents would result in valid
HTML, we should have a way to validate our XML pages (i.e. just as
with HTML sort of "make xmllint"). Currently you may try validate
your own XML files as follows:
a) set {XML,SGML}_CATALOG_FILES to "$HTDOCS/share/xml/catalog-common.xml
$HTDOCS/share/xml/catalog.xml $LOCALBASE/share/xml/catalog" (space
separated list).
b) use xmllint(1) by the following way:
xmllint --noout --nonet --xinclude --catalogs --valid FILE_NAME
The second part is really broken because we're using Simplified Docbook
as backend for Website. It doesn't have <sect[1-6]> and many other
widely used things (don't know exactly, but I'm sure :-). Because all
this will enweight our toolchain and bind use even more to XML/Docbook,
we must talk with <hrs> about our website again.
The (5) can be very easy eliminated completely. All problems are shown here:
http://jigsaw.w3.org/css-validator/validator?uri=http%3A%2F%2Fwww.netbsd.org%2FNetBSD.css&usermedium=all
Responsibles for this errors are <grant> and <keihan>.