Subject: Re: WWW query engine bug (was Query-PR)
To: None <leonard@dstc.edu.au>
From: None <Chris_G_Demetriou@NIAGARA.NECTAR.CS.CMU.EDU>
List: current-users
Date: 02/21/1996 03:02:11
> > But the question is: how can you tell 'intentional' html from
> > something that just looks like HTML? (and, what impact does that have
> > on the software used to spit out PRs?)
>
> Force people to insert <HTML>...</HTML> around their text if they
> dont want tags to be converted to &, < and >.
You didn't answer the second question: what impact does that have?
i don't think that's workable, for several reasons:
(1) PRs are sent as e-mail messages, and for the most part
look like e-mail messages. How can you put that before
your headers, so it will do the right thing with HTML
in the headers? (e.g. an X-Organization: header...)
(2) the PR machinery appears to mangle some submissions
in ways that are not obvious to me, e.g. reordering
some headers, etc. How are people supposed to set
things up so that they work right?
(3) if the user does a 'long-range' <html>, perhaps one
which is never closed, how does the scanner deal with
that? some of the PRs are gigantic, and i think it's
unreasonable to have to have it parse them completely
before it processes any of them. I wanted to write it
as a filter, which more or less eliminates dealing with
this.
(4) this still doesn't solve the problem! the user can
_still_ supply bad html! (It's for this reason that
I do basic sanity checks on the pr's... however,
i cant catch things like hanging italicization, because
of (3)...
cgd