tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
On Thu, 11 Sep 2008, der Mouse wrote:
> I do not want UTF-8; if I want to use Unicode, it seems
> much saner to me to use streams of hexdecets rather than encoding
> hexdecets into octet streams with a funky variable-length encoding.
Unicode is a 21-bit character set (or 31-bit in some old versions).
The 16-bit encoding is just as funky and variable-length as the 8-bit
encoding.
> >> Not that I care so much, but are NetBSD supposed to have its files
> >> in Latin1? Is that supposed to be the source character set, or
> >> what?
> > I think that simply is the practical reality.
> I agree.
I think the default should be either ASCII or UTF-8. Other encodings
are too abmiguous. For example, when you see an octet outside the ASCII
range and not part of a valid UTF-8 sequence, do you guess that it's
iso-8859-1, iso-8859-2, iso-8859-whatever, or something else entirely?
> I think the default should be Latin-1, except that I also think tools
> such as wc should, by default, not complain about invalid Latin-1,
> instead sticking with the traditional behaviour of operating on bytes
> rather than characters.
I was talking about the default encoding used for source code and text
files supplied with the OS. How tools should behave is a different
question, but I share your concerns.
--apb (Alan Barrett)
Home |
Main Index |
Thread Index |
Old Index