Subject: Re: UTF8 and nroff manpages
To: None <tech-misc@NetBSD.org>
From: Alan Barrett <apb@cequrux.com>
List: tech-misc
Date: 09/22/2005 17:23:38
On Thu, 22 Sep 2005, David Brownlee wrote:
> If locale is en_US.UTF-8 then manpages do not display '-' as
> '-', but instad as the hex sequence e2 88 92. You can easily
> see this by comparing the output of
> nroff -mandoc /usr/share/man/man1/ls.1
> with LC_CTYPE=en_US.UTF-8 and undefined.
>
> I can understand for text formatting that e2 88 92 may be a
> 'better' unicode entity for a hyphen, but for a manpage its
> very much not.
Actually, that's the UTF8 code for the Unicode character U+2212 (minus
sign). It's not a hyphen at all. Hyphen would be U+2010.
$ printf "0xe2, 0x88, 0x92" | recode utf8/x..dump
UCS2 Mne Description
2212 -2 minus sign
$ printf "-" | recode ascii..dump
UCS2 Mne Description
002D - hyphen-minus
$ printf "0x2010" | recode ucs2/x2..dump
UCS2 Mne Description
2010 -1 hyphen
$ printf "0x2010" | recode ucs2/x2..utf8/x
0xE2, 0x80, 0x90
$
> Does anyone have any ideas?
Sorry, no.
--apb (Alan Barrett)