NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Unicode to ASCII
Silas wrote:
> Bob Proulx wrote:
> > iconv -f UTF-8 -t ASCII//TRANSLIT <filein >fileout
>
> It seems it is not possible on NetBSD 9.0 iconv :-(
It looks like //TRANSLIT is a GNU glibc extension not available in
NetBSD's version of libc. Sorry.
> $ echo 'pão' | iconv -f UTF-8 -t ASCII//TRANSLIT
> iconv: iconv_open(ASCII//TRANSLIT, UTF-8): Invalid argument
I can use iconv to translate from one codeset to another but it
doesn't know how to transliterate. It's not listed in the
documentation for it.
man iconv
-t Specifies the destination codeset name as to_name.
And that is all it says. So can change codesets.
$ echo 'pão' | iconv -f UTF-8 -t LATIN1 | od -tx1 -c
0000000 70 e3 6f 0a
p 343 o \n
I passed the output through od to show the e3 of it in LATIN1 to avoid
the mismash of it here in what will be a UTF-8 mailing. But I can
show that it can be converted back.
$ echo 'pão' | iconv -f UTF-8 -t LATIN1 | iconv -f LATIN1 -t UTF-8
pão
> Is there something that could be installed from pkgsrc (or another
> iconv implementation) to make it work?
For transliteration it looks like you would need the GNU version of
iconv. Sorry!
https://manpages.debian.org/buster/manpages/iconv.1.en.html
-t to-encoding, --to-code=to-encoding
Use to-encoding for output characters.
If the string //IGNORE is appended to to-encoding, characters that
cannot be converted are discarded and an error is printed after
conversion.
If the string //TRANSLIT is appended to to-encoding, characters
being converted are transliterated when needed and possible. This
means that when a character cannot be represented in the target
character set, it can be approximated through one or several
similar looking characters. Characters that are outside of the
target character set and cannot be transliterated are replaced
with a question mark (?) in the output.
Bob
Home |
Main Index |
Thread Index |
Old Index