NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Unicode to ASCII
Todd Gruhn wrote:
> I extracted the "text" from a large PDF using a NetBSD prog called
> pdftotext(1).
pdftotext is really awesome. I find "pdftotext -layout" to do a truly
excellent job with most PDF files I need to deal with from banks and
things here.
> I got the desired ASCII text, but it has many occurances of the sequence
> \x{80}\x{9c} ... \x{80}\x{9d}
Do you know what charset that is in natively?
> Is there a nice and universal utility that can convert these to ASCII chars?
> Someone mentioned EMACS... What about in pkgsrc?
I'll be honest and say I did not look but on another system I am using
"iconv" for this type of thing routinely. I will cross my fingers and
hope it is available in pkgsrc.
iconv -f UTF-8 -t ASCII//TRANSLIT <filein >fileout
That's assuming UTF-8 in and ASCII out but you will probably want some
other code set like this or another code page.
iconv -f CP1252 -t UTF-8 <filein >fileout
Hopefully even if incomplete it might still be useful.
Bob
Home |
Main Index |
Thread Index |
Old Index