NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Ideas for stripping tags from document
Hi,
On Sat, Jan 16, 2021 at 01:45:45PM -0500, Todd Gruhn wrote:
> I have a large document (18,000L). It is full of tags such as <93>
> ,<94> , <95> .
>
> If I view the doc in a PERL editor I see \x{93} , \x{94} , \{95} ...
Ahem - are you sure (have you looked at as few of them with hexdump -C)?
Your perl editor displays \x{93}, your other editor <93>, in reality
they might be just one octet with that value.
Sounds like some windows-1252, where they're “, ” and • , respectively.
> Is there a pkg or command to strip these tags and leave the text ?
In that case I'd try
iconv -f windows-1252 -t utf-8 < foo > bar
Regards,
-is
Home |
Main Index |
Thread Index |
Old Index