NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: sending/receiving UTF-8 characters from terminal to program
Date: Fri, 20 Jan 2023 08:55:45 +0000 (UTC)
From: RVP <rvp%SDF.ORG@localhost>
Message-ID: <4dd21c1f-f5c3-c3ba-96d8-cab73a0b433%SDF.ORG@localhost>
| Both /bin/sh and bash output UTF-8 if given Unicode code-
| points in the form `\uNNNN'. So,
I believe bash will take your current locale into account
when doing that, whereas neither /bin/sh nor /usr/bin/printf
do, they simply emit UTF-8 unconditionally. This kind of
difference is (partly) why POSIX is not including the \u (or \U)
escape sequences in $'...' quoted strings in Issue 8.
Another is how the end of the NNNN is detected, is it always
exactly 4 hex digits (or 8 for \U), or any number up to 4 (or
8) if followed by a non-hex char, or using as many hex chars
as exist? To be portable (as input) such a string needs to
be exactly 4 (8) hex digits, and be followed by something
which is not a hex digit - the closing ' is often useful
there, it can always be followed immediately by $' to
resume quoting again (or just ' or " if those are adequate).
But that's just the input, you also need to be using a
locale using UTF-8 char encoding to get predictable output.
kre
|
| $ printf 'néz' | hexdump -C
| 00000000 6e c3 a9 7a |n..z|
| 00000004
| $ printf $'n\uE9z' | hexdump -C
| 00000000 6e c3 a9 7a |n..z|
| 00000004
| $
|
| If that works, then check those UTF-8 bytes against whatever the
| terminal emulator generated from your keystrokes for the `é'
| in `néz'.
|
| -RVP
|
| --0-494486379-1674204946=:18222--
|
Home |
Main Index |
Thread Index |
Old Index