IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: UTF8





On Friday, February 18, 2005 03:41:30 PM +0100 Niels Möller <nisse%lysator.liu.se@localhost> wrote:

markus%gyger.org@localhost (Markus Gyger) writes:

Niels Moeller writes:
> Please leave the file system issues out of it for now. What's of
> primary importantance are the core drafts, and those deal with
> usernames and passwords in utf8 form, *not* file names.

Are there any directions on what encoding SSH_MSG_CHANNEL_REQUEST
with the "exec" request type should use for the command string
(for "subsystem" it is specified but there sems to be no info
for "exec" in connect-23)?

I've never thought about the encoding issues for this string before.
It's not specified. Note that also the channel data is often text, and
its encoding is also not specified at all by the ssh protocols. In
practice,

  * You have to guess what character system is used by the default
    session.

  * This will probably be a pain if local and remote systems use
    different character sets (say latin-1/utf8).

  * It's easier to let the client environment adapt to the server
    environment conventions than vice versa.

But for now, users have to ensure manually that they are using the
same character set locally and remotely. If you e.g. want to login to
a server that lives in utf8-land, and your local environment doesn't
use utf8, then you'd better use some local terminal emulator that
understands utf8.

It's interesting that I was just thinking about this very problem earlier this week. Ultimately, I came to the conclusion that it's probably best not to try to specify the character set of channel data, because operational experience shows that what we have now actually works fairly well, and trying to "fix" it may well create more problems than it would solve.


Fundamentally, I see ssh channels as doing two things:

(1) allowing the user to have a local terminal on a remote machine

(2) carrying non-textual protocols like sftp or X11 or arbitrary traffic
   via port-forwarding

I think it's pretty non-controversial that for (2), we don't want the ssh protocol modifying the data stream.

I'd like to suggest that case (1) is really the same thing. In this case we're not carrying _text_; we're carrying communication between a terminal and programs using that terminal, which is actually a non-textual protocol that happens to carry a lot of text.

The ssh protocol already provides a means to inform the server software of the user's terminal type; servers (or the application software they run) will use this information to determine things like what are valid control sequences for the terminal, including how to determine what the available character sets are and how to switch between them. It would be pretty sad if an application switched character sets and stopped working because the ssh client and/or server didn't notice the change.

Even worse, for a shell we actually can't tell the difference between (1) and (2). For an interactive login (1) is probably the case, but scp runs a binary protocol over a shell connection, which is actually (2).


So if someone wants to define a set of extensions which define a UTF-8 based network virtual terminal, I'd not object. But changing the default behavior of the existing session channel is probably a bad idea.

-- Jeff



Home | Main Index | Thread Index | Old Index