How to treat utf8 text with overlong utf8 sequences?

To: ietf-ssh%netbsd.org@localhost
Subject: How to treat utf8 text with overlong utf8 sequences?
From: nisse%lysator.liu.se@localhost (Niels Möller)
Date: 02 Oct 2004 15:18:19 +0200

What do you think about sending overlong / "non-minimum form" utf8
sequences in various utf8 strings in the protocol?

It matters the most for utf8 strings that are displayed to the user,
e.g. the prompt strings in SSH_MSG_USERAUTH_INFO_REQUEST, where the
specification recommends control character filtering.

I prefer doing the control filtering before converting the data to the
local character set, because it's pretty well defined which
ucs4/unicode values are control characters (namely u0000-u001f,
u007f-u009f).

If we allow overlong control character sequences, then e .g. ESC can
be represented in utf8 as

  0x1b, (0xc0 0x9b), (0xe0 0x80 0x9b) ... or (0xfc 0x80 0x80 0x80 0x80 0x9b)

Filtering gets easier if I can first check if the utf8 string contains
overlong sequences at an early stage, and treat that as a protocol
error.

About the same question applies for the utf8 encoding of ud800-udfff
(surrogates) and the non-characters ufffe and uffff, which are also not
supposed to ever occur in valid utf8 text.

RFC 2279 does not address these questions, as far as I can see.

I'm tempted to treat any use of overlong or otherwise invalid utf8
strings that I receive from the remote end as a protocol error.

* Do you think that is a reasonable thing to do?

* Does it violate the ssh specification?

* Will it cause any interoperability problems in practice?

Regards,
/Niels

Follow-Ups:
- Re: How to treat utf8 text with overlong utf8 sequences?
  - From: Niels Möller
- Re: How to treat utf8 text with overlong utf8 sequences?
  - From: Derek Fawcus
- Re: How to treat utf8 text with overlong utf8 sequences?
  - From: Simon Josefsson
- Re: How to treat utf8 text with overlong utf8 sequences?
  - From: Simon Tatham

Prev by Date: RE: SFTP v5...
Next by Date: Re: How to treat utf8 text with overlong utf8 sequences?
Previous by Thread: SFTP v5...
Next by Thread: Re: How to treat utf8 text with overlong utf8 sequences?
Indexes:

Home | Main Index | Thread Index | Old Index