IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: UTF8 in SFTP (was: solving the SFTP text mode issue)



On Tue, May 14, 2002 at 11:02:23AM -0400, Richard Whalen wrote:
> The text transfer mechanism in the SSH File Transfer Protocol should define
> a single method of encoding the line end to remove any ambiguity.  Systems
> that encode line breaks differently from the specified method would be
> responsible for scanning the data and performing the necessary substitution.

Ok, I now realize there is no point in allowing either CR, LF, or CR LF,
so let's just specify LF.

> One additional thing to note: When a file is transferred in Text mode, the
> size information reported for a file must be considered to be an estimate as
> computing the exact size may consume too many resources or use too much time
> to process the command in a timely manner.  The only way to determine that
> all of the text from the file has been retrieved is through the receipt of
> end of file status when there is a request for data.

That seems reasonable.

Joseph Galbraith wrote:
> > If the SSH_FXF_TEXT flag is set during open, and the server
> > will send file content in UTF-8, it should respond with
> > status code SSH_FX_OK_UTF8 (status code 9) instead of
> > SSH_FX_OK.

Instead of the above, how about a seperate flag SSH_FXF_UTF8, and status
code SSH_FX_UTF8_NOT_SUPPORTED. It seems likely that a client sometimes
will not be able to determine how to convert a file it wants to upload
into UTF-8 either. SSH_FXF_TEXT would then mean only using LF for line
end.

> > If the server will encode filenames in UTF-8, it should
> > include the following extension data in it's VERSION
> > packet (if and only if the clients INIT packet specified
> > a version >= 3.)
> > 
> >   "filename-utf8"        # extension name
> >   ""                     # no extension data

Why shouldn't this extension apply to both sides symmetrically? We can say 
that if both sides support this extension, all filenames will be in UTF-8, 
otherwise their native encodings are used.



Home | Main Index | Thread Index | Old Index