On Thu, Sep 11, 2008 David Laight wrote: That sucks big-time. It makes me think even more that UTF-8 is completely inappropriate for a system-wide locale on any unix system. Clearly some documents and strings can be in UTF-8, but that has to be a known property of the string. It isn't appropriate that any string a program obtains can be assumed to be UTF-8.
But at least, we could make the UTF-8 encoding explicit by including the BOM (byte order mark) at the beginning of such a file.It is the byte sequence 0xEF 0xBB 0xBF. Vim has support for automatically handling it, see e.g. http://www.nabble.com/utf8-BOM-td16427974.html UTF-8 should IMO not be the default encoding (in the absence of an explicit marker), we better stay at latin1. Joachim