Subject: Re: codeconv v3 - kernel code set recoding engine
To: Noriyuki Soda <soda@sra.co.jp>
From: PER4MANCE, J. Dolecek <jdolecek@per4mance.cz>
List: tech-kern
Date: 03/07/2000 16:27:31
Noriyuki Soda wrote:
> No. It is better to use different codeconv_t.
> For example:
> (1) VFAT vs SJIS userland.
> codeconv_t *k2u = codeconv_open("UTF-16LE", "SJIS");
> codeconv_t *u2k = codeconv_open("SJIS", "UTF-16LE");
> (2) SJIS MS-DOS fs (not VFAT, but FAT) vs UTF-8 userland:
> codeconv_t *k2u = codeconv_open("SJIS", "UTF-8");
> codeconv_t *u2k = codeconv_open("UTF-8", "SJIS");
FAT used really used SJIS ? EUC-encoded ? I always though that
FAT supports only subset of ASCII - namely [A-Z0-9_-?] + one dot.
Oh god :(
> (3) NFSv4 with UTF-8 vs SJIS userland
> codeconv_t *k2u = codeconv_open("UTF-8", "SJIS");
> codeconv_t *u2k = codeconv_open("SJIS", "UTF-8");
> I think there is no reason to use one codeconv_t for opposite
> direction conversion.
As I said, I though it would be convenient. That's the only
reason I've done it this way for now :)
> No, it does cost.
> There are cases that only one direction conversion is needed.
But typically, caller would need conversion in both directions,
so why not provide it with what is commonly needed ?
Furthermore, separate codeconv_enc() & codeconv_dec() (or whatever
they would be named) provide better type checking, FWIW.
> IMHO, passing endiannes is wrong abstraction. Why passing endianess is
> needed although more general function like iconv(3) doesn't need that?
I imagine there might be other options which might be "configurable"
per-codeconv and usable for several code sets. But using unique
code set name (like "Unicode-LE") is also ok.
> It makes sense to use/share same function and implementation for NTFS
> and Joliet extension.
> But it doesn't make sense to implement it on codeconv layer.
To me, it makes good sense - codeconv has all information it needs.
It knows both the "source" and "target" code set. It knows best how to
compare individual codes in a string.
> Case folded comparison is quite difficult than what you thought.
> For example, I've heard that there is a difference between MS-Windows
> 98 and MS-Windows NT about filename comparison. (e.g. handling of
> Cyrillic characters)
Well, we don't need to emulate case comparison as done by specific
operating systems - we can do it right :) The only case where code
depends
on case folded comparison is in NTFS - file names in NTFS
directory are indexed case-insensitively.
I'm not surprised if MS would not do the case comparison correctly
under Win9X ;-/ But since MS Windows 95/98 support VFAT & cd9660/Joliet
only, we don't need to care, AFAICS.
> If you combine case-folded comparison feature with codeconv layer,
> you cannot use following codeconv_t:
> codeconv_t *cc = codeconv_open("SJIS", "UTF-16LE");
> rather, you have to use this:
> codeconv_t *cc = codeconv_open("SJIS", "UTF-16LE-Win95");
> for Windows 98
> codeconv_t *cc = codeconv_open("SJIS", "UTF-16LE-WinNT");
>
> Do you really want to do this?
If Win95 Unicode and WinNT Unicode are really different, we need to do
this
anyway, as you've noted in a followup mail.
Jaromir