Subject: Re: CMSG_* problems
To: None <tech-userlevel@NetBSD.org, tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 02/14/2007 03:48:39
> If you're willing to assume NetBSD then you don't need anything added
> to NetBSD headers, you can write your own macros (which I assume you
> have already done) and simply reference the internals of the NetBSD
> data structs.

I haven't built my own macros, no.  I've been applying CMSG_DATA to a
struct cmsghdr allocated as a struct cmsghdr rather than overlaid onto
a buffer-of-bytes, then using the resulting pointer to obtain a
distance by which I advance my pointer into the buffer.  A bit ugly
(because it potentially uses a pointer past the end of an object), but
less ugly to my eye than overlaying the cmsghdr onto the data buffer.

>> [...], and - at least to my own eye! - macros such as I proposed
>> permit significantly cleaner code.
> More portable code (for one kind of portability issue yes), but
> cleaner, no.  They build in application knowledge of the kind of data
> structs that the application shouldn't have.  That is, you know that
> there's an integer offset that you can add to something to get the
> next header, and that you can increment that integer offset to move
> from one header to another.

That's what I thought the control data interface was defined as:
alternating structs cmsghdr and data areas, packed into a buffer of
bytes.  Recently, this buffer acquired possible padding that no
(exported) interface but the CMSG_* macros knew how to compute.

I would much rather simply do away with the padding - basically, the
same thing most other byte-straem interfaces work - but that makes the
resulting ABI incompatible with code built around the CMSG_* "just
overlay onto the buffer" paradigm on strict-alignment machines.

> That's the kind of knowledge that the CMSG_FIRSTHDR() CMSG_NXTHDR()
> CMSG_DATA() interface avoids.

I think I see what you mean; you are basically saying that the
application should know _nothing_ about how the various pieces are
packed into the byte buffer, ideally not even knowing what *is* there
besides the various data blobs.

That's certainly a defensible interface, but it's not one msg_control
has ever been documented as having (and it's hard to do right with a
byte-buffer underlying it).  Even the CMSG_* interface wires some
knowledge into the application; the alignment issue aside, there's the
assumption that the right amount of space can be computed by adding up
CMSG_SPACE() on the various data portion lengths (this constrains the
kinds of overhead data that can be present) and there's the assumption
that a struct cmsghdr * is sufficient to capture the state of a
traversal of the list of data blobs.

If that's the direction you want to go, there's a lot of redesign that
needs to be done.

> To me that makes the CMSG_* interface (the 2292 interface) cleaner
> than the one you're proposing.  Slightly harder to use portably
> perhaps, but cleaner.

Well...if it were the interface you are idealizing it into, I might
agree.  But the existing interface is in an uneasy position between an
opaque interface such as you sketched and a transparent interface such
as it was before CMSG_* moved in and set up shop.  Given how much
knowledge applications still have to have of the layout of the buffer,
and how it is documented (and how it has historically been documented),
I prefer to push it back in the transparent direction.  Pushing it in
the opaque direction feels to me more like the new design work
mentioned below.

>     for (cmsgptr = CMSG_FIRSTHDR(&msg); cmsgptr != NULL;
>          cmsgptr = CMSG_NXTHDR(&msg, cmsgptr)) {
>         if (CMSG_LVL(cmsgptr) == ... && CMSG_TYPE(cmsgptr) == ... ) {
>             whatever_t x;
> 
>             memcpy(&x, CMSG_DATA(cmsgptr), sizeof x);
>             /* process data in x */
>         }
>     }

> If you wanted you could add a CMSG_DATALEN() macro and check that
> CMSG_DATALEN(cmsgptr) == sizeof x before the memcpy - that is, if you
> don't trust the OS not to lie to you about what it is giving back.

Well, you need something like that anyway, because otherwise you can't
tell, for example, how many file descriptors arrived in an SCM_RIGHTS
blob.  (Unless you propose to redesign that too.)  I'd also prefer to
have an opaque type that encapsulates the traversal, rather than using
a struct cmsghdr * for that.

Yes, this would be another reasonable direction to go.

> Personally I wouldn't bother, the application writer is supposed to
> know that the buffer is to hold cmsg hdrs, if the buffer provided is
> not suitable for that, just allow the code to fail.

> After all, if I'm reading a file, that contains a sequence of longs,
> and I do

> 	char buf[BUFSIZE];
> 	long *lp;

> 	n = read(fd, buf, BUFSIZE);
> 	lp = (long *)buf;
> 	if (*lp ...

> I don't bitch and moan about the read interface being inadequate when
> the code dies on the *lp reference, do I?  Do you?

No...but if the blob-of-bytes were generated with padding nothing but
some macros knew how to compute, with nothing documenting the alignment
necessary for those macros to work, I most certainly would complain
about that interface's design.

> That kind of enhancement (addition of CMSG_TYPE() and CMSG_LEVEL())
> is to me much cleaner than what you proposed, while being just as
> portable.

I'm not sure that I'd call either cleaner than the other.  They are two
different directions the current half-and-half interface can be taken
in; either direction - transparency or opacity - can be done cleanly.

>> True.  If you'd like to try to hash out a better interface without
>> even considering compatability, I'd be interested.
> I would - but first we have to agree on what compatability I'm (or
> we're) willing to forego.  I wouldn't change the kernel/user API in
> this area, doing that means either versioned interfaces in the
> kernel, or flag days, neither of which is very nice.

(I wouldn't call the kernel/user interface an API, but rather an ABI.)

> So, what I'd prefer to do is to simply allow applications to ignore
> the CMSG_* macros

Except there is no alternative available for the application to
determine how the kernel has padded things - unless you're proposing to
have applications calling __cmsg_alignbytes() directly or the like.

This is why I disliked the flag day when CMSG_* were introduced:
applications then *had* to use them (the old fully transparent
interface was no longer available) but they weren't sufficient for a
fully opaque interface.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B