NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/59055: recvmsg(2) fails to return partial fds in MSG_CTRUNC case
>Number: 59055
>Category: kern
>Synopsis: recvmsg(2) fails to return partial fds in MSG_CTRUNC case
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Feb 07 19:10:00 +0000 2025
>Originator: Taylor R Campbell
>Release: current, 10, 9, ...
>Organization:
The NotQuiteCMSG Foundation
>Environment:
>Description:
If a sender sends more file descriptors over a socket through ancillary data with cmsg(3) SCM_RIGHTS than the receiver has wittingly allocated space for, _sometimes_ all of the file descriptors are discarded.
For example, on x86, if the sender sends 3 fds and the receiver has allocated space for 2 fds using CMSG_SPACE(2 * sizeof(int)), all three fds are discarded and MSG_CTRUNC is set.
But if the sender sends _4_ fds and the receiver has allocated space for _3_ fds using CMSG_SPACE(3 * sizeof(int)), then all four fds make it through and CMSG_CTRUNC is not set (and the receiver has to handle the fourth fd!).
How does this happpen?
Say you're on x86 where socket ancillary buffers are aligned to multiples of 8 bytes and struct cmsghdr itself is 16 bytes.
When sending n file descriptors, the ancillary buffer is laid out like so, in a buffer of size CMSG_SPACE(n * sizeof(int)):
[0..4) cmsg_len
[4..8) cmsg_level
[8..12) cmsg_type
[12..16) (anonymous padding)
[16..20) fd[0]
[20..24) fd[1]
...
If n is odd, four padding bytes are appended to the end.
So if the sender send n=3 file descriptors, they must lay it out in a 32-byte buffer like so:
[0..4) cmsg_len
[4..8) cmsg_level
[8..12) cmsg_type
[12..16) (anonymous padding)
[16..20) fd[0]
[20..24) fd[1]
[24..28) fd[2]
[28..32) (anonymous padding)
If the receiver has prepared to receive only at most n=2 file descriptors, though, they will have a 24-byte buffer that they expect to lay out like so:
[0..4) cmsg_len
[4..8) cmsg_level
[8..12) cmsg_type
[12..16) (anonymous padding)
[16..20) fd[0]
[20..24) fd[1]
The kernel could simply close fd[2] and let fd[0] and fd[1] pass. But it doesn't. When the kernel finds there isn't enough space in the recvmsg msg_control buffer, it chucks everything:
857 for (m = control; m != NULL; ) {
858 cmsg = mtod(m, struct cmsghdr *);
859 i = m->m_len;
860 if (len < i) {
861 mp->msg_flags |= MSG_CTRUNC;
862 if (cmsg->cmsg_level == SOL_SOCKET
863 && cmsg->cmsg_type == SCM_RIGHTS)
864 /* Do not truncate me ... */
865 break;
866 i = len;
867 }
868 error = copyout(mtod(m, void *), q, i);
https://nxr.netbsd.org/xref/src/sys/kern/uipc_syscalls.c?r=1.214#864
This behaviour was introduced in rev. 1.113 of uipc_syscalls.c back in 2007 by dsl@ with a commit message that doesn't give any reason for discarding _all_ descriptors when the control buffer is truncated.
https://mail-index.netbsd.org/source-changes/2007/06/24/msg187028.html
This confuses some applications like devel/capnproto which exercise this path and expect some of the descriptors to make it through (though I haven't reviewed to see if this logic is robust to different cmsg alignment constraints on different architectures):
kj/async-io-test.c++:751: failed: expected result.capCount == 2 [0 == 2]
[ FAIL ] kj/async-io-test.c++:717: legacy test: AsyncIo/ScmRightsTruncatedEven (31363 μs)
>How-To-Repeat:
Tweak N in the following program -- that's the number of descriptors the sender sends, and one more than the number of descriptors the receiver allocates space to receive. For N = 3, no descriptors are printed on the receiving end; for N = 4, four descriptors are printed on the receiving end.
#include <sys/socket.h>
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
enum { N = 3 };
int
main(void)
{
int s[2], p[2*((N + 1)/2)], *q;
union {
struct cmsghdr hdr;
char buf[CMSG_SPACE(N * sizeof(p[0]))];
} cmsgbuf;
struct msghdr msg;
struct cmsghdr *cmsg;
unsigned i, j;
if (socketpair(AF_LOCAL, SOCK_STREAM|SOCK_NONBLOCK, 0, s) == -1)
err(1, "socketpair");
for (i = 0; i < N; i++) {
if (pipe2(p + 2*i, O_NONBLOCK) == -1)
err(1, "pipe2[%u]", i);
}
msg = (struct msghdr) {
.msg_name = NULL,
.msg_namelen = 0,
.msg_iov = NULL,
.msg_iovlen = 0,
.msg_control = cmsgbuf.buf,
.msg_controllen = CMSG_SPACE(N * sizeof(p[0])),
.msg_flags = 0,
};
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_len = CMSG_LEN(N * sizeof(p[0]));
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
memcpy(CMSG_DATA(cmsg), p, N * sizeof(p[0]));
printf("* sendmsg\n");
printf("msg_flags=0x%x\n", msg.msg_flags);
printf("msg_controllen=%d\n", msg.msg_controllen);
if (sendmsg(s[0], &msg, 0) == -1)
err(1, "sendmsg");
printf("\n");
msg = (struct msghdr) {
.msg_name = NULL,
.msg_namelen = 0,
.msg_iov = NULL,
.msg_iovlen = 0,
.msg_control = cmsgbuf.buf,
.msg_controllen = CMSG_SPACE((N - 1) * sizeof(p[0])),
.msg_flags = 0,
};
printf("* recvmsg\n");
printf("msg_flags=0x%x\n", msg.msg_flags);
printf("msg_controllen=%d\n", msg.msg_controllen);
if (recvmsg(s[1], &msg, 0) == -1)
err(1, "recvmsg");
printf("->\n");
printf("msg_flags=0x%x\n", msg.msg_flags);
printf("msg_controllen=%d\n", msg.msg_controllen);
for (i = 0, cmsg = CMSG_FIRSTHDR(&msg);
cmsg != NULL;
i++, cmsg = CMSG_NXTHDR(&msg, cmsg)) {
printf("[%u] cmsg_len=%d\n", i, cmsg->cmsg_len);
printf("[%u] cmsg_level=%d\n", i, cmsg->cmsg_level);
printf("[%u] cmsg_type=%d\n", i, cmsg->cmsg_type);
q = (int *)CMSG_DATA(cmsg);
for (j = cmsg->cmsg_len - CMSG_LEN(0);
j >= sizeof(*q);
j -= sizeof(*q), q++)
printf("[%u] fd %d\n", i, *q);
printf("\n");
}
return 0;
}
>Fix:
Close as many descriptors as we need to fit in recvmsg msg_controllen bytes, not _all_ of the descriptors.
(Also, let's add some tests for this! cmsg is notoriously difficult and needs extensive automatic testing.)
Home |
Main Index |
Thread Index |
Old Index