Subject: stuck with compat_svr4
To: None <port-sparc@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: port-sparc
Date: 11/14/2001 16:00:15
Hi,
I'm trying to get a solaris vendor daemon to run on a NetBSD/sparc machine.
I'm now stuck at a strange problem.
The daemon forks a second process, and the 2 processes communicate though a
TCP socket I think (it's the fd returned by a open("/dev/tcp")).
Basically the child exists because it can't receive data from the parent,
but it doesn't even try to read data.
A truss on solaris shows:
25811: write(11, 0xEFFFF614, 147) = 147
25811: sigaction(SIGALRM, 0xEFFFF300, 0xEFFFF380) = 0
25811: setitimer(ITIMER_REAL, 0x000F8E90, 0x000F8E90) = 0
25811: poll(0xEFFFD348, 1, 10000) = 1
25811: sigaction(SIGALRM, 0xEFFFF300, 0xEFFFF380) = 0
25811: setitimer(ITIMER_REAL, 0x000F8E90, 0x00000000) = 0
25811: poll(0xEFFFD2B8, 1, 10000) = 1
25811: read(11, 0xEFFFF580, 147) = 147
25811: sigaction(SIGALRM, 0xEFFFF300, 0xEFFFF380) = 0
25811: setitimer(ITIMER_REAL, 0x000F8E80, 0x00000000) = 0
25811: sigprocmask(SIG_BLOCK, 0xEFFFF3C0, 0xEFFFF450) = 0
The write() is sending data to the parent, the parent reads it and send anserw
on the socket. The read() is the child reading the anserw.
A ktrace on NetBSD shows (I prefer to not expose the processes names here :):
2801 food CALL write(0xa,0xeffff464,0x93)
2801 food GIO fd 10 wrote 147 bytes
2801 food RET write 147/0x93
(The child sent data to the parent, ktrace of the parent shows that it received
it properly, and wrote the anserw to the socket).
2801 food CALL sigaction(0xe,0xeffff150,0xeffff1d0)
2801 food RET sigaction 0
2801 food CALL setitimer(0,0xf8ea0,0xf8e90)
2801 food RET setitimer 0
2801 food CALL sigaction(0xe,0xeffff150,0xeffff1d0)
2801 food RET sigaction 0
2801 food CALL setitimer(0,0xf8e90,0)
2801 food RET setitimer 0
2801 food CALL sigprocmask(0x1,0xeffff188,0xeffff218)
2801 food RET sigprocmask 0
It doesn't even try to poll()/read() ! But the system calls around looks the
same. A few syscall later the process writes the error message to stderr
("can't communicate with parent") and exits.
Any idea on how to track this down ? I suspect the program is calling
a library function, and that the 'skip poll/read' isn't happening in the
program itself but in a library. This program doesn't use any custom shared
lib.
I tried solaris 2.5.1 and 2.7 libraries, as well as hacking svr4_stat.c so that
uname() returns the exact same values as the solaris host.
--
Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
--