NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Corosync on NetBSD
On Fri, Sep 17, 2010 at 11:29 AM, haad <haaaad%gmail.com@localhost> wrote:
> Hi,
>
> On Fri, Sep 17, 2010 at 11:51 AM, Adam Hoka <adam.hoka%gmail.com@localhost>
> wrote:
> > On Thu, 16 Sep 2010 09:14:01 +0200
> > Stephan Wiebusch <stephanwib%googlemail.com@localhost> wrote:
> >
> >> Hi,
> >>
> >> I tried to make the Corosync 1.2.8 cluster software work on NetBSD
> 5.0.2.
> >> There were some issues to fix in the source and make files. One can
> compile
> >> and run it then. Unfortunately it doesn_t do anything except eating 100%
> >> CPU. I sent a list with the build issues to the Corosync mailing list,
> as
> >> there were:
> >>
> >>
> >>
> >> => lib/coroipcc.c
> >> -There are some "#ifdef COROSYNC_BSD" statements which include some
> >> madvise() calls. There is a MADV_NOSYNC flag being used which is only
> >> available on FreeBSD. I commented these lines out.
> >> -There is a "semun" union requiered which is not defined in "sys/sem.h"
> on
> >> NetBSD, in contrast to FreeBSD. I had to add it to this file:
> >>
> >> union semun {
> >> int val; /* value for SETVAL */
> >> struct semid_ds *buf; /* buffer for IPC_STAT & IPC_SET
> */
> >> u_short *array; /* array for GETALL & SETALL */
> >> };
>
> To me it seems that we should add this to sem.h. See [1]
>
> >> =>exec/totemip.c
> >> -There is an #include for <net/if_var.h>. This file is not present on
> >> NetBSD, so I just dropped that.
> >>
> >> =>exec/logsys.c
> >> -The same problem with madvise() and MADV_NOSYNC.
> >
> > Are you sure, that the code will do the same after that?
>
> These calls were added by FreeBSD guys in this commit [2]. To me it
> seems that we can ignore those calls as we do not specify anything
> like that in our mmap/madvise manual pages. Have you tried to run it
> with gdb ?
>
Yes. The corosync process has 4 LWPs:
# ps axs | grep corosync
0 3837 1 3957 4 4 191 0 23708 2708 parked I- ?
19:28.84 ./corosync
0 3837 1 3957 3 4 191 0 23708 2708 select I- ?
19:28.84 ./corosync
0 3837 1 3957 2 4 191 0 23708 2708 parked I- ?
19:28.84 ./corosync
0 3837 1 3957 1 4 191 0 23708 2708 - R ?
19:28.84 ./corosync
In "live" mode, it seems that gdb can not handle LWPs (is this a NetBSD
issue?):
0xbbadb5a7 in _lwp_park () from /usr/lib/libc.so.12
(gdb)
(gdb) info thr
(gdb) thr 2
Thread ID 2 not known.
(gdb) bt
#0 0xbbadb5a7 in _lwp_park () from /usr/lib/libc.so.12
#1 0xbbbafb14 in pthread_cond_wait () from /usr/lib/libpthread.so.0
#2 0xbbbac8b1 in sem_wait () from /usr/lib/libpthread.so.0
#3 0x0804dbcf in corosync_exit_thread_handler (arg=0x0) at main.c:198
#4 0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#5 0xbbafd670 in swapcontext () from /usr/lib/libc.so.12
What I can do is to kill the process with SIGABRT and then analyse the core
file:
(gdb) info thr
4 process 69373 0xbbbae202 in pthread_rwlock_unlock () from
/usr/lib/libpthread.so.0
3 process 134909 0xbbb14697 in _lwp_exit () from /usr/lib/libc.so.12
2 process 200445 0xbbadab67 in poll () from /usr/lib/libc.so.12
* 1 process 265981 0xbbadb5a7 in _lwp_park () from /usr/lib/libc.so.12
(gdb) bt
#0 0xbbadb5a7 in _lwp_park () from /usr/lib/libc.so.12
#1 0xbbbafb14 in pthread_cond_wait () from /usr/lib/libpthread.so.0
#2 0xbbbac8b1 in sem_wait () from /usr/lib/libpthread.so.0
#3 0x0804dbcf in corosync_exit_thread_handler (arg=0x0) at main.c:198
#4 0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#5 0xbbafd670 in swapcontext () from /usr/lib/libc.so.12
(gdb) thr 2
[Switching to thread 2 (process 200445)]#0 0xbbadab67 in poll () from
/usr/lib/libc.so.12
(gdb) bt
#0 0xbbadab67 in poll () from /usr/lib/libc.so.12
#1 0xbbbad0f9 in poll () from /usr/lib/libpthread.so.0
#2 0x08050269 in prioritized_timer_thread (data=0x0) at timer.c:127
#3 0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#4 0xbbafd670 in swapcontext () from /usr/lib/libc.so.12
(gdb) thr 3
[Switching to thread 3 (process 134909)]#0 0xbbb14697 in _lwp_exit () from
/usr/lib/libc.so.12
(gdb) bt
#0 0xbbb14697 in _lwp_exit () from /usr/lib/libc.so.12
#1 0xbbbb11f6 in pthread_exit () from /usr/lib/libpthread.so.0
#2 0xbbbc0f75 in logsys_worker_thread (data=0x0) at logsys.c:733
#3 0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#4 0xbbafd670 in swapcontext () from /usr/lib/libc.so.12
(gdb) thr 4
[Switching to thread 4 (process 69373)]#0 0xbbbae202 in
pthread_rwlock_unlock () from /usr/lib/libpthread.so.0
(gdb) bt
#0 0xbbbae202 in pthread_rwlock_unlock () from /usr/lib/libpthread.so.0
#1 0x00000000 in ?? ()
>
> >> =>exec/coroipcs.c
> >> -Once more the madvise() issue.
> >> -And once more a missing semun union.
> >>
>
>
>
>
> [1] http://www.opengroup.org/onlinepubs/000095399/functions/semctl.html
> [2]
> http://www.mail-archive.com/openais%lists.linux-foundation.org@localhost/msg02767.html
>
> --
>
>
> Regards.
>
> Adam
Home |
Main Index |
Thread Index |
Old Index