pkgsrc-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
pkg/32621: ucarp pkg doesn't work
>Number: 32621
>Category: pkg
>Synopsis: ucarp pkg doesn't work
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: pkg-manager
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jan 24 23:05:01 +0000 2006
>Originator: Gregory McGarry
>Release: NetBSD 2.0, NetBSD-current
>Organization:
>Environment:
NetBSD 2.0 w/ ex0
NetBSD-current w/ tlp0
>Description:
Using ucarp 1.1. It's the same version as was recently updated in pkgsrc.
The UCARP master machine sends multicast "heartbeat" packets onto the network
which are received by backup machines. Any backup machine will assume the role
of the master if the master machine goes offline. When the master machine
resumes, the multicast beartbeat packets are detected by the backup machine and
it reliquished the master role.
However, if the master machine is running NetBSD, then it receives its own
multicast heartbeat packets, interpretting them as coming from another master
machine, and immediately falls back to the backup role. When the multicast
heartbeat signal is missing, it switches back to the master role, detects its
own multicast heartbeat packet again, and immediately resumes the backup role.
This ping-pong effect continues.
>How-To-Repeat:
Trying running ucarp 1.1.
>Fix:
Does this happen on all machines? I have seen it on tlp and ex hardware. I'm
not sure whether it is expected behaviour, or an issue with multicast filters
on these nics.
Anyway, the following patch simply checks if the multicast heartbeat packet was
sent by us:
--- carp.c.orig 2006-01-24 14:44:07.000000000 -0800
+++ carp.c 2006-01-24 14:45:00.000000000 -0800
@@ -428,6 +428,16 @@
dest = ntohl(iphead.ip_dst.s_addr);
proto = iphead.ip_p;
+#ifdef DEBUG
+ printf("source=%ld (%ld), srcip=%ld(%ld)\n", source, (iphead.ip_src.s_addr)
, ntohl(srcip.s_addr), srcip.s_addr);
+#endif
+
+ /*
+ * Don't process our own multicasts.
+ */
+ if (iphead.ip_src.s_addr == srcip.s_addr)
+ return;
+
switch (proto) {
case IPPROTO_CARP: {
struct carp_header ch;
With this change UCARP works well. I have used it on some very large public
networks. Having thoughts about it, i wonder if this failure is the source of
people's interest in integrating CARP into the kernel. IMHO, this is not a
protocol which belongs in the kernel and getting UCARP working correctly is the
correct solution.
Home |
Main Index |
Thread Index |
Old Index