tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 45 second autoconfig delay



> Date: Tue, 3 Dec 2024 00:10:23 +0100
> From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
> 
> On Mon, Dec 02, 2024 at 10:57:40PM -0000, Michael van Elst wrote:
> > ef%math.uni-bonn.de@localhost (Edgar =?iso-8859-1?B?RnXf?=) writes:
> > 
> > >With a -10 kernel, I experience a >45s delay during boot at
> > >[   4.7185738] wsmouse0 at ums0 mux 0
> > >[  51.1886454] ipmi0: version 32.0 interface KCS iobase 0xca8/0x8 spacing 4
> > >How can I track down what causes this delay?
> > 
> > The delay comes from the slow "KCS" (keyboard controller) interface.
> > 
> > This is done asynchronously, but netbsd-9 will still wait for all
> > autoconfig operations before continuing the boot. In netbsd-10
> > it no longer waits for ipmi to finish attachment (but may do so
> > at shutdown if that occurs very early).
> 
> I think it still does, at last I'm seeing it on my servers.
> 
> If I remember properly this was reintroduced so that the watchdog is
> available early.

This is PR kern/56568: ipmi.c 1.7 causes large boot delays
<https://gnats.netbsd.org/56568>.  The code is the same in netbsd-9,
netbsd-10, and HEAD; it changed between 9.2 and 9.3.

There is currently a conflict on machines with slow ipmi(4)
initialization where:

(a) If the driver waits for the device to initialize, there's an
    annoying delay before the system reaches userland.

(b) If the driver doesn't wait for device to initialize, it boots too
    fast for the watchdog to work -- PR kern/56539: wdogctl starts
    early, but ipmi takes its time <https://gnats.netbsd.org/56539>.

I was the one who put in the delay, but it was for a different reason.
On PR 56568 I suggested an alternative, which is that:

- ipmi_attach or ipmi_thread can publish the watchdog immediately
  before the device has initialized so it doesn't hold up boot, but

- ipmi_watchdog_setmode must wait with cv_wait_sig until the device
  has initialized before it actually configures the watchdog.

That way, you reach userland quickly but wdogctl(8) still works --
with the caveat that (by necessity) it waits until the device has
initialized.

However, nobody has acted on this suggestion yet.


Here's what happened:

1. The ipmi(4) driver shipped in 9.0 had an ipmi_detach function that
   wouldn't wait for the thread to complete -- and wouldn't even wait
   for the thread to finish initializing, e.g., sc->sc_wdog before it
   called sysmon_wdog_unregister to tear it down.

   ipmi(4) _did_ use a short-lived flag DVF_ATTACH_INPROGRESS _as if_
   that would prevent ipmi_detach from running before the thread has
   finished initializing, except that's not what DVF_ATTACH_INPROGRESS
   ever did (it only changed the timing of the `WARNING: power
   management not supported' message, and it's gone now).

2. I fixed these races by using config_pending_incr in ipmi_attach and
   config_pending_decr in ipmi_thread where it used to set and clear
   DVF_ATTACH_INPROGRESS.

   This caused boot to hang indefinitely because the place where
   ipmi_thread cleared DVF_ATTACH_INPROGRESS was only after its
   near-infinite loop to process commands, interrupted only when
   ipmi_detach asks it to stop.  So I fixed the indefinite hang at
   boot by moving config_pending_decr a little earlier in ipmi_thread
   so that it happens between initialization and the near-infinite
   loop to process commands.

3. hauke@ noticed that on some machines, ipmi(4) booted _too fast_ in
   9.0 for wdogctl(8) to run at boot time:

	PR kern/56539: wdogctl starts early, but ipmi takes its time
	https://gnats.netbsd.org/56539

		On an amd64 server (Supermicro H8DCL-iF board, AMD 4226 cpu),
		I enabled the IPMI watchdog.

		Unfortunately, while the wdogctl rc.d script runs very early
		during the boot (BEFORE: fsck_root), IPMI takes its time:

		[   1.0470319] ipmi0 at mainbus0
		[...]
		[   7.6848901] init: copying out path `/sbin/init' 11
		Tue Dec  7 17:11:51 CET 2021
		Starting watchdog timer.
		wdogctl: WDOGIOC_SMODE: No such process
		/etc/rc.d/wdogctl exited with code 1
		[...]
		[  13.0963472] ipmi0: ID 32.1 IPMI 2.0 Available

   He also noticed that my change had the side effect of holding boot
   up until ipmi_thread had initialized enough for wdogctl(8) to work,
   so he pulled that up to netbsd-9:

	https://releng.netbsd.org/cgi-bin/req-9.cgi?show=1392

4. mlelstv@ noticed that the config_pending_incr/decr changes caused a
   long delay in boot on some machines:

	PR kern/56568: ipmi.c 1.7 causes large boot delays
	https://gnats.netbsd.org/56568

		ipmi(4) talks to a separate system (the BMC) when
		attaching. The communication may be done very slowly,
		often it is passed through the keyboard controller and
		can take several ten seconds.

                The driver used to set the DV_ATTACH_INPROGRESS flag
		to signal that the attachment hasn't completed when
		ipmi_attach returns.

		The flag isn't used anymore by autoconf.

		Changing this to config_pending_incr/descr has a
		different effect, it makes the boot process wait for
		the attachment to finish. This is neither wanted nor
		necessary, you only have to wait for it before you can
		detach the driver again.

		On this system (HP ML110 G4) the attachment takes 40
		seconds.


The logic is currently the same in netbsd-9, netbsd-10, and HEAD.
Here's how it works:

1. ipmi_attach starts ipmi_thread and arranges, with
   config_pending_incr, to have boot held up until the thread has
   acknowledged its startup:

   2135 static void
   2136 ipmi_attach(device_t parent, device_t self, void *aux)
   2137 {
...
   2154 	if (kthread_create(PRI_NONE, KTHREAD_MUSTJOIN, NULL, ipmi_thread, self,
   2155 	    &sc->sc_kthread, "%s", device_xname(self)) != 0) {
   2156 		aprint_error_dev(self, "unable to create thread, disabled\n");
   2157 	} else
   2158 		config_pending_incr(self);
   2159 }

https://nxr.netbsd.org/xref/src/sys/dev/ipmi.c?r=1.10#2135

2. ipmi_thread maps the registers and talks to the device with
   ipmi_get_device_id and get_sdr -- this requires the device to read
   and respond to commands from the OS, which may involve delays (not
   sure how long):

   1982 static void
   1983 ipmi_thread(void *cookie)
   1984 {
...
   1998 	/* Map registers */
   1999 	ipmi_map_regs(sc, ia);
   2000 
   2001 	memset(&id, 0, sizeof(id));
   2002 	if (ipmi_get_device_id(sc, &id))
   2003 		aprint_error_dev(self, "Failed to re-query device ID\n");
   2004 
   2005 	/* Scan SDRs, add sensors to list */
   2006 	for (rec = 0; rec != 0xFFFF;)
   2007 		if (get_sdr(sc, rec, &rec))
   2008 			break;

https://nxr.netbsd.org/xref/src/sys/dev/ipmi.c?r=1.10#1982

3. ipmi_thread prints the message Edgar quoted:

   2063 	aprint_verbose_dev(self, "version %d.%d interface %s %sbase "
   2064 	    "0x%" PRIx64 "/%#x spacing %d\n",
   2065 	    ia->iaa_if_rev >> 4, ia->iaa_if_rev & 0xF, sc->sc_if->name,
   2066 	    ia->iaa_if_iotype == 'i' ? "io" : "mem",
   2067 	    (uint64_t)ia->iaa_if_iobase,
   2068 	    ia->iaa_if_iospacing * sc->sc_if->nregs, ia->iaa_if_iospacing);

https://nxr.netbsd.org/xref/src/sys/dev/ipmi.c?r=1.10#2063

4. ipmi_thread calls config_pending_decr to allow boot to proceed:

   2112 	config_pending_decr(self);

https://nxr.netbsd.org/xref/src/sys/dev/ipmi.c?r=1.10#2112

5. ipmi_thread goes into a loop until impi_detach tells it to stop:

   2114 	mutex_enter(&sc->sc_poll_mtx);
   2115 	while (sc->sc_thread_running) {
...
   2131 	mutex_exit(&sc->sc_poll_mtx);
   2132 	kthread_exit(0);
   2133 }

https://nxr.netbsd.org/xref/src/sys/dev/ipmi.c?r=1.10#2114

   2161 static int
   2162 ipmi_detach(device_t self, int flags)
   2163 {
   2164 	struct ipmi_sensor *i;
   2165 	int rc;
   2166 	struct ipmi_softc *sc = device_private(self);
   2167 
   2168 	mutex_enter(&sc->sc_poll_mtx);
   2169 	sc->sc_thread_running = false;
   2170 	cv_signal(&sc->sc_poll_cv);
   2171 	mutex_exit(&sc->sc_poll_mtx);
   2172 	if (sc->sc_kthread)
   2173 		(void)kthread_join(sc->sc_kthread);

https://nxr.netbsd.org/xref/src/sys/dev/ipmi.c?r=1.10#2161


Home | Main Index | Thread Index | Old Index