kern/58684: mpii driver wedges during install of 10.0

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/58684: mpii driver wedges during install of 10.0
From: he%NetBSD.org@localhost
Date: Fri, 20 Sep 2024 21:10:00 +0000 (UTC)

>Number:         58684
>Category:       kern
>Synopsis:       mpii driver wedges during install of 10.0
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Sep 20 21:10:00 +0000 2024
>Originator:     Havard Eidnes
>Release:        NetBSD 10.0
>Organization:
I Try...
>Environment:
System: NetBSD 10.0 USB UEFI amd64 install image
Architecture: x86_64
Machine: amd64
>Description:
	I'm trying to install NetBSD/amd64 10.0 on a Fujitsu RX2540 M4
	equipped with a 600GB SAS disk, connected via a controller
	identifying as

mpii0 at pci9 dev 0 function 0: Symbios Logic SAS3008 (rev. 0x02)
mpii0: interrupting at msix7 vec 0
mpii0: PSAS CP400i, firmware 16.0.0.0IR, MPI 2.5
mpii0: phsyical device inserted in slot 4

	Before I started the installation, I exited to the shell
	and successfully did

# dd if=/dev/urandom bs=32k of=/dev/rsd0d

	and I let it run to completion (took a few hours).

	However, for the extraction of the install sets from within
	sysinst, from the booted-from USB drive, the target file
	system is mounted async, so a large amount of data is being
	buffered, and when the buffering "runs out" there is going
	to be a thundering herd of I/O requests.  It appears that
	the driver is unable to cope with this situation, because
	the drive light extinguishes, and the extraction stalls
	from within sysinst.  After more than a minute I get a long
	slew of kernel messages saying

mpii0: mpii_scsi_cmd_tmo

	I think in the order of 50 such messages in quick succession.
	After this happens, any attempt at doing I/O to sd0 will
	stall.

	Sadly, the host does not have a serial port, and I've yet
	to figure out how to configure and use the basic system
	management on the host (they are "all different", grr!),
	so the above is transcribed from pictures from my phone.

	At one attempt, I suspended sysinst quickly after starting
	the set extraction, and did

# mount -u -o noasync /targetroot

	in an attempt to even out the I/O requests, but apparently
	it was at that point already too late, the extraction had
	already come too far at that point.

	Any information I should collect at that point?

	I've broken into DDB and did "show uvmexp", but there's
	more than 7M pages free (the host has 32GB memory).

	I also did "vmstat -m", but didn't see anything untoward,
	but then again I don't really know what would be abnormal
	in that output.  Sadly, I didn't take a picture of that one...

>How-To-Repeat:
	Try to install with something similar to the above hardware.
>Fix:
	That would be nice.

Follow-Ups:
- Re: kern/58684: mpii driver wedges during install of 10.0
  - From: Havard Eidnes
- Re: kern/58684: mpii driver wedges during install of 10.0
  - From: Havard Eidnes

Prev by Date: Re: port-evbarm/58683: pthread_setaffinity_np() results on SIGSEGV on aarch64
Next by Date: Re: lib/58674 (HEAD fails to build tools on recent Linux (Fedora 40))
Previous by Thread: port-evbarm/58683: pthread_setaffinity_np() results on SIGSEGV on aarch64
Next by Thread: Re: kern/58684: mpii driver wedges during install of 10.0
Indexes:

Home | Main Index | Thread Index | Old Index