NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/43986: ataraid(4) doesn't seem to handle weird array configs very gracefully
Hi Greg,
Do you still the hardware available and would be able to test the
patch https://netbsd.org/~andvar/ata_raid_fix2.diff?
It should not crash if drives are missing or RAID is invalid.
On Tue, Mar 26, 2024 at 4:15 AM Taylor R Campbell <riastradh%netbsd.org@localhost> wrote:
>
> The following reply was made to PR kern/43986; it has been noted by GNATS.
>
> From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
> To: "Greg A. Woods" <woods%planix.com@localhost>
> Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
> Subject: Re: kern/43986: ataraid(4) doesn't seem to handle weird array configs very gracefully
> Date: Tue, 26 Mar 2024 02:12:00 +0000
>
> > uvm_fault(0xc0ab2c40, 0, 1) -> 0xe
> > kernel: supervisor trap page fault, code=3D0
> > Stopped in pid 0.1 (swapper) at netbsd:strncmp+0x23: movzbl 0(%ebx),%=
> eax
> > db{0}> trace
> > strncmp(c095397a,1c,6,282,c0a9564c) at netbsd:strncmp+0x23
>
> This is an attempt to access the page zero. Not exactly a null
> pointer dererence, but a near-null pointer dereference -- the second
> argument is 0x1c.
>
> > devsw_name2blk(1c,0,0,0,0) at netbsd:devsw_name2blk+0x7e
>
> It probably happened here:
>
> 535 if (strncmp(conv->d_name, name, len) !=3D 0)
>
> https://nxr.netbsd.org/xref/src/sys/kern/subr_devsw.c?r=3D1.28#535
>
> > ld_ataraid_attach(c3d9c940,c3dede00,c3da0600,c37d4000,c37f4000) at netbsd=
> :ld_ataraid_attach+0x1a8
>
> The devsw_name2blk call is probaby this one:
>
> 78 bmajor =3D devsw_name2blk(device_xname(adi->adi_dev), NULL, 0);
>
> https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_subr.c?r=3D1.2#78
>
> 0x1c is offsetof(struct device, dv_xname) on i386 as of around 2010:
>
> https://nxr.netbsd.org/xref/src/sys/sys/device.h?r=3D1.137#141
>
> So adi->adi_dev is probably null here.
>
> ata_raid_disk_vnode_find assumes that adi->adi_dev is nonnull. This
> is probably a bug on its own: the caller handles a null return, so
> either if the input is null, the output should be null, or the caller
> should avoid calling ata_raid_disk_vnode_find with a null input.
>
> 225 adi =3D &aai->aai_disks[i];
> 226 vp =3D ata_raid_disk_vnode_find(adi);
> 227 if (vp =3D=3D NULL) {
> 228 /*
> 229 * XXX This is bogus. We should just mark the
> 230 * XXX component as FAILED, and write-back new
> 231 * XXX config blocks.
> 232 */
> 233 break;
> 234 }
>
> https://nxr.netbsd.org/xref/src/sys/dev/ata/ld_ataraid.c?r=3D1.37#225
>
> How did adi->adi_dev come to be null?
>
> Well, the struct ataraid_disk_info *adi structure is one of the
> aai->aai_ndisks entries in a struct ataraid_array_info *aai structure
> which is created by ataraid_get_array_info:
>
> 270 aai =3D malloc(sizeof(*aai), M_DEVBUF, M_WAITOK | M_ZERO);
> ...
> 289 TAILQ_INSERT_TAIL(&ataraid_array_info_list, aai, aai_list);
>
> https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid.c?r=3D1.34#258
>
> This is called by the various ata_raid_*.c drivers. From the dmesg
> you shared, it looks like this was an Adaptec adapter, so aai_ndisks
> is filled in from the first component here:
>
> 156 aai->aai_ndisks =3D be16toh(info->configs[0].total_disks);
>
> https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_adaptec.c?r=3D1.9#156
>
> And aai->aai_disks[drive].adi_dev is filled in for each component
> found here:
>
> 182 adi =3D &aai->aai_disks[drive];
> 183 adi->adi_dev =3D sc->sc_dev;
>
> https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_adaptec.c?r=3D1.9#182
>
> If any components are missing, there will be some number `drive' for
> which aai->aai_disks[drive].adi_dev remains null, and the logic will
> crash where you saw this. That seems to be consistent with your
> explanation of the state of the disks.
>
> None of this code has changed much since this PR was filed, so I
> suspect the bug is still there. I'm inclined to say that
> ld_ataraid_attach should just check adi->adi_dev =3D=3D NULL and handle it
> like when ata_raid_disk_vnode_find returns null -- in both cases, it
> needs to deal with a missing component.
>
> --- ld_ataraid.c
> +++ ld_ataraid.c
> @@ -226,8 +226,8 @@
> */
> for (i =3D 0; i < aai->aai_ndisks; i++) {
> adi =3D &aai->aai_disks[i];
> - vp =3D ata_raid_disk_vnode_find(adi);
> - if (vp =3D=3D NULL) {
> + if (adi =3D=3D NULL ||
> + (vp =3D ata_raid_disk_vnode_find(adi)) =3D=3D NULL) {
> /*
> * XXX This is bogus. We should just mark the
> * XXX component as FAILED, and write-back new
>
> Now, the failure branch logic here may be wrong -- it just gives up
> instead of trying to deal with a missing component -- but that's a
> separate issue which, with any luck, should lead to a more graceful
> failure than crash, even if the more graceful failure isn't ideal.
>
Home |
Main Index |
Thread Index |
Old Index