NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/43986: ataraid(4) doesn't seem to handle weird array configs very gracefully
The following reply was made to PR kern/43986; it has been noted by GNATS.
From: Andrius V <vezhlys%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost,
"Greg A. Woods" <woods%planix.com@localhost>
Subject: Re: kern/43986: ataraid(4) doesn't seem to handle weird array configs
very gracefully
Date: Tue, 11 Mar 2025 23:58:51 +0200
Hi Greg,
Do you still the hardware available and would be able to test the
patch https://netbsd.org/~andvar/ata_raid_fix2.diff?
It should not crash if drives are missing or RAID is invalid.
On Tue, Mar 26, 2024 at 4:15=E2=80=AFAM Taylor R Campbell <riastradh@netbsd=
.org> wrote:
>
> The following reply was made to PR kern/43986; it has been noted by GNATS=
.
>
> From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
> To: "Greg A. Woods" <woods%planix.com@localhost>
> Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
> Subject: Re: kern/43986: ataraid(4) doesn't seem to handle weird array co=
nfigs very gracefully
> Date: Tue, 26 Mar 2024 02:12:00 +0000
>
> > uvm_fault(0xc0ab2c40, 0, 1) -> 0xe
> > kernel: supervisor trap page fault, code=3D3D0
> > Stopped in pid 0.1 (swapper) at netbsd:strncmp+0x23: movzbl 0(%ebx=
),%=3D
> eax
> > db{0}> trace
> > strncmp(c095397a,1c,6,282,c0a9564c) at netbsd:strncmp+0x23
>
> This is an attempt to access the page zero. Not exactly a null
> pointer dererence, but a near-null pointer dereference -- the second
> argument is 0x1c.
>
> > devsw_name2blk(1c,0,0,0,0) at netbsd:devsw_name2blk+0x7e
>
> It probably happened here:
>
> 535 if (strncmp(conv->d_name, name, len) !=3D3D 0)
>
> https://nxr.netbsd.org/xref/src/sys/kern/subr_devsw.c?r=3D3D1.28#535
>
> > ld_ataraid_attach(c3d9c940,c3dede00,c3da0600,c37d4000,c37f4000) at net=
bsd=3D
> :ld_ataraid_attach+0x1a8
>
> The devsw_name2blk call is probaby this one:
>
> 78 bmajor =3D3D devsw_name2blk(device_xname(adi->adi_dev), N=
ULL, 0);
>
> https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_subr.c?r=3D3D1.2#78
>
> 0x1c is offsetof(struct device, dv_xname) on i386 as of around 2010:
>
> https://nxr.netbsd.org/xref/src/sys/sys/device.h?r=3D3D1.137#141
>
> So adi->adi_dev is probably null here.
>
> ata_raid_disk_vnode_find assumes that adi->adi_dev is nonnull. This
> is probably a bug on its own: the caller handles a null return, so
> either if the input is null, the output should be null, or the caller
> should avoid calling ata_raid_disk_vnode_find with a null input.
>
> 225 adi =3D3D &aai->aai_disks[i];
> 226 vp =3D3D ata_raid_disk_vnode_find(adi);
> 227 if (vp =3D3D=3D3D NULL) {
> 228 /*
> 229 * XXX This is bogus. We should just mar=
k the
> 230 * XXX component as FAILED, and write-bac=
k new
> 231 * XXX config blocks.
> 232 */
> 233 break;
> 234 }
>
> https://nxr.netbsd.org/xref/src/sys/dev/ata/ld_ataraid.c?r=3D3D1.37#225
>
> How did adi->adi_dev come to be null?
>
> Well, the struct ataraid_disk_info *adi structure is one of the
> aai->aai_ndisks entries in a struct ataraid_array_info *aai structure
> which is created by ataraid_get_array_info:
>
> 270 aai =3D3D malloc(sizeof(*aai), M_DEVBUF, M_WAITOK | M_ZER=
O);
> ...
> 289 TAILQ_INSERT_TAIL(&ataraid_array_info_list, aai, aai_list=
);
>
> https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid.c?r=3D3D1.34#258
>
> This is called by the various ata_raid_*.c drivers. From the dmesg
> you shared, it looks like this was an Adaptec adapter, so aai_ndisks
> is filled in from the first component here:
>
> 156 aai->aai_ndisks =3D3D be16toh(info->configs[0].to=
tal_disks);
>
> https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_adaptec.c?r=3D3D1.9=
#156
>
> And aai->aai_disks[drive].adi_dev is filled in for each component
> found here:
>
> 182 adi =3D3D &aai->aai_disks[drive];
> 183 adi->adi_dev =3D3D sc->sc_dev;
>
> https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_adaptec.c?r=3D3D1.9=
#182
>
> If any components are missing, there will be some number `drive' for
> which aai->aai_disks[drive].adi_dev remains null, and the logic will
> crash where you saw this. That seems to be consistent with your
> explanation of the state of the disks.
>
> None of this code has changed much since this PR was filed, so I
> suspect the bug is still there. I'm inclined to say that
> ld_ataraid_attach should just check adi->adi_dev =3D3D=3D3D NULL and han=
dle it
> like when ata_raid_disk_vnode_find returns null -- in both cases, it
> needs to deal with a missing component.
>
> --- ld_ataraid.c
> +++ ld_ataraid.c
> @@ -226,8 +226,8 @@
> */
> for (i =3D3D 0; i < aai->aai_ndisks; i++) {
> adi =3D3D &aai->aai_disks[i];
> - vp =3D3D ata_raid_disk_vnode_find(adi);
> - if (vp =3D3D=3D3D NULL) {
> + if (adi =3D3D=3D3D NULL ||
> + (vp =3D3D ata_raid_disk_vnode_find(adi)) =3D3D=3D3D N=
ULL) {
> /*
> * XXX This is bogus. We should just mark the
> * XXX component as FAILED, and write-back new
>
> Now, the failure branch logic here may be wrong -- it just gives up
> instead of trying to deal with a missing component -- but that's a
> separate issue which, with any luck, should lead to a more graceful
> failure than crash, even if the more graceful failure isn't ideal.
>
Home |
Main Index |
Thread Index |
Old Index