NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/43986: ataraid(4) doesn't seem to handle weird array configs very gracefully



Hi Greg,

Do you still the hardware available and would be able to test the
patch https://netbsd.org/~andvar/ata_raid_fix2.diff?

It should not crash if drives are missing or RAID is invalid.



On Tue, Mar 26, 2024 at 4:15 AM Taylor R Campbell <riastradh%netbsd.org@localhost> wrote:
>
> The following reply was made to PR kern/43986; it has been noted by GNATS.
>
> From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
> To: "Greg A. Woods" <woods%planix.com@localhost>
> Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
> Subject: Re: kern/43986: ataraid(4) doesn't seem to handle weird array configs very gracefully
> Date: Tue, 26 Mar 2024 02:12:00 +0000
>
>  > uvm_fault(0xc0ab2c40, 0, 1) -> 0xe
>  > kernel: supervisor trap page fault, code=3D0
>  > Stopped in pid 0.1 (swapper) at netbsd:strncmp+0x23:    movzbl  0(%ebx),%=
>  eax
>  > db{0}> trace
>  > strncmp(c095397a,1c,6,282,c0a9564c) at netbsd:strncmp+0x23
>
>  This is an attempt to access the page zero.  Not exactly a null
>  pointer dererence, but a near-null pointer dereference -- the second
>  argument is 0x1c.
>
>  > devsw_name2blk(1c,0,0,0,0) at netbsd:devsw_name2blk+0x7e
>
>  It probably happened here:
>
>      535                if (strncmp(conv->d_name, name, len) !=3D 0)
>
>  https://nxr.netbsd.org/xref/src/sys/kern/subr_devsw.c?r=3D1.28#535
>
>  > ld_ataraid_attach(c3d9c940,c3dede00,c3da0600,c37d4000,c37f4000) at netbsd=
>  :ld_ataraid_attach+0x1a8
>
>  The devsw_name2blk call is probaby this one:
>
>       78        bmajor =3D devsw_name2blk(device_xname(adi->adi_dev), NULL, 0);
>
>  https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_subr.c?r=3D1.2#78
>
>  0x1c is offsetof(struct device, dv_xname) on i386 as of around 2010:
>
>  https://nxr.netbsd.org/xref/src/sys/sys/device.h?r=3D1.137#141
>
>  So adi->adi_dev is probably null here.
>
>  ata_raid_disk_vnode_find assumes that adi->adi_dev is nonnull.  This
>  is probably a bug on its own: the caller handles a null return, so
>  either if the input is null, the output should be null, or the caller
>  should avoid calling ata_raid_disk_vnode_find with a null input.
>
>      225                adi =3D &aai->aai_disks[i];
>      226                vp =3D ata_raid_disk_vnode_find(adi);
>      227                if (vp =3D=3D NULL) {
>      228                        /*
>      229                         * XXX This is bogus.  We should just mark the
>      230                         * XXX component as FAILED, and write-back new
>      231                         * XXX config blocks.
>      232                         */
>      233                        break;
>      234                }
>
>  https://nxr.netbsd.org/xref/src/sys/dev/ata/ld_ataraid.c?r=3D1.37#225
>
>  How did adi->adi_dev come to be null?
>
>  Well, the struct ataraid_disk_info *adi structure is one of the
>  aai->aai_ndisks entries in a struct ataraid_array_info *aai structure
>  which is created by ataraid_get_array_info:
>
>      270        aai =3D malloc(sizeof(*aai), M_DEVBUF, M_WAITOK | M_ZERO);
>  ...
>      289        TAILQ_INSERT_TAIL(&ataraid_array_info_list, aai, aai_list);
>
>  https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid.c?r=3D1.34#258
>
>  This is called by the various ata_raid_*.c drivers.  From the dmesg
>  you shared, it looks like this was an Adaptec adapter, so aai_ndisks
>  is filled in from the first component here:
>
>      156                aai->aai_ndisks =3D be16toh(info->configs[0].total_disks);
>
>  https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_adaptec.c?r=3D1.9#156
>
>  And aai->aai_disks[drive].adi_dev is filled in for each component
>  found here:
>
>      182        adi =3D &aai->aai_disks[drive];
>      183        adi->adi_dev =3D sc->sc_dev;
>
>  https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_adaptec.c?r=3D1.9#182
>
>  If any components are missing, there will be some number `drive' for
>  which aai->aai_disks[drive].adi_dev remains null, and the logic will
>  crash where you saw this.  That seems to be consistent with your
>  explanation of the state of the disks.
>
>  None of this code has changed much since this PR was filed, so I
>  suspect the bug is still there.  I'm inclined to say that
>  ld_ataraid_attach should just check adi->adi_dev =3D=3D NULL and handle it
>  like when ata_raid_disk_vnode_find returns null -- in both cases, it
>  needs to deal with a missing component.
>
>  --- ld_ataraid.c
>  +++ ld_ataraid.c
>  @@ -226,8 +226,8 @@
>          */
>         for (i =3D 0; i < aai->aai_ndisks; i++) {
>                 adi =3D &aai->aai_disks[i];
>  -              vp =3D ata_raid_disk_vnode_find(adi);
>  -              if (vp =3D=3D NULL) {
>  +              if (adi =3D=3D NULL ||
>  +                  (vp =3D ata_raid_disk_vnode_find(adi)) =3D=3D NULL) {
>                         /*
>                          * XXX This is bogus.  We should just mark the
>                          * XXX component as FAILED, and write-back new
>
>  Now, the failure branch logic here may be wrong -- it just gives up
>  instead of trying to deal with a missing component -- but that's a
>  separate issue which, with any luck, should lead to a more graceful
>  failure than crash, even if the more graceful failure isn't ideal.
>


Home | Main Index | Thread Index | Old Index