NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/43986: ataraid(4) doesn't seem to handle weird array configs very gracefully



The following reply was made to PR kern/43986; it has been noted by GNATS.

From: Andrius V <vezhlys%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, 
	"Greg A. Woods" <woods%planix.com@localhost>
Subject: Re: kern/43986: ataraid(4) doesn't seem to handle weird array configs
 very gracefully
Date: Tue, 11 Mar 2025 23:58:51 +0200

 Hi Greg,
 
 Do you still the hardware available and would be able to test the
 patch https://netbsd.org/~andvar/ata_raid_fix2.diff?
 
 It should not crash if drives are missing or RAID is invalid.
 
 
 
 On Tue, Mar 26, 2024 at 4:15=E2=80=AFAM Taylor R Campbell <riastradh@netbsd=
 .org> wrote:
 >
 > The following reply was made to PR kern/43986; it has been noted by GNATS=
 .
 >
 > From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
 > To: "Greg A. Woods" <woods%planix.com@localhost>
 > Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
 > Subject: Re: kern/43986: ataraid(4) doesn't seem to handle weird array co=
 nfigs very gracefully
 > Date: Tue, 26 Mar 2024 02:12:00 +0000
 >
 >  > uvm_fault(0xc0ab2c40, 0, 1) -> 0xe
 >  > kernel: supervisor trap page fault, code=3D3D0
 >  > Stopped in pid 0.1 (swapper) at netbsd:strncmp+0x23:    movzbl  0(%ebx=
 ),%=3D
 >  eax
 >  > db{0}> trace
 >  > strncmp(c095397a,1c,6,282,c0a9564c) at netbsd:strncmp+0x23
 >
 >  This is an attempt to access the page zero.  Not exactly a null
 >  pointer dererence, but a near-null pointer dereference -- the second
 >  argument is 0x1c.
 >
 >  > devsw_name2blk(1c,0,0,0,0) at netbsd:devsw_name2blk+0x7e
 >
 >  It probably happened here:
 >
 >      535                if (strncmp(conv->d_name, name, len) !=3D3D 0)
 >
 >  https://nxr.netbsd.org/xref/src/sys/kern/subr_devsw.c?r=3D3D1.28#535
 >
 >  > ld_ataraid_attach(c3d9c940,c3dede00,c3da0600,c37d4000,c37f4000) at net=
 bsd=3D
 >  :ld_ataraid_attach+0x1a8
 >
 >  The devsw_name2blk call is probaby this one:
 >
 >       78        bmajor =3D3D devsw_name2blk(device_xname(adi->adi_dev), N=
 ULL, 0);
 >
 >  https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_subr.c?r=3D3D1.2#78
 >
 >  0x1c is offsetof(struct device, dv_xname) on i386 as of around 2010:
 >
 >  https://nxr.netbsd.org/xref/src/sys/sys/device.h?r=3D3D1.137#141
 >
 >  So adi->adi_dev is probably null here.
 >
 >  ata_raid_disk_vnode_find assumes that adi->adi_dev is nonnull.  This
 >  is probably a bug on its own: the caller handles a null return, so
 >  either if the input is null, the output should be null, or the caller
 >  should avoid calling ata_raid_disk_vnode_find with a null input.
 >
 >      225                adi =3D3D &aai->aai_disks[i];
 >      226                vp =3D3D ata_raid_disk_vnode_find(adi);
 >      227                if (vp =3D3D=3D3D NULL) {
 >      228                        /*
 >      229                         * XXX This is bogus.  We should just mar=
 k the
 >      230                         * XXX component as FAILED, and write-bac=
 k new
 >      231                         * XXX config blocks.
 >      232                         */
 >      233                        break;
 >      234                }
 >
 >  https://nxr.netbsd.org/xref/src/sys/dev/ata/ld_ataraid.c?r=3D3D1.37#225
 >
 >  How did adi->adi_dev come to be null?
 >
 >  Well, the struct ataraid_disk_info *adi structure is one of the
 >  aai->aai_ndisks entries in a struct ataraid_array_info *aai structure
 >  which is created by ataraid_get_array_info:
 >
 >      270        aai =3D3D malloc(sizeof(*aai), M_DEVBUF, M_WAITOK | M_ZER=
 O);
 >  ...
 >      289        TAILQ_INSERT_TAIL(&ataraid_array_info_list, aai, aai_list=
 );
 >
 >  https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid.c?r=3D3D1.34#258
 >
 >  This is called by the various ata_raid_*.c drivers.  From the dmesg
 >  you shared, it looks like this was an Adaptec adapter, so aai_ndisks
 >  is filled in from the first component here:
 >
 >      156                aai->aai_ndisks =3D3D be16toh(info->configs[0].to=
 tal_disks);
 >
 >  https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_adaptec.c?r=3D3D1.9=
 #156
 >
 >  And aai->aai_disks[drive].adi_dev is filled in for each component
 >  found here:
 >
 >      182        adi =3D3D &aai->aai_disks[drive];
 >      183        adi->adi_dev =3D3D sc->sc_dev;
 >
 >  https://nxr.netbsd.org/xref/src/sys/dev/ata/ata_raid_adaptec.c?r=3D3D1.9=
 #182
 >
 >  If any components are missing, there will be some number `drive' for
 >  which aai->aai_disks[drive].adi_dev remains null, and the logic will
 >  crash where you saw this.  That seems to be consistent with your
 >  explanation of the state of the disks.
 >
 >  None of this code has changed much since this PR was filed, so I
 >  suspect the bug is still there.  I'm inclined to say that
 >  ld_ataraid_attach should just check adi->adi_dev =3D3D=3D3D NULL and han=
 dle it
 >  like when ata_raid_disk_vnode_find returns null -- in both cases, it
 >  needs to deal with a missing component.
 >
 >  --- ld_ataraid.c
 >  +++ ld_ataraid.c
 >  @@ -226,8 +226,8 @@
 >          */
 >         for (i =3D3D 0; i < aai->aai_ndisks; i++) {
 >                 adi =3D3D &aai->aai_disks[i];
 >  -              vp =3D3D ata_raid_disk_vnode_find(adi);
 >  -              if (vp =3D3D=3D3D NULL) {
 >  +              if (adi =3D3D=3D3D NULL ||
 >  +                  (vp =3D3D ata_raid_disk_vnode_find(adi)) =3D3D=3D3D N=
 ULL) {
 >                         /*
 >                          * XXX This is bogus.  We should just mark the
 >                          * XXX component as FAILED, and write-back new
 >
 >  Now, the failure branch logic here may be wrong -- it just gives up
 >  instead of trying to deal with a missing component -- but that's a
 >  separate issue which, with any luck, should lead to a more graceful
 >  failure than crash, even if the more graceful failure isn't ideal.
 >
 


Home | Main Index | Thread Index | Old Index