Re: port-macppc/59014: Shutdown -r now often freezes on macppc b&w G3

To: port-macppc-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,capa150%gmail.com@localhost
Subject: Re: port-macppc/59014: Shutdown -r now often freezes on macppc b&w G3
From: "Chris Tucker via gnats" <gnats-admin%NetBSD.org@localhost>
Date: Wed, 22 Jan 2025 06:00:02 +0000 (UTC)

The following reply was made to PR port-macppc/59014; it has been noted by GNATS.

From: Chris Tucker <capa150%gmail.com@localhost>
To: Taylor R Campbell <riastradh%netbsd.org@localhost>, gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: port-macppc/59014: Shutdown -r now often freezes on macppc b&w G3
Date: Tue, 21 Jan 2025 21:58:08 -0800

 --0000000000002e18b8062c452e69
 Content-Type: text/plain; charset="UTF-8"
 Content-Transfer-Encoding: quoted-printable
 
 On Tue, Jan 21, 2025 at 11:44=E2=80=AFAM Taylor R Campbell <riastradh@netbs=
 d.org>
 wrote:
 
 > > Date: Tue, 21 Jan 2025 11:25:31 -0800
 > > From: Chris Tucker <capa150%gmail.com@localhost>
 > >
 > > netbsd# rndctl -S seed2   (works,
 > > netbsd# rndctl -S seed2   (works)
 > > netbsd# rndctl -S seed2   (works)
 > > netbsd# rndctl -S seed2   (freezes, and I press ctrl-t)
 > > [ 139.8510292] load: 0.42  cmd: rndctl 633 [biowait] 0.01u 0.00s 0% 142=
 4k
 >
 > Interesting.  What kind of disk do you have?
 
 
 It's an old 12.0GB IBM Travelstar DJSA-220 that I grabbed out of my box o'
 parts to test NetBSD 10.1 with. I have been using NetBSD 9.1 on my
 "official" system (dual-boot with MacOS 8.6), running off a new-old-stock
 30GB Maxtor Diamondmax I bought off ebay a year ago. Both IDE.
 
 I wanted to run through the 10.1 install process without disturbing my
 functional 9.1 system, hence the use of the old drive.
 
 
 
 
 > In crash(8) or ddb, can you get the kernel stack trace of the rndctl
 > process?  In this case, the pid is 633, so it would be:
 >
 > crash> bt/t 0t633
 >
 > (Note the `0t' prefix for decimal.)
 >
 > I bet it's going to come up in sys_fsync_range, which is going to
 > reveal a problem with fsync_range(FDISKSYNC) on your disk controller
 > or hard disk.
 >
 
 I ran the rndctl command in the background to get it to fail.
 
 While in crash on console:
 PID    LID     S    CPU     FLAGS         STRUCT LWP  *       NAME      WAI=
 T
 4941  4941   3     0            0                   fb3a680
   rndctl      biowait
 
 >From an ssh session on another machine at the same time:
 netbsd# crash
 Crash version 10.1, image version 10.1.
 Output from a running system is unreliable.
 crash> bt/t 0t4941
 trace: pid 4941 lid 4941 at 0x172cb00
 0x0172cb60: at cpu_switchto+0x28
 0x0172cb70: at mi_switch+0x140
 0x0172cbb0: at sleepq_block+0xe0
 0x0172cbd0: at cv_wait+0xc0
 0x0172cc00: at bwrite+0x150
 0x0172cc20: at ffs_update.part.0+0x3cc
 0x0172cc60: at ufs_gro_rename+0xec
 0x0172ccd0: at genfs_sane_rename+0x220
 0x0172cd40: at ufs_sane_rename+0x3c
 0x0172cd80: at genfs_insane_rename+0x9c
 0x0172cdb0: at VOP_RENAME+0x80
 0x0172cdf0: at do_sys_renameat.isra.0+0x558
 0x0172cec0: at syscall+0x294
 0x0172cf20: user SC trap #128 by 0xfdcac858: srr1=3D0xd032
             r1=3D0xffffe0c0 cr=3D0x44000482 xer=3D0x20000000 ctr=3D0xfdcac8=
 50
 
 I installed NetBSD 10.1 on a third IDE drive, and the error persists.
 
 I tried it on my 9.1 machine, running randctl -L seed 16 times in a row
 before it froze on the 16th time; hit ctrl-t and saw "biowait" again. But I
 suspect randctl wasn't meant to be run 16 times in rapid succession, if
 that matters.
 
 Guess it could be a hardware problem, IDE controller mebbe. I'll try all
 the usual hw fixes (cuda, pram, etc) and mebbe look into a PCI IDE
 controller card, or see if I can dig up a suitable SCSI drive.
 
 --0000000000002e18b8062c452e69
 Content-Type: text/html; charset="UTF-8"
 Content-Transfer-Encoding: quoted-printable
 
 <div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><br></d=
 iv><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On =
 Tue, Jan 21, 2025 at 11:44=E2=80=AFAM Taylor R Campbell &lt;<a href=3D"mail=
 to:riastradh%netbsd.org@localhost" target=3D"_blank">riastradh%netbsd.org@localhost</a>&gt; wro=
 te:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px =
 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(2=
 04,204,204);padding-left:1ex">&gt; Date: Tue, 21 Jan 2025 11:25:31 -0800<br=
 >
 &gt; From: Chris Tucker &lt;<a href=3D"mailto:capa150%gmail.com@localhost"; target=3D"=
 _blank">capa150%gmail.com@localhost</a>&gt;<br>
 &gt; <br>
 &gt; netbsd# rndctl -S seed2=C2=A0 =C2=A0(works,=C2=A0<br>
 &gt; netbsd# rndctl -S seed2=C2=A0 =C2=A0(works)<br>
 &gt; netbsd# rndctl -S seed2=C2=A0 =C2=A0(works)<br>
 &gt; netbsd# rndctl -S seed2=C2=A0 =C2=A0(freezes, and I press ctrl-t)<br>
 &gt; [ 139.8510292] load: 0.42=C2=A0 cmd: rndctl 633 [biowait] 0.01u 0.00s =
 0% 1424k<br>
 <br>
 Interesting.=C2=A0 What kind of disk do you have?</blockquote><div><br></di=
 v><div>It&#39;s an old 12.0GB IBM Travelstar DJSA-220 that I grabbed out of=
  my box o&#39; parts to test NetBSD 10.1 with. I have been using NetBSD 9.1=
  on my &quot;official&quot; system (dual-boot with MacOS 8.6), running off =
 a new-old-stock 30GB Maxtor Diamondmax I bought off ebay a year ago. Both I=
 DE.</div><div><br></div><div>I wanted to run through the 10.1 install proce=
 ss without disturbing my functional 9.1 system, hence the use of the old dr=
 ive.</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:=
 0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left=
 -color:rgb(204,204,204);padding-left:1ex">=C2=A0</blockquote><blockquote cl=
 ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px=
 ;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1e=
 x">
 <br>
 In crash(8) or ddb, can you get the kernel stack trace of the rndctl<br>
 process?=C2=A0 In this case, the pid is 633, so it would be:<br>
 <br>
 crash&gt; bt/t 0t633<br>
 <br>
 (Note the `0t&#39; prefix for decimal.)<br>
 <br>
 I bet it&#39;s going to come up in sys_fsync_range, which is going to<br>
 reveal a problem with fsync_range(FDISKSYNC) on your disk controller<br>
 or hard disk.<br></blockquote><div><br></div><div>I ran the rndctl=C2=A0com=
 mand in the background to get it to fail.</div><div><br></div><div>While in=
  crash on console:</div><div>PID =C2=A0 =C2=A0LID =C2=A0 =C2=A0 S =C2=A0 =
 =C2=A0CPU =C2=A0 =C2=A0 FLAGS =C2=A0 =C2=A0 =C2=A0 =C2=A0 STRUCT LWP =C2=A0=
 * =C2=A0 =C2=A0 =C2=A0 NAME =C2=A0 =C2=A0 =C2=A0WAIT</div><div>4941 =C2=A04=
 941 =C2=A0 3 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=
 =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 fb3a680 =C2=A0 =
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 rndctl =C2=A0 =C2=
 =A0 =C2=A0biowait</div><div><br></div><div>From an=C2=A0ssh session on anot=
 her machine at the same time:</div><div>netbsd# crash<br>Crash version 10.1=
 , image version 10.1.<br>Output from a running system is unreliable.<br>cra=
 sh&gt; bt/t 0t4941<br>trace: pid 4941 lid 4941 at 0x172cb00<br>0x0172cb60: =
 at cpu_switchto+0x28<br>0x0172cb70: at mi_switch+0x140<br>0x0172cbb0: at sl=
 eepq_block+0xe0<br>0x0172cbd0: at cv_wait+0xc0<br>0x0172cc00: at bwrite+0x1=
 50<br>0x0172cc20: at ffs_update.part.0+0x3cc<br>0x0172cc60: at ufs_gro_rena=
 me+0xec<br>0x0172ccd0: at genfs_sane_rename+0x220<br>0x0172cd40: at ufs_san=
 e_rename+0x3c<br>0x0172cd80: at genfs_insane_rename+0x9c<br>0x0172cdb0: at =
 VOP_RENAME+0x80<br>0x0172cdf0: at do_sys_renameat.isra.0+0x558<br>0x0172cec=
 0: at syscall+0x294<br>0x0172cf20: user SC trap #128 by 0xfdcac858: srr1=3D=
 0xd032<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 r1=3D0xffffe0c0 cr=3D0x=
 44000482 xer=3D0x20000000 ctr=3D0xfdcac850<br></div><div><br></div><div>I i=
 nstalled NetBSD 10.1 on a third IDE drive, and the error persists.</div><di=
 v><br></div><div>I tried it on my 9.1 machine, running randctl -L seed 16 t=
 imes in a row before it froze on the 16th time; hit ctrl-t and saw &quot;bi=
 owait&quot; again. But I suspect randctl=C2=A0wasn&#39;t meant to be run 16=
  times in rapid succession, if that matters.</div><div><br></div><div>Guess=
  it could be a hardware problem, IDE controller mebbe. I&#39;ll try all the=
  usual hw fixes (cuda, pram, etc) and mebbe look into a PCI IDE controller =
 card, or see if I can dig up a suitable SCSI drive.</div></div></div>
 </div>
 </div>
 
 --0000000000002e18b8062c452e69--

Prev by Date: Re: port-amd64/58982 (NetBSD 10.1 install image (amd64) failed to boot on J3455-ITX board (ASRock))
Next by Date: port-cobalt/59023: IDE can't do DMA in NetBSD 10
Previous by Thread: Re: port-macppc/59014: Shutdown -r now often freezes on macppc b&w G3
Next by Thread: Re: port-macppc/59014: Shutdown -r now often freezes on macppc b&w G3
Indexes:

Home | Main Index | Thread Index | Old Index