Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: SCSi issues after 10.0 update



On Tue, 11 Jun 2024 at 15:34, Riccardo Mottola
<riccardo.mottola%libero.it@localhost> wrote:
>
> > When the issue triggered before which clients were busy - was it the
> > sparcs? I'm wondering if it may also be related to the specific use
> > patterns of those clients - also might be interesting to know if they
> > fail on rmdir the same way...
>
> I don't know why NFS Client FreeBSD bonnie missed to delete directories,
> since when I tried manually, same user, it worked perfectly. It has
> 1000Mbit link

The missed deleted directories may be something different in the
FreeBSD NFS client behaviour. Particularly as you mention that running
from a fast NetBSD client doesn't cause it...

> When the issue was triggered I had:
>
> - Netra T1 - NetBSD 10 serving NFS and also doing some work itself. It
> has 100Mbit link
>   - Client 1 - SparcStation 10 - NetBSD 9.4 compiling pkg with sources
> over NFS
>   - Client 2 - Sparcstation 20 (or 4?) - NetBSD 10.0 compiling pkg with
> sources over NFS
>
> So it was all a NetBSD business :)
>
> I tried running bonnie++ on SS10 / 9.4 over NFS (same volume as
> FreeBSD), it hangs.
>
> bash-5.2$  bonnie++ -d /disk2/pkg-bin/ -s 256 -r 128
> Writing a byte at a time...done
> Writing intelligently...done
> Rewriting...done
> Reading a byte at a time...done
> Reading intelligently...done
> start 'em...done...done...done...done...done...
>
> <...> waited here for 10 hours
>
> top says the process is parked:
> 19531 multix    39    0    59M 2528K parked/0  12:15  0.00%  0.00% bonnie++
>
> still.. before this it did already quite some read/write over NFS and I
> see no errors on serial console of the Netra server

So likely an issue here, but not immediately clear if it was client or
server based.

> I even retried and it completed!!!
> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
> rochben.wester 256M   12k  97  797k  13  269k  11   13k  99 1021k  11
> 210.3 345
> Latency               996ms    1030ms    1572ms     748ms     303ms
> 936ms
>
> while it was running, I re-run bonnie on the server on the other disk -
> no issues!
> 800k/s write 1000k/s read isn't that much, but it has a 10MBit link none
> can't expect much more. I wonder why writing is so slow.
> 1.5 seconds latency is bad :)
>
> Anyay, couldn't yet reproduce. Building bonnie++ on the S4, so to have a
> Sparc NetBSD 10 client. Perhaps I was using a SS20 at the moment of the
> bug, but thatone is so unstable it is out of order for me.
>
> To further test, I used a NetBSD 10.0 laptop as a client, instead of
> FreeBSD. With wired 1000Mbig ethernet. Performance weren't bad, even if
> with even higher latency.
> It succeeds without issues. So I repeated it while running bonnie++
> locally on the other disk - should be quite a bit of stress.
> NO errors.

OK, so hammering across NFS from various boxes does not trigger anything.

Other thoughts:
- Could it be triggered by multiple NFS clients (possible to test)
- The original case had included building on the NFS server as well,
presumably on the other disk? It could be interactions between the two
disks, though you mention running bonnie on the server on another disk
at the same time so probably already tested that
- Reaching now - maybe heavy compute (compiling) plus IO on both
disks? hard to prove if so
- I assume nothing is overheating, or potentially an older power
supply being marginal under sustained load?

David


Home | Main Index | Thread Index | Old Index