NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/58775 (apei(4) spamming console)



The following reply was made to PR kern/58775; it has been noted by GNATS.

From: Hauke Fath <hf%spg.tu-darmstadt.de@localhost>
To: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Cc: gnats-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost
Subject: Re: kern/58775 (apei(4) spamming console)
Date: Sun, 27 Oct 2024 00:13:15 +0200

 On Sat, 26 Oct 2024 15:49:32 +0000, Taylor R Campbell wrote:
 > So, the new apei(4) code and pcictl(8) both confirm that your
 > PCI device is unhappy with lots of hardware errors -- corrected
 > errors, but still alarming.  This is almost certainly an actual
 > hardware problem that you might want to address (once we're done
 > doing science!).
 
 Disturbing - this is basically a new machine, from a batch we bought=20
 last year (always quicker bought than deployed). We did have another=20
 machine from the same vendor, though, whose ECC RAM faults vanished=20
 once the offending module was found and - reseated. We've worked with=20
 this vendor for almost twenty years, but they got so big they=20
 apparently don't have to sweat the details any more.
 
 I guess I'll hook up the machine's ipmi console on Monday, and see what=20
 that has to say.
 
 > Can you revert the previous patch and try the attached patch instead,
 > which applies a rate limit to the console output?
 
 Done, resulted in a much more reasonable message rate. Thanks!
 
 In the general case, how would I map the "error source" on hardware?
 
 Cheerio,
 Hauke
 
 --=20
      The ASCII Ribbon Campaign                    Hauke Fath
 ()     No HTML/RTF in email            Institut f=FCr Nachrichtentechnik
 /\     No Word docs in email                     TU Darmstadt
      Respect for open standards              Ruf +49-6151-16-21344
 


Home | Main Index | Thread Index | Old Index