tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

i915drmkms heartbeat failure, reset failure and hard lockup



I've had a few of these of late on amd64 netbsd-10 from ~2024-10-22.
It seems to be triggered by high system load, likely memory pressure
and high paging (pkgsrc builds, I blame rust, mostly).

I get a i915drmkms heartbeat failure, followed by weird screen artifacts
(horizontal tearing and redraws taking a second or two). Some time later
(minutes, hours, days even), I get a hard lock (no response to console,
ping, etc, watchdogs on swwdog0 & tco0 also didn't fire).

Anyone seen anything similar? Any hints? I'm planning on upgrading this
old machine soon, but figured I can't be the only one running into this.

fwiw, i915 is specifically:
i915drmkms0 at pci0 dev 2 function 0: Intel Sandy Bridge (desktop) GI1 Integrated Graphics Device (rev. 0x09)

X reports:
[    87.659] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20200114
[    87.661] (WW) Falling back to old probe method for modesetting
[    87.661] (WW) Falling back to old probe method for wsfb
[    87.661] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[    87.662] (--) intel(0): Integrated Graphics Chipset: Intel(R) HD Graphics 2000
[    87.662] (--) intel(0): CPU: x86-64, sse2, sse3, ssse3, sse4.1, sse4.2, avx; using a maximum of 4 threads

Heartbeat error:
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973] bcs0 heartbeat {prio:-2147483645} not ticking
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]        Awake? 4
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]        Barriers?: no
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]        Latency: 95us
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]        Heartbeat: 3000 ms ago
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]        Reset count: 0 (global 0)
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]        Requests:
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                active  4:2d9f7b*  @ 5571ms: X[1511]
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                ring->start:  0x7fff3000
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                ring->head:   0x00002dd8
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                ring->tail:   0x000034c8
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                ring->emit:   0x000034c8
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                ring->space:  0x00002858
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                ring->hwsp:   0x7fff7100
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973] [head 2de8, postfix 2e60, tail 2e70, batch 0x00000000_07bc6000]:
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973] warning: /home/netbsd/netbsd-10/src/sys/external/bsd/drm2/dist/drm/i915/gt/intel_engine_cs.c:1234: WARN_ON_ONCE(hex_dump_to_buffer(buf + pos, len - pos, rowsize, sizeof(u32), line, sizeof(line), 0) >= sizeof(line))
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973] [0000] 01402413 04020000 00000000 00000000 01402013 04020000 00000000 00000000
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973] 01000011 20220200 ffffffff 01000011 28220200 0000df7f 0
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973] [0020] 01000011 20220200 ffffffff 01000011 28220200 0000df7f 01004012 28220200
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973] 00f0ff7f 01000011 c0200200 00020002 01402013 04020000 0
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973] bcs0 heartbeat {prio:-2147483645} not ticking[ 120338.1753973]         IPEHR: 0x16060720
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973] b      E  4:2d9f7b*  @ 5571ms: X[1511]
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                E  4:2d9f7c  @ 5570ms: X[1511]
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                E  4:2d9f7d  @ 5570ms: X[1511]
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                E  4:2d9f7e  @ 5570ms: X[1511]
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                E  4:2d9f7f  @ 5569ms: X[1511]
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                E  4:2d9f80  @ 5569ms: X[1511]
Nov 14 09:18:29 slave /netbsd: [ 120338.1753973]                E  4:2d9f81  @ 5569ms: X[1511]
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973]                ...skipping 5 executing requests...
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973]                E  4:2d9f87  @ 3001ms: [i915]
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] HWSP:
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] HWSP:[ 120338.1763973] 00000000 00000000 00000000 00000000 00000000 00000000 0
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] [0020] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] 00000000 00000000 00000000 00000000 00000000 00000000 0
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] *
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] [0100] 7a9f2d00 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] 00000000 00000000 00000000 00000000 00000000 00000000 0
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] [0120] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] 00000000 00000000 00000000 00000000 00000000 00000000 0
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] *
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] Idle? no
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] Signals:
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973]        [4:2d9f86] @ 5568ms
Nov 14 09:18:29 slave /netbsd: [ 120338.1763973] i915drmkms0: notice: Resetting chip for stopped heartbeat on bcs0
Nov 14 09:18:29 slave /netbsd: [ 120338.2423960] i915drmkms0: autoconfiguration error: error: Failed to reset chip

-- 
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.


Home | Main Index | Thread Index | Old Index