But that is exactly what is happening. I have a dump file that I think I
was able to locate the jobs variable on the stack, and it points to a
pmap_tlb_shootdown_job that links to itself:
*** stack from for pmap_do_tlbshootdown:
(gdb) x/10gx 0xfffffe000e501e58
0xfffffe000e501e58: 0xfffffc0000840468 0xfffffe0000084c70
^ RA from call to pmap_do_tlb_shootdown
0xfffffe000e501e68: 0xfffffc0000b31ea8 0x0000000000000003
0xfffffe000e501e78: 0xfffffc0000b3b2f8 0xfffffc0000b6e3c8
0xfffffe000e501e88: 0xfffffc006f92e2c0 0xfffffc006f92e2c0
^ jobs TAILQ_HEAD
0xfffffe000e501e98: 0xfffffc000083fd84 0xfffffc0000b31ea8
*** pmap_tlb_shootdown_job pointed to by jobs:
(gdb) x/x 0xfffffc006f92e2c0
0xfffffc006f92e2c0: 0xfffffc006f92e2c0
(gdb) print (struct pmap_tlb_shootdown_job)* 0xfffffc006f92e2c0
$4 = {pj_list = {tqe_next = 0xfffffc006f92e2c0, tqe_prev =
0xfffffe000e501e88},
^^^^^^^^^^^^^^^^^^
EEEK!!!!!
pj_va = 18446741874823061504, pj_pmap = 0xfffffc0000ba68a8, pj_pte = 16}
Another oddity - the pmap_tlb_shootdown_q entry for CPU 0 shows a different
count:
(gdb) print pmap_tlb_shootdown_q[0]
$5 = {pq_head = {tqh_first = 0x0, tqh_last = 0xfffffc0000b77480}, pq_lock =
{
mtx_pad1 = 1025, mtx_pad2 = 1}, pq_pte = 16, pq_count = 2, pq_tbia = 0,
pq_pad = '\0' <repeats 23 times>}
The pq_count indicates there should be 2 entries in the job queue.
Somewhere something is corrupting the job queue, but I haven't been able to
spot it. All the accesses look like they should be properly protected
via the pq_lock mutex. I guess the next step will be to put checks in to
verify the proper queue entries and see if I can find where it's getting
corrupted.