Port-xen archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Some strange messages from the kernel
On 06/29/11 13:58, Hugo Silva wrote:
> Hello,
>
> For my latest project I'm simulating many networks with virtual routers,
> subnets etc.
>
> The dom0 is crashing occasionally, and I think it may be related to vm
> start/stop activity.
>
> Here are some strange messages that I've been getting lately:
>
> xvif17i2: can't map TX grant ref: err 0 status -1
> xvif17i0 GNTTABOP_transfer[0] -1
> xvif17i0: req_prod 256 req_cons 28 rsp_prod 27 rsp_prod_pvt 27 i 1
>
>
> At this moment I can't seem to start a NetBSD domU PV, it gets stuck
> during boot, last message is xennet2 being detected. It may be related
> to the above, especially when considering that it was working before,
> and no changes were made to the dom0/domU config.
>
> Maybe someone with more knowledge of the code could comment on the above
> messages?
>
> Regards,
>
> Hugo
The dom0 just crashed. I have some more information.
I now suspect the NetBSD domU crashing is related to storage, not
network. Tried to start the VM without any vifs, it still hung. Then I
realized from the logs that it was only attaching 1 of two virtual disks
(the one it managed to attached is swap space), so I was about to test
running the netbsd/xen install image with and without the 1st virtual
disk in the config (the one that wasn't being attached).
Am I correct to deduce that explains why it "crashed" after xennet2?
That should be when the kernel starts going for the disks.
The machine panicked after I changed the config file, right after I
started it with xm create -c.
There seems to be a pattern in the dmesg output, have a look. The first
entry was logged many hours before the actual panic. rsp_prod etc appear
to keep on increasing. They appear to have started with the "can't map
TX grant ref" error, but I'm not 100% sure.
xvif17i2: can't map TX grant ref: err 0 status -1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 28 rsp_prod 27 rsp_prod_pvt 27 i 1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 29 rsp_prod 28 rsp_prod_pvt 28 i 1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 30 rsp_prod 29 rsp_prod_pvt 29 i 1
<---- many many other messages (the dom0 was operational for a few
hours, while messages like the following kept being logged ---->
xvif17i0: req_prod 256 req_cons 60 rsp_prod 59 rsp_prod_pvt 59 i 1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 61 rsp_prod 60 rsp_prod_pvt 60 i 1
...
xvif17i0: req_prod 256 req_cons 124 rsp_prod 123 rsp_prod_pvt 123 i 1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 125 rsp_prod 124 rsp_prod_pvt 124 i 1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 126 rsp_prod 125 rsp_prod_pvt 125 i 1
Eventually all the way down to when it crashes (where I destroyed the
NetBSD domU PV VM because it appeared to be stuck with xennet2, as
reported in the original email; then I xm create -c domu.conf, and
that's where it panicked:
xbd backend: detach device dm0n for domain 19
xvif19i0: disconnecting
xvif19i1: disconnecting
xvif19i2: disconnecting
xbdback backend/vbd/20/1: can't VOP_OPEN device 0xa907: 16
xbdback backend/vbd/20/1: can't VOP_OPEN device 0xa907: 16
xbd backend: attach device vg0-sslaccel1_s (size 131072) for domain 20
xbd backend 0x1 for domain 20 using event channel 40, protocol x86_64-abi
xbdback backend/vbd/20/2: can't map grant ref: 0/-1
xbdback backend/vbd/20/2: can't map grant ref: 0/-1
xbd backend: detach device dm0n for domain 20
xvif21i0: Ethernet address 00:16:3e:ab:03:03
xbdback backend/vbd/21/1: can't VOP_OPEN device 0xa907: 16
xvif21i1: Ethernet address 00:16:3e:f2:01:03
xvif21i2: Ethernet address 00:16:3e:db:02:03
xbdback backend/vbd/21/1: can't VOP_OPEN device 0xa907: 16
uvm_fault(0xffffffff805fca60, 0xffffa00046e99000, 2) -> e
fatal page fault in supervisor mode
trap type 6 code 3 rip ffffffff8020ecd1 cs e030 rflags 10286 cr2
ffffa00046e990
00 cpl 7 rsp ffffa0003dc10890
Backtrace from ddb:
mi_switch() at netbsd:mi_switch+0x1b4
sleepq_block() at netbsd:sleepq_block+0xdc
cv_timedwait_sig() at netbsd:cv_timedwait_sig+0x98
sbwait() at netbsd:sbwait+0x66
soreceive() at netbsd:soreceive+0xac1
do_sys_recvmsg() at netbsd:do_sys_recvmsg+0x185
sys_recvfrom() at netbsd:sys_recvfrom+0x66
syscall() at netbsd:syscall+0x9b
The process table:
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
18418 1 3 0 0 ffffa00046a29060 xenstore-read tstile
19556 1 3 0 80 ffffa00046ae7100 sh pipe_rd
5750 > 1 7 0 40000 ffffa00046b595a0 python2.6
18191 2 3 0 80 ffffa00046c28aa0 xenconsole netio
18191 1 3 0 40080 ffffa000467479c0 xenconsole select
8162 1 3 0 80 ffffa00046b4e160 ksh ttyraw
17557 3 3 0 1000080 ffffa0003d74eb80 qemu-dm aiowork
17557 2 3 0 1000080 ffffa00046b599c0 qemu-dm netio
17557 1 3 0 80 ffffa00046be3620 qemu-dm select
16320 1 3 0 80 ffffa00046a93900 pickup kqueue
22609 3 3 0 1000080 ffffa000467595e0 qemu-dm aiowork
22609 2 3 0 1000080 ffffa00046d3db60 qemu-dm netio
22609 1 3 0 80 ffffa00046998760 qemu-dm select
17413 1 3 0 80 ffffa00046a9c920 tail kqueue
16495 1 3 0 80 ffffa00046cb1300 ksh pause
24732 1 3 0 80 ffffa000468846a0 tmux kqueue
23201 1 3 0 80 ffffa00046a354a0 ksh pause
24773 1 3 0 80 ffffa0003dc35140 sshd select
24242 1 3 0 80 ffffa0003dc20100 sshd netio
14674 1 3 0 80 ffffa00046a3e0a0 ntpd pause
13065 1 3 0 80 ffffa0003d4562e0 ksh ttyraw
13024 1 3 0 80 ffffa00046c2fac0 su wait
5909 1 3 0 80 ffffa0003ccac5e0 getty ttyraw
3487 1 3 0 80 ffffa0003dc20520 ksh ttyraw
697 1 3 0 80 ffffa0003d91a060 ksh pause
72 1 3 0 80 ffffa0003d88e840 ksh pause
71 1 3 0 80 ffffa0003d89c860 su wait
926 1 3 0 80 ffffa0003d74e760 ksh pause
539 1 3 0 80 ffffa0003d75fba0 tmux kqueue
1291 1 3 0 80 ffffa0003cbb41a0 qmgr kqueue
230 1 3 0 80 ffffa0003d75f780 cron nanoslp
1245 1 3 0 80 ffffa0003d4e4740 master kqueue
1038 1 3 0 80 ffffa0003d456700 sshd select
534 3 3 0 1000080 ffffa0003d4b1720 qemu-dm aiowork
534 2 3 0 1000080 ffffa0003d456b20 qemu-dm netio
534 1 3 0 80 ffffa0003d42e6e0 qemu-dm select
402 2 3 0 80 ffffa0003ccb9a20 xenbackendd netio
402 1 3 0 80 ffffa0003d1c8640 xenbackendd wait
373 2 3 0 80 ffffa0003d3d1260 xenconsoled netio
373 1 3 0 80 ffffa0003d3dd6a0 xenconsoled select
377 101 5 0 0 ffffa00046cb1b40 python2.6
377 100 3 0 80 ffffa00044ffb220 python2.6 netio
377 86 3 0 80 ffffa0003d4b1b40 python2.6 netio
377 77 3 0 80 ffffa0003dc2c540 python2.6 netio
377 9 3 0 80 ffffa0003d42e2c0 python2.6 netio
377 7 3 0 80 ffffa0003d42eb00 python2.6 select
377 6 3 0 80 ffffa0003d4202a0 python2.6 select
377 5 3 0 80 ffffa0003d4206c0 python2.6 netio
377 4 3 0 80 ffffa0003d420ae0 python2.6 parked
377 3 3 0 80 ffffa0003d3dd280 python2.6 select
377 2 3 0 80 ffffa0003d3ddac0 python2.6 socket
377 1 3 0 80 ffffa0003d3d1680 python2.6 select
379 1 3 0 80 ffffa0003d3b7240 python2.6 wait
372 1 2 0 0 ffffa0003d1c8220 xenstored
362 1 3 0 80 ffffa0003cd24200 pflogd bpf
346 1 3 0 80 ffffa0003ccb9600 pflogd netio
240 1 2 0 0 ffffa0003cd24620 syslogd
1 1 3 0 80 ffffa0003437e920 init wait
0 59 5 0 200 ffffa0003437b4a0 (zombie)
0 54 3 0 200 ffffa0003d3d1aa0 xen_balloon xen_balloon
0 53 3 0 200 ffffa0003ccb91e0 ipmi0 ipmi0
0 52 3 0 200 ffffa0003cd24a40 physiod physiod
0 51 3 0 200 ffffa0003cbb49e0 aiodoned aiodoned
0 50 3 0 200 ffffa0003cbaf5a0 ioflush syncer
0 49 3 0 200 ffffa0003cbaf180 pgdaemon pgdaemon
0 46 3 0 200 ffffa0003cbaf9c0 raidio1 raidiow
0 45 3 0 200 ffffa0003cbae160 raid1 rfnodeq
0 44 3 0 200 ffffa0003cbae580 raidio0 raidiow
0 43 3 0 200 ffffa0003cbae9a0 raid0 rfnodeq
0 42 3 0 200 ffffa0003cbad140 xbdbackd xbdbackd
0 41 3 0 200 ffffa0003cbad560 pfpurge pftm
0 40 3 0 200 ffffa0003cbad980 cryptoret crypto_w
0 39 3 0 200 ffffa0003cba9120 scsibus2 sccomp
0 38 3 0 200 ffffa0003437d4e0 scsibus1 sccomp
0 37 3 0 200 ffffa0003437d900 usb3 usbevt
0 36 3 0 200 ffffa0003437c0a0 usb2 usbevt
0 35 3 0 200 ffffa0003437c4c0 usb1 usbevt
0 34 3 0 200 ffffa0003cba9540 usbtask-dr usbtsk
0 33 3 0 200 ffffa0003cba9960 usbtask-hc usbtsk
0 32 3 0 200 ffffa0003437d0c0 usb0 usbevt
0 31 3 0 200 ffffa0003437f100 atapibus0 sccomp
0 29 3 0 200 ffffa0003437e500 unpgc unpgc
0 28 3 0 200 ffffa0003437e0e0 vmem_rehash vmem_rehash
0 27 3 0 200 ffffa0003437f520 xenbus rdst
0 26 3 0 200 ffffa0003437f940 xenwatch rplq
0 17 3 0 200 ffffa0003437c8e0 atabus0 atath
0 16 3 0 200 ffffa0003437b080 scsibus0 sccomp
0 14 3 0 200 ffffa0003437b8c0 ipmi ipmipoll
0 13 3 0 200 ffffa00034377060 sysmon smtaskq
0 12 3 0 200 ffffa00034377480 pmfsuspend pmfsuspend
0 11 3 0 200 ffffa000343778a0 pmfevent pmfevent
0 10 3 0 200 ffffa00034371040 cachegc cachegc
0 9 3 0 200 ffffa00034371460 vrele vrele
0 8 3 0 200 ffffa00034371880 modunload mod_unld
0 7 3 0 200 ffffa0003436f020 xcall/0 xcall
0 6 1 0 200 ffffa0003436f440 softser/0
0 5 1 0 200 ffffa0003436f860 softclk/0
0 4 1 0 200 ffffa0003436d000 softbio/0
0 3 1 0 200 ffffa0003436d420 softnet/0
0 2 1 0 201 ffffa0003436d840 idle/0
0 1 3 0 200 ffffffff805db720 swapper uvm
Callouts:
ticks wheel arg func
2 -1/-256 ffffa00046998760 sleepq_timeout
1 0/45 ffffa00046be3620 sleepq_timeout
1 0/45 ffffa0003d42e6e0 sleepq_timeout
4 0/48 ffffa0003d75fba0 sleepq_timeout
14 0/58 0 pffasttimo
15 0/59 ffffa00001809c00 uhci_poll_hub
15 0/59 ffffa00001809800 uhci_poll_hub
15 0/59 ffffa00001809000 uhci_poll_hub
19 0/63 ffffa0003cd24200 sleepq_timeout
28 0/72 0 if_slowtimo
28 0/72 0 nd6_timer
28 0/72 0 rt_timer_timer
40 0/84 ffffffff805db720 sleepq_timeout
40 0/84 ffffa00034371460 sleepq_timeout
40 0/84 ffffa00034371040 sleepq_timeout
46 0/90 ffffa0003d1c5738 realtimerexpire
49 0/93 0 pfslowtimo
53 0/97 ffffa0003cbad560 sleepq_timeout
86 0/130 ffffa000380a0000 bnx_tick
97 0/141 ffffa0003cbaf5a0 sleepq_timeout
178 0/222 ffffa0003d3dd280 sleepq_timeout
13297 1/266 ffffa00001fcb000 bridge_timer
13299 1/266 ffffa00001fcb800 bridge_timer
13300 1/266 ffffa00002134800 bridge_timer
13300 1/266 ffffa00002134000 bridge_timer
13301 1/266 ffffa0000213c800 bridge_timer
13302 1/266 ffffa00002140800 bridge_timer
13304 1/266 ffffa00002140000 bridge_timer
13394 1/266 0 arptimer
15293 1/273 ffffa0003cbb41a0 sleepq_timeout
18693 1/287 ffffa0003d1c5b58 realtimerexpire
31396 1/336 ffffa0003d1c5c60 realtimerexpire
224 1/471 ffffa0003437b8c0 sleepq_timeout
613 1/472 ffffffff805e8640 xen_timepush
594 1/472 0 vmem_rehash_all_kick
834 1/473 ffffa0003cd24620 sleepq_timeout
1012 1/474 ffffa0003cc26408 rf_paritymap_tick
1319 1/475 ffffa00034916258 rf_paritymap_tick
1500 1/476 ffffa0003cc38400 sme_events_check
2097 1/478 ffffa0003d75f780 sleepq_timeout
2520 1/480 ffffa0003d1c5e70 realtimerexpire
4096 1/486 ffffa0003d4e4740 sleepq_timeout
4670 1/488 ffffa0003437d0c0 sleepq_timeout
4670 1/488 ffffa0003437c4c0 sleepq_timeout
4670 1/488 ffffa0003437c0a0 sleepq_timeout
5276 1/490 ffffa0003437d900 sleepq_timeout
5948 1/493 ffffa0003dc35140 sleepq_timeout
8096 1/501 ffffa00046a93900 sleepq_timeout
100592 2/616 0 nd6_slowtimo
719968 2/622 ffffa00002448950 tcp_timer_keep
598096 2/623 ffffa0003d1c5a50 realtimerexpire
1884992 2/643 0 in6_tmpaddrtimer
2140747400 3/896 ffffa0003d3d1680 sleepq_timeout
When it crashed, 3 HVM machines (all OpenBSD) were running, no issues.
I had to reboot the machine and have no access to ddb for the time
being. Hope the info provided is enough to track it down; should more
debugger info be required, I'm likely to run into this one again anyway.
Best regards,
Hugo
Home |
Main Index |
Thread Index |
Old Index