Subject: Re: bouyer-xenamd64 merge (xen roadmap)
To: Adam Hamsik <haaaad@gmail.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-xen
Date: 11/20/2007 22:40:48
On Tue, Nov 20, 2007 at 10:03:45PM +0100, Adam Hamsik wrote:
> >>
> >Of course I updated my source tree since Ibuilt this kernel, so the
> >stack
> >trace isn't so usefull.
> >Can you try again with the kernel I've put at
> >ftp://asim.lip6.fr/outgoing/bouyer/amd64/netbsd-INSTALL_XEN3_DOMU.gz
> >
> >thanks
> >
> >BTW, what is your hardware, Xen version and domU config ?
> >
>
> This machine had some hardware issues with memory. That can be source
> of my problems I thought that they are resolved, but nobody knows.
Well, I had some issue with my devel box too, which seems to show up
again. But it 2 systems have hardware issues maybe it's not hardware ?
I'll run memtest tomorow, it was successfull at pointing out the issue
last time. In my case, I also get panics in the hypervisor itself, at
various places, so I suspect it's really hardware in my case.
>
>
> hardware:
>
> Processor is:
>
> cat /proc/cpuinfo
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 15
> model : 75
> model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
> stepping : 2
> cpu MHz : 2000.000
> cache size : 512 KB
> physical id : 0
> siblings : 1
> core id : 0
> cpu cores : 1
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
> clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext
> 3dnow pni cx16 lahf_lm cmp_legacy svm cr8_legacy
> bogomips : 4038.90
> TLB size : 1024 4K pages
> clflush size : 64
> cache_alignment : 64
> address sizes : 40 bits physical, 48 bits virtual
> power management: ts fid vid ttp tm stc
>
> Dmesg attached.
>
> XEN :
> # xm info
> host : xena2
> release : 2.6.20.4
> version : #3 SMP Tue Apr 10 18:27:16 Local time zone
> must be set--see zic
> machine : x86_64
> nr_cpus : 2
> nr_nodes : 1
> sockets_per_node : 1
> cores_per_socket : 2
> threads_per_core : 1
> cpu_mhz : 2009
> hw_caps : 178bfbff:ebd3fbff:
> 00000000:00000010:00002001:00000000:0000001f
> total_memory : 4031
I'll have to try on a system with that much memory. Hopefully I can get
at this tomorow.
> free_memory : 969
> xen_major : 3
> xen_minor : 1
> xen_extra : .0
Same as mine, it seems.
> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
> hvm-3.0-x86_32p hvm-3.0-x86_64
> xen_scheduler : credit
> xen_pagesize : 4096
> platform_params : virt_start=0xffff800000000000
> xen_changeset : unavailable
> cc_compiler : gcc version 4.1.1 (Gentoo 4.1.1-r3)
> cc_compile_by : root
> cc_compile_domain : at.fiit.stuba.sk
> cc_compile_date : Wed Jul 4 09:56:13 CEST 2007
> xend_config_format : 4
>
> DomU config attached.
>
> I got this panic with your new kernel.
>
> Kernelized RAIDframe activated
> Status: Finishedpanic: HYPERVISOR_mmu_update failed
> Command: /sbin/dhclient -q -pf /tmp/dhclnt.pid -lf /tmp/
> dhclient.leases xenn
> Stopped in pid 34.1 (dhclient) at 0xffffffff8026d9b9: ret
So it did boot and started some programs
> db> bt
> ?() at 0xffffffff8026d9b9
breakpoint()
> ?() at 0xffffffff8027b1f9
xpq_flush_queue()
> ?() at 0xffffffff80274954
pmap_map_ptes() (pmap_pte_flush)
> ?() at 0xffffffff802767c5
pmap_do_remove()
> ?() at 0xffffffff801c6a09
uvm_unmap_remove()
> ?() at 0xffffffff801ca692
uvmspace_free()
> ?() at 0xffffffff801f1408
exit1()
> ?() at 0xffffffff801ff0ba
sigexit()
> ?() at 0xffffffff80200526
postsig()
> ?() at 0xffffffff801f607c
lwp_userret()
> ?() at 0xffffffff8027835d
child_return()
>
> I got this panic only when I run dhclient from sysinst. If I configure
> network from /bin/sh and then run sysinst everything works fine. I
I didn't try dhcp in my domU, only fixed network configs.
I just tested and got the same panic as you did, with the same backtrace.
I'll probably be able to debug this tomorow then ...
> have installed system with this little hack and tested your yesterday
> domu-kernel. I got this panic with it.
>
> Starting file system checks:
> uvm_fault(0xffffa00009931758, 0x0, 1) -> e
> kernel: page fault trap, code=0
> Stopped in pid 17.1 (mount_ffs) at 0xffffffff80245a23:
This is a null pointer dereference. I didn't see this one.
I'll see if I can get something from the trace.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--