Subject: port-i386/34186: mapping of msgbuf during startup may map invalid physical adresses
To: None <port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock@nagler-company.com>
List: netbsd-bugs
Date: 08/11/2006 14:20:01
>Number: 34186
>Category: port-i386
>Synopsis: mapping of msgbuf during startup may map invalid physical adresses
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-i386-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Aug 11 14:20:00 +0000 2006
>Originator: Wolfgang Stukenbrock
>Release: NetBSD 3.0.1, current
>Organization:
Dr. Nagler & Company GmbH
>Environment:
System: NetBSD s012 2.1 NetBSD 2.1 (NSW-S012) #10: Mon Dec 12 12:03:54 CET 2005 wgstuken@s011:/export/netbsd-2.1/usr/src/sys/arch/i386/compile/NSW-S012 i386
Architecture: i386
Machine: i386
>Description:
If the last available physical memeory segment on a system is less 16k, than the
startup code that will map the kernel message buffer, will fail and map physical
pages behind the last segment.
This may either only lead to a message buffer without physical memory behind it,
or to an overlapping message buffer with something else.
Analyses of the problem on our Intel D945 board:
The system will get the following five available memory segments from the BIOS:
(remark: the avail coloumn means the corresponding component of the structure)
start end avail
1000 3f07c 3707c
919 1000 1000
4 9f 9f
3f6e9 3f6ed 3f6ed
3f6ff 3f700 3f700
The startup code in machdep.c will try to get the 16k from the last segment,
determines that there are only 4k available, prints out a warning and setup the
variable msgbuf_paddr the theese 4k.
Later in cpu_startup() this address is used to map 16k to it. BUG!
Now there are two ways to correct this problem.
1. force cpu_startup() to map only the detected size
2. allow multiple segments to map the message buffer of the requested size.
Both sollution are not perfect, because you either get a realy small message
buffer on some motherboards, the second one lead to an unknown number of segments
at compile time.
The best sollution is in the middle, I think.
The fix added below will add a segment list with a static, but I hope via kernel
configuration selectable, size that will be filled up in a loop (implemented by a
goto back to the start of the block) that will fill up the segment list until no
space is available. If this is still no enougth the warning is printer as before.
The function cpu_startup() will determine the size of the message buffer from the
segment list and will map the allocated pages to the allocated virtual adressspace
with the size determined from the segment list.
The default segment list size is set to 2. That should be good for most cases.
In the file pmap.c there seems to be some dead code for a previous message buffer
allocation method. The allocated virtual address for the message buffer in pmap.c
is never used and will always be overwritten by the allocation in cpu_startup().
My patch will remove theese old fragments.
The patch has been tested and it works fine.
Remark: I've done the test only on a sigle processor machine, because I've no
multiprocessor available at the moment.
Remark: The code in the port-amd looks nearly the same as for i386.
The only difference I've located ist the fact, that something is done with
PTE's in pmap.c and I'm not shure what it will exactly do.
Please forward this bugreport to the port-amd maintainer. Thanks
>How-To-Repeat:
just get a motherboard that will supply a last available memory segment less 16k.
>Fix:
Here comes the ouput of "rcsdiff -c *.c" for the files /usr/src/sys/arch/i386/i386/pmap.c
and /usr/src/sys/arch/i386/i386/machdep.c.
*** machdep.c 2006/08/11 09:40:35 1.1
--- machdep.c 2006/08/11 12:05:17
***************
*** 238,245 ****
int tmx86_has_longrun;
vaddr_t msgbuf_vaddr;
! paddr_t msgbuf_paddr;
vaddr_t idt_vaddr;
paddr_t idt_paddr;
--- 238,252 ----
int tmx86_has_longrun;
+ #ifndef MSGBUF_MAX_SEG
+ #define MSGBUF_MAX_SEG 2
+ #endif
vaddr_t msgbuf_vaddr;
! struct {
! paddr_t paddr;
! psize_t sz;
! } msgbuf_p_seg[MSGBUF_MAX_SEG];
! unsigned int msgbuf_p_cnt = 0;
vaddr_t idt_vaddr;
paddr_t idt_paddr;
***************
*** 283,303 ****
void
cpu_startup()
{
! int x;
vaddr_t minaddr, maxaddr;
char pbuf[9];
/*
* Initialize error message buffer (et end of core).
*/
! msgbuf_vaddr = uvm_km_valloc(kernel_map, x86_round_page(MSGBUFSIZE));
if (msgbuf_vaddr == 0)
panic("failed to valloc msgbuf_vaddr");
/* msgbuf_paddr was init'd in pmap */
! for (x = 0; x < btoc(MSGBUFSIZE); x++)
! pmap_kenter_pa((vaddr_t)msgbuf_vaddr + x * PAGE_SIZE,
! msgbuf_paddr + x * PAGE_SIZE, VM_PROT_READ|VM_PROT_WRITE);
pmap_update(pmap_kernel());
initmsgbuf((caddr_t)msgbuf_vaddr, round_page(MSGBUFSIZE));
--- 290,316 ----
void
cpu_startup()
{
! int x, y;
vaddr_t minaddr, maxaddr;
+ psize_t sz;
char pbuf[9];
/*
* Initialize error message buffer (et end of core).
*/
! if (msgbuf_p_cnt == 0)
! panic("msgbuf paddr map has not been set up");
! for (x = 0, sz = 0; x < msgbuf_p_cnt; sz += msgbuf_p_seg[x++].sz);
! msgbuf_vaddr = uvm_km_valloc(kernel_map, sz);
if (msgbuf_vaddr == 0)
panic("failed to valloc msgbuf_vaddr");
/* msgbuf_paddr was init'd in pmap */
! for (y = 0, sz = 0; y < msgbuf_p_cnt; y++) {
! for (x = 0; x < btoc(msgbuf_p_seg[y].sz); x++, sz += PAGE_SIZE)
! pmap_kenter_pa((vaddr_t)msgbuf_vaddr + sz,
! msgbuf_p_seg[y].paddr + x * PAGE_SIZE, VM_PROT_READ|VM_PROT_WRITE);
! }
pmap_update(pmap_kernel());
initmsgbuf((caddr_t)msgbuf_vaddr, round_page(MSGBUFSIZE));
***************
*** 1695,1700 ****
--- 1708,1714 ----
psize_t sz = round_page(MSGBUFSIZE);
psize_t reqsz = sz;
+ search_again:
for (x = 0; x < vm_nphysseg; x++) {
vps = &vm_physmem[x];
if (ptoa(vps->avail_end) == avail_end)
***************
*** 1709,1715 ****
vps->avail_end -= atop(sz);
vps->end -= atop(sz);
! msgbuf_paddr = ptoa(vps->avail_end);
/* Remove the last segment if it now has no pages. */
if (vps->start == vps->end) {
--- 1723,1730 ----
vps->avail_end -= atop(sz);
vps->end -= atop(sz);
! msgbuf_p_seg[msgbuf_p_cnt].sz = sz;
! msgbuf_p_seg[msgbuf_p_cnt++].paddr = ptoa(vps->avail_end);
/* Remove the last segment if it now has no pages. */
if (vps->start == vps->end) {
***************
*** 1723,1732 ****
avail_end = vm_physmem[x].avail_end;
avail_end = ptoa(avail_end);
/* Warn if the message buffer had to be shrunk. */
- if (sz != reqsz)
printf("WARNING: %ld bytes not available for msgbuf "
! "in last cluster (%ld used)\n", reqsz, sz);
}
/*
--- 1738,1754 ----
avail_end = vm_physmem[x].avail_end;
avail_end = ptoa(avail_end);
+ if (sz != reqsz) {
+ reqsz -= sz;
+ if (msgbuf_p_cnt != MSGBUF_MAX_SEG) {
+ /* if still segments available, get memory from next one ... */
+ sz = reqsz;
+ goto search_again;
+ }
/* Warn if the message buffer had to be shrunk. */
printf("WARNING: %ld bytes not available for msgbuf "
! "in last cluster (%ld used)\n", (long)MSGBUFSIZE, MSGBUFSIZE - reqsz);
! }
}
/*
*** pmap.c 2006/08/11 09:40:35 1.1
--- pmap.c 2006/08/11 11:27:01
***************
*** 448,456 ****
caddr_t vmmap; /* XXX: used by mem.c... it should really uvm_map_reserve it */
- extern vaddr_t msgbuf_vaddr;
- extern paddr_t msgbuf_paddr;
-
extern vaddr_t idt_vaddr; /* we allocate IDT early */
extern paddr_t idt_paddr;
--- 448,453 ----
***************
*** 1072,1080 ****
vmmap = (char *)virtual_avail; /* don't need pte */
virtual_avail += PAGE_SIZE;
- msgbuf_vaddr = virtual_avail; /* don't need pte */
- virtual_avail += round_page(MSGBUFSIZE);
-
idt_vaddr = virtual_avail; /* don't need pte */
virtual_avail += PAGE_SIZE;
idt_paddr = avail_start; /* steal a page */
--- 1069,1074 ----
>Unformatted: