Bug in loadfile_elf32.c?

To: NetBSD port-sparc64 mailing list <port-sparc64%netbsd.org@localhost>
Subject: Bug in loadfile_elf32.c?
From: Mark Cave-Ayland <mark.cave-ayland%ilande.co.uk@localhost>
Date: Wed, 31 Aug 2016 08:47:45 +0100

Hi all,

I've recently been working on a patchset that changes the way in which
the OpenBIOS client contexts are constructed, and was quite surprised to
see a very minor change to the stack address was causing my NetBSD 6
test image to regress under qemu-system-sparc64.

Digging in further: what I could see with this code change was that the
text segment of the kernel was no longer being mapped at boot, and so
when jump_to_kernel() in arch/sparc/stand/ofwboot/boot.c tried to pass
control over to the loaded kernel, it would fault straight away.

After several hours with the debugger I eventually found out that the
problem with the change applied to OpenBIOS was that marks[MARK_DATA]
was being set to 0x8 rather than 0x1800000 which was causing
sparc64_finalize_tlb_sun4u() to skip the kernel mapping since
(dtlb_store[i].te_va >= data_va) was always true and so we'd drop out of
the loop via the continue and never map the text segment of the kernel.

Eventually I traced the source back to arch/sparc/stand/ofwboot/boot.c
and figured out what was happening. In start_kernel() the marks array is
defined on the heap like this:

u_long marks[MARK_MAX];

When the patch to OpenBIOS was applied, the stack address was changed to
point to an area of memory that had already been used to build a
previous client context, and already contained junk data. It so happened
that marks[MARK_DATA] was not set to 0 by default which meant we were
never triggering the logic below in loadfile_elf32.c to update it,
leaving it set to a random value:

loadseg:
	if (marks[MARK_DATA] == 0 && IS_DATA(phdr[i]))
		marks[MARK_DATA] = LOADADDR(phdr[i].p_vaddr);

I believe the bug here is that loadfile_elf32.c should set
marks[MARK_DATA] = 0 before the main segment loading loop. Fortunately
I'm fairly sure that this isn't an issue on real SPARC hardware since
OBP sets all physical RAM to zero on boot (except retained segments),
however I could see that this could catch out other archs after a reboot
where RAM contents may not necessarily be zero.

In the meantime I'll see if I can figure out a workaround in my OpenBIOS
patches to make sure that the stack is set to zero when executing the
client image to work around this...


ATB,

Mark.

Follow-Ups:
- Re: Bug in loadfile_elf32.c?
  - From: Erik Fair
- Re: Bug in loadfile_elf32.c?
  - From: Martin Husemann

Prev by Date: Re: networking issues under heavy load
Next by Date: Re: Bug in loadfile_elf32.c?
Previous by Thread: networking issues under heavy load
Next by Thread: Re: Bug in loadfile_elf32.c?
Indexes:

Home | Main Index | Thread Index | Old Index