On Sat, Apr 25, 2020 at 09:50:26AM +0200, Maxime Villard wrote:
Small tactical mitigation I want to (re)introduce.
The copyin and copyout functions (plus their ufetch and ustore variants)
register a specific handler, which causes page faults in kernel mode to
be dismissed as user faults. This way, when userland passes an unmapped
page to copyin for example, the kernel will page-fault but will consider
it as a non-fatal user fault.
The problem I see, is that when discarding such faults, the kernel doesn't
check the VA that faulted, and this hides certain faults that should be
considered as fatal: those occuring on kernel pages, betraying a bug in the
caller of the copy functions.
Suppose there is a crazy syscall that passes a user-controllable length to
copyout (something I've encountered already in the past). An attacker can
do a gigantic copyout and retrieve kernel memory until a page fault occurs
on an unmapped kernel page; the page fault will actually be dismissed, and
the syscall will just error out, meaning that the system remains alive and
the attacker remains alive too. With simple forms of heap re-modelling, and
by repeating this operation, the attacker could manage to dump strictly all
of the kernel memory.
I want to apply this change [1], that causes the handler to panic if the
fault came from a kernel page. This isn't a very excellent mitigation, but
at least, it will expose bogus memory accesses and make it less easy for
severe buffer overflows to go unpunished.
The change is based on KASSERTs I initially added in 2016:
https://mail-index.netbsd.org/source-changes/2016/09/16/msg077746.html
But which I quickly reverted without really understanding what was wrong
with them. In 2018 I realized that CR2 was getting accessed in non-page
faults, which was a bug:
https://mail-index.netbsd.org/source-changes/2018/02/25/msg092488.html
In retrospect I see that my KASSERTs were firing because of this bug. Now
I want to re-introduce these KASSERTs as panics.
Maxime
[1] https://m00nbsd.net/garbage/x86/copy-trap.diff
Improving the detection of buggy copyin/out()s would be great, however
any copyin/out where the both the kernel and user side are pageable
can legitimately fail on either side. The examples of this that I can
think of right now are ubc_uiomove() and copyinargs(). Please make sure
that we recover from such legitimate fault failures than treat a failure
in such a situation as a bug.