Re: Improvements in amd64

To: Martin Husemann <martin%duskware.de@localhost>
Subject: Re: Improvements in amd64
From: Maxime Villard <max%m00nbsd.net@localhost>
Date: Fri, 13 May 2016 17:42:55 +0200

Le 13/05/2016 16:48, Martin Husemann a écrit :

On Fri, May 13, 2016 at 12:53:54PM +0200, Maxime Villard wrote:

  - I took rodata out of the text+rodata chunk, and put it in the data+bss+
    PRELOADED_MODULES+BOOTSTRAP_TABLES chunk [3].


Why?

You are probably assuming something obvious to you, but for folks
not too deep into x86 MMU handling (like me), this sounds like a very
strange thing to do.

Martin


What I wanted to achieve, from the beginning to the end, was mapping
text with RX, rodata with R, and data+bss with RW, and optimize them
with large pages.

Initially, two chunks were mapped contiguously in both amd64 and i386:
 - text+rodata with RX permissions
 - data+bss+PRELOADED_MODULES+BOOTSTRAP_TABLES with RWX permissions
Among those two chunks, only the first could benefit from the large
page optimization.

There are actually four levels of pages on amd64, and the CPU iterates
on each level in order to find the physical address of a given virtual
address. In order to optimize the access time, the CPU uses several
caches to remember the regularly-accessed addresses, and does not have
to iterate in the 4-layer tree. Each page is 4096 byte long.

By using large pages, we are actually short-circuiting the iteration,
so that it looks like a 3-layer tree. Therefore, accessing an area that
is in a large page is a little faster; but more importantly, large
pages are 2MB long, so the CPU needs only one entry in the cache for
a larger area, and it means that it is less likely to be "saturated".

The drawback of large pages is that we lose permission granularity. If
text and rodata are put in the same large page, they'll share the same
entries in the tree, and therefore will have the same permission.

By moving rodata in the second chunk, I just intended to dedicate large
pages exclusively for the text segment, and keep rodata out of them in
order to later grant it its own large pages with its own permissions.

What I could have done is simply taking rodata out of the first chunk,
and mapping it independently directly as a chunk. However, large pages
are handled at the x86 level, which is shared between amd64 and i386.
I wanted to split the segments exclusively on amd64 first. So I did
two splits:
 - the split you're referring to, at the x86 level, that temporarily
   mapped rodata with normal pages and with RWX (which is bad); this
   split affects both amd64 and i386
 - a second right after the first, this time at the amd64 level, to map
   rodata independently as a chunk with only R; this split affects only
   amd64

i386 is still affected by the first split: right now, on i386, the
rodata segment is in the second chunk, and is RWX.

In my last point, I enabled large page mappings at the x86 level for
rodata and data+bss; both amd64 and i386 benefit from it. amd64 is
close to as optimized as you can get. i386 is not (yet), because of
alignment issues.

For i386, it's a bit more complex; the large page size depends on
whether PAE is enabled, and therefore the segments need to be aligned
differently in each case.

There are still three huge problems with the mappings - which were
already there when I came in -.

Follow-Ups:
- Re: Improvements in amd64
  - From: Joerg Sonnenberger
- Re: Improvements in amd64
  - From: Martin Husemann

References:
- Improvements in amd64
  - From: Maxime Villard
- Re: Improvements in amd64
  - From: Martin Husemann

Prev by Date: Re: Improvements in amd64
Next by Date: Re: Improvements in amd64
Previous by Thread: Re: Improvements in amd64
Next by Thread: Re: Improvements in amd64
Indexes:

Home | Main Index | Thread Index | Old Index