tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Small ld.elf_so speed up
On Thu, 1 Apr 2010 19:01:26 +0200
Joerg Sonnenberger <joerg%britannica.bec.de@localhost> wrote:
> On Thu, Apr 01, 2010 at 05:45:51PM +0100, Sad Clouds wrote:
> > So, if you have an application that is linked to a total of 10
> > shared libraries. Each of those libraries exports 50 symbols. The
> > application references all of those symbols, that is 10 * 50 = 500
> > symbols. This then increases load time.
>
> No. It doesn't matter how many symbols a library *exports*. The
> question is, how many relocations have to be resolved. That is, how
> many undefined symbols are present. There is one exception here in
> that global symbols in the same DSO may take some short cuts if they
> are not exported, but that is not relevant for runtime linker
> overhead.
>
> [snip]
> > I don't know how dynamic linker is implemented, but I've been
> > developing some of my packages/libraries as described above. I also
> > added calls to pthread_mutex_lock() to make package init() and
> > destroy() functions thread-safe.
>
> You have essentially reimplemented what the dynamic linker does. Just
> in a more expensive way. It is more expensive in terms of per-call
> overhead as indirect calls can on most CPUs be considered as
> mispredicted branch. It has larger startup overhead, because the
> relocations can't be done lazy.
>
> Joerg
Joerg I did a few tests and they seem to indicate that declaring
functions 'static' and then exporting them via function pointers is not
more expensive, but quite the opposite.
I built two versions of shared library and main program.
First version is the normal way of letting the linker resolve all
undefined symbols:
./libtest.so.0
This shared library has 10000 simple functions of the form:
int fn_0(int n) { return n++; }
...
int fn_9999(int n) { return n++; }
./test_main
This main program is linked to the above library and has 10000
functions calls of the form:
fn_0(1);
...
fn_9999(1);
Second version is declaring all symbols 'static' and exporting them via
function pointers. The way I described in my previous email:
./libtest2.so.0
This shared library has 10000 simple functions of the form:
static int priv_fn_0(int n) { return n++; }
...
static int priv_fn_9999(int n) { return n++; }
Which are then exported with the following calls at run time:
(*pkg)->fn_0 = &priv_fn_0;
...
(*pkg)->fn_9999 = &priv_fn_9999;
./test2_main
This main program is linked to the above library and has 10000
functions calls of the form:
test2_init(&test2);
test2->fn_0(1);
...
test2->fn_9999(1);
Below are some statistics for both programs:
p3smp$ ls -lh libtest.so.0 test_main
-rwxr-xr-x 1 rom wheel 617K Apr 1 21:13 libtest.so.0
-rwxr-xr-x 1 rom wheel 936K Apr 1 21:17 test_main
p3smp$ size ./test_main
text data bss dec hex filename
673380 40280 36 713696 ae3e0 ./test_main
p3smp$ size ./libtest.so.0
text data bss dec hex filename
391952 116 0 392068 5fb84 ./libtest.so.0
p3smp$ nm test_main | grep U | wc -l
10003
p3smp$ time ./test_main
0.02 real 0.02 user 0.00 sys
----------------------------------------------------------
p3smp$ ls -lh libtest2.so.0 test2_main
-rwxr-xr-x 1 rom wheel 480K Apr 1 21:10 libtest2.so.0
-rwxr-xr-x 1 rom wheel 181K Apr 1 21:15 test2_main
p3smp$ size ./test2_main
text data bss dec hex filename
181676 284 40 182000 2c6f0 ./test2_main
p3smp$ size ./libtest2.so.0
text data bss dec hex filename
200452 160 0 200612 30fa4 ./libtest2.so.0
p3smp$ nm test2_main | grep U | wc -l
4
p3smp$ time ./test2_main
0.00 real 0.00 user 0.00 sys
As you can see above:
1. libtest2.so is 100K smaller
2. test2_main is 755K (5 times) smaller
3. test2_main has only 4 unresolved symbols, compared to 10003 for other
4. test2_main program load/run time is smaller
Home |
Main Index |
Thread Index |
Old Index