Subject: Shrinking NetBSD - deep final distribution re-linker
To: None <tech-userlevel@netbsd.org>
From: Ian Zagorskih <ianzag@megasignal.com>
List: tech-userlevel
Date: 10/19/2004 16:04:16
Sorry for quite confusing subject, just i don't know how exactly express my
ideas :) Not sure where exactly should i post this message..
Pre-history. We're using NetBSD in various embedded devices and sometimes data
storage resources are limited (usually it's some kind of flash). Today on the
base of standard NetBSD distribution i can freely build target installations
with relatively small size, about ~8 Mb. This includes non-stripped
and non-compressed kernel (about 1Mb) and some sub-sets
of /bin, /sbin, /usr/bin and so on. Cut off is done on the basis "what do i
need for a simple terminal server to administer it and write/run simple
scripts". Right now all works just fine :)
Well, cannot say i really need to shrink existing NetBSD installations for our
projects. Flash is relatively cheap today and if you use USB/Compact
Flash/Disk On Module cards you most likely the smallest card you can get is
at least 8Mb. On the other hand, i feel that theoretically i can make a much
smaller but fully operational NetBSD installation. Just to the sake of the
pure art :)
One of the hugest file in installation is libc and some other shared
libraries. I have a feeling that huge part of libc is actually not used by
any application so it just wastes space. And if i'd removed this part nothing
would break but it saves space.
Sure, i can build custom libc from source code switching off this or these
source sub-trees. But from my point of view this way brings some minor and
major technical problems. Starting from "what to remove?" and ending with how
to update custom source tree from NetBSD's CVS. I would prefer to leave
sources tree alone in unmodified state as i fetched it from the vendor.
So why not to walk opposite way and not to shrink binary libc itself ? I have
a set of dynamically linked executables in ELF format. I can read table of
imported symbols from each of them and build a common table of used symbols.
Next i can search for this symbols in some predefined set of shared
libraries (resolve them). For example, 100 symbols was located in libc while
libc itself contains 500 exported symbols. From my point of view, nothing
stops me to manually "re-link" libc and drop this unused 400 symbols so now
libc contains only 100 actually required symbols. This way i can "re-link"
other shared libraries which are used by my executables.
Well, i hope you got the idea i'm talking about :) At the end of this
operation i should make a set of custom shared libraries which contains only
used code.
Sure, there are at least two problems i see ATM:
1) Custom libs quite likely willn't work with new apps which probably require
missing symbols. So i need to re-link final set every time i fix it.
2) I don't see a way how to determine symbols accessed with dlopen()/dlsym().
Let's forget about them at this moment.
My first question is - does anybody saw anything like i'm talking about ? Some
kind of "ELF cleanup toolkit" i'd say. The idea itself is quite obvious.
Please note that this is not like making a single executable from a set or
making gziped executabes with custom startup code.
My second question is - where i failed in my ideas ? :) What kind of problems
i'll finally face with cutting off shared libraries like i pointed above ?
Well, if anybody is interested in using NetBSD as a base platform for embedded
designs in environment with limited resources i would be glad to discuss
various technical ideas/anything.
Thanks all.
// wbr