On 09.12.2017 19:15, Robert Elz wrote: > Date: Sat, 9 Dec 2017 15:46:42 +0100 > From: Kamil Rytarowski <n54%gmx.com@localhost> > Message-ID: <d69bba40-7068-096a-4333-863800c10fe6%gmx.com@localhost> > > | However there exist programs in the basesystem that shadow libc > | symbol routines as well, > > There is nothing wrong with that, in fact it is almost unavoidable, > as programs need names to use, and libraries need names for functions > they add later, and it is inevitable that they will clash from time to > time. > > | for example ps(1): > | > | bin/ps/extern.h:void uname(struct pinfo *, VARENT *, enum mode); > > I suspect that the BSD ps command has had a uname() function since long > before the Sys V (or Sys III or wherever it originated) was added to the > BSD libc - this is a perfect example. > > To handle this kind of issue, the libc functions only get to be defined > when the relevant header file is included, in this case <sys/utsname.h> > which ps does not do, hence, it is perfectly entitled to have a function > called uname if it wants, or a "struct utsname" if it really wanted to > be perverse. > > | I'm going to rename the symbol routine names when I will hit them. > > There is nothing inherently wrong with that - they are just names after > all, but it is the wrong solution, and one that would have no end. > > There could easily be a "usrname()" function added to libc next week, > and the sanitizers could learn about it the week after, and then you're > back with the exact same problem. > > The right way is for the sanitizers to learn which headers define the > symbols that they want to take over, and only do that when the appropriate > header is included (one way to do that would be to define shadow headers, > so LLVM could define a sys/utsname.h and arrange for that one to be found > ahead of /usr/include/sys/utsname.h when compiling. Then that header does > the magic needed to get the LLVM version of uname() - otherwise it simply > does nothing with a function called uname() if the program happens to have one. > > And the same for all the other symbols that it feels the need to take over > from libc (or other libraries.) > > Whether that's done with actual new header files, or simply by recognising > the system headers being included and then adding the appropriate magic > only in those cases when it observes the system header being included is > just an implementation detail. > > kre > The problem is not on the header files (preprocessor), but on the linker level. We are linking prebuilt .a / .so files with a target application. $ nm /usr/local/lib/clang/6.0.0/lib/netbsd/libclang_rt.msan-x86_64.a|grep uname 0000000000000000 B _ZN14__interception10real_unameE 0000000000000000 T __interceptor_uname 0000000000000000 T uname We are intercepting uname(3) because behind the scenes it's a syscall and we need to hardcode sanitizing rules (length of a field that is being initialized). INTERCEPTOR(int, uname, struct utsname *utsname) { ENSURE_MSAN_INITED(); int res = REAL(uname)(utsname); if (!res) __msan_unpoison(utsname, __sanitizer::struct_utsname_sz); return res; } In the MSan case we mark the utsname pointer as initialized. The impact for basesystem utilities is rather low so far (in sh(1) there are 0 symbol clashes, in ksh(1) there is 1 clash) and appears to be the least intrusive workaround. I agree that this is not perfect, but I'm not aware about a better solution that does not introduce redesign&rewrite of the sanitizers.
Attachment:
signature.asc
Description: OpenPGP digital signature