On 08.05.2020 00:49, maya%NetBSD.org@localhost wrote: > I am under the impression that (at least GCC) compilers will not emit > intrinsic calls if they are guaranteed to be available on the target. > > This means libatomic needs to: > > - Optimize: we can runtime detect, which emitted code cannot do. > > Note that this means providing this libatomic will cause us to stop > noticing 64-bit atomics used when compiling for -march=i486, our default > for i386. We will stop upgrading those to -march=i586 and users will see > a performance penalty. > A runtime detection could be a part of ifunc (is it ready for NetBSD?). The standard C/C++ feature is to detect whether atomic operations are real (lock-free) through atomic_is_lock_free(). This is a feature, not a bug (as claimed by some people). atomic_is_lock_free() can be overloaded in libatomic and detect CPU type in runtime and redirect either to real CPU intrinsic of lock-free fallback. My code is a proof-of-concept, presenting that it's not that terribly complicated. Possibly the best approach (unless someone is interested in inventing native libatomic) is to use the llvm runtime (with prior upstreaming of local patches) for MKLLVM=yes and gcc runtime for MKGCC=yes. > - Provide the fallback code > > And that it isn't necessary for libatomic to: > > - Attempt to cause the compiler to emit the intrinsic > > Which should make this code a lot simpler. > > --- > > +#define LOCK_FREE_ACTION(type) \ > + return atomic_compare_exchange_strong_explicit( \ > + (_Atomic(type) *)ptr, (type *)expected, *(type *)desired, success, \ > + failure) > + LOCK_FREE_CASES(); > +#undef LOCK_FREE_ACTION > > This feels a bit offensive... > Mentally I am reading this as "I don't believe the compiler will > optimize out some scenarios in cleaner code with static inline, so > forcing the optimization to happen via C preprocessor". > I wonder if it's really true. > > The macros seem overly complicated to avoid generics, I don't think > pre-C11 is a concern for us. I wonder if it can be simplified. > > libatomic macros generate code like: __atomic_fetch_add_1 __atomic_fetch_add_2 __atomic_fetch_add_4 __atomic_fetch_add_8 __atomic_fetch_sub_1 __atomic_fetch_sub_2 __atomic_fetch_sub_4 __atomic_fetch_sub_8 __atomic_fetch_and_1 __atomic_fetch_and_2 __atomic_fetch_and_4 __atomic_fetch_and_8 etc, all reducing repetitions. I don't know how to write this code differently. If we compare the length of stdatomic.h with the length of atomic.c, they are comparable so it's not that bad.
Attachment:
signature.asc
Description: OpenPGP digital signature