tech-toolchain archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

alpha stack alignment: 8- or 16-byte



Hi,

I've found that some compiler-related issues for alpha is due to
inconsistency of stack alignment b/w NetBSD and GCC; whereas we
assume that sp is aligned to 8-byte boundaries, GCC is to 16-byte.

By changing STACK_BOUNDARY from 128 to 64 for GCC:

----
Index: external/gpl3/gcc/dist/gcc/config/alpha/netbsd.h
===================================================================
RCS file: /home/netbsd/src/external/gpl3/gcc/dist/gcc/config/alpha/netbsd.h,v
retrieving revision 1.10
diff -p -u -r1.10 netbsd.h
--- external/gpl3/gcc/dist/gcc/config/alpha/netbsd.h	11 Apr 2021 00:02:13 -0000	1.10
+++ external/gpl3/gcc/dist/gcc/config/alpha/netbsd.h	5 Jul 2021 07:26:20 -0000
@@ -23,6 +23,9 @@ along with GCC; see the file COPYING3.
 	NETBSD_OS_CPP_BUILTINS_ELF();		\
     } while (0)
+/* NetBSD aligns stack to 8-byte boundaries. */
+#undef STACK_BOUNDARY
+#define STACK_BOUNDARY 64
/* NetBSD doesn't use the LANGUAGE* built-ins. */
 #undef SUBTARGET_LANGUAGE_CPP_BUILTINS
----

userland becomes working just fine without hacks for jemalloc and GDB:

http://gnats.netbsd.org/54307
http://gnats.netbsd.org/56153

as well as that for pkgsrc/devel/gettext-tools:

http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/devel/gettext-tools/hacks.mk

(I've confirmed that there's no regression for ATF.)

Unlike other platforms, there is no public System V ABI documentation for
alpha. So, there should be options:

(1) Align sp to 8-byte boundary: keep our kernel and libraries as well,
    and fix GCC, or

(2) Align sp to 16-byte boundary: fix our kernel and libraries.

While (1) is easier and no worries for breakage anything, (2) should be
better for performance. "Alpha Architecture Handbook" Sec. A.3.1 says:

| Data PSECTs should be at least octaword aligned, so that aggregates
| (arrays, some records, subroutine stack frames) can be allocated on
| aligned octaword boundaries to take advantage of any implementations
| with aligned octaword data paths, and to decrease the number of cache
| fills in almost all implementations.
|
| Aggregates (arrays, records, common blocks, and so forth) should be
| allocated on at least aligned octaword boundaries whenever language
| rules allow. In some implementations, a series of writes that completely
| fill a cache block may be a factor of 10 faster than a series of writes
| that partially fill a cache block, when that cache block would give a
| read miss. This is true of write-back caches that read a partially
| filled cache block from memory, but optimize away the read for
| completely filled blocks.

Thoughts?

Thanks,
rin


Home | Main Index | Thread Index | Old Index