Port-vax archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
VAX floating point formats & tuning GCC for CPU models (was Re: Some more patches for GCC...)
> On Apr 1, 2016, at 14:42, Greg Stark <stark%mit.edu@localhost> wrote:
>
> On Fri, Apr 1, 2016 at 9:20 PM, Jake Hamby <jehamby420%me.com@localhost> wrote:
>> The reason I'd like to see better support for compiling as G_float (you can do it now with "-mg", but all of the libraries that you link with will crash or return bad results for any "double" values that you pass or receive, because the bits are interpreted differently) is that VAX floating point also does not support denormalized numbers, which means that there's an abrupt gap between +/-2.9e-39 and 0 (there's a hidden 1 bit, even when the mantissa is all 0). With IEEE math, there's special support for this case (see https://en.wikipedia.org/wiki/Denormal_number), which VAX doesn't support at all in any mode.
>
>
> It would also be really handy to have -msoft-float support with IEEE
> floating point implemented in software. This might also necessitate
> some libc function variants compiled with -msoft-float being
> available.
>
> A big part of the reason to have a VAX port at all is to have a
> non-IEEE fp platform to test on. But the problem is that while we want
> to be sure our internal code doesn't depend on IEEE fp we can't really
> run our regression tests on VAX at all because there are so many
> user-visible floating point operations in the tests that all fail. I
> would love to compile just the user visible float data type using
> soft-math and use D_FLOAT or G_FLOAT for all the internal floating
> point arithmetic.
Hi all,
Here's what I've been thinking about the VAX floating point formats and tuning GCC for different CPU models. It's been a very interesting journey reading about the different systems in the old Digital Technical Journals. BTW, here are links to a site with scans of the useful ones:
Vol. 1 Issue 1 (Aug. 1985) - all about the VAX 8600 system (with a beautiful cover photo of the metal heat sinks on the ECL arrays)
http://bitsavers.trailing-edge.com/pdf/dec/dtj/dtj_v01-01_aug1985.pdf
Vol. 1 Issue 2 (Mar. 1986) - the MicroVAX II system (VAX 78032 CPU + 78132 FPU, precursor to CVAX, first to emulate some VAX instructions)
http://bitsavers.trailing-edge.com/pdf/dec/dtj/dtj_v01-02_mar1986.pdf
Vol. 1 Issue 7 (Aug. 1988) - the VAX 6200 series and CVAX (CPU, FPU, and support chips)
http://bitsavers.trailing-edge.com/pdf/dec/dtj/dtj_v01-07_aug1988.pdf
Vol. 2 Issue 2 (Spring 1990) - VAX 6000 Model 400 (REX520 CPU, supports the rare vector processor module)
http://bitsavers.trailing-edge.com/pdf/dec/dtj/dtj_v02-02_1990.pdf
Vol. 4 Issue 3 (Summer 1992) - NVAX systems (VAX 6000/600, VAX 4000/{100,400,500,600}, MicroVAX 3100/90, VAXstation 4000/90)
http://bitsavers.trailing-edge.com/pdf/dec/dtj/dtj_v04-03_1992.pdf
Vol. 4 Issue 4 (Special Issue 1992) - Alpha AXP architecture & systems (& VAX compatibility)
http://bitsavers.trailing-edge.com/pdf/dec/dtj/dtj_v04-04_1992.pdf
One reason I like the NVAX is that it's the final model in the series, the one that most resembles a desktop x86 CPU of that era: most instructions execute in 3 or 4 cycles, plus memory access time, with a few operations, like division or multiplication into a 64-bit product, taking 30 or 40 cycles. It's interesting that the very next issue of DTJ after the NVAX issue was all about the Alpha, the chip that would replace it. Their predictions on the Alpha having a potential life of 25 years would probably have been accurate if DEC/Compaq hadn't ended the project (in favor of Itanium, a chip that fared much worse for everyone concerned and also isn't any fun to program).
The issues I linked to represent a good cross-section of useful models to consider supporting today. As far as FP formats, only the VAX-11/780 and perhaps a few older models had accelerated D_float, but not G_float, support. All of the others either had both D & G accelerated in hardware, or emulated in software. So there are three different targets, in descending order of interest:
1) models with D and G float in hardware - includes VAX 8600 series, CVAX, NVAX, MicroVAX II w/ FPU.
2) models with no FPA - includes MicroVAX II or later CVAX models without FP chip: does anyone on the list care about NetBSD on these?
3) models with D but not G float in hardware - VAX-11/780, others? - same question: does anyone care about NetBSD on these?
My attitude is that if people are interested in NetBSD on VAXen without FPA chips, then it would make sense to have a build with -msoft-float. I know this is the case for m68k, PPC, and a few other archs where it's very common to need to support the soft-FP case because there are so many systems that people have that don't have FPUs, but how common is that for NetBSD/vax?
I would *really* love to switch NetBSD over from D to G format, because a lot of NetBSD packages are failing purely due to the limitated exponent range. IOW, every time something fails to build because (1.0e99) overflows the range, or (1.0e-100) rounds down to zero (both real examples that I saw yesterday when compiling some benchmarks), those are failures due to limitations of D_FLOAT format, and not of VAX FP in general. And I don't think they're portability bugs in the app, either. Even on VMS, G_float is the default format for the C/C++ compiler, and the Alpha supports G_ and F_float in hardware, but not D_float. Good indications that DEC and their customers considered D_float something of a legacy format.
So the advantages of switching formats include a much easier time with supporting packages and the knowledge that accuracy of calculations is going to be much closer to that of IEEE FP systems, because of the additional 3 bits of exponent. Right now, VAX doubles are hardly any better than the 32-bit version in the sense that they can't handle the extended exponent range that programs expect, especially when they're using random constants that are known to be close to infinity, or close to 0, on IEEE doubles, and expected not to round up or down.
It's also worth noting that all of the VAX CPUs that support D and G float seem to be equally optimized for both (although the CVAX and a few other CPUs apparently used interesting algorithms that lead to faster or slower speeds for multiplication, division, and even bit shifting depending on the particular *values* of the inputs, while others don't), so there's really only the downside of figuring out how to safely transition NetBSD users from one format to the other and not have all the apps break because they're linking with libraries of the other format. Also, nobody wants their build times to increase by 25% because they now have to build all of the libraries with two different sets of compiler flags (e.g. 32-bit compat libs on 64-bit systems, or systems with soft-float and hard-float libs).
On the subject of benchmarking and GCC optimizations for different platforms, I think I've made some good progress on what items are interesting to test and what the final result should look like. The i386 backend is really useful for comparison. They have a "struct processor_costs" that I'll probably use as a template. The values most useful for tuning would include:
// cost for different math ops
// cost for bitfield, bitshift, sign extension, int/FP conversion
// cost for register move
// cost for memory move
// cost for branches
// prefer 2 or 3-operand forms? (VAX 8600 prefers 2 w/ register in second operand, NVAX prefers 3 w/ dest register != others, others?)
// cost of complex specifiers (NVAX can decode one complex specifier per cycle, which may be a bottleneck compared to others)
// MOVE_RATIO & CLEAR_RATIO (max length of inline quad-at-a-time memcpy before switching to memcpy builtin or lib call)
// is cmpc3/cmpc5 available? (useful for inline strcmp(), emulated on MicroVAX II, apparently added back to CVAX, hopefully also on NVAX)
// speed of movc3/movc5 vs. other memcpy methods
I'm more interested in correctness first, and then performance, but the good news is that it's definitely not a moving target, and there are only a limited number of optimizations worth pursuing. GCC has builtin code to do basically the right thing if only we give it the correct values for the relative costs of different kinds of operations. In a way, the good news is that even the most complex VAX isn't superscalar, so there's no need or benefit to writing an instruction scheduler to try to keep the pipelines full by reordering instructions. Most VAX CPUs are pipelined, but I suspect the most benefits would come from trying to prefer certain forms of instructions over others, which is where the relative cost tables come in.
So I'll try to come up with a suitable benchmark program that runs a representative set of microbenchmarks to calculate the basic ratios that could be inserted into GCC in place of the hardcoded values that are there now, and also try to figure out the rest of the bugs and issues with the ELF symbol loading that all seems both confusing and suboptimal right now. But I think I'm asking the right questions and focused on the right things to speed up. There really isn't anything horribly wrong with the current VAX backend except for the fact that various bugs have crept in over the years as different parts of it have been hacked on by different people who didn't quite understand the implications on other parts of it.
-Jake
Home |
Main Index |
Thread Index |
Old Index