v5: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02793.html
Changes since v5:
- Rebase on rth/tcg-next-for-4.0
- Use QEMU_FLATTEN instead of __attribute__((flatten))
- Merge rth's cleanups (thanks!). With this, we now use a union to
hold {float|float32} or {double|float64} types, which gets
rid of most macros. I added a few optimizations (i.e. likely
hints in some branches, and not using temp variables to hold
the result of fpclassify) to roughly match (and sometimes
surpass) v5's performance.
- float64_sqrt: use fpclassify, which gives a 1.5x speedup.
This series introduces no regressions to fp-test. You can test
hardfloat by passing "-f x" to fp-test (so that the inexact flag
is set before each operation) and using even rounding (fp-test's
default). Note that hardfloat does not affect operations with
other rounding modes.
Perf numbers for fp-bench running on several host machines are in
each commit log; numbers for several benchmarks (NBench, SPEC06fp)
are in the last patch's commit log. These numbers are a bit
outdated (they're from v2 or so), but I've decided to keep them
because they give a good idea of the speedups to expect, and I don't
have time to re-run them =)
I did re-run the numbers for sqrt and cmp, though, since the
implementation has changed quite a bit since v5. I didn't
re-run these on Aarch64 and PPC hosts due to lack of time,
but I doubt they'd change significantly.
You can fetch this series from:
https://github.com/cota/qemu/tree/hardfloat-v6
Thanks,
Emilio