"Maciej W. Rozycki" <[email protected]> writes:
> This issue was originally raised here:
>
> http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00863.html
>
> We have a shortcoming in GCC in that we only allow the use half of the FP
> MADD instruction subset (MADD.fmt and MSUB.fmt) in the 64-bit/32-register
> mode (CP0.Status.FR == 1) on MIPS32r2 processors. Furthermore we never
> enable the other half (NMADD.fmt and NMSUB.fmt) on those processors.
> However this whole instruction subset is always available on MIPS32r2 FPUs
> regardless of the mode selected, just as it always has been on FPUs of the
> 64-bit ISA line from MIPS IV up.
Hmm, this was discussed here:
http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00488.html
http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00492.html
The footnote for COP1X instructions on page 12 of volume 1 of the MIPS32 ISA
(v2.50) says:
1. In Release 1 of the Architecture, these instructions are legal only
with a MIPS64 processor with 64-bit operations enabled (they are, in
effect, actually MIPS64 instructions). In Release 2 of the Architecture,
these instructions are legal with either a MIPS32 or MIPS64 processor
_which includes a 64-bit floating point unit_.
(my emphasis). "which" rather than "that" makes this a bit ambiguous,
but various comments in the rest of the manual imply that MIPS32r2 allows
an implementation choice between 32-bit and 64-bit FPUs. E.g. page 8 says:
Support for 64-bit coprocessors with 32-bit CPUs: These changes allow
a 64-bit coprocessor (including an FPU) to be attached to a 32-bit
CPU. This enhancement is optional in a Release 2 implementation.
and page 45 says:
In addition to an Instruction Set Architecture, the MIPS architecture
definition includes processing resources such as the set of
coprocessor general registers. In Release 1 of the Architecture, the
32-bit registers in MIPS32 were enlarged to 64-bits in MIPS64;
however, these 64-bit FPU registers are not backwards
compatible. Instead, processors implementing the MIPS64 Architecture
provide a mode bit to select either the 32-bit or 64-bit register
model. In Release 2 of the Architecture, a 32-bit CPU _may_ include a
full 64-bit coprocessor, including a floating point unit which
implements the same mode bit to select 32-bit or 64-bit FPU register
model.
On page 322 of volume 2, the footnote for "Table A-20 MIPS64 COP1X
Encoding of Function Field" uses slightly different wording:
COP1X instructions are legal only if 64-bit floating point operations
are enabled.
So was this all a big misunderstanding on my part? The TARGET_FLOAT64
condition came from MIPS themselves, and when challenged they seemed
pretty adamant that it was correct. If I was wrong to be convinced
by the explanation, I hope you can at least see why I was convinced. :-)
If it wasn't a misunderstanding, then the point is that we can't tell
from ISA_MIPS32R2 alone whether the target has a 32-bit or 64-bit FPU,
but we know that it must have a 64-bit FPU if using TARGET_FLOAT64.
> Also, according to MIPS IV ISA documentation these operations are only
> fused (i.e. don't match original IEEE 754-1985 accuracy requirements) on
> the original MIPS IV R8000 CPU, and MIPS architecture specs don't mention
> any limitations of these instructions either, so I have updated the GCC
> manual to document that on non-R8000 CPUs (which are ones we really care
> about) they are numerically equivalent to computations made with
> corresponding individual operations.
This part is OK, thanks, and is probably the only bit that's suitable for
4.8 at this stage. Would you mind applying it separately?
> Finally, while at it, I found it interesting that we have separate
> conditions to cover MADD.fmt/MSUB.fmt (ISA_HAS_FP_MADD4_MSUB4) and
> NMADD.fmt/NMADD.fmt (ISA_HAS_NMADD4_NMSUB4) while all the four
> instructions need to be implemented as a whole group per data format
> supported and cannot be separated (the MIPS architecture specification
> explicitly forbids subsetting). The difference between the two conditions
> is the former expands to ISA_HAS_FP4, that is enables the subsubset for
> any MIPS IV and up FPU while the latter has an extra "&& (!TARGET_MIPS5400
> || TARGET_MAD)" qualifier.
>
> I went ahead and checked available NEC VR54xx documentation and here's
> what I came up with:
>
> 1. "VR5400 MIPS RISC Microprocessor Family" datasheet (NEC doc #13362)
> says:
>
> "The VR5400 processor family complies with the MIPS IV instruction set
> and IEEE-754 floating-point and IEEE-1149.1/1149.1a JTAG specification,
> [...]"
>
> 2. "VR5432 MIPS RISC Microprocessor User's Manual, Volume 2" (NEC doc
> #13751) lists all the individual MADD.fmt, MSUB.fmt, NMADD.fmt and
> NMSUB.fmt instructions in Chapter 18 "Floating-Point Unit Instruction
> Set" with no restrictions as to their availability (the only other
> member of the VR54xx family I know of is the VR5464 that is a
> high-performance version of the VR5432 and is fully software
> compatible).
>
> Further to that TARGET_MAD controls whether to "Use PMC-style 'mad'
> instructions" that are all CPU rather than FPU instructions. The VR5432
> indeed supports extra integer multiply-accumulate instructions, as
> documented in #2 above; these are the MACC/MACCHI/MACCHIU/MACCU and
> MSAC/MSACHI/MSACHIU/MSACU instructions as roughly covered by our
> ISA_HAS_MACC, ISA_HAS_MSAC and ISA_HAS_MACCHI knobs (the latter is not
> implied for TARGET_MIPS5400, perhaps because the family does not support
> the doubleword variants).
>
> All in all it looks to me like a misplaced hunk. It was introduced in
> rev. 56471 (you were named as one of the contributors on that commit, so
> you may be able to remember and/or correct me if I am wrong here anywhere)
> and it looks to me it should have been applied to the ISA_HAS_MADD_MSUB
> macro instead that's still just a few lines above ISA_HAS_NMADD4_NMSUB4
> (and was even closer to ISA_HAS_NMADD_NMSUB as the latter was then called;
> the bodies were close enough back then for a hunk to apply cleanly to
> either).
I was named in that commit but the VR54xx stuff wasn't mine. I do remember
that Mike put a lot of effort into tuning the VR54xx madd stuff though,
because of the difficulty of having multiply-accumulate instructions
that force the use of HI/LO on an architecture that also has efficient
three-operand multiplies. So I'm pretty sure that this worked correctly
in the Cygnus devo tree, and your explanation of a misplaced hunk seems
very convincing.
Richard