On Thu, Sep 13, 2012 at 11:25:42AM -0700, Richard Henderson wrote: > On 09/13/2012 08:52 AM, Jakub Jelinek wrote: > > The following patch fixes it, by tweaking the header so that the first > > argument is not negated (we negate the second one instead), as we don't want > > to negate the high elements if e.g. for whatever reason combiner doesn't > > match it. It fixes the expander to use a dup of the X operand as the high > > element provider for the pattern, removes the 231 alternatives (because > > those provide different destination high elements) and removes commutative > > marker (again, that would mean different high elements). > > I don't think this is the best way to fix this up. > > (1) Negating the second argument is arguably non-canonical rtl.
That is why I've put in the *fmai_fnm{add,sub}_<mode> patterns operands 2 with the neg as first operand of the FMA rtl. That way it is canonical (otherwise it didn't match in combine). The FMA rtl operand order doesn't need to imply the order of instruction operands. The reason why I don't want to negate in fmaintrin.h the first operand is that the higher elements of it shouldn't be negated. So, when the second argument to the builtin is negated, combiner substitutes ... (fma (reg1) (neg (reg2) (reg3))) (reg1) ..., then canonicalizes to ... (fma (neg (reg2) (reg1) (reg3))) (reg1) ... and it matches that way. > (2) It's not the best match if we were to extend these builtins to FMA4. > There we really do have 4 inputs. Thus The first thing I've tried was for the *fmai_fmadd_<mode> patterns to use match_operand:VF_128 4 instead of the match_dup I'm using now. But that way I've ended up with: (define_insn "*fmai_fmadd_<mode>" [(set (match_operand:VF_128 0 "register_operand" "=x,x") (vec_merge:VF_128 (fma:VF_128 (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0") (match_operand:VF_128 2 "nonimmediate_operand" "xm, x") (match_operand:VF_128 3 "nonimmediate_operand" " x,xm")) (match_operand:VF_128 4 "nonimmediate_operand" " 0, 0") (const_int 1)))] "TARGET_FMA" "@ vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %3, %2} vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "mode" "<MODE>")]) which was apparently too much for reload (supposedly the two "0" constraint operands, even when the expander used (match_dup 1)). > It would be nice if Intel cleaned up their documentation for the > builtin, explicitly saying which high bits to copy. I agree that > the testcase is probably as normative as we'll get though. Usually the docs just document instructions and list intrinsics. When there is non-obvious match between the intrinsics and instructions, we have a problem of insufficiently documented semantics. Jakub