On Mon, Nov 18, 2013 at 12:27 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Mon, Nov 18, 2013 at 9:15 PM, Cong Hou <co...@google.com> wrote: > >>>> This patch adds the support to two non-isomorphic operations addsub >>>> and subadd for SLP vectorizer. More non-isomorphic operations can be >>>> added later, but the limitation is that operations on even/odd >>>> elements should still be isomorphic. Once such an operation is >>>> detected, the code of the operation used in vectorized code is stored >>>> and later will be used during statement transformation. Two new GIMPLE >>>> opeartions VEC_ADDSUB_EXPR and VEC_SUBADD_EXPR are defined. And also >>>> new optabs for them. They are also documented. >>>> >>>> The target supports for SSE/SSE2/SSE3/AVX are added for those two new >>>> operations on floating points. SSE3/AVX provides ADDSUBPD and ADDSUBPS >>>> instructions. For SSE/SSE2, those two operations are emulated using >>>> two instructions (selectively negate then add). >>> >>> +(define_expand "vec_subadd_v4sf3" >>> + [(set (match_operand:V4SF 0 "register_operand") >>> + (unspec:V4SF >>> + [(match_operand:V4SF 1 "register_operand") >>> + (match_operand:V4SF 2 "nonimmediate_operand")] UNSPEC_SUBADD))] >>> + "TARGET_SSE" >>> +{ >>> + if (TARGET_SSE3) >>> + emit_insn (gen_sse3_addsubv4sf3 (operands[0], operands[1], >>> operands[2])); >>> + else >>> + ix86_sse_expand_fp_addsub_operator (true, V4SFmode, operands); >>> + DONE; >>> +}) >>> >>> Make the expander pattern look like correspondig sse3 insn and: >>> ... >>> { >>> if (!TARGET_SSE3) >>> { >>> ix86_sse_expand_fp_...(); >>> DONE; >>> } >>> } >>> >> >> You mean I should write two expanders for SSE and SSE3 respectively? > > No, please use the same approach as you did for abs<mode>2 expander. > For !TARGET_SSE3, call the helper function (ix86_sse_expand...), > otherwise expand through pattern. Also, it looks to me that you should > partially expand in the pattern before calling helper function, mainly > to avoid a bunch of "if (...)" at the beginning of the helper > function. >
I know what you mean. Then I have to change the pattern being detected for sse3_addsubv4sf3, so that it can handle ADDSUB_EXPR for SSE3. Currently I am considering using Richard's method without creating new tree nodes and optabs, based on pattern matching. I will handle SSE2 and SSE3 separately by define_expand and define_insn. The current problem is that the pattern may contain more than four instructions which cannot be processed by the combine pass. I am considering how to reduce the number of instructions in the pattern to four. Thank you very much! Cong > Uros.