On Sun, Aug 17, 2014 at 8:39 PM, Bill Schmidt
<wschm...@linux.vnet.ibm.com> wrote:
> On Sun, 2014-08-17 at 14:52 -0400, David Edelsohn wrote:
>> On Wed, Aug 13, 2014 at 7:14 PM, Bill Schmidt
>> <wschm...@linux.vnet.ibm.com> wrote:
>> > Hi,
>> >
>> > This patch adds a PowerPC-specific pass just prior to the first cse RTL
>> > pass.  The pass runs only when generating little-endian code for Power8
>> > with VSX enabled, and for -O1 and up.  For this particular subtarget,
>> > the use of the big-endian-biased vector load and store instructions
>> > requires permutations to order vector elements for little endian.  To
>> > reduce the overhead of these permutations, this pass looks for
>> > computations for which the exact lanes in which computations are
>> > performed does not matter, so long as the results are returned to
>> > storage in the proper order.  For such computations we can remove the
>> > xxpermdi's associated with the vector loads and stores.
>> >
>> > This patch relies on another patch posted today that converts a struct
>> > used by the web pass into a base class that this patch can subclass.  If
>> > it's determined that the other patch isn't appropriate, then this patch
>> > will need modifications to duplicate the union-find logic.
>> >
>> > A complete description of the new pass appears in rs6000.c (search for
>> > "Analyze vector computations").  That description also identifies some
>> > remaining opportunities we can follow up with later.
>> >
>> > A number of new tests are added to verify that the pass works as
>> > expected for some vectorized code samples.
>> >
>> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> > regressions.  Is this ok for trunk?
>> >
>> > Thanks,
>> > Bill
>> >
>> >
>> > [gcc]
>> >
>> > 2014-08-13  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
>> >
>> >         * config/rs6000/rs6000.c (context.h): New include.
>> >         (tree-pass.h): Likewise.
>> >         (make_pass_analyze_swaps): New decl.
>> >         (rs6000_option_override): Register pass_analyze_swaps.
>> >         (swap_web_entry): New subsclass of web_entry_base (df.h).
>> >         (special_handling_values): New enum.
>> >         (union_defs): New function.
>> >         (union_uses): Likewise.
>> >         (insn_is_load_p): Likewise.
>> >         (insn_is_store_p): Likewise.
>> >         (insn_is_swap_p): Likewise.
>> >         (rtx_is_swappable_p): Likewise.
>> >         (insn_is_swappable_p): Likewise.
>> >         (chain_purpose): New enum.
>> >         (chain_contains_only_swaps): New function.
>> >         (mark_swaps_for_removal): Likewise.
>> >         (swap_const_vector_halves): Likewise.
>> >         (adjust_subreg_index): Likewise.
>> >         (permute_load): Likewise.
>> >         (permute_store): Likewise.
>> >         (handle_special_swappables): Likewise.
>> >         (replace_swap_with_copy): Likewise.
>> >         (dump_swap_insn_table): Likewise.
>> >         (rs6000_analyze_swaps): Likewise.
>> >         (pass_data_analyze_swaps): New pass_data.
>> >         (pass_analyze_swaps): New rtl_opt_pass.
>> >         (make_pass_analyze_swaps): New function.
>> >         * config/rs6000/rs6000.opt (moptimize-swaps): New option.
>> >
>> > [gcc/testsuite]
>> >
>> > 2014-08-13  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
>> >
>> >         * gcc.target/powerpc/swaps-p8-1.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-2.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-3.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-4.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-5.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-6.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-7.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-8.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-9.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-10.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-11.c: New test.
>> >         * gcc.target/powerpc/swaps-p8-12.c: New test.
>>
>> This looks okay, although I was hoping that other developers with more
>> DF and web experience would double check.
>
> Thanks for the review!
>
>> Why are you specifically gating the pass on POWER8?
>
> The problem is introduced in POWER8 (since LE isn't supported earlier),

It seems better for the initial implementation not to be overly
restrictive.  IBM support PPC64LE Linux begins with POWER8, but others
are not limited to that.

Freescale was applying some little endian cleanups. I'm not sure about
the status of VMX support in their processors.

Thanks, David

Reply via email to