On Fri, May 16, 2014 at 12:56 PM, <pins...@gmail.com> wrote: > > >> On May 16, 2014, at 3:48 AM, Richard Biener <richard.guent...@gmail.com> >> wrote: >> >> On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme >> <thomas.preudho...@arm.com> wrote: >>> Ping? >> >> Sorry ... >> >>> Best regards, >>> >>> Thomas Preud'homme >>> >>>> -----Original Message----- >>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme >>>> Sent: Friday, May 09, 2014 6:26 PM >>>> To: GCC Patches >>>> Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store >>>> >>>> Sorry, took longer than expected as I got distracted by some other patch. >>>> I merged the whole patchset in a single patch as I was told the current >>>> setup >>>> is actually more difficult to read. >>>> >>>> Here are the updated ChangeLogs: >>>> >>>> *** gcc/ChangeLog *** >>>> >>>> 2014-05-09 Thomas Preud'homme <thomas.preudho...@arm.com> >>>> >>>> PR tree-optimization/54733 >>>> * expr.c (get_inner_reference): Add a parameter to control whether >>>> a >>>> MEM_REF should be split into base + offset. >>>> * tree.h (get_inner_reference): Default new parameter to false. >>>> * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure. >>>> (CMPNOP): Define. >>>> (find_bswap_or_nop_load): New. >>>> (find_bswap_1): Renamed to ... >>>> (find_bswap_or_nop_1): This. Also add support for memory source. >>>> (find_bswap): Renamed to ... >>>> (find_bswap_or_nop): This. Also add support for memory source and >>>> detection of bitwise operations equivalent to load in host endianness. >>>> (execute_optimize_bswap): Likewise. Also move its leading >>>> comment back >>>> in place and split statement transformation into ... >>>> (bswap_replace): This. Add assert when updating bswap_stats. >>>> >>>> *** gcc/testsuite/ChangeLog *** >>>> >>>> 2014-05-09 Thomas Preud'homme <thomas.preudho...@arm.com> >>>> >>>> PR tree-optimization/54733 >>>> * gcc.dg/optimize-bswapdi-3.c: New test to check extension of >>>> bswap >>>> optimization to support memory sources and bitwise operations >>>> equivalent to load in host endianness. >>>> * gcc.dg/optimize-bswaphi-1.c: Likewise. >>>> * gcc.dg/optimize-bswapsi-2.c: Likewise. >>>> * gcc.c-torture/execute/bswap-2.c: Likewise. >>>> >>>> Ok for trunk? >> >> Ok, I now decided otherwise and dislike the new parameter to >> get_inner_reference. Can you please revert that part and just >> deal with a MEM_REF result in your only caller? >> >> And (of course) I found another possible issue. The way you >> compute load_type and use it here: >> >> + /* Perform the load. */ >> + load_offset_ptr = build_int_cst (n->alias_set, 0); >> + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, >> + load_offset_ptr); >> >> makes the load always appear aligned according to the mode of >> load_type. On strict-alignment targets this may cause faults. >> >> So what you have to do is either (simpler) >> >> unsigned int align = get_pointer_alignment (addr_tmp); >> tree al_load_type = load_type; >> if (align < TYPE_ALIGN (load_type)) >> al_load_type = build_aligned_type (load_type, align); >> ... >> val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, >> load_offset_ptr); >> >> or keep track of the "first" actual load and use >> >> unsigned int align = get_object_alignment (that_first_load); >> >> "first" in the one that corresponds to addr_tmp. From that there >> is a much better chance to derive good alignment values. >> >> Of course on STRICT_ALIGNMENT targets a not aligned load >> will be decomposed again, so eventually doing the transformation >> may no longer be profitable(?). > > Not always decomposed. On MIPS, it should using the load/store left/right > instructions for unaligned load/stores which is normally better than > decomposed load/stores. So having a cost model would be nice.
Agreed, but I am happy with doing that as a followup. Btw, a very simple one would be to reject unaligned SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). [of course that may be true on MIPS even for the cases where a "reasonable" fast unalgined variant exists - nearly no target defines that macro in a too fancy way] Richard. > Thanks, > Andrew > >> >> Thanks and sorry again for the delay. >> >> Otherwise the patch looks good to me. >> >> Richard. >> >>>> Best regards, >>>> >>>> Thomas >>>> >>>>> -----Original Message----- >>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme >>>>> Sent: Monday, May 05, 2014 7:30 PM >>>>> To: GCC Patches >>>>> Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent >>>>> load/store >>>>> >>>>> I found a way to improve the function find_bswap/find_bswap_or_nop >>>>> and reduce its size. Please hold for the review, I will post an updated >>>>> version as soon as I finish testing. >>>>> >>>>> Best regards, >>>>> >>>>> Thomas Preud'homme >>> >>>