Re: [PATCH] Assorted store-merging improvements (PR middle-end/22141)

Richard Biener Mon, 30 Oct 2017 03:47:37 -0700

On Fri, 27 Oct 2017, Jakub Jelinek wrote:

> Hi!
> 
> The following patch attempts to improve store merging, for the time being
> it still only optimizes constant stores to adjacent memory.
> 
> The biggest improvement is handling bitfields, it uses the get_bit_range
> helper to find the bounds of what can be modified when modifying the bitfield
> and instead of requiring all the stores to be adjacent it now only requires
> that their bitregion_* regions are adjacent.  If get_bit_range fails (e.g. for
> non-C/C++ languages), then it still rounds the boundaries down and up to whole
> bytes, as any kind of changes within the byte affect the rest.  At the end,
> if there are any gaps in between the stored values, the old value is loaded 
> from
> memory (had to set TREE_NO_WARNING on it, so that uninit doesn't complain)
> masked with mask, ored with the constant maked with negation of mask and 
> stored,
> pretty much what the expansion emits.  As incremental improvement, perhaps we
> could emit in some cases where all the stored bitfields in one load/store set
> are adjacent a BIT_INSERT_EXPR instead of doing the and/or.
> 
> Another improvement is related to alignment handling, previously the code
> was using get_object_alignment, which say for the store_merging_11.c testcase
> has to return 8-bit alignment, as the whole struct is 64-bit aligned,
> but the first store is 1 byte after that.  The old code would then on
> targets that don't allow unaligned stores or have them slow emit just byte
> stores (many of them).  The patch uses get_object_alignment_1, so that we
> get both the maximum known alignment and misalign, and computes alignment
> for that for every bitpos we try, such that for stores starting with
> 1 byte after 64-bit alignment we get 1 byte store, then 2 byte and then 4 byte
> and then 8 byte.
> 
> Another improvement is for targets that allow unaligned stores, the new
> code performs a dry run on split_group and if it determines that aligned
> stores are as many or fewer as unaligned stores, it prefers aligned stores.
> E.g. for the case in store_merging_11.c, where ptr is 64-bit aligned and
> we store 15 bytes, unaligned stores and unpatched gcc would choose to
> do an 8 byte, then 4 byte, then 2 byte and then one byte store.
> Aligned stores, 1, 2, 4, 8 in that order are also 4, so it is better
> to do those.
> 
> The patch also attempts to reuse original stores (well, just their lhs/rhs1),
> if we choose a split store that has a single original insn in it.  That way
> we don't lose ARRAY_REFs/COMPONENT_REFs etc. unnecessarily.  Furthermore, if
> there is a larger original store than the maximum we try (wordsize), e.g. when
> there is originally 8 byte long long store followed by 1 byte store followed 
> by
> 1 byte store, on 32-bit targets we'd previously try to split it into 4 byte
> store, 4 byte store and 2 byte store, figure out that is 3 stores like 
> previously
> and give up.  With the patch, if we see a single original larger store at the
> bitpos we want, we just reuse that store, so we get in that case an 8 byte
> original store (lhs/rhs1) followed by 2 byte store.
> 
> In find_constituent_stmts it optimizes by not walking unnecessarily 
> group->stores
> entries that are already known to be before the bitpos we ask.  It fixes the 
> comparisons
> which were off by one, so previously it often chose more original stores than 
> were really
> in the split store.
> 
> Another change is that output_merged_store used to emit the new stores into a 
> sequence
> and if it found out there are too many, released all ssa names and failed.
> That seems to be unnecessary to me, because we know before entering the loop
> how many split stores there are, so can just fail at that point and so only 
> start
> emitting something when we have decided to do the replacement.
> 
> I had to disable store merging in the g++.dg/pr71694.C testcase, but that is 
> just
> because the testcase doesn't test what it should.  In my understanding, it 
> wants to
> verify that the c.c store isn't using 32-bit RMW, because it would create 
> data race
> for c.d.  But it stores both c.c and c.d next to each other, so even when 
> c.c's
> bitregion is the first 3 bytes and c.d's bitregion is the following byte, we 
> are
> then touching bytes in both of the regions and thus a RMW cycle for the whole
> 32-bit word is fine, as c.d is written, it will store the new value and 
> ignore the
> old value of the c.d byte.  What is wrong, but store merging doesn't emit, is 
> what
> we emitted before, i.e. 32-bit RMW that just stored c.c, followed by c.d 
> store.
> Another thread could have stored value1 into c.d, we'd R it by 32-bit read, 
> modify,
> while another thread stored value2 into c.d, then we W the 32-bit word and 
> thus
> introduce a value1 store into c.d, then another thread reads it and finds
> a value1 instead of expected value2.  Finally we store into c.d value3.
> So, alternative to -fno-store-merging in the testcase would be probably 
> separate
> functions where one would store to c.c and another one to c.d, then we can 
> make
> sure neither store is using movl.  Though, it probably still should only look 
> at
> movl stores or loads, other moves are fine.
> 
> The included store_merging_10.c improves on x86_64 from:
>       movzbl  (%rdi), %eax
> -     andl    $-19, %eax
> +     andl    $-32, %eax
>       orl     $13, %eax
>       movb    %al, (%rdi)
> in foo and
> -     orb     $1, (%rdi)
>       movl    (%rdi), %eax
> -     andl    $-131071, %eax
> +     andl    $2147352576, %eax
> +     orl     $1, %eax
>       movl    %eax, (%rdi)
> -     shrl    $24, %eax
> -     andl    $127, %eax
> -     movb    %al, 3(%rdi)
> in bar.  foo is something combine.c managed to optimize too, but bar it 
> couldn't.
> In store_merging_11.c on x86_64, bar is the same and foo changed:
> -     movabsq $578437695752115969, %rax
> -     movl    $0, 9(%rdi)
> -     movb    $0, 15(%rdi)
> -     movq    %rax, 1(%rdi)
> -     xorl    %eax, %eax
> -     movw    %ax, 13(%rdi)
> +     movl    $23, %eax
> +     movb    $1, 1(%rdi)
> +     movl    $117835012, 4(%rdi)
> +     movw    %ax, 2(%rdi)
> +     movq    $8, 8(%rdi)
> which is not only shorter, but all the stores are aligned.
> On ppc64le in store_merging_10.c the difference is:
> -     lwz 9,0(3)
> +     lbz 9,0(3)
>       rlwinm 9,9,0,0,26
>       ori 9,9,0xd
> -     stw 9,0(3)
> +     stb 9,0(3)
> in foo and
>       lwz 9,0(3)
> +     rlwinm 9,9,0,1,14
>       ori 9,9,0x1
> -     rlwinm 9,9,0,31,14
> -     rlwinm 9,9,0,1,31
>       stw 9,0(3)
> in bar, and store_merging_11.c the difference is:
> -     lis 8,0x807
> -     li 9,0
> -     ori 8,8,0x605
> -     li 10,0
> -     sldi 8,8,32
> -     stw 9,9(3)
> -     sth 9,13(3)
> -     oris 8,8,0x400
> -     stb 10,15(3)
> -     ori 8,8,0x1701
> -     mtvsrd 0,8
> -     stfd 0,1(3)
> +     lis 9,0x706
> +     li 7,1
> +     li 8,23
> +     ori 9,9,0x504
> +     li 10,8
> +     stb 7,1(3)
> +     sth 8,2(3)
> +     stw 9,4(3)
> +     std 10,8(3)
> in foo and no changes in bar.
> 
> What the patch doesn't implement yet, but could be also possible for
> allow_unaligned case is in store_merging_11.c when we are storing 15 bytes
> store 8 bytes at offset 1 and 8 bytes at offset 8 (i.e. create two
> overlapping stores, in this case one aligned and one unaligned).
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux and powerpc64le-linux,
> ok for trunk?


Ok.

Thanks,
Richard.

> 2017-10-27  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR middle-end/22141
>       * gimple-ssa-store-merging.c: Include rtl.h and expr.h.
>       (struct store_immediate_info): Add bitregion_start and bitregion_end
>       fields.
>       (store_immediate_info::store_immediate_info): Add brs and bre
>       arguments and initialize bitregion_{start,end} from those.
>       (struct merged_store_group): Add bitregion_start, bitregion_end,
>       align_base and mask fields.  Drop unnecessary struct keyword from
>       struct store_immediate_info.  Add do_merge method.
>       (clear_bit_region_be): Use memset instead of loop storing zeros.
>       (merged_store_group::do_merge): New method.
>       (merged_store_group::merge_into): Use do_merge.  Allow gaps in between
>       stores as long as the surrounding bitregions have no gaps.
>       (merged_store_group::merge_overlapping): Use do_merge.
>       (merged_store_group::apply_stores): Test that bitregion_{start,end}
>       is byte aligned, rather than requiring that start and width are
>       byte aligned.  Drop unnecessary struct keyword from
>       struct store_immediate_info.  Allocate and populate also mask array.
>       Make start of the arrays relative to bitregion_start rather than
>       start and size them according to bitregion_{end,start} difference.
>       (struct imm_store_chain_info): Drop unnecessary struct keyword from
>       struct store_immediate_info.
>       (pass_store_merging::gate): Punt if BITS_PER_UNIT or CHAR_BIT is not 8.
>       (pass_store_merging::terminate_all_aliasing_chains): Drop unnecessary
>       struct keyword from struct store_immediate_info.
>       (imm_store_chain_info::coalesce_immediate_stores): Allow gaps in
>       between stores as long as the surrounding bitregions have no gaps.
>       Formatting fixes.
>       (struct split_store): Add orig non-static data member.
>       (split_store::split_store): Initialize orig to false.
>       (find_constituent_stmts): Return store_immediate_info *, non-NULL
>       if there is exactly a single original stmt.  Change stmts argument
>       to pointer from reference, if NULL, don't push anything to it.  Add
>       first argument, use it to optimize skipping over orig stmts that
>       are known to be before bitpos already.  Simplify.
>       (split_group): Return unsigned int count how many stores are or
>       would be needed rather than a bool.  Add allow_unaligned argument.
>       Change split_stores argument from reference to pointer, if NULL,
>       only do a dry run computing how many stores would be produced.
>       Rewritten algorithm to use both alignment and misalign if
>       !allow_unaligned and handle bitfield stores with gaps.
>       (imm_store_chain_info::output_merged_store): Set start_byte_pos
>       from bitregion_start instead of start.  Compute allow_unaligned
>       here, if true, do 2 split_group dry runs to compute which one
>       produces fewer stores and prefer aligned if equal.  Punt if
>       new count is bigger or equal than original before emitting any
>       statements, rather than during that.  Remove no longer needed
>       new_ssa_names tracking.  Replace num_stmts with
>       split_stores.length ().  Use 32-bit stack allocated entries
>       in split_stores auto_vec.  Try to reuse original store lhs/rhs1
>       if possible.  Handle bitfields with gaps.
>       (pass_store_merging::execute): Ignore bitsize == 0 stores.
>       Compute bitregion_{start,end} for the stores and construct
>       store_immediate_info with that.  Formatting fixes.
> 
>       * gcc.dg/store_merging_10.c: New test.
>       * gcc.dg/store_merging_11.c: New test.
>       * gcc.dg/store_merging_12.c: New test.
>       * g++.dg/pr71694.C: Add -fno-store-merging to dg-options.
> 
> --- gcc/gimple-ssa-store-merging.c.jj 2017-10-27 14:16:26.074249585 +0200
> +++ gcc/gimple-ssa-store-merging.c    2017-10-27 18:10:18.212762773 +0200
> @@ -126,6 +126,8 @@
>  #include "tree-eh.h"
>  #include "target.h"
>  #include "gimplify-me.h"
> +#include "rtl.h"
> +#include "expr.h"    /* For get_bit_range.  */
>  #include "selftest.h"
>  
>  /* The maximum size (in bits) of the stores this pass should generate.  */
> @@ -142,17 +144,24 @@ struct store_immediate_info
>  {
>    unsigned HOST_WIDE_INT bitsize;
>    unsigned HOST_WIDE_INT bitpos;
> +  unsigned HOST_WIDE_INT bitregion_start;
> +  /* This is one past the last bit of the bit region.  */
> +  unsigned HOST_WIDE_INT bitregion_end;
>    gimple *stmt;
>    unsigned int order;
>    store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
> +                     unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
>                       gimple *, unsigned int);
>  };
>  
>  store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs,
>                                           unsigned HOST_WIDE_INT bp,
> +                                         unsigned HOST_WIDE_INT brs,
> +                                         unsigned HOST_WIDE_INT bre,
>                                           gimple *st,
>                                           unsigned int ord)
> -  : bitsize (bs), bitpos (bp), stmt (st), order (ord)
> +  : bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end (bre),
> +    stmt (st), order (ord)
>  {
>  }
>  
> @@ -164,26 +173,32 @@ struct merged_store_group
>  {
>    unsigned HOST_WIDE_INT start;
>    unsigned HOST_WIDE_INT width;
> -  /* The size of the allocated memory for val.  */
> +  unsigned HOST_WIDE_INT bitregion_start;
> +  unsigned HOST_WIDE_INT bitregion_end;
> +  /* The size of the allocated memory for val and mask.  */
>    unsigned HOST_WIDE_INT buf_size;
> +  unsigned HOST_WIDE_INT align_base;
>  
>    unsigned int align;
>    unsigned int first_order;
>    unsigned int last_order;
>  
> -  auto_vec<struct store_immediate_info *> stores;
> +  auto_vec<store_immediate_info *> stores;
>    /* We record the first and last original statements in the sequence because
>       we'll need their vuse/vdef and replacement position.  It's easier to 
> keep
>       track of them separately as 'stores' is reordered by apply_stores.  */
>    gimple *last_stmt;
>    gimple *first_stmt;
>    unsigned char *val;
> +  unsigned char *mask;
>  
>    merged_store_group (store_immediate_info *);
>    ~merged_store_group ();
>    void merge_into (store_immediate_info *);
>    void merge_overlapping (store_immediate_info *);
>    bool apply_stores ();
> +private:
> +  void do_merge (store_immediate_info *);
>  };
>  
>  /* Debug helper.  Dump LEN elements of byte array PTR to FD in hex.  */
> @@ -287,8 +302,7 @@ clear_bit_region_be (unsigned char *ptr,
>          && len > BITS_PER_UNIT)
>      {
>        unsigned int nbytes = len / BITS_PER_UNIT;
> -      for (unsigned int i = 0; i < nbytes; i++)
> -     ptr[i] = 0U;
> +      memset (ptr, 0, nbytes);
>        if (len % BITS_PER_UNIT != 0)
>       clear_bit_region_be (ptr + nbytes, BITS_PER_UNIT - 1,
>                            len % BITS_PER_UNIT);
> @@ -549,10 +563,16 @@ merged_store_group::merged_store_group (
>  {
>    start = info->bitpos;
>    width = info->bitsize;
> +  bitregion_start = info->bitregion_start;
> +  bitregion_end = info->bitregion_end;
>    /* VAL has memory allocated for it in apply_stores once the group
>       width has been finalized.  */
>    val = NULL;
> -  align = get_object_alignment (gimple_assign_lhs (info->stmt));
> +  mask = NULL;
> +  unsigned HOST_WIDE_INT align_bitpos = 0;
> +  get_object_alignment_1 (gimple_assign_lhs (info->stmt),
> +                       &align, &align_bitpos);
> +  align_base = start - align_bitpos;
>    stores.create (1);
>    stores.safe_push (info);
>    last_stmt = info->stmt;
> @@ -568,18 +588,24 @@ merged_store_group::~merged_store_group
>      XDELETEVEC (val);
>  }
>  
> -/* Merge a store recorded by INFO into this merged store.
> -   The store is not overlapping with the existing recorded
> -   stores.  */
> -
> +/* Helper method for merge_into and merge_overlapping to do
> +   the common part.  */
>  void
> -merged_store_group::merge_into (store_immediate_info *info)
> +merged_store_group::do_merge (store_immediate_info *info)
>  {
> -  unsigned HOST_WIDE_INT wid = info->bitsize;
> -  /* Make sure we're inserting in the position we think we're inserting.  */
> -  gcc_assert (info->bitpos == start + width);
> +  bitregion_start = MIN (bitregion_start, info->bitregion_start);
> +  bitregion_end = MAX (bitregion_end, info->bitregion_end);
> +
> +  unsigned int this_align;
> +  unsigned HOST_WIDE_INT align_bitpos = 0;
> +  get_object_alignment_1 (gimple_assign_lhs (info->stmt),
> +                       &this_align, &align_bitpos);
> +  if (this_align > align)
> +    {
> +      align = this_align;
> +      align_base = info->bitpos - align_bitpos;
> +    }
>  
> -  width += wid;
>    gimple *stmt = info->stmt;
>    stores.safe_push (info);
>    if (info->order > last_order)
> @@ -594,6 +620,22 @@ merged_store_group::merge_into (store_im
>      }
>  }
>  
> +/* Merge a store recorded by INFO into this merged store.
> +   The store is not overlapping with the existing recorded
> +   stores.  */
> +
> +void
> +merged_store_group::merge_into (store_immediate_info *info)
> +{
> +  unsigned HOST_WIDE_INT wid = info->bitsize;
> +  /* Make sure we're inserting in the position we think we're inserting.  */
> +  gcc_assert (info->bitpos >= start + width
> +           && info->bitregion_start <= bitregion_end);
> +
> +  width += wid;
> +  do_merge (info);
> +}
> +
>  /* Merge a store described by INFO into this merged store.
>     INFO overlaps in some way with the current store (i.e. it's not contiguous
>     which is handled by merged_store_group::merge_into).  */
> @@ -601,23 +643,11 @@ merged_store_group::merge_into (store_im
>  void
>  merged_store_group::merge_overlapping (store_immediate_info *info)
>  {
> -  gimple *stmt = info->stmt;
> -  stores.safe_push (info);
> -
>    /* If the store extends the size of the group, extend the width.  */
> -  if ((info->bitpos + info->bitsize) > (start + width))
> +  if (info->bitpos + info->bitsize > start + width)
>      width += info->bitpos + info->bitsize - (start + width);
>  
> -  if (info->order > last_order)
> -    {
> -      last_order = info->order;
> -      last_stmt = stmt;
> -    }
> -  else if (info->order < first_order)
> -    {
> -      first_order = info->order;
> -      first_stmt = stmt;
> -    }
> +  do_merge (info);
>  }
>  
>  /* Go through all the recorded stores in this group in program order and
> @@ -627,27 +657,28 @@ merged_store_group::merge_overlapping (s
>  bool
>  merged_store_group::apply_stores ()
>  {
> -  /* The total width of the stores must add up to a whole number of bytes
> -     and start at a byte boundary.  We don't support emitting bitfield
> -     references for now.  Also, make sure we have more than one store
> -     in the group, otherwise we cannot merge anything.  */
> -  if (width % BITS_PER_UNIT != 0
> -      || start % BITS_PER_UNIT != 0
> +  /* Make sure we have more than one store in the group, otherwise we cannot
> +     merge anything.  */
> +  if (bitregion_start % BITS_PER_UNIT != 0
> +      || bitregion_end % BITS_PER_UNIT != 0
>        || stores.length () == 1)
>      return false;
>  
>    stores.qsort (sort_by_order);
> -  struct store_immediate_info *info;
> +  store_immediate_info *info;
>    unsigned int i;
>    /* Create a buffer of a size that is 2 times the number of bytes we're
>       storing.  That way native_encode_expr can write power-of-2-sized
>       chunks without overrunning.  */
> -  buf_size = 2 * (ROUND_UP (width, BITS_PER_UNIT) / BITS_PER_UNIT);
> -  val = XCNEWVEC (unsigned char, buf_size);
> +  buf_size = 2 * ((bitregion_end - bitregion_start) / BITS_PER_UNIT);
> +  val = XNEWVEC (unsigned char, 2 * buf_size);
> +  mask = val + buf_size;
> +  memset (val, 0, buf_size);
> +  memset (mask, ~0U, buf_size);
>  
>    FOR_EACH_VEC_ELT (stores, i, info)
>      {
> -      unsigned int pos_in_buffer = info->bitpos - start;
> +      unsigned int pos_in_buffer = info->bitpos - bitregion_start;
>        bool ret = encode_tree_to_bitpos (gimple_assign_rhs1 (info->stmt),
>                                       val, info->bitsize,
>                                       pos_in_buffer, buf_size);
> @@ -668,6 +699,11 @@ merged_store_group::apply_stores ()
>          }
>        if (!ret)
>       return false;
> +      unsigned char *m = mask + (pos_in_buffer / BITS_PER_UNIT);
> +      if (BYTES_BIG_ENDIAN)
> +     clear_bit_region_be (m, pos_in_buffer % BITS_PER_UNIT, info->bitsize);
> +      else
> +     clear_bit_region (m, pos_in_buffer % BITS_PER_UNIT, info->bitsize);
>      }
>    return true;
>  }
> @@ -682,7 +718,7 @@ struct imm_store_chain_info
>       See pass_store_merging::m_stores_head for more rationale.  */
>    imm_store_chain_info *next, **pnxp;
>    tree base_addr;
> -  auto_vec<struct store_immediate_info *> m_store_info;
> +  auto_vec<store_immediate_info *> m_store_info;
>    auto_vec<merged_store_group *> m_merged_store_groups;
>  
>    imm_store_chain_info (imm_store_chain_info *&inspt, tree b_a)
> @@ -730,11 +766,16 @@ public:
>    {
>    }
>  
> -  /* Pass not supported for PDP-endianness.  */
> +  /* Pass not supported for PDP-endianness, nor for insane hosts
> +     or target character sizes where native_{encode,interpret}_expr
> +     doesn't work properly.  */
>    virtual bool
>    gate (function *)
>    {
> -    return flag_store_merging && (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
> +    return flag_store_merging
> +        && WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN
> +        && CHAR_BIT == 8
> +        && BITS_PER_UNIT == 8;
>    }
>  
>    virtual unsigned int execute (function *);
> @@ -811,7 +852,7 @@ pass_store_merging::terminate_all_aliasi
>        aliases with any of them.  */
>        else
>       {
> -       struct store_immediate_info *info;
> +       store_immediate_info *info;
>         unsigned int i;
>         FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
>           {
> @@ -926,8 +967,9 @@ imm_store_chain_info::coalesce_immediate
>       }
>  
>        /* |---store 1---| <gap> |---store 2---|.
> -      Gap between stores.  Start a new group.  */
> -      if (start != merged_store->start + merged_store->width)
> +      Gap between stores.  Start a new group if there are any gaps
> +      between bitregions.  */
> +      if (info->bitregion_start > merged_store->bitregion_end)
>       {
>         /* Try to apply all the stores recorded for the group to determine
>            the bitpattern they write and discard it if that fails.
> @@ -948,11 +990,11 @@ imm_store_chain_info::coalesce_immediate
>         merged_store->merge_into (info);
>      }
>  
> -    /* Record or discard the last store group.  */
> -    if (!merged_store->apply_stores ())
> -      delete merged_store;
> -    else
> -      m_merged_store_groups.safe_push (merged_store);
> +  /* Record or discard the last store group.  */
> +  if (!merged_store->apply_stores ())
> +    delete merged_store;
> +  else
> +    m_merged_store_groups.safe_push (merged_store);
>  
>    gcc_assert (m_merged_store_groups.length () <= m_store_info.length ());
>    bool success
> @@ -961,8 +1003,8 @@ imm_store_chain_info::coalesce_immediate
>  
>    if (success && dump_file)
>      fprintf (dump_file, "Coalescing successful!\n"
> -                      "Merged into %u stores\n",
> -             m_merged_store_groups.length ());
> +                     "Merged into %u stores\n",
> +          m_merged_store_groups.length ());
>  
>    return success;
>  }
> @@ -1016,6 +1058,8 @@ struct split_store
>    unsigned HOST_WIDE_INT size;
>    unsigned HOST_WIDE_INT align;
>    auto_vec<gimple *> orig_stmts;
> +  /* True if there is a single orig stmt covering the whole split store.  */
> +  bool orig;
>    split_store (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
>              unsigned HOST_WIDE_INT);
>  };
> @@ -1025,100 +1069,198 @@ struct split_store
>  split_store::split_store (unsigned HOST_WIDE_INT bp,
>                         unsigned HOST_WIDE_INT sz,
>                         unsigned HOST_WIDE_INT al)
> -                       : bytepos (bp), size (sz), align (al)
> +                       : bytepos (bp), size (sz), align (al), orig (false)
>  {
>    orig_stmts.create (0);
>  }
>  
>  /* Record all statements corresponding to stores in GROUP that write to
>     the region starting at BITPOS and is of size BITSIZE.  Record such
> -   statements in STMTS.  The stores in GROUP must be sorted by
> -   bitposition.  */
> +   statements in STMTS if non-NULL.  The stores in GROUP must be sorted by
> +   bitposition.  Return INFO if there is exactly one original store
> +   in the range.  */
>  
> -static void
> +static store_immediate_info *
>  find_constituent_stmts (struct merged_store_group *group,
> -                      auto_vec<gimple *> &stmts,
> -                      unsigned HOST_WIDE_INT bitpos,
> -                      unsigned HOST_WIDE_INT bitsize)
> +                     vec<gimple *> *stmts,
> +                     unsigned int *first,
> +                     unsigned HOST_WIDE_INT bitpos,
> +                     unsigned HOST_WIDE_INT bitsize)
>  {
> -  struct store_immediate_info *info;
> +  store_immediate_info *info, *ret = NULL;
>    unsigned int i;
> +  bool second = false;
> +  bool update_first = true;
>    unsigned HOST_WIDE_INT end = bitpos + bitsize;
> -  FOR_EACH_VEC_ELT (group->stores, i, info)
> +  for (i = *first; group->stores.iterate (i, &info); ++i)
>      {
>        unsigned HOST_WIDE_INT stmt_start = info->bitpos;
>        unsigned HOST_WIDE_INT stmt_end = stmt_start + info->bitsize;
> -      if (stmt_end < bitpos)
> -     continue;
> +      if (stmt_end <= bitpos)
> +     {
> +       /* BITPOS passed to this function never decreases from within the
> +          same split_group call, so optimize and don't scan info records
> +          which are known to end before or at BITPOS next time.
> +          Only do it if all stores before this one also pass this.  */
> +       if (update_first)
> +         *first = i + 1;
> +       continue;
> +     }
> +      else
> +     update_first = false;
> +
>        /* The stores in GROUP are ordered by bitposition so if we're past
> -       the region for this group return early.  */
> -      if (stmt_start > end)
> -     return;
> -
> -      if (IN_RANGE (stmt_start, bitpos, bitpos + bitsize)
> -       || IN_RANGE (stmt_end, bitpos, end)
> -       /* The statement writes a region that completely encloses the region
> -          that this group writes.  Unlikely to occur but let's
> -          handle it.  */
> -       || IN_RANGE (bitpos, stmt_start, stmt_end))
> -     stmts.safe_push (info->stmt);
> +      the region for this group return early.  */
> +      if (stmt_start >= end)
> +     return ret;
> +
> +      if (stmts)
> +     {
> +       stmts->safe_push (info->stmt);
> +       if (ret)
> +         {
> +           ret = NULL;
> +           second = true;
> +         }
> +     }
> +      else if (ret)
> +     return NULL;
> +      if (!second)
> +     ret = info;
>      }
> +  return ret;
>  }
>  
>  /* Split a merged store described by GROUP by populating the SPLIT_STORES
> -   vector with split_store structs describing the byte offset (from the 
> base),
> -   the bit size and alignment of each store as well as the original 
> statements
> -   involved in each such split group.
> +   vector (if non-NULL) with split_store structs describing the byte offset
> +   (from the base), the bit size and alignment of each store as well as the
> +   original statements involved in each such split group.
>     This is to separate the splitting strategy from the statement
>     building/emission/linking done in output_merged_store.
> -   At the moment just start with the widest possible size and keep emitting
> -   the widest we can until we have emitted all the bytes, halving the size
> -   when appropriate.  */
> -
> -static bool
> -split_group (merged_store_group *group,
> -          auto_vec<struct split_store *> &split_stores)
> +   Return number of new stores.
> +   If SPLIT_STORES is NULL, it is just a dry run to count number of
> +   new stores.  */
> +
> +static unsigned int
> +split_group (merged_store_group *group, bool allow_unaligned,
> +          vec<struct split_store *> *split_stores)
>  {
> -  unsigned HOST_WIDE_INT pos = group->start;
> -  unsigned HOST_WIDE_INT size = group->width;
> +  unsigned HOST_WIDE_INT pos = group->bitregion_start;
> +  unsigned HOST_WIDE_INT size = group->bitregion_end - pos;
>    unsigned HOST_WIDE_INT bytepos = pos / BITS_PER_UNIT;
> -  unsigned HOST_WIDE_INT align = group->align;
> +  unsigned HOST_WIDE_INT group_align = group->align;
> +  unsigned HOST_WIDE_INT align_base = group->align_base;
>  
> -  /* We don't handle partial bitfields for now.  We shouldn't have
> -     reached this far.  */
>    gcc_assert ((size % BITS_PER_UNIT == 0) && (pos % BITS_PER_UNIT == 0));
>  
> -  bool allow_unaligned
> -    = !STRICT_ALIGNMENT && PARAM_VALUE (PARAM_STORE_MERGING_ALLOW_UNALIGNED);
> -
> -  unsigned int try_size = MAX_STORE_BITSIZE;
> -  while (try_size > size
> -      || (!allow_unaligned
> -          && try_size > align))
> -    {
> -      try_size /= 2;
> -      if (try_size < BITS_PER_UNIT)
> -     return false;
> -    }
> -
> +  unsigned int ret = 0, first = 0;
>    unsigned HOST_WIDE_INT try_pos = bytepos;
>    group->stores.qsort (sort_by_bitpos);
>  
>    while (size > 0)
>      {
> -      struct split_store *store = new split_store (try_pos, try_size, align);
> +      if ((allow_unaligned || group_align <= BITS_PER_UNIT)
> +       && group->mask[try_pos - bytepos] == (unsigned char) ~0U)
> +     {
> +       /* Skip padding bytes.  */
> +       ++try_pos;
> +       size -= BITS_PER_UNIT;
> +       continue;
> +     }
> +
>        unsigned HOST_WIDE_INT try_bitpos = try_pos * BITS_PER_UNIT;
> -      find_constituent_stmts (group, store->orig_stmts, try_bitpos, 
> try_size);
> -      split_stores.safe_push (store);
> +      unsigned int try_size = MAX_STORE_BITSIZE, nonmasked;
> +      unsigned HOST_WIDE_INT align_bitpos
> +     = (try_bitpos - align_base) & (group_align - 1);
> +      unsigned HOST_WIDE_INT align = group_align;
> +      if (align_bitpos)
> +     align = least_bit_hwi (align_bitpos);
> +      if (!allow_unaligned)
> +     try_size = MIN (try_size, align);
> +      store_immediate_info *info
> +     = find_constituent_stmts (group, NULL, &first, try_bitpos, try_size);
> +      if (info)
> +     {
> +       /* If there is just one original statement for the range, see if
> +          we can just reuse the original store which could be even larger
> +          than try_size.  */
> +       unsigned HOST_WIDE_INT stmt_end
> +         = ROUND_UP (info->bitpos + info->bitsize, BITS_PER_UNIT);
> +       info = find_constituent_stmts (group, NULL, &first, try_bitpos,
> +                                      stmt_end - try_bitpos);
> +       if (info && info->bitpos >= try_bitpos)
> +         {
> +           try_size = stmt_end - try_bitpos;
> +           goto found;
> +         }
> +     }
>  
> -      try_pos += try_size / BITS_PER_UNIT;
> +      /* Approximate store bitsize for the case when there are no padding
> +      bits.  */
> +      while (try_size > size)
> +     try_size /= 2;
> +      /* Now look for whole padding bytes at the end of that bitsize.  */
> +      for (nonmasked = try_size / BITS_PER_UNIT; nonmasked > 0; --nonmasked)
> +     if (group->mask[try_pos - bytepos + nonmasked - 1]
> +         != (unsigned char) ~0U)
> +       break;
> +      if (nonmasked == 0)
> +     {
> +       /* If entire try_size range is padding, skip it.  */
> +       try_pos += try_size / BITS_PER_UNIT;
> +       size -= try_size;
> +       continue;
> +     }
> +      /* Otherwise try to decrease try_size if second half, last 3 quarters
> +      etc. are padding.  */
> +      nonmasked *= BITS_PER_UNIT;
> +      while (nonmasked <= try_size / 2)
> +     try_size /= 2;
> +      if (!allow_unaligned && group_align > BITS_PER_UNIT)
> +     {
> +       /* Now look for whole padding bytes at the start of that bitsize.  */
> +       unsigned int try_bytesize = try_size / BITS_PER_UNIT, masked;
> +       for (masked = 0; masked < try_bytesize; ++masked)
> +         if (group->mask[try_pos - bytepos + masked] != (unsigned char) ~0U)
> +           break;
> +       masked *= BITS_PER_UNIT;
> +       gcc_assert (masked < try_size);
> +       if (masked >= try_size / 2)
> +         {
> +           while (masked >= try_size / 2)
> +             {
> +               try_size /= 2;
> +               try_pos += try_size / BITS_PER_UNIT;
> +               size -= try_size;
> +               masked -= try_size;
> +             }
> +           /* Need to recompute the alignment, so just retry at the new
> +              position.  */
> +           continue;
> +         }
> +     }
>  
> +    found:
> +      ++ret;
> +
> +      if (split_stores)
> +     {
> +       struct split_store *store
> +         = new split_store (try_pos, try_size, align);
> +       info = find_constituent_stmts (group, &store->orig_stmts,
> +                                      &first, try_bitpos, try_size);
> +       if (info
> +           && info->bitpos >= try_bitpos
> +           && info->bitpos + info->bitsize <= try_bitpos + try_size)
> +         store->orig = true;
> +       split_stores->safe_push (store);
> +     }
> +
> +      try_pos += try_size / BITS_PER_UNIT;
>        size -= try_size;
> -      align = try_size;
> -      while (size < try_size)
> -     try_size /= 2;
>      }
> -  return true;
> +
> +  return ret;
>  }
>  
>  /* Given a merged store group GROUP output the widened version of it.
> @@ -1132,31 +1274,50 @@ split_group (merged_store_group *group,
>  bool
>  imm_store_chain_info::output_merged_store (merged_store_group *group)
>  {
> -  unsigned HOST_WIDE_INT start_byte_pos = group->start / BITS_PER_UNIT;
> +  unsigned HOST_WIDE_INT start_byte_pos
> +    = group->bitregion_start / BITS_PER_UNIT;
>  
>    unsigned int orig_num_stmts = group->stores.length ();
>    if (orig_num_stmts < 2)
>      return false;
>  
> -  auto_vec<struct split_store *> split_stores;
> +  auto_vec<struct split_store *, 32> split_stores;
>    split_stores.create (0);
> -  if (!split_group (group, split_stores))
> -    return false;
> +  bool allow_unaligned
> +    = !STRICT_ALIGNMENT && PARAM_VALUE (PARAM_STORE_MERGING_ALLOW_UNALIGNED);
> +  if (allow_unaligned)
> +    {
> +      /* If unaligned stores are allowed, see how many stores we'd emit
> +      for unaligned and how many stores we'd emit for aligned stores.
> +      Only use unaligned stores if it allows fewer stores than aligned.  */
> +      unsigned aligned_cnt = split_group (group, false, NULL);
> +      unsigned unaligned_cnt = split_group (group, true, NULL);
> +      if (aligned_cnt <= unaligned_cnt)
> +     allow_unaligned = false;
> +    }
> +  split_group (group, allow_unaligned, &split_stores);
> +
> +  if (split_stores.length () >= orig_num_stmts)
> +    {
> +      /* We didn't manage to reduce the number of statements.  Bail out.  */
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +     {
> +       fprintf (dump_file, "Exceeded original number of stmts (%u)."
> +                           "  Not profitable to emit new sequence.\n",
> +                orig_num_stmts);
> +     }
> +      return false;
> +    }
>  
>    gimple_stmt_iterator last_gsi = gsi_for_stmt (group->last_stmt);
>    gimple_seq seq = NULL;
> -  unsigned int num_stmts = 0;
>    tree last_vdef, new_vuse;
>    last_vdef = gimple_vdef (group->last_stmt);
>    new_vuse = gimple_vuse (group->last_stmt);
>  
>    gimple *stmt = NULL;
> -  /* The new SSA names created.  Keep track of them so that we can free them
> -     if we decide to not use the new sequence.  */
> -  auto_vec<tree> new_ssa_names;
>    split_store *split_store;
>    unsigned int i;
> -  bool fail = false;
>  
>    tree addr = force_gimple_operand_1 (unshare_expr (base_addr), &seq,
>                                     is_gimple_mem_ref_addr, NULL_TREE);
> @@ -1165,48 +1326,76 @@ imm_store_chain_info::output_merged_stor
>        unsigned HOST_WIDE_INT try_size = split_store->size;
>        unsigned HOST_WIDE_INT try_pos = split_store->bytepos;
>        unsigned HOST_WIDE_INT align = split_store->align;
> -      tree offset_type = get_alias_type_for_stmts (split_store->orig_stmts);
> -      location_t loc = get_location_for_stmts (split_store->orig_stmts);
> -
> -      tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
> -      int_type = build_aligned_type (int_type, align);
> -      tree dest = fold_build2 (MEM_REF, int_type, addr,
> -                            build_int_cst (offset_type, try_pos));
> -
> -      tree src = native_interpret_expr (int_type,
> -                                     group->val + try_pos - start_byte_pos,
> -                                     group->buf_size);
> +      tree dest, src;
> +      location_t loc;
> +      if (split_store->orig)
> +     {
> +       /* If there is just a single constituent store which covers
> +          the whole area, just reuse the lhs and rhs.  */
> +       dest = gimple_assign_lhs (split_store->orig_stmts[0]);
> +       src = gimple_assign_rhs1 (split_store->orig_stmts[0]);
> +       loc = gimple_location (split_store->orig_stmts[0]);
> +     }
> +      else
> +     {
> +       tree offset_type
> +         = get_alias_type_for_stmts (split_store->orig_stmts);
> +       loc = get_location_for_stmts (split_store->orig_stmts);
> +
> +       tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
> +       int_type = build_aligned_type (int_type, align);
> +       dest = fold_build2 (MEM_REF, int_type, addr,
> +                           build_int_cst (offset_type, try_pos));
> +       src = native_interpret_expr (int_type,
> +                                    group->val + try_pos - start_byte_pos,
> +                                    group->buf_size);
> +       tree mask
> +         = native_interpret_expr (int_type,
> +                                  group->mask + try_pos - start_byte_pos,
> +                                  group->buf_size);
> +       if (!integer_zerop (mask))
> +         {
> +           tree tem = make_ssa_name (int_type);
> +           tree load_src = unshare_expr (dest);
> +           /* The load might load some or all bits uninitialized,
> +              avoid -W*uninitialized warnings in that case.
> +              As optimization, it would be nice if all the bits are
> +              provably uninitialized (no stores at all yet or previous
> +              store a CLOBBER) we'd optimize away the load and replace
> +              it e.g. with 0.  */
> +           TREE_NO_WARNING (load_src) = 1;
> +           stmt = gimple_build_assign (tem, load_src);
> +           gimple_set_location (stmt, loc);
> +           gimple_set_vuse (stmt, new_vuse);
> +           gimple_seq_add_stmt_without_update (&seq, stmt);
> +
> +           /* FIXME: If there is a single chunk of zero bits in mask,
> +              perhaps use BIT_INSERT_EXPR instead?  */
> +           stmt = gimple_build_assign (make_ssa_name (int_type),
> +                                       BIT_AND_EXPR, tem, mask);
> +           gimple_set_location (stmt, loc);
> +           gimple_seq_add_stmt_without_update (&seq, stmt);
> +           tem = gimple_assign_lhs (stmt);
> +
> +           src = wide_int_to_tree (int_type,
> +                                   wi::bit_and_not (wi::to_wide (src),
> +                                                    wi::to_wide (mask)));
> +           stmt = gimple_build_assign (make_ssa_name (int_type),
> +                                       BIT_IOR_EXPR, tem, src);
> +           gimple_set_location (stmt, loc);
> +           gimple_seq_add_stmt_without_update (&seq, stmt);
> +           src = gimple_assign_lhs (stmt);
> +         }
> +     }
>  
>        stmt = gimple_build_assign (dest, src);
>        gimple_set_location (stmt, loc);
>        gimple_set_vuse (stmt, new_vuse);
>        gimple_seq_add_stmt_without_update (&seq, stmt);
>  
> -      /* We didn't manage to reduce the number of statements.  Bail out.  */
> -      if (++num_stmts == orig_num_stmts)
> -     {
> -       if (dump_file && (dump_flags & TDF_DETAILS))
> -         {
> -           fprintf (dump_file, "Exceeded original number of stmts (%u)."
> -                               "  Not profitable to emit new sequence.\n",
> -                    orig_num_stmts);
> -         }
> -       unsigned int ssa_count;
> -       tree ssa_name;
> -       /* Don't forget to cleanup the temporary SSA names.  */
> -       FOR_EACH_VEC_ELT (new_ssa_names, ssa_count, ssa_name)
> -         release_ssa_name (ssa_name);
> -
> -       fail = true;
> -       break;
> -     }
> -
>        tree new_vdef;
>        if (i < split_stores.length () - 1)
> -     {
> -       new_vdef = make_ssa_name (gimple_vop (cfun), stmt);
> -       new_ssa_names.safe_push (new_vdef);
> -     }
> +     new_vdef = make_ssa_name (gimple_vop (cfun), stmt);
>        else
>       new_vdef = last_vdef;
>  
> @@ -1218,15 +1407,12 @@ imm_store_chain_info::output_merged_stor
>    FOR_EACH_VEC_ELT (split_stores, i, split_store)
>      delete split_store;
>  
> -  if (fail)
> -    return false;
> -
>    gcc_assert (seq);
>    if (dump_file)
>      {
>        fprintf (dump_file,
>              "New sequence of %u stmts to replace old one of %u stmts\n",
> -            num_stmts, orig_num_stmts);
> +            split_stores.length (), orig_num_stmts);
>        if (dump_flags & TDF_DETAILS)
>       print_gimple_seq (dump_file, seq, 0, TDF_VOPS | TDF_MEMSYMS);
>      }
> @@ -1387,12 +1573,25 @@ pass_store_merging::execute (function *f
>             tree rhs = gimple_assign_rhs1 (stmt);
>  
>             HOST_WIDE_INT bitsize, bitpos;
> +           unsigned HOST_WIDE_INT bitregion_start = 0;
> +           unsigned HOST_WIDE_INT bitregion_end = 0;
>             machine_mode mode;
>             int unsignedp = 0, reversep = 0, volatilep = 0;
>             tree offset, base_addr;
>             base_addr
>               = get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode,
>                                      &unsignedp, &reversep, &volatilep);
> +           if (TREE_CODE (lhs) == COMPONENT_REF
> +               && DECL_BIT_FIELD_TYPE (TREE_OPERAND (lhs, 1)))
> +             {
> +               get_bit_range (&bitregion_start, &bitregion_end, lhs,
> +                              &bitpos, &offset);
> +               if (bitregion_end)
> +                 ++bitregion_end;
> +             }
> +           if (bitsize == 0)
> +             continue;
> +
>             /* As a future enhancement we could handle stores with the same
>                base and offset.  */
>             bool invalid = reversep
> @@ -1414,7 +1613,26 @@ pass_store_merging::execute (function *f
>                 bit_off = byte_off << LOG2_BITS_PER_UNIT;
>                 bit_off += bitpos;
>                 if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off))
> -                 bitpos = bit_off.to_shwi ();
> +                 {
> +                   bitpos = bit_off.to_shwi ();
> +                   if (bitregion_end)
> +                     {
> +                       bit_off = byte_off << LOG2_BITS_PER_UNIT;
> +                       bit_off += bitregion_start;
> +                       if (wi::fits_uhwi_p (bit_off))
> +                         {
> +                           bitregion_start = bit_off.to_uhwi ();
> +                           bit_off = byte_off << LOG2_BITS_PER_UNIT;
> +                           bit_off += bitregion_end;
> +                           if (wi::fits_uhwi_p (bit_off))
> +                             bitregion_end = bit_off.to_uhwi ();
> +                           else
> +                             bitregion_end = 0;
> +                         }
> +                       else
> +                         bitregion_end = 0;
> +                     }
> +                 }
>                 else
>                   invalid = true;
>                 base_addr = TREE_OPERAND (base_addr, 0);
> @@ -1428,6 +1646,12 @@ pass_store_merging::execute (function *f
>                 base_addr = build_fold_addr_expr (base_addr);
>               }
>  
> +           if (!bitregion_end)
> +             {
> +               bitregion_start = ROUND_DOWN (bitpos, BITS_PER_UNIT);
> +               bitregion_end = ROUND_UP (bitpos + bitsize, BITS_PER_UNIT);
> +             }
> +
>             if (! invalid
>                 && offset != NULL_TREE)
>               {
> @@ -1457,9 +1681,11 @@ pass_store_merging::execute (function *f
>                 store_immediate_info *info;
>                 if (chain_info)
>                   {
> -                   info = new store_immediate_info (
> -                     bitsize, bitpos, stmt,
> -                     (*chain_info)->m_store_info.length ());
> +                   unsigned int ord = (*chain_info)->m_store_info.length ();
> +                   info = new store_immediate_info (bitsize, bitpos,
> +                                                    bitregion_start,
> +                                                    bitregion_end,
> +                                                    stmt, ord);
>                     if (dump_file && (dump_flags & TDF_DETAILS))
>                       {
>                         fprintf (dump_file,
> @@ -1488,6 +1714,8 @@ pass_store_merging::execute (function *f
>                 struct imm_store_chain_info *new_chain
>                   = new imm_store_chain_info (m_stores_head, base_addr);
>                 info = new store_immediate_info (bitsize, bitpos,
> +                                                bitregion_start,
> +                                                bitregion_end,
>                                                  stmt, 0);
>                 new_chain->m_store_info.safe_push (info);
>                 m_stores.put (base_addr, new_chain);
> --- gcc/testsuite/gcc.dg/store_merging_10.c.jj        2017-10-27 
> 14:52:29.724755656 +0200
> +++ gcc/testsuite/gcc.dg/store_merging_10.c   2017-10-27 14:52:29.724755656 
> +0200
> @@ -0,0 +1,56 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target store_merge } */
> +/* { dg-options "-O2 -fdump-tree-store-merging" } */
> +
> +struct S {
> +  unsigned int b1:1;
> +  unsigned int b2:1;
> +  unsigned int b3:1;
> +  unsigned int b4:1;
> +  unsigned int b5:1;
> +  unsigned int b6:27;
> +};
> +
> +struct T {
> +  unsigned int b1:1;
> +  unsigned int b2:16;
> +  unsigned int b3:14;
> +  unsigned int b4:1;
> +};
> +
> +__attribute__((noipa)) void
> +foo (struct S *x)
> +{
> +  x->b1 = 1;
> +  x->b2 = 0;
> +  x->b3 = 1;
> +  x->b4 = 1;
> +  x->b5 = 0;
> +}
> +
> +__attribute__((noipa)) void
> +bar (struct T *x)
> +{
> +  x->b1 = 1;
> +  x->b2 = 0;
> +  x->b4 = 0;
> +}
> +
> +struct S s = { 0, 1, 0, 0, 1, 0x3a5f05a };
> +struct T t = { 0, 0xf5af, 0x3a5a, 1 };
> +
> +int
> +main ()
> +{
> +  asm volatile ("" : : : "memory");
> +  foo (&s);
> +  bar (&t);
> +  asm volatile ("" : : : "memory");
> +  if (s.b1 != 1 || s.b2 != 0 || s.b3 != 1 || s.b4 != 1 || s.b5 != 0 || s.b6 
> != 0x3a5f05a)
> +    __builtin_abort ();
> +  if (t.b1 != 1 || t.b2 != 0 || t.b3 != 0x3a5a || t.b4 != 0)
> +    __builtin_abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" 
> } } */
> --- gcc/testsuite/gcc.dg/store_merging_11.c.jj        2017-10-27 
> 14:52:29.725755644 +0200
> +++ gcc/testsuite/gcc.dg/store_merging_11.c   2017-10-27 14:52:29.725755644 
> +0200
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target store_merge } */
> +/* { dg-options "-O2 -fdump-tree-store-merging" } */
> +
> +struct S { unsigned char b[2]; unsigned short c; unsigned char d[4]; 
> unsigned long e; };
> +
> +__attribute__((noipa)) void
> +foo (struct S *p)
> +{
> +  p->b[1] = 1;
> +  p->c = 23;
> +  p->d[0] = 4;
> +  p->d[1] = 5;
> +  p->d[2] = 6;
> +  p->d[3] = 7;
> +  p->e = 8;
> +}
> +
> +__attribute__((noipa)) void
> +bar (struct S *p)
> +{
> +  p->b[1] = 9;
> +  p->c = 112;
> +  p->d[0] = 10;
> +  p->d[1] = 11;
> +}
> +
> +struct S s = { { 30, 31 }, 32, { 33, 34, 35, 36 }, 37 };
> +
> +int
> +main ()
> +{
> +  asm volatile ("" : : : "memory");
> +  foo (&s);
> +  asm volatile ("" : : : "memory");
> +  if (s.b[0] != 30 || s.b[1] != 1 || s.c != 23 || s.d[0] != 4 || s.d[1] != 5
> +      || s.d[2] != 6 || s.d[3] != 7 || s.e != 8)
> +    __builtin_abort ();
> +  bar (&s);
> +  asm volatile ("" : : : "memory");
> +  if (s.b[0] != 30 || s.b[1] != 9 || s.c != 112 || s.d[0] != 10 || s.d[1] != 
> 11
> +      || s.d[2] != 6 || s.d[3] != 7 || s.e != 8)
> +    __builtin_abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" 
> } } */
> --- gcc/testsuite/gcc.dg/store_merging_12.c.jj        2017-10-27 
> 15:00:20.046976487 +0200
> +++ gcc/testsuite/gcc.dg/store_merging_12.c   2017-10-27 14:59:56.000000000 
> +0200
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Wall" } */
> +
> +struct S { unsigned int b1:1, b2:1, b3:1, b4:1, b5:1, b6:27; };
> +void bar (struct S *);
> +void foo (int x)
> +{
> +  struct S s;
> +  s.b2 = 1; s.b3 = 0; s.b4 = 1; s.b5 = 0; s.b1 = x; s.b6 = x;        /* { 
> dg-bogus "is used uninitialized in this function" } */
> +  bar (&s);
> +}
> --- gcc/testsuite/g++.dg/pr71694.C.jj 2016-12-16 11:24:32.000000000 +0100
> +++ gcc/testsuite/g++.dg/pr71694.C    2017-10-27 16:53:09.278596219 +0200
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -fno-store-merging" } */
>  
>  struct B {
>      B() {}
> 
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Assorted store-merging improvements (PR middle-end/22141)

Reply via email to