Hi! This is something the bswap pass has been already doing, but not the new load handling code in store-merging. If all the loads have the same vuse, it still doesn't mean they are necessarily in the same basic block. If they are in multiple bbs, previously we've chosen randomly (well, from the load corresponding to the first store in the group), now we pick at least the last basic block.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? It might not be good enough, if some program has int x = q[0]; foo (x); int y = q[1]; p[0] = x; p[1] = y; and foo conditionally exits or aborts or externally throws or loops forever in case q[1] might not be mapped, then even when both loads are in the same bb we might put the larger load on the first load rather than the second. I think we'd need to compute uids (perhaps lazily) and compare which stmt comes last. Thoughts on that? 2017-11-20 Jakub Jelinek <ja...@redhat.com> PR tree-optimization/83047 * gimple-ssa-store-merging.c (imm_store_chain_info::output_merged_store): If the loads with the same vuse are in different basic blocks, for load_gsi pick a load location that is dominated by the other loads. * gcc.dg/pr83047.c: New test. --- gcc/gimple-ssa-store-merging.c.jj 2017-11-17 08:40:25.000000000 +0100 +++ gcc/gimple-ssa-store-merging.c 2017-11-20 10:37:40.429859947 +0100 @@ -1857,7 +1857,30 @@ imm_store_chain_info::output_merged_stor store_immediate_info *infol = group->stores.last (); if (gimple_vuse (op.stmt) == gimple_vuse (infol->ops[j].stmt)) { - load_gsi[j] = gsi_for_stmt (op.stmt); + /* We can't pick the location randomly; while we've verified + all the loads have the same vuse, they can be still in different + basic blocks and we need to pick the one from the last bb: + int x = q[0]; + if (x == N) return; + int y = q[1]; + p[0] = x; + p[1] = y; + otherwise if we put the wider load at the q[0] load, we might + segfault if q[1] is not mapped. */ + basic_block bb = gimple_bb (op.stmt); + gimple *ostmt = op.stmt; + store_immediate_info *info; + FOR_EACH_VEC_ELT (group->stores, i, info) + { + gimple *tstmt = info->ops[j].stmt; + basic_block tbb = gimple_bb (tstmt); + if (dominated_by_p (CDI_DOMINATORS, tbb, bb)) + { + ostmt = tstmt; + bb = tbb; + } + } + load_gsi[j] = gsi_for_stmt (ostmt); load_addr[j] = force_gimple_operand_1 (unshare_expr (op.base_addr), &load_seq[j], is_gimple_mem_ref_addr, --- gcc/testsuite/gcc.dg/pr83047.c.jj 2017-11-20 10:19:26.612657065 +0100 +++ gcc/testsuite/gcc.dg/pr83047.c 2017-11-20 10:24:15.000000000 +0100 @@ -0,0 +1,58 @@ +/* PR tree-optimization/83047 */ +/* { dg-do run { target mmap } } */ +/* { dg-options "-O2" } */ + +#include <stddef.h> +#include <stdio.h> +#include <sys/mman.h> +#ifndef MAP_ANONYMOUS +#define MAP_ANONYMOUS MAP_ANON +#endif +#ifndef MAP_ANON +#define MAP_ANON 0 +#endif +#ifndef MAP_FAILED +#define MAP_FAILED ((void *)-1) +#endif +#include <stdlib.h> + +__attribute__((noipa)) void +foo (char *p, char *q, int r) +{ + char a = q[0]; + if (r || a == '\0') + return; + char b = q[1]; + p[0] = a; + p[1] = b; +} + +int +main () +{ + char *p = mmap (NULL, 131072, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (p == MAP_FAILED) + return 0; + if (munmap (p + 65536, 65536) < 0) + return 0; + p[0] = 'c'; + p[1] = 'd'; + p[65536 - 2] = 'a'; + p[65536 - 1] = 'b'; + volatile int r = 1; + foo (p, p + 65536 - 2, r); + if (p[0] != 'c' || p[1] != 'd') + abort (); + r = 0; + foo (p, p + 65536 - 2, r); + if (p[0] != 'a' || p[1] != 'b') + abort (); + p[0] = 'e'; + p[1] = 'f'; + r = 1; + foo (p, p + 65536 - 1, r); + if (p[0] != 'e' || p[1] != 'f') + abort (); + return 0; +} Jakub