Hello! As explained in the PR [1] Comment #4, this is a target problem with invalid RTL sharing.
Invalid sharing is created by the misaligned expansion code in i386.c, when subregs are involved. vec_extract_hi_v32qi pattern is generated in loop2_invariant pass when misaligned V8SI move is generated, and later cprop3 pass propagates a register inside a subreg. The pass updates both instances of (reg:V8SI 181) to (reg:V8SI 175) in (insn 197) and (insn 198). However, since just renamed (reg 175) doesn't trigger rescan of (insn 198) in the substitution loop, we miss a rescan of (insn 198). The solution is to avoid invalid sharing by copying RTXes when subregs are created. 2016-06-06 Uros Bizjak <ubiz...@gmail.com> PR target/71389 * config/i386/i386.c (ix86_avx256_split_vector_move_misalign): Copy op1 RTX to avoid invalid sharing. (ix86_expand_vector_move_misalign): Ditto. testsuite/ChangeLog: 2016-06-06 Uros Bizjak <ubiz...@gmail.com> PR target/71389 * g++.dg/pr71389.C: New test. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN, will be backported to release branches. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71389 Uros.
Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 237110) +++ config/i386/i386.c (working copy) @@ -19552,7 +19552,7 @@ ix86_avx256_split_vector_move_misalign (rtx op0, r m = adjust_address (op0, mode, 0); emit_insn (extract (m, op1, const0_rtx)); m = adjust_address (op0, mode, 16); - emit_insn (extract (m, op1, const1_rtx)); + emit_insn (extract (m, copy_rtx (op1), const1_rtx)); } else gcc_unreachable (); @@ -19724,7 +19724,7 @@ ix86_expand_vector_move_misalign (machine_mode mod m = adjust_address (op0, V2SFmode, 0); emit_insn (gen_sse_storelps (m, op1)); m = adjust_address (op0, V2SFmode, 8); - emit_insn (gen_sse_storehps (m, op1)); + emit_insn (gen_sse_storehps (m, copy_rtx (op1))); } } else Index: testsuite/g++.dg/pr71389.C =================================================================== --- testsuite/g++.dg/pr71389.C (nonexistent) +++ testsuite/g++.dg/pr71389.C (working copy) @@ -0,0 +1,23 @@ +// { dg-do compile { target i?86-*-* x86_64-*-* } } +// { dg-options "-std=c++11 -O3 -march=ivybridge" } + +#include <functional> + +extern int le_s6, le_s9, le_s11; +long foo_v14[16][16]; + +void fn1() { + std::array<std::array<int, 16>, 16> v13; + for (; le_s6;) + for (int k1 = 2; k1 < 4; k1 = k1 + 1) { + for (int n1 = 0; n1 < le_s9; n1 = 8) { + *foo_v14[6] = 20923310; + for (int i2 = n1; i2 < n1 + 8; i2 = i2 + 1) + v13.at(5).at(i2 + 6 - n1) = 306146921; + } + + for (int l2 = 0; l2 < le_s11; l2 = l2 + 1) + *(l2 + v13.at(5).begin()) = 306146921; + } + v13.at(le_s6 - 4); +}