Ping.
On Mon, Jun 2, 2014 at 11:12 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > Hello! > > A problem was uncovered by -march=corei7 -mtune=intel -m32 with > i386/memcpy-[23] testcase in decide_alg subroutine [1]. Although the > max size of the transfer was known, the memcpy was not inlined, as > expected by the testcase. > > The core of the problem can be seen in the definition of 32bit > intel_memcpy stringop alg: > > {libcall, {{11, loop, false}, {-1, rep_prefix_4_byte, false}}}, > > Please note that the last algorithm sets its maximum size to -1, > "unlimited". However, in decide_alg, the same number also signals that > no algorithm sets its size, so expected_size is never calculated. In > the loop that sets maximal size for user defined algorithm, it is > assumed that "-1" belongs exclusively to libcall, which is not the > case in the above intel_memcpy definition: > > if (candidate != libcall && candidate && usable) > max = algs->size[i].max; > > When the last non-libcall algorithm sets its maximum to "-1" (aka > "unlimited"), this value fails following test: > > if (max > 1 && (unsigned HOST_WIDE_INT) max >= max_size > > and expected_size is never calculated. > > Attached patch fixes this oversight, so "-1" means unlimited size and > "0" means that size was never set. The patch also considers these two > special values when choosing a maximum size for dynamic check. > > 2014-06-02 Uros Bizjak <ubiz...@gmail.com> > > * config/i386/i386.c (decide_alg): Correctly handle maximum size of > stringop algorithm. > > Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu > {,-m32}, also with > RUNTESTFLAGS="--target_board=unix/-march=corei7/-mtune=intel\{,-m32\}", > where it fixes both memcpy failures from [1]. > > [1] https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg00127.html > > Jan, can you please review the patch, to check if the logic is OK? > > Uros.
Index: ChangeLog =================================================================== --- ChangeLog (revision 211140) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-06-02 Uros Bizjak <ubiz...@gmail.com> + + * config/i386/i386.c (decide_alg): Correctly handle maximum size of + stringop algorithm. + 2014-06-02 Marcus Shawcroft <marcus.shawcr...@arm.com> * config/aarch64/aarch64.md (set_fpcr): Drop ISB after FPCR write. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 211140) +++ config/i386/i386.c (working copy) @@ -23828,7 +23828,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp { const struct stringop_algs * algs; bool optimize_for_speed; - int max = -1; + int max = 0; const struct processor_costs *cost; int i; bool any_alg_usable_p = false; @@ -23866,7 +23866,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp /* If expected size is not known but max size is small enough so inline version is a win, set expected size into the range. */ - if (max > 1 && (unsigned HOST_WIDE_INT) max >= max_size + if (((max > 1 && (unsigned HOST_WIDE_INT) max >= max_size) || max == -1) && expected_size == -1) expected_size = min_size / 2 + max_size / 2; @@ -23955,7 +23955,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp *dynamic_check = 128; return loop_1_byte; } - if (max == -1) + if (max <= 0) max = 4096; alg = decide_alg (count, max / 2, min_size, max_size, memset, zero_memset, dynamic_check, noalign);