On Mon, Jun 2, 2014 at 11:12 PM, Uros Bizjak <[email protected]> wrote:
> A problem was uncovered by -march=corei7 -mtune=intel -m32 with
> i386/memcpy-[23] testcase in decide_alg subroutine [1]. Although the
> max size of the transfer was known, the memcpy was not inlined, as
> expected by the testcase.
>
> The core of the problem can be seen in the definition of 32bit
> intel_memcpy stringop alg:
>
> {libcall, {{11, loop, false}, {-1, rep_prefix_4_byte, false}}},
>
> Please note that the last algorithm sets its maximum size to -1,
> "unlimited". However, in decide_alg, the same number also signals that
> no algorithm sets its size, so expected_size is never calculated. In
> the loop that sets maximal size for user defined algorithm, it is
> assumed that "-1" belongs exclusively to libcall, which is not the
> case in the above intel_memcpy definition:
>
> if (candidate != libcall && candidate && usable)
> max = algs->size[i].max;
>
> When the last non-libcall algorithm sets its maximum to "-1" (aka
> "unlimited"), this value fails following test:
>
> if (max > 1 && (unsigned HOST_WIDE_INT) max >= max_size
>
> and expected_size is never calculated.
>
> Attached patch fixes this oversight, so "-1" means unlimited size and
> "0" means that size was never set. The patch also considers these two
> special values when choosing a maximum size for dynamic check.
>
> 2014-06-02 Uros Bizjak <[email protected]>
>
> * config/i386/i386.c (decide_alg): Correctly handle maximum size of
> stringop algorithm.
>
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
> {,-m32}, also with
> RUNTESTFLAGS="--target_board=unix/-march=corei7/-mtune=intel\{,-m32\}",
> where it fixes both memcpy failures from [1].
>
> [1] https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg00127.html
>
> Jan, can you please review the patch, to check if the logic is OK?
Whoops, wrong patch was attached. Now with the correct attachment.
Uros.
Index: ChangeLog
===================================================================
--- ChangeLog (revision 211140)
+++ ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2014-06-02 Uros Bizjak <[email protected]>
+
+ * config/i386/i386.c (decide_alg): Correctly handle maximum size of
+ stringop algorithm.
+
2014-06-02 Marcus Shawcroft <[email protected]>
* config/aarch64/aarch64.md (set_fpcr): Drop ISB after FPCR write.
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 211140)
+++ config/i386/i386.c (working copy)
@@ -23828,7 +23828,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp
{
const struct stringop_algs * algs;
bool optimize_for_speed;
- int max = -1;
+ int max = 0;
const struct processor_costs *cost;
int i;
bool any_alg_usable_p = false;
@@ -23866,7 +23866,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp
/* If expected size is not known but max size is small enough
so inline version is a win, set expected size into
the range. */
- if (max > 1 && (unsigned HOST_WIDE_INT) max >= max_size
+ if (((max > 1 && (unsigned HOST_WIDE_INT) max >= max_size) || max == -1)
&& expected_size == -1)
expected_size = min_size / 2 + max_size / 2;
@@ -23955,7 +23955,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp
*dynamic_check = 128;
return loop_1_byte;
}
- if (max == -1)
+ if (max <= 0)
max = 4096;
alg = decide_alg (count, max / 2, min_size, max_size, memset,
zero_memset, dynamic_check, noalign);