setmem use mode wider than Pmode for size

H.J. Lu Mon, 04 Nov 2013 05:08:41 -0800

On Mon, Nov 4, 2013 at 3:34 AM, H.J. Lu <hjl.to...@gmail.com> wrote:
> Y
> On Mon, Nov 4, 2013 at 3:11 AM, Richard Sandiford
> <rsand...@linux.vnet.ibm.com> wrote:
>> "H.J. Lu" <hongjiu...@intel.com> writes:
>>> emit_block_move_via_movmem and set_storage_via_setmem have
>>>
>>>   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
>>>        mode = GET_MODE_WIDER_MODE (mode))
>>>     {
>>>       enum insn_code code = direct_optab_handler (movmem_optab, mode);
>>>
>>>       if (code != CODE_FOR_nothing
>>>           /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
>>>              here because if SIZE is less than the mode mask, as it is
>>>              returned by the macro, it will definitely be less than the
>>>              actual mode mask.  */
>>>           && ((CONST_INT_P (size)
>>>                && ((unsigned HOST_WIDE_INT) INTVAL (size)
>>>                    <= (GET_MODE_MASK (mode) >> 1)))
>>>               || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD))
>>>         {
>>>
>>> Backend may assume mode of size in movmem and setmem expanders is no
>>> widder than Pmode since size is within the Pmode address space.  X86
>>> backend expand_set_or_movmem_prologue_epilogue_by_misaligned has
>>>
>>>       rtx saveddest = *destptr;
>>>
>>>       gcc_assert (desired_align <= size);
>>>       /* Align destptr up, place it to new register.  */
>>>       *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr,
>>>                                       GEN_INT (prolog_size),
>>>                                       NULL_RTX, 1, OPTAB_DIRECT);
>>>       *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr,
>>>                                       GEN_INT (-desired_align),
>>>                                       *destptr, 1, OPTAB_DIRECT);
>>>       /* See how many bytes we skipped.  */
>>>       saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, 
>>> saveddest,
>>>                                        *destptr,
>>>                                        saveddest, 1, OPTAB_DIRECT);
>>>       /* Adjust srcptr and count.  */
>>>       if (!issetmem)
>>>         *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, 
>>> saveddest,
>>>                                         *srcptr, 1, OPTAB_DIRECT);
>>>       *count = expand_simple_binop (GET_MODE (*count), PLUS, *count,
>>>                                     saveddest, *count, 1, OPTAB_DIRECT);
>>>
>>> saveddest is a negative number in Pmode and *count is in word_mode.  For
>>> x32, when Pmode is SImode and word_mode is DImode, saveddest + *count
>>> leads to overflow.  We could fix it by using mode of saveddest to compute
>>> saveddest + *count.  But it leads to extra conversions and other backends
>>> may run into the same problem.  A better fix is to limit mode of size in
>>> movmem and setmem expanders to Pmode.  It generates better and correct
>>> memcpy and memset for x32.
>>>
>>> There is also a typo in comments.  It should be BITS_PER_WORD, not
>>> BITS_PER_HOST_WIDE_INT.
>>
>> I don't think it's a typo.  It's explaining why we don't have to worry about:
>>
>>     GET_MODE_BITSIZE (mode) > BITS_PER_HOST_WIDE_INT
>>
>> in the CONST_INT_P test (because in that case the GET_MODE_MASK macro
>> will be an all-1 HOST_WIDE_INT, even though that's narrower than the
>> real mask).
>
> Thanks for explanation.
>
>> I don't think the current comment covers the BITS_PER_WORD test at all.
>> AIUI it's there because the pattern is defined as taking a length of
>> at most word_mode, so we should stop once we reach it.
>
> I see.
>
>> FWIW, I agree Pmode makes more sense on face value.  But shouldn't
>> we replace the BITS_PER_WORD test instead of adding to it?  Having both
>> would only make a difference for Pmode > word_mode targets, which might
>> be able to handle full Pmode lengths.
>
> Do we ever have a target with Pmode is wider than
> word_mode? If not, we can check Pmode instead.
>
>> Either way, the md.texi documentation should be updated too.
>>


This is the updated patch with md.texi change.   The testcase is the same.
Tested on x32.  OK to install?

Thanks.

-- 
H.J.
---
gcc/

2013-11-04  H.J. Lu  <hongjiu...@intel.com>

    PR middle-end/58981
    * doc/md.texi (@code{movmem@var{m}}): Specify Pmode as mode of
    pattern, instead of word_mode.

    * expr.c (emit_block_move_via_movmem): Don't use mode wider than
    Pmode for size.
    (set_storage_via_setmem): Likewise.

gcc/testsuite/

2013-11-04  H.J. Lu  <hongjiu...@intel.com>

    PR middle-end/58981
    * gcc.dg/pr58981.c: New test.

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index ac10a0a..1e22b88 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5291,12 +5291,13 @@ are the first two operands, and both are
@code{mem:BLK}s with an
 address in mode @code{Pmode}.

 The number of bytes to move is the third operand, in mode @var{m}.
-Usually, you specify @code{word_mode} for @var{m}.  However, if you can
+Usually, you specify @code{Pmode} for @var{m}.  However, if you can
 generate better code knowing the range of valid lengths is smaller than
-those representable in a full word, you should provide a pattern with a
+those representable in a full Pmode pointer, you should provide
+a pattern with a
 mode corresponding to the range of values you can handle efficiently
 (e.g., @code{QImode} for values in the range 0--127; note we avoid numbers
-that appear negative) and also a pattern with @code{word_mode}.
+that appear negative) and also a pattern with @code{Pmode}.

 The fourth operand is the known shared alignment of the source and
 destination, in the form of a @code{const_int} rtx.  Thus, if the
diff --git a/gcc/expr.c b/gcc/expr.c
index 551a660..8ef2870 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1297,11 +1297,12 @@ emit_block_move_via_movmem (rtx x, rtx y, rtx
size, unsigned int align,
       /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
          here because if SIZE is less than the mode mask, as it is
          returned by the macro, it will definitely be less than the
-         actual mode mask.  */
+         actual mode mask.  Since SIZE is within the Pmode address
+         space, we limit MODE to Pmode.  */
       && ((CONST_INT_P (size)
            && ((unsigned HOST_WIDE_INT) INTVAL (size)
            <= (GET_MODE_MASK (mode) >> 1)))
-          || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD))
+          || GET_MODE_BITSIZE (mode) >= GET_MODE_BITSIZE (Pmode)))
     {
       struct expand_operand ops[6];
       unsigned int nops;
@@ -2879,14 +2880,15 @@ set_storage_via_setmem (rtx object, rtx size,
rtx val, unsigned int align,
       enum insn_code code = direct_optab_handler (setmem_optab, mode);

       if (code != CODE_FOR_nothing
-      /* We don't need MODE to be narrower than
-         BITS_PER_HOST_WIDE_INT here because if SIZE is less than
-         the mode mask, as it is returned by the macro, it will
-         definitely be less than the actual mode mask.  */
+      /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
+         here because if SIZE is less than the mode mask, as it is
+         returned by the macro, it will definitely be less than the
+         actual mode mask.  Since SIZE is within the Pmode address
+         space, we limit MODE to Pmode.  */
       && ((CONST_INT_P (size)
            && ((unsigned HOST_WIDE_INT) INTVAL (size)
            <= (GET_MODE_MASK (mode) >> 1)))
-          || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD))
+          || GET_MODE_BITSIZE (mode) >= GET_MODE_BITSIZE (Pmode)))
     {
       struct expand_operand ops[6];
       unsigned int nops;

Re: PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size

Reply via email to