On Mon, Nov 4, 2013 at 3:34 AM, H.J. Lu <hjl.to...@gmail.com> wrote: > Y > On Mon, Nov 4, 2013 at 3:11 AM, Richard Sandiford > <rsand...@linux.vnet.ibm.com> wrote: >> "H.J. Lu" <hongjiu...@intel.com> writes: >>> emit_block_move_via_movmem and set_storage_via_setmem have >>> >>> for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode; >>> mode = GET_MODE_WIDER_MODE (mode)) >>> { >>> enum insn_code code = direct_optab_handler (movmem_optab, mode); >>> >>> if (code != CODE_FOR_nothing >>> /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT >>> here because if SIZE is less than the mode mask, as it is >>> returned by the macro, it will definitely be less than the >>> actual mode mask. */ >>> && ((CONST_INT_P (size) >>> && ((unsigned HOST_WIDE_INT) INTVAL (size) >>> <= (GET_MODE_MASK (mode) >> 1))) >>> || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD)) >>> { >>> >>> Backend may assume mode of size in movmem and setmem expanders is no >>> widder than Pmode since size is within the Pmode address space. X86 >>> backend expand_set_or_movmem_prologue_epilogue_by_misaligned has >>> >>> rtx saveddest = *destptr; >>> >>> gcc_assert (desired_align <= size); >>> /* Align destptr up, place it to new register. */ >>> *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr, >>> GEN_INT (prolog_size), >>> NULL_RTX, 1, OPTAB_DIRECT); >>> *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr, >>> GEN_INT (-desired_align), >>> *destptr, 1, OPTAB_DIRECT); >>> /* See how many bytes we skipped. */ >>> saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, >>> saveddest, >>> *destptr, >>> saveddest, 1, OPTAB_DIRECT); >>> /* Adjust srcptr and count. */ >>> if (!issetmem) >>> *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, >>> saveddest, >>> *srcptr, 1, OPTAB_DIRECT); >>> *count = expand_simple_binop (GET_MODE (*count), PLUS, *count, >>> saveddest, *count, 1, OPTAB_DIRECT); >>> >>> saveddest is a negative number in Pmode and *count is in word_mode. For >>> x32, when Pmode is SImode and word_mode is DImode, saveddest + *count >>> leads to overflow. We could fix it by using mode of saveddest to compute >>> saveddest + *count. But it leads to extra conversions and other backends >>> may run into the same problem. A better fix is to limit mode of size in >>> movmem and setmem expanders to Pmode. It generates better and correct >>> memcpy and memset for x32. >>> >>> There is also a typo in comments. It should be BITS_PER_WORD, not >>> BITS_PER_HOST_WIDE_INT. >> >> I don't think it's a typo. It's explaining why we don't have to worry about: >> >> GET_MODE_BITSIZE (mode) > BITS_PER_HOST_WIDE_INT >> >> in the CONST_INT_P test (because in that case the GET_MODE_MASK macro >> will be an all-1 HOST_WIDE_INT, even though that's narrower than the >> real mask). > > Thanks for explanation. > >> I don't think the current comment covers the BITS_PER_WORD test at all. >> AIUI it's there because the pattern is defined as taking a length of >> at most word_mode, so we should stop once we reach it. > > I see. > >> FWIW, I agree Pmode makes more sense on face value. But shouldn't >> we replace the BITS_PER_WORD test instead of adding to it? Having both >> would only make a difference for Pmode > word_mode targets, which might >> be able to handle full Pmode lengths. > > Do we ever have a target with Pmode is wider than > word_mode? If not, we can check Pmode instead. > >> Either way, the md.texi documentation should be updated too. >>
This is the updated patch with md.texi change. The testcase is the same. Tested on x32. OK to install? Thanks. -- H.J. --- gcc/ 2013-11-04 H.J. Lu <hongjiu...@intel.com> PR middle-end/58981 * doc/md.texi (@code{movmem@var{m}}): Specify Pmode as mode of pattern, instead of word_mode. * expr.c (emit_block_move_via_movmem): Don't use mode wider than Pmode for size. (set_storage_via_setmem): Likewise. gcc/testsuite/ 2013-11-04 H.J. Lu <hongjiu...@intel.com> PR middle-end/58981 * gcc.dg/pr58981.c: New test. diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index ac10a0a..1e22b88 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5291,12 +5291,13 @@ are the first two operands, and both are @code{mem:BLK}s with an address in mode @code{Pmode}. The number of bytes to move is the third operand, in mode @var{m}. -Usually, you specify @code{word_mode} for @var{m}. However, if you can +Usually, you specify @code{Pmode} for @var{m}. However, if you can generate better code knowing the range of valid lengths is smaller than -those representable in a full word, you should provide a pattern with a +those representable in a full Pmode pointer, you should provide +a pattern with a mode corresponding to the range of values you can handle efficiently (e.g., @code{QImode} for values in the range 0--127; note we avoid numbers -that appear negative) and also a pattern with @code{word_mode}. +that appear negative) and also a pattern with @code{Pmode}. The fourth operand is the known shared alignment of the source and destination, in the form of a @code{const_int} rtx. Thus, if the diff --git a/gcc/expr.c b/gcc/expr.c index 551a660..8ef2870 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -1297,11 +1297,12 @@ emit_block_move_via_movmem (rtx x, rtx y, rtx size, unsigned int align, /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT here because if SIZE is less than the mode mask, as it is returned by the macro, it will definitely be less than the - actual mode mask. */ + actual mode mask. Since SIZE is within the Pmode address + space, we limit MODE to Pmode. */ && ((CONST_INT_P (size) && ((unsigned HOST_WIDE_INT) INTVAL (size) <= (GET_MODE_MASK (mode) >> 1))) - || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD)) + || GET_MODE_BITSIZE (mode) >= GET_MODE_BITSIZE (Pmode))) { struct expand_operand ops[6]; unsigned int nops; @@ -2879,14 +2880,15 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, unsigned int align, enum insn_code code = direct_optab_handler (setmem_optab, mode); if (code != CODE_FOR_nothing - /* We don't need MODE to be narrower than - BITS_PER_HOST_WIDE_INT here because if SIZE is less than - the mode mask, as it is returned by the macro, it will - definitely be less than the actual mode mask. */ + /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT + here because if SIZE is less than the mode mask, as it is + returned by the macro, it will definitely be less than the + actual mode mask. Since SIZE is within the Pmode address + space, we limit MODE to Pmode. */ && ((CONST_INT_P (size) && ((unsigned HOST_WIDE_INT) INTVAL (size) <= (GET_MODE_MASK (mode) >> 1))) - || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD)) + || GET_MODE_BITSIZE (mode) >= GET_MODE_BITSIZE (Pmode))) { struct expand_operand ops[6]; unsigned int nops;