Hi All,

This patch fixes the code generation for copy_blkmode_to_reg by calculating
the bitsize per iteration doing the maximum copy allowed that does not read more
than the amount of bits left to copy.

This fixes the bad code generation reported and also still produces better code
in most cases. For targets that don't support fast unaligned access it defaults
to the old 1 byte copy (MIN alignment).

This produces for the copying of a 3 byte structure:

fun3:
        adrp    x1, .LANCHOR0
        add     x1, x1, :lo12:.LANCHOR0
        mov     x0, 0
        sub     sp, sp, #16
        ldrh    w2, [x1, 16]
        ldrb    w1, [x1, 18]
        add     sp, sp, 16
        bfi     x0, x2, 0, 16
        bfi     x0, x1, 16, 8
        ret

whereas before it was producing

fun3:
        adrp    x0, .LANCHOR0
        add     x2, x0, :lo12:.LANCHOR0
        sub     sp, sp, #16
        ldrh    w1, [x0, #:lo12:.LANCHOR0]
        ldrb    w0, [x2, 2]
        strh    w1, [sp, 8]
        strb    w0, [sp, 10]
        ldr     w0, [sp, 8]
        add     sp, sp, 16
        ret

Cross compiled on aarch64-none-elf and no issues
Bootstrapped powerpc64-unknown-linux-gnu, x86_64-pc-linux-gnu,
arm-none-linux-gnueabihf, aarch64-none-linux-gnu with no issues.

Regtested aarch64-none-elf, x86_64-pc-linux-gnu, powerpc64-unknown-linux-gnu
and arm-none-linux-gnueabihf and found no issues.

Regression on powerpc (pr63594-2.c) is fixed now.

OK for trunk?

Thanks,
Tamar

gcc/
2018-04-05  Tamar Christina  <tamar.christ...@arm.com>

        PR middle-end/85123
        * expr.c (copy_blkmode_to_reg): Fix wrong code gen.

-- 
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72e5441a6421a280b04c57fca2922b8..7daeb8c91d758edf0b3dc37f6927380b6f3df877 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2749,7 +2749,7 @@ copy_blkmode_to_reg (machine_mode mode_in, tree src)
 {
   int i, n_regs;
   unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0, bytes;
-  unsigned int bitsize;
+   unsigned int bitsize = 0;
   rtx *dst_words, dst, x, src_word = NULL_RTX, dst_word = NULL_RTX;
   /* No current ABI uses variable-sized modes to pass a BLKmnode type.  */
   fixed_size_mode mode = as_a <fixed_size_mode> (mode_in);
@@ -2782,7 +2782,7 @@ copy_blkmode_to_reg (machine_mode mode_in, tree src)
 
   n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
   dst_words = XALLOCAVEC (rtx, n_regs);
-  bitsize = BITS_PER_WORD;
+
   if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
     bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
 
@@ -2791,6 +2791,17 @@ copy_blkmode_to_reg (machine_mode mode_in, tree src)
        bitpos < bytes * BITS_PER_UNIT;
        bitpos += bitsize, xbitpos += bitsize)
     {
+      /* Find the largest integer mode that can be used to copy all or as
+	 many bits as possible of the structure.  */
+      opt_scalar_int_mode mode_iter;
+      FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
+      if (GET_MODE_BITSIZE (mode_iter.require ())
+			    <= ((bytes * BITS_PER_UNIT) - bitpos)
+	  && GET_MODE_BITSIZE (mode_iter.require ()) <= BITS_PER_WORD)
+	bitsize = GET_MODE_BITSIZE (mode_iter.require ());
+      else
+	break;
+
       /* We need a new destination pseudo each time xbitpos is
 	 on a word boundary and when xbitpos == padding_correction
 	 (the first time through).  */

Reply via email to