On 2014-09-25 5:46 AM, Ilya Enkovich wrote:
2014-09-25 1:51 GMT+04:00 Ilya Enkovich <enkovich....@gmail.com>:
2014-09-24 23:09 GMT+04:00 Jeff Law <l...@redhat.com>:
On 09/24/14 07:13, Ilya Enkovich wrote:
I tried to generate PARALLEL with all regs set by call. Here is a
memset call I got:
(call_insn 23 22 24 2 (set (parallel [
(expr_list:REG_DEP_TRUE (reg:DI 0 ax)
(const_int 0 [0]))
(expr_list:REG_DEP_TRUE (reg:BND64 77 bnd0)
(const_int 0 [0]))
(expr_list:REG_DEP_TRUE (reg:BND64 78 bnd1)
(const_int 0 [0]))
])
(call/j (mem:QI (symbol_ref:DI ("memset") [flags 0x41]
[ snip ]
Looks good. This is the approved way to handle multiple results of a call.
During register allocation LRA generated a weird move instruction:
(insn 63 0 0 (set (reg/f:DI 100)
(parallel [
(expr_list:REG_DEP_TRUE (reg:DI 0 ax)
(const_int 0 [0]))
(expr_list:REG_DEP_TRUE (reg:BND64 77 bnd0)
(const_int 0 [0]))
(expr_list:REG_DEP_TRUE (reg:BND64 78 bnd1)
(const_int 0 [0]))
])) -1
(nil))
Which caused ICE later in LRA. This move happens because of
REG_RETURNED (reg/f:DI 100) (see condition in inherit_in_ebb at
lra-constraints.c:5312). Thus this code in LRA doesn't accept
PARALLEL dest for calls.
This is a bug in LRA then. Multiple return values aren't heavily used, so
I'm not surprised that its handling was missed in LRA.
The question now is how to bundle things together in such a way as to make
it easy for Vlad to reproduce and fix this in LRA.
Jeff
I suppose it should be easy to reproduce using the same test case I
use and some speudo patch which adds fake return values (e.g. xmm6 and
xmm7) to calls. Will try to make some minimal patch and test Vlad
could work with.
Ilya
I couldn't reproduce the problem on a small test but chrome build
shows a lot of errors. Due to the nature of the problem test's size
shouldn't matter, so I attach patch which emulates situation with
bounds regs (but uses xmm5 and xmm6 instead of bnd0 and bnd1) with a
preprocessed chrome file.
The problem is in code introduced by Bernd in IRA and caller-saves.c in
2012. It is basically an optimization for functions returning always
the same result as one argument (e.g. memcpy returning 1st argument).
There are two possible solutions. First one is to prohibit the
optimizations when there is a parallel in SET. Second one is to go
deeper if the call result is guaranteed in the first element which is
true for the patch.
For the first solution, the patch would be
Index: lra-constraints.c
===================================================================
--- lra-constraints.c (revision 215748)
+++ lra-constraints.c (working copy)
@@ -5348,16 +5348,19 @@
if (GET_CODE (pat) == PARALLEL)
pat = XVECEXP (pat, 0, 0);
dest = SET_DEST (pat);
- start_sequence ();
- emit_move_insn (cheap, copy_rtx (dest));
- restore = get_insns ();
- end_sequence ();
- lra_process_new_insns (curr_insn, NULL, restore,
- "Inserting call parameter
restore");
- /* We don't need to save/restore of the pseudo from
- this call. */
- usage_insns[regno].calls_num = calls_num;
- bitmap_set_bit (&check_only_regs, regno);
+ if (REG_P (dest))
+ {
+ start_sequence ();
+ emit_move_insn (cheap, copy_rtx (dest));
+ restore = get_insns ();
+ end_sequence ();
+ lra_process_new_insns (curr_insn, NULL, restore,
+ "Inserting call parameter
restore");
+ /* We don't need to save/restore of the pseudo
+ from this call. */
+ usage_insns[regno].calls_num = calls_num;
+ bitmap_set_bit (&check_only_regs, regno);
+ }
}
}
to_inherit_num = 0;
For the second solution, the patch is
Index: lra-constraints.c
===================================================================
--- lra-constraints.c (revision 215748)
+++ lra-constraints.c (working copy)
@@ -5348,16 +5348,25 @@
if (GET_CODE (pat) == PARALLEL)
pat = XVECEXP (pat, 0, 0);
dest = SET_DEST (pat);
- start_sequence ();
- emit_move_insn (cheap, copy_rtx (dest));
- restore = get_insns ();
- end_sequence ();
- lra_process_new_insns (curr_insn, NULL, restore,
- "Inserting call parameter
restore");
- /* We don't need to save/restore of the pseudo from
- this call. */
- usage_insns[regno].calls_num = calls_num;
- bitmap_set_bit (&check_only_regs, regno);
+ if (GET_CODE (dest) == PARALLEL)
+ {
+ dest = XVECEXP (dest, 0, 0);
+ if (GET_CODE (dest) == EXPR_LIST)
+ dest = XEXP (dest, 0);
+ }
+ if (REG_P (dest))
+ {
+ start_sequence ();
+ emit_move_insn (cheap, copy_rtx (dest));
+ restore = get_insns ();
+ end_sequence ();
+ lra_process_new_insns (curr_insn, NULL, restore,
+ "Inserting call parameter
restore");
+ /* We don't need to save/restore of the pseudo from
+ this call. */
+ usage_insns[regno].calls_num = calls_num;
+ bitmap_set_bit (&check_only_regs, regno);
+ }
}
}
The first patch is safer but the second one is ok too. I have no
particular preferences. Whatever we choose, analogous code in
caller-saves.c should be changed too.