Hello
This patch backports to the OG9 branch the following patches that have
already been committed to the GCC mainline:
Support using multiple registers to hold the frame pointer
[LRA] Do not use eliminable registers for spilling
[amdgcn] Use first lane of v1 for zero offset
[amdgcn] Reinitialize registers for every function
[amdgcn] Restrict registers available to non-kernel functions
[amdgcn] Update lower bounds for the number of registers in non-leaf kernels
[amdgcn] Unfix registers for frame pointer
[amdgcn] Fix handling of VCC_CONDITIONAL_REG
Check suitability of spill register for mode
I will commit this later.
Thanks,
Kwok
2019-11-07 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* ira.c (setup_alloc_regs): Setup no_unit_alloc_regs for
frame pointer in multiple registers.
(ira_setup_eliminable_regset): Setup eliminable_regset,
ira_no_alloc_regs and regs_ever_live for frame pointer in
multiple registers.
2019-11-10 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* lra-spills.c (assign_spill_hard_regs): Do not spill into
registers in eliminable_regset.
2019-11-14 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* lra-spills.c (assign_spill_hard_regs): Check that the spill
register is suitable for the mode.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (gcn_regno_reg_class): Return VCC_CONDITIONAL_REG
register class for VCC_LO and VCC_HI.
(gcn_spill_class): Use SGPR_REGS to spill registers in
VCC_CONDITIONAL_REG.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (gcn_expand_prologue): Remove initialization and
prologue use of v0.
(print_operand_address): Use v1 for zero vector offset.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (gcn_init_cumulative_args): Call reinit_regs.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (default_requested_args): New.
(gcn_parse_amdgpu_hsa_kernel_attribute): Initialize requested args
set with default_requested_args.
(gcn_conditional_register_usage): Limit register usage of non-kernel
functions. Reassign fixed registers if a non-standard set of args is
requested.
* config/gcn/gcn.h (FIXED_REGISTERS): Fix registers according to ABI.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (MAX_NORMAL_SGPR_COUNT, MAX_NORMAL_VGPR_COUNT): New.
(gcn_conditional_register_usage): Use constants in place of hard-coded
values.
(gcn_hsa_declare_function_name): Set lower bound for number of
SGPRs/VGPRs in non-leaf kernels to MAX_NORMAL_SGPR_COUNT and
MAX_NORMAL_VGPR_COUNT.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.h (FIXED_REGISTERS): Unfix frame pointer.
(CALL_USED_REGISTERS): Make frame pointer callee-saved.
---
gcc/config/gcn/gcn.c | 104
++++++++++++++++++++++++++++-----------------------
gcc/config/gcn/gcn.h | 8 ++--
gcc/ira.c | 33 +++++++++-------
gcc/lra-spills.c | 3 ++
4 files changed, 85 insertions(+), 63 deletions(-)
From 20245bf2e0b8ac258f86e6495b3be3e09edd0181 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <k...@codesourcery.com>
Date: Mon, 18 Nov 2019 13:26:50 -0800
Subject: [PATCH] [og9] Backport AMD GCN backend improvements from mainline
2019-11-07 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* ira.c (setup_alloc_regs): Setup no_unit_alloc_regs for
frame pointer in multiple registers.
(ira_setup_eliminable_regset): Setup eliminable_regset,
ira_no_alloc_regs and regs_ever_live for frame pointer in
multiple registers.
2019-11-10 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* lra-spills.c (assign_spill_hard_regs): Do not spill into
registers in eliminable_regset.
2019-11-14 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* lra-spills.c (assign_spill_hard_regs): Check that the spill
register is suitable for the mode.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (gcn_regno_reg_class): Return VCC_CONDITIONAL_REG
register class for VCC_LO and VCC_HI.
(gcn_spill_class): Use SGPR_REGS to spill registers in
VCC_CONDITIONAL_REG.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (gcn_expand_prologue): Remove initialization and
prologue use of v0.
(print_operand_address): Use v1 for zero vector offset.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (gcn_init_cumulative_args): Call reinit_regs.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (default_requested_args): New.
(gcn_parse_amdgpu_hsa_kernel_attribute): Initialize requested args
set with default_requested_args.
(gcn_conditional_register_usage): Limit register usage of non-kernel
functions. Reassign fixed registers if a non-standard set of args is
requested.
* config/gcn/gcn.h (FIXED_REGISTERS): Fix registers according to ABI.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (MAX_NORMAL_SGPR_COUNT, MAX_NORMAL_VGPR_COUNT): New.
(gcn_conditional_register_usage): Use constants in place of hard-coded
values.
(gcn_hsa_declare_function_name): Set lower bound for number of
SGPRs/VGPRs in non-leaf kernels to MAX_NORMAL_SGPR_COUNT and
MAX_NORMAL_VGPR_COUNT.
2019-11-15 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.h (FIXED_REGISTERS): Unfix frame pointer.
(CALL_USED_REGISTERS): Make frame pointer callee-saved.
---
gcc/config/gcn/gcn.c | 104 ++++++++++++++++++++++++++++-----------------------
gcc/config/gcn/gcn.h | 8 ++--
gcc/ira.c | 33 +++++++++-------
gcc/lra-spills.c | 3 ++
4 files changed, 85 insertions(+), 63 deletions(-)
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 2835a3d..f556ffe 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -75,6 +75,12 @@ int gcn_isa = 3; /* Default to GCN3. */
#define LDS_SIZE 65536
+/* The number of registers usable by normal non-kernel functions.
+ The SGPR count includes any special extra registers such as VCC. */
+
+#define MAX_NORMAL_SGPR_COUNT 64
+#define MAX_NORMAL_VGPR_COUNT 24
+
/* }}} */
/* {{{ Initialization and options. */
@@ -191,6 +197,17 @@ static const struct gcn_kernel_arg_type
{"work_item_id_Z", NULL, V64SImode, FIRST_VGPR_REG + 2}
};
+static const long default_requested_args
+ = (1 << PRIVATE_SEGMENT_BUFFER_ARG)
+ | (1 << DISPATCH_PTR_ARG)
+ | (1 << QUEUE_PTR_ARG)
+ | (1 << KERNARG_SEGMENT_PTR_ARG)
+ | (1 << PRIVATE_SEGMENT_WAVE_OFFSET_ARG)
+ | (1 << WORKGROUP_ID_X_ARG)
+ | (1 << WORK_ITEM_ID_X_ARG)
+ | (1 << WORK_ITEM_ID_Y_ARG)
+ | (1 << WORK_ITEM_ID_Z_ARG);
+
/* Extract parameter settings from __attribute__((amdgpu_hsa_kernel ())).
This function also sets the default values for some arguments.
@@ -201,10 +218,7 @@ gcn_parse_amdgpu_hsa_kernel_attribute (struct
gcn_kernel_args *args,
tree list)
{
bool err = false;
- args->requested = ((1 << PRIVATE_SEGMENT_BUFFER_ARG)
- | (1 << QUEUE_PTR_ARG)
- | (1 << KERNARG_SEGMENT_PTR_ARG)
- | (1 << PRIVATE_SEGMENT_WAVE_OFFSET_ARG));
+ args->requested = default_requested_args;
args->nargs = 0;
for (int a = 0; a < GCN_KERNEL_ARG_TYPES; a++)
@@ -242,8 +256,6 @@ gcn_parse_amdgpu_hsa_kernel_attribute (struct
gcn_kernel_args *args,
args->requested |= (1 << a);
args->order[args->nargs++] = a;
}
- args->requested |= (1 << WORKGROUP_ID_X_ARG);
- args->requested |= (1 << WORK_ITEM_ID_Z_ARG);
/* Requesting WORK_ITEM_ID_Z_ARG implies requesting WORK_ITEM_ID_X_ARG and
WORK_ITEM_ID_Y_ARG. Similarly, requesting WORK_ITEM_ID_Y_ARG implies
@@ -253,10 +265,6 @@ gcn_parse_amdgpu_hsa_kernel_attribute (struct
gcn_kernel_args *args,
if (args->requested & (1 << WORK_ITEM_ID_Y_ARG))
args->requested |= (1 << WORK_ITEM_ID_X_ARG);
- /* Always enable this so that kernargs is in a predictable place for
- gomp_print, etc. */
- args->requested |= (1 << DISPATCH_PTR_ARG);
-
int sgpr_regno = FIRST_SGPR_REG;
args->nsgprs = 0;
for (int a = 0; a < GCN_KERNEL_ARG_TYPES; a++)
@@ -462,6 +470,9 @@ gcn_regno_reg_class (int regno)
{
case SCC_REG:
return SCC_CONDITIONAL_REG;
+ case VCC_LO_REG:
+ case VCC_HI_REG:
+ return VCC_CONDITIONAL_REG;
case VCCZ_REG:
return VCCZ_CONDITIONAL_REG;
case EXECZ_REG:
@@ -629,7 +640,8 @@ gcn_can_split_p (machine_mode, rtx op)
static reg_class_t
gcn_spill_class (reg_class_t c, machine_mode /*mode */ )
{
- if (reg_classes_intersect_p (ALL_CONDITIONAL_REGS, c))
+ if (reg_classes_intersect_p (ALL_CONDITIONAL_REGS, c)
+ || c == VCC_CONDITIONAL_REG)
return SGPR_REGS;
else
return NO_REGS;
@@ -2040,27 +2052,36 @@ gcn_secondary_reload (bool in_p, rtx x, reg_class_t
rclass,
static void
gcn_conditional_register_usage (void)
{
- int i;
+ if (!cfun || !cfun->machine)
+ return;
- /* FIXME: Do we need to reset fixed_regs? */
+ if (cfun->machine->normal_function)
+ {
+ /* Restrict the set of SGPRs and VGPRs used by non-kernel functions. */
+ for (int i = SGPR_REGNO (MAX_NORMAL_SGPR_COUNT - 2);
+ i <= LAST_SGPR_REG; i++)
+ fixed_regs[i] = 1, call_used_regs[i] = 1;
-/* Limit ourselves to 1/16 the register file for maximimum sized workgroups.
- There are enough SGPRs not to limit those.
- TODO: Adjust this more dynamically. */
- for (i = FIRST_VGPR_REG + 64; i <= LAST_VGPR_REG; i++)
- fixed_regs[i] = 1, call_used_regs[i] = 1;
+ for (int i = VGPR_REGNO (MAX_NORMAL_VGPR_COUNT);
+ i <= LAST_VGPR_REG; i++)
+ fixed_regs[i] = 1, call_used_regs[i] = 1;
- if (!cfun || !cfun->machine || cfun->machine->normal_function)
- {
- /* Normal functions can't know what kernel argument registers are
- live, so just fix the bottom 16 SGPRs, and bottom 3 VGPRs. */
- for (i = 0; i < 16; i++)
- fixed_regs[FIRST_SGPR_REG + i] = 1;
- for (i = 0; i < 3; i++)
- fixed_regs[FIRST_VGPR_REG + i] = 1;
return;
}
+ /* If the set of requested args is the default set, nothing more needs to
+ be done. */
+ if (cfun->machine->args.requested == default_requested_args)
+ return;
+
+ /* Requesting a set of args different from the default violates the ABI. */
+ if (!leaf_function_p ())
+ warning (0, "A non-default set of initial values has been requested, "
+ "which violates the ABI!");
+
+ for (int i = SGPR_REGNO (0); i < SGPR_REGNO (14); i++)
+ fixed_regs[i] = 0;
+
/* Fix the runtime argument register containing values that may be
needed later. DISPATCH_PTR_ARG and FLAT_SCRATCH_* should not be
needed after the prologue so there's no need to fix them. */
@@ -2068,10 +2089,10 @@ gcn_conditional_register_usage (void)
fixed_regs[cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG]] = 1;
if (cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
{
+ /* The upper 32-bits of the 64-bit descriptor are not used, so allow
+ the containing registers to be used for other purposes. */
fixed_regs[cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]] = 1;
fixed_regs[cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] + 1] = 1;
- fixed_regs[cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] + 2] = 1;
- fixed_regs[cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] + 3] = 1;
}
if (cfun->machine->args.reg[KERNARG_SEGMENT_PTR_ARG] >= 0)
{
@@ -2441,6 +2462,8 @@ gcn_init_cumulative_args (CUMULATIVE_ARGS *cum /*
Argument info to init */ ,
cfun->machine->args = cum->args;
if (!caller && cfun->machine->normal_function)
gcn_detect_incoming_pointer_arg (fndecl);
+
+ reinit_regs ();
}
static bool
@@ -2776,15 +2799,6 @@ gcn_expand_prologue ()
cfun->machine->args.
reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG]);
- if (TARGET_GCN5_PLUS)
- {
- /* v0 is reserved for constant zero so that "global"
- memory instructions can have a nul-offset without
- causing reloads. */
- emit_insn (gen_vec_duplicatev64si
- (gen_rtx_REG (V64SImode, VGPR_REGNO (0)), const0_rtx));
- }
-
if (cfun->machine->args.requested & (1 << FLAT_SCRATCH_INIT_ARG))
{
rtx fs_init_lo =
@@ -2843,8 +2857,6 @@ gcn_expand_prologue ()
gen_int_mode (LDS_SIZE, SImode));
emit_insn (gen_prologue_use (gen_rtx_REG (SImode, M0_REG)));
- if (TARGET_GCN5_PLUS)
- emit_insn (gen_prologue_use (gen_rtx_REG (SImode, VGPR_REGNO (0))));
if (cfun && cfun->machine && !cfun->machine->normal_function && flag_openmp)
{
@@ -4876,10 +4888,10 @@ gcn_hsa_declare_function_name (FILE *file, const char
*name, tree)
if (!leaf_function_p ())
{
/* We can't know how many registers function calls might use. */
- if (vgpr < 64)
- vgpr = 64;
- if (sgpr + extra_regs < 102)
- sgpr = 102 - extra_regs;
+ if (vgpr < MAX_NORMAL_VGPR_COUNT)
+ vgpr = MAX_NORMAL_VGPR_COUNT;
+ if (sgpr + extra_regs < MAX_NORMAL_SGPR_COUNT)
+ sgpr = MAX_NORMAL_SGPR_COUNT - extra_regs;
}
/* GFX8 allocates SGPRs in blocks of 8.
@@ -5303,9 +5315,9 @@ print_operand_address (FILE *file, rtx mem)
/* The assembler requires a 64-bit VGPR pair here, even though
the offset should be only 32-bit. */
if (vgpr_offset == NULL_RTX)
- /* In this case, the vector offset is zero, so we use v0,
- which is initialized by the kernel prologue to zero. */
- fprintf (file, "v[0:1]");
+ /* In this case, the vector offset is zero, so we use the first
+ lane of v1, which is initialized to zero. */
+ fprintf (file, "v[1:2]");
else if (REG_P (vgpr_offset)
&& VGPR_REGNO_P (REGNO (vgpr_offset)))
{
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index b3b2d1a..e60b431 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -160,9 +160,9 @@
#define FIXED_REGISTERS { \
/* Scalars. */ \
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
+ 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, \
/* fp sp lr. */ \
- 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, \
+ 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, \
/* exec_save, cc_save */ \
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
@@ -180,7 +180,7 @@
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
/* VGRPs */ \
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
+ 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
@@ -203,7 +203,7 @@
#define CALL_USED_REGISTERS { \
/* Scalars. */ \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
+ 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
diff --git a/gcc/ira.c b/gcc/ira.c
index fd481d6..60e0b9b 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -516,7 +516,8 @@ setup_alloc_regs (bool use_hard_frame_p)
#endif
COPY_HARD_REG_SET (no_unit_alloc_regs, fixed_nonglobal_reg_set);
if (! use_hard_frame_p)
- SET_HARD_REG_BIT (no_unit_alloc_regs, HARD_FRAME_POINTER_REGNUM);
+ add_to_hard_reg_set (&no_unit_alloc_regs, Pmode,
+ HARD_FRAME_POINTER_REGNUM);
setup_class_hard_regs ();
}
@@ -2275,6 +2276,7 @@ ira_setup_eliminable_regset (void)
{
int i;
static const struct {const int from, to; } eliminables[] = ELIMINABLE_REGS;
+ int fp_reg_count = hard_regno_nregs (HARD_FRAME_POINTER_REGNUM, Pmode);
/* Setup is_leaf as frame_pointer_required may use it. This function
is called by sched_init before ira if scheduling is enabled. */
@@ -2303,7 +2305,8 @@ ira_setup_eliminable_regset (void)
frame pointer in LRA. */
if (frame_pointer_needed)
- df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM, true);
+ for (i = 0; i < fp_reg_count; i++)
+ df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
COPY_HARD_REG_SET (ira_no_alloc_regs, no_unit_alloc_regs);
CLEAR_HARD_REG_SET (eliminable_regset);
@@ -2333,17 +2336,21 @@ ira_setup_eliminable_regset (void)
}
if (!HARD_FRAME_POINTER_IS_FRAME_POINTER)
{
- if (!TEST_HARD_REG_BIT (crtl->asm_clobbers, HARD_FRAME_POINTER_REGNUM))
- {
- SET_HARD_REG_BIT (eliminable_regset, HARD_FRAME_POINTER_REGNUM);
- if (frame_pointer_needed)
- SET_HARD_REG_BIT (ira_no_alloc_regs, HARD_FRAME_POINTER_REGNUM);
- }
- else if (frame_pointer_needed)
- error ("%s cannot be used in asm here",
- reg_names[HARD_FRAME_POINTER_REGNUM]);
- else
- df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM, true);
+ for (i = 0; i < fp_reg_count; i++)
+ if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
+ HARD_FRAME_POINTER_REGNUM + i))
+ {
+ SET_HARD_REG_BIT (eliminable_regset,
+ HARD_FRAME_POINTER_REGNUM + i);
+ if (frame_pointer_needed)
+ SET_HARD_REG_BIT (ira_no_alloc_regs,
+ HARD_FRAME_POINTER_REGNUM + i);
+ }
+ else if (frame_pointer_needed)
+ error ("%s cannot be used in asm here",
+ reg_names[HARD_FRAME_POINTER_REGNUM + i]);
+ else
+ df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
}
}
diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c
index c19b76a..417d68c 100644
--- a/gcc/lra-spills.c
+++ b/gcc/lra-spills.c
@@ -283,6 +283,9 @@ assign_spill_hard_regs (int *pseudo_regnos, int n)
for (k = 0; k < spill_class_size; k++)
{
hard_regno = ira_class_hard_regs[spill_class][k];
+ if (TEST_HARD_REG_BIT (eliminable_regset, hard_regno)
+ || !targetm.hard_regno_mode_ok (hard_regno, mode))
+ continue;
if (! overlaps_hard_reg_set_p (conflict_hard_regs, mode, hard_regno))
break;
}
--
2.8.1