On Thu, 20 Jan 2011 17:27:14 -0800 Richard Henderson <r...@redhat.com> wrote:
> Depending on how Haskell programs are built, it may be better > to avoid the GOT entirely. E.g. > > -mcmodel=large > > a-la the x86_64 port. This generates full 64-bit absolute > relocations. For ia64 code this would look like > > movl r32 = foo# > > Offset Info Type Sym. Value Sym. Name + > Addend > 000000000002 000400000023 R_IA64_IMM64 0000000000000000 foo + 0 > > Of course, you wouldn't put this code into a shared library. > For that you really would want a 64-bit GPREL offset. E.g. > > movl r32 = @gprel(foo) > add r32 = r32, r1 > > Offset Info Type Sym. Value Sym. Name + > Addend > 000000000002 00040000002b R_IA64_GPREL64I 0000000000000000 foo + 0 > > Since both of these assemble now, really doubt there's any > binutils work that needs to be done. > > What you'd have to do is add some command-line switches (and perhaps > clean up the ones that are there wrt code models), and adjust the > code in ia64_expand_load_address to handle your new options. It really > shouldn't be very difficult. Aha. Sorry, I'm not familar with ia64 'instruction set'/'linux ABI' yet. So forgive me my silly questions. In the ia64_expand_load_address I found out -mauto-pic option, which looks magical. It seems to do _almost_ the thing I need. I've added commandline-switch handling (actually, stole from sparc's case) and tried to reuse existing TARGET_AUTO_PIC code. I've attached dirty patch. It has not very nice comments, tabs and spaces yet. As I understand, TARGET_AUTO_PIC, TARGET_CONSTANT_GP, TARGET_NO_PIC should somehow fall into different memory models. I don't get exact difference between them. i386 distincts code models in a more fine grained manner: CMODEL_* and CMODEL_*_PIC. Maybe, ia64 should have similar distinction? Something like: MEDIUM LARGE_CODE LARGE_DATA LARGE (CODE+DATA) and *_NO_PIC variants The patch is tested on the following code sample: extern void foo_fun(); extern int foo_var; void bar() { foo_fun(); foo_var = 4; } Result: a.o: Offset Info Type Sym. Value Sym. Name + Addend 000000000022 000900000049 R_IA64_PCREL21B 0000000000000000 foo_fun + 0 000000000031 000a00000086 R_IA64_LTOFF22X 0000000000000000 foo_var + 0 000000000040 000a00000087 R_IA64_LDXMOV 0000000000000000 foo_var + 0 a.o.large: Offset Info Type Sym. Value Sym. Name + Addend 000000000022 000900000049 R_IA64_PCREL21B 0000000000000000 foo_fun + 0 000000000042 000a0000002b R_IA64_GPREL64I 0000000000000000 foo_var + 0 There is yet one more theoretical problem. Or not so theoretical as GHC's binary is about 16MBs (and still growing). To actually follow the meaning of 'cmodel' word (code model) we should implement function calls to arbtrarily far offset. AFAIU R_IA64_PCREL21B won't let us make calls more, than 2^21 bundles away. (~16 megabytes up and down). What kind of calls should be emitted in this case? call_gp/call_value_gp? Thanks! -- Sergei
diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c index 1842555..41bd287 100644 --- a/gcc/config/ia64/ia64.c +++ b/gcc/config/ia64/ia64.c @@ -45,40 +45,43 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic-core.h" #include "sched-int.h" #include "timevar.h" #include "target.h" #include "target-def.h" #include "tm_p.h" #include "hashtab.h" #include "langhooks.h" #include "cfglayout.h" #include "gimple.h" #include "intl.h" #include "df.h" #include "debug.h" #include "params.h" #include "dbgcnt.h" #include "tm-constrs.h" #include "sel-sched.h" #include "reload.h" #include "dwarf2out.h" +/* Used code model */ +enum cmodel ia64_cmodel; + /* This is used for communication between ASM_OUTPUT_LABEL and ASM_OUTPUT_LABELREF. */ int ia64_asm_output_label = 0; /* Register names for ia64_expand_prologue. */ static const char * const ia64_reg_numbers[96] = { "r32", "r33", "r34", "r35", "r36", "r37", "r38", "r39", "r40", "r41", "r42", "r43", "r44", "r45", "r46", "r47", "r48", "r49", "r50", "r51", "r52", "r53", "r54", "r55", "r56", "r57", "r58", "r59", "r60", "r61", "r62", "r63", "r64", "r65", "r66", "r67", "r68", "r69", "r70", "r71", "r72", "r73", "r74", "r75", "r76", "r77", "r78", "r79", "r80", "r81", "r82", "r83", "r84", "r85", "r86", "r87", "r88", "r89", "r90", "r91", "r92", "r93", "r94", "r95", "r96", "r97", "r98", "r99", "r100","r101","r102","r103", "r104","r105","r106","r107","r108","r109","r110","r111", "r112","r113","r114","r115","r116","r117","r118","r119", "r120","r121","r122","r123","r124","r125","r126","r127"}; /* ??? These strings could be shared with REGISTER_NAMES. */ @@ -1022,41 +1025,45 @@ ia64_cannot_force_const_mem (rtx x) /* Expand a symbolic constant load. */ bool ia64_expand_load_address (rtx dest, rtx src) { gcc_assert (GET_CODE (dest) == REG); /* ILP32 mode still loads 64-bits of data from the GOT. This avoids having to pointer-extend the value afterward. Other forms of address computation below are also more natural to compute as 64-bit quantities. If we've been given an SImode destination register, change it. */ if (GET_MODE (dest) != Pmode) dest = gen_rtx_REG_offset (dest, Pmode, REGNO (dest), byte_lowpart_offset (Pmode, GET_MODE (dest))); if (TARGET_NO_PIC) return false; if (small_addr_symbolic_operand (src, VOIDmode)) return false; - if (TARGET_AUTO_PIC) + /* TODO: + * CMODEL_LARGE && TARGET_NO_PIC should use 'movl r32 = foo#' + * (R_IA64_IMM64) + */ + if (TARGET_AUTO_PIC || ia64_cmodel == CMODEL_LARGE) emit_insn (gen_load_gprel64 (dest, src)); else if (GET_CODE (src) == SYMBOL_REF && SYMBOL_REF_FUNCTION_P (src)) emit_insn (gen_load_fptr (dest, src)); else if (sdata_symbolic_operand (src, VOIDmode)) emit_insn (gen_load_gprel (dest, src)); else { HOST_WIDE_INT addend = 0; rtx tmp; /* We did split constant offsets in ia64_expand_move, and we did try to keep them split in move_operand, but we also allowed reload to rematerialize arbitrary constants rather than spill the value to the stack and reload it. So we have to be prepared here to split them apart again. */ if (GET_CODE (src) == CONST) { HOST_WIDE_INT hi, lo; hi = INTVAL (XEXP (XEXP (src, 0), 1)); @@ -5758,40 +5765,64 @@ ia64_handle_option (size_t code, const char *arg, int value) if (!strcmp (arg, processor_alias_table[i].name)) { ia64_tune = processor_alias_table[i].processor; break; } if (i == pta_size) error ("bad value %<%s%> for -mtune= switch", arg); return true; } default: return true; } } /* Implement TARGET_OPTION_OVERRIDE. */ static void ia64_option_override (void) { + static struct code_model { + const char *const name; + const enum cmodel value; + } const cmodels[] = { + { "medium", CMODEL_MEDIUM }, + { "large", CMODEL_LARGE }, + { NULL, (enum cmodel) 0 } + }; + const struct code_model *cmodel; + + /* Code model selection. */ + ia64_cmodel = CMODEL_MEDIUM; + + if (ia64_cmodel_string != NULL) + { + for (cmodel = &cmodels[0]; cmodel->name; cmodel++) + if (strcmp (ia64_cmodel_string, cmodel->name) == 0) + break; + if (cmodel->name == NULL) + error ("bad value (%s) for -mcmodel= switch", ia64_cmodel_string); + else + ia64_cmodel = cmodel->value; + } + if (TARGET_AUTO_PIC) target_flags |= MASK_CONST_GP; /* Numerous experiment shows that IRA based loop pressure calculation works better for RTL loop invariant motion on targets with enough (>= 32) registers. It is an expensive optimization. So it is on only for peak performance. */ if (optimize >= 3) flag_ira_loop_pressure = 1; ia64_section_threshold = (global_options_set.x_g_switch_value ? g_switch_value : IA64_DEFAULT_GVALUE); init_machine_status = ia64_init_machine_status; if (align_functions <= 0) align_functions = 64; if (align_loops <= 0) @@ -10855,46 +10886,50 @@ ia64_output_function_profiler (FILE *file, int labelno) { gcc_assert (STATIC_CHAIN_REGNUM == 15); indirect_call = true; } else indirect_call = false; if (TARGET_GNU_AS) fputs ("\t.prologue 4, r40\n", file); else fputs ("\t.prologue\n\t.save ar.pfs, r40\n", file); fputs ("\talloc out0 = ar.pfs, 8, 0, 4, 0\n", file); if (NO_PROFILE_COUNTERS) fputs ("\tmov out3 = r0\n", file); else { char buf[20]; ASM_GENERATE_INTERNAL_LABEL (buf, "LP", labelno); - if (TARGET_AUTO_PIC) - fputs ("\tmovl out3 = @gprel(", file); + /* TODO: + * CMODEL_LARGE && TARGET_NO_PIC should use 'movl r32 = foo#' + * (R_IA64_IMM64) + */ + if (TARGET_AUTO_PIC || ia64_cmodel == CMODEL_LARGE) + fputs ("\tmovl out3 = @gprel(", file); else fputs ("\taddl out3 = @ltoff(", file); assemble_name (file, buf); - if (TARGET_AUTO_PIC) + if (TARGET_AUTO_PIC || ia64_cmodel == CMODEL_LARGE) fputs (")\n", file); else fputs ("), r1\n", file); } if (indirect_call) fputs ("\taddl r14 = @ltoff(@fptr(_mcount)), r1\n", file); fputs ("\t;;\n", file); fputs ("\t.save rp, r42\n", file); fputs ("\tmov out2 = b0\n", file); if (indirect_call) fputs ("\tld8 r14 = [r14]\n\t;;\n", file); fputs ("\t.body\n", file); fputs ("\tmov out1 = r1\n", file); if (indirect_call) { fputs ("\tld8 r16 = [r14], 8\n\t;;\n", file); fputs ("\tmov b6 = r16\n", file); fputs ("\tld8 r1 = [r14]\n", file); diff --git a/gcc/config/ia64/ia64.h b/gcc/config/ia64/ia64.h index 8e6d298..23869fb 100644 --- a/gcc/config/ia64/ia64.h +++ b/gcc/config/ia64/ia64.h @@ -66,40 +66,48 @@ extern unsigned int ia64_section_threshold; #define TARGET_HAVE_TLS true #endif #define TARGET_TLS14 (ia64_tls_size == 14) #define TARGET_TLS22 (ia64_tls_size == 22) #define TARGET_TLS64 (ia64_tls_size == 64) #define TARGET_HPUX 0 #define TARGET_HPUX_LD 0 #define TARGET_ABI_OPEN_VMS 0 #ifndef TARGET_ILP32 #define TARGET_ILP32 0 #endif #ifndef HAVE_AS_LTOFFX_LDXMOV_RELOCS #define HAVE_AS_LTOFFX_LDXMOV_RELOCS 0 #endif +enum cmodel { + CMODEL_MEDIUM, /* GOT entries are limited by 21 bits (4MB) */ + CMODEL_LARGE /* no assumptions on address and data space sizes */ +}; + +/* used memory model */ +extern enum cmodel ia64_cmodel; + /* Values for TARGET_INLINE_FLOAT_DIV, TARGET_INLINE_INT_DIV, and TARGET_INLINE_SQRT. */ enum ia64_inline_type { INL_NO = 0, INL_MIN_LAT = 1, INL_MAX_THR = 2 }; /* Default target_flags if no switches are specified */ #ifndef TARGET_DEFAULT #define TARGET_DEFAULT (MASK_DWARF2_ASM) #endif #ifndef TARGET_CPU_DEFAULT #define TARGET_CPU_DEFAULT 0 #endif diff --git a/gcc/config/ia64/ia64.opt b/gcc/config/ia64/ia64.opt index 49d099a..1535556 100644 --- a/gcc/config/ia64/ia64.opt +++ b/gcc/config/ia64/ia64.opt @@ -42,40 +42,44 @@ Use in/loc/out register names mno-sdata Target Report RejectNegative Mask(NO_SDATA) msdata Target Report RejectNegative InverseMask(NO_SDATA) Enable use of sdata/scommon/sbss mno-pic Target Report RejectNegative Mask(NO_PIC) Generate code without GP reg mconstant-gp Target Report RejectNegative Mask(CONST_GP) gp is constant (but save/restore gp on indirect calls) mauto-pic Target Report RejectNegative Mask(AUTO_PIC) Generate self-relocatable code +mcmodel= +Target RejectNegative Joined Var(ia64_cmodel_string) +Use given ia64 code model + minline-float-divide-min-latency Target Report RejectNegative Var(TARGET_INLINE_FLOAT_DIV, 1) Generate inline floating point division, optimize for latency minline-float-divide-max-throughput Target Report RejectNegative Var(TARGET_INLINE_FLOAT_DIV, 2) Init(2) Generate inline floating point division, optimize for throughput mno-inline-float-divide Target Report RejectNegative Var(TARGET_INLINE_FLOAT_DIV, 0) minline-int-divide-min-latency Target Report RejectNegative Var(TARGET_INLINE_INT_DIV, 1) Generate inline integer division, optimize for latency minline-int-divide-max-throughput Target Report RejectNegative Var(TARGET_INLINE_INT_DIV, 2) Generate inline integer division, optimize for throughput mno-inline-int-divide diff --git a/gcc/config/ia64/predicates.md b/gcc/config/ia64/predicates.md index e06c521..bf227b7 100644 --- a/gcc/config/ia64/predicates.md +++ b/gcc/config/ia64/predicates.md @@ -102,41 +102,41 @@ op = XEXP (op, 0); if (GET_CODE (op) != PLUS || GET_CODE (XEXP (op, 0)) != SYMBOL_REF || GET_CODE (XEXP (op, 1)) != CONST_INT) return false; op = XEXP (op, 0); /* FALLTHRU */ case SYMBOL_REF: return SYMBOL_REF_SMALL_ADDR_P (op); default: gcc_unreachable (); } }) ;; True if OP refers to a symbol with which we may use any offset. (define_predicate "any_offset_symbol_operand" (match_code "symbol_ref") { - if (TARGET_NO_PIC || TARGET_AUTO_PIC) + if (TARGET_NO_PIC || TARGET_AUTO_PIC || ia64_cmodel == CMODEL_LARGE) return true; if (SYMBOL_REF_SMALL_ADDR_P (op)) return true; if (SYMBOL_REF_FUNCTION_P (op)) return false; if (sdata_symbolic_operand (op, mode)) return true; return false; }) ;; True if OP refers to a symbol with which we may use 14-bit aligned offsets. ;; False if OP refers to a symbol with which we may not use any offset at any ;; time. (define_predicate "aligned_offset_symbol_operand" (and (match_code "symbol_ref") (match_test "! SYMBOL_REF_FUNCTION_P (op)"))) ;; True if OP refers to a symbol, and is appropriate for a GOT load. (define_predicate "got_symbolic_operand" (match_operand 0 "symbolic_operand" "")
signature.asc
Description: PGP signature