This patch adds the basic support for the PCREL_OPT optimization for loads. It is on the long side, because I needed to create the infrastructure for the support. It creates a new pass that is run just before final to see if it can find appropriate load external addresses and a single load/store using that address.
I have bootstrapped a compiler with this a little endian power8 system, and there were no regressions in the test suite. Can I check this into the FSF trunk once the patches it depends on from the V7 series have been checked in? 2019-11-15 Michael Meissner <meiss...@linux.ibm.com> * config/rs6000/pcrel-opt.c: New file to implement the PCREL_OPT optimization as a new pass. * config/rs6000/rs6000-passes.def: Add comment for the analyze swaps pass. Add new pass to do the PCREL_OPT optimization. * config/rs6000/rs6000-protos.h (enum non_prefixed_form): Add a new case to recognize memory that meets PCREL_OPT requirements. (reg_to_non_prefixed): New declaration. (make_pass_pcrel_opt): New declaration. * config/rs6000/rs6000.c (rs6000_option_override_internal): Add support for -mpcrel-opt. (rs6000_delegitimize_address): Convert PCREL_OPT unspec for GOT load back into a normal SYMBOL_REF. (print_operand): Add %r<n> to print the .reloc for PCREL_OPT. (rs6000_opt_masks): Add -mpcrel-opt. (address_to_insn_form): For addresses used with PCREL_OPT, only recognize addresses that can be used in a non-prefixed instruction. (reg_to_non_prefixed): Make global. * config/rs6000/rs6000.md (UNSPEC_PCREL_OPT_LD_GOT): New unspec. (UNSPEC_PCREL_OPT_LD_RELOC): New unspec. (pcrel_extern_addr): Make it a global insn. (PO mode iterator): New mode iterator for the PCREL_OPT optimization. (POV mode iterator): New mode iterator for the PCREL_OPT optimization. (pcrel_opt_ld_got<mode>, PO iterator): New insns for the PCREL_OPT optimization to load the address of an external symbol. (pcrel_opt_ld<mode>, QHSI iterator): New insns for the PCREL_OPT optimization to load the value of an external variable. (pcrel_opt_lddi): New insn for the PCREL_OPT optimization to load a DImode external variable. (pcrel_opt_ldsf): New insn for the PCREL_OPT optimization to load a SFmode external variable. (pcrel_opt_lddf): New insn for the PCREL_OPT optimization to load a DFmode external variable. (pcrel_opt_ld<mode>): New insns for the PCREL_OPT optimization to load external vector variables. * config/rs6000/rs6000.opt (-mpcrel-opt): New undocumented switch. * config/rs6000/t-rs6000 (pcrel-opt.o): Add build rules. * config.gcc (powerpc*-*-*): Add pcrel-opt.o. (rs6000*-*-*): Add pcrel-opt.o. Index: gcc/config/rs6000/pcrel-opt.c =================================================================== --- gcc/config/rs6000/pcrel-opt.c (revision 278311) +++ gcc/config/rs6000/pcrel-opt.c (working copy) @@ -0,0 +1,623 @@ +/* Subroutines used support the pc-relative linker optimization. + Copyright (C) 2019 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +/* This file implements a RTL pass that looks for pc-relative loads of the + address of an external variable using the PCREL_GOT relocation and a single + load that uses that GOT pointer. If that is found we create the PCREL_OPT + relocation to possibly convert: + + pld b,var@pcrel@got(0),1 + + # possibly other instructions that do not use the base register 'b' or + # the result register 'r'. + + lwz r,0(b) + + into: + + plwz r,var@pcrel(0),1 + + # possibly other instructions that do not use the base register 'b' or + # the result register 'r'. + + nop + + If the variable is not defined in the main program or the code using it is + not in the main program, the linker put the address in the .got section and + do: + + .section .got + .Lvar_got: .dword var + + .section .text + pld b,.Lvar_got@pcrel(0),1 + + # possibly other instructions that do not use the base register 'b' or + # the result register 'r'. + + lwz r,0(b) + + We only look for a single usage in the basic block where the GOT pointer is + loaded. Multiple uses or references in another basic block will force us to + not use the PCREL_OPT relocation. */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "rtl.h" +#include "tree.h" +#include "memmodel.h" +#include "expmed.h" +#include "optabs.h" +#include "recog.h" +#include "df.h" +#include "tm_p.h" +#include "ira.h" +#include "print-tree.h" +#include "varasm.h" +#include "explow.h" +#include "expr.h" +#include "output.h" +#include "tree-pass.h" +#include "rtx-vector-builder.h" +#include "print-rtl.h" +#include "insn-attr.h" +#include "insn-codes.h" + + +// Optimize pc-relative references +const pass_data pass_data_pcrel_opt = +{ + RTL_PASS, // type + "pcrel_opt", // name + OPTGROUP_NONE, // optinfo_flags + TV_NONE, // tv_id + 0, // properties_required + 0, // properties_provided + 0, // properties_destroyed + 0, // todo_flags_start + TODO_df_finish, // todo_flags_finish +}; + +// Maximum number of insns to scan between the load address and the load that +// uses that address. +const int MAX_PCREL_OPT_INSNS = 10; + +/* Next PCREL_OPT label number. */ +static unsigned int pcrel_opt_next_num; + +// Pass data structures +class pcrel_opt : public rtl_opt_pass +{ +private: + // Pass to look for insns loading the PC-relative GOT address and then + // possibly optimizing them. + unsigned int do_pcrel_opt_pass (function *); + + // Given the load of a PC-relative GOT address, optimize it. + void do_pcrel_opt_got_addr (rtx_insn *); + + // Optimize a particular PC-relative load + bool do_pcrel_opt_load (rtx_insn *, rtx_insn *); + + // Various counters + struct { + unsigned long gots; + unsigned long loads; + unsigned long load_separation[MAX_PCREL_OPT_INSNS+1]; + } counters; + +public: + pcrel_opt (gcc::context *ctxt) + : rtl_opt_pass (pass_data_pcrel_opt, ctxt) + {} + + ~pcrel_opt (void) + {} + + // opt_pass methods: + virtual bool gate (function *) + { + return TARGET_PCREL && TARGET_PCREL_OPT && optimize; + } + + virtual unsigned int execute (function *fun) + { + return do_pcrel_opt_pass (fun); + } + + opt_pass *clone () + { + return new pcrel_opt (m_ctxt); + } +}; + + +// Optimize a PC-relative load address to be used in a load. + +// If the sequence of insns is safe to use the PCREL_OPT optimization (i.e. no +// additional references to the address register, the address register dies at +// the load, and no references to the load), convert insns of the form: +// +// (set (reg:DI addr) +// (symbol_ref:DI "ext_symbol")) +// +// ... +// +// (set (reg:<MODE> value) +// (mem:<MODE> (reg:DI addr))) +// +// into: +// +// (parallel [(set (reg:DI addr) +// (unspec:DI [(symbol_ref:DI "ext_symbol") +// (const_int label_num)] +// UNSPEC_PCREL_OPT_LD_GOT)) +// (clobber (reg:<MODE> value))]) +// +// ... +// +// (parallel [(set (reg:<MODE>) +// (unspec:<MODE> [(mem:<MODE> (reg:DI addr)) +// (const_int label_num)] +// UNSPEC_PCREL_OPT_LD_RELOC)) +// (clobber (reg:DI addr))]) +// +// +// The UNSPEC_PCREL_OPT_LD_GOT insn will generate the load address plus a +// definition of a label (.Lpcrel<n>), while the UNSPEC_PCREL_OPT_LD_RELOC insn +// will generate the .reloc to tell the linker to tie the load address and load +// using that address together. +// +// pld b,ext_symbol@got@pcrel(0),1 +// .Lpcrel1: +// +// ... +// +// .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8) +// lwz r,0(b) +// +// If ext_symbol is defined in another object file in the main program and we +// are linking the main program, the linker will convert the above instructions +// to: +// +// plwz r,ext_symbol@got@pcrel(0),1 +// +// ... +// +// nop +// +// Return true if the PCREL_OPT load optimization succeeded. + +bool +pcrel_opt::do_pcrel_opt_load (rtx_insn *got_insn, // insn loading GOT + rtx_insn *load_insn) // insn using GOT +{ + rtx got_set = PATTERN (got_insn); + rtx got = SET_DEST (got_set); + rtx got_addr = SET_SRC (got_set); + rtx load_set = single_set (load_insn); + rtx reg = SET_DEST (load_set); + rtx mem = SET_SRC (load_set); + machine_mode reg_mode = GET_MODE (reg); + machine_mode mem_mode = GET_MODE (mem); + rtx mem_inner = mem; + unsigned int reg_regno = reg_or_subregno (reg); + + if (!MEM_P (mem_inner)) + return false; + + // If this is LFIWAX or similar instructions that are indexed only, we can't + // do the optimization. + enum non_prefixed_form non_prefixed = reg_to_non_prefixed (reg, mem_mode); + if (non_prefixed == NON_PREFIXED_X) + return false; + + // The optimization will only work on non-prefixed offsettable loads. + rtx addr = XEXP (mem_inner, 0); + enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed); + if (iform != INSN_FORM_BASE_REG + && iform != INSN_FORM_D + && iform != INSN_FORM_DS + && iform != INSN_FORM_DQ) + return false; + + // Allocate a new PC-relative label, and update the GOT insn. If the GOT + // register is not the same register being loaded, add a clobber just in case + // something runs after this pass. + // + // (parallel [(set (got) + // (unspec [(symbol_ref got_addr) + // (const_int label_num)] + // UNSPEC_PCREL_OPT_LD_GOT)) + // (clobber (reg))]) + + ++pcrel_opt_next_num; + unsigned int got_regno = reg_or_subregno (got); + rtx label_num = GEN_INT (pcrel_opt_next_num); + rtvec v_got = gen_rtvec (2, got_addr, label_num); + rtx got_unspec = gen_rtx_UNSPEC (Pmode, v_got, UNSPEC_PCREL_OPT_LD_GOT); + rtx got_new_set = gen_rtx_SET (got, got_unspec); + rtx got_clobber = gen_rtx_CLOBBER (VOIDmode, + (got_regno == reg_regno + ? gen_rtx_SCRATCH (reg_mode) + : reg)); + + PATTERN (got_insn) + = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, got_new_set, got_clobber)); + + // Revalidate the insn, backing out of the optimization if the insn is not + // supported. + INSN_CODE (got_insn) = recog (PATTERN (got_insn), got_insn, 0); + if (INSN_CODE (got_insn) < 0) + { + PATTERN (got_insn) = got_set; + INSN_CODE (got_insn) = recog (PATTERN (got_insn), got_insn, 0); + return false; + } + + // Update the load insn. Add an explicit clobber of the GOT register just in + // case something runs after this pass. + // + // (parallel [(set (reg) + // (unspec:<MODE> [(mem (got) + // (const_int label_num)] + // UNSPEC_PCREL_OPT_LD_RELOC)) + // (clobber (reg:DI got))]) + + rtvec v_load = gen_rtvec (2, mem_inner, label_num); + rtx new_load = gen_rtx_UNSPEC (GET_MODE (mem_inner), v_load, + UNSPEC_PCREL_OPT_LD_RELOC); + + rtx old_load_set = PATTERN (load_insn); + rtx new_load_set = gen_rtx_SET (reg, new_load); + rtx load_clobber = gen_rtx_CLOBBER (VOIDmode, + (got_regno == reg_regno + ? gen_rtx_SCRATCH (Pmode) + : got)); + PATTERN (load_insn) + = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, new_load_set, load_clobber)); + + // Revalidate the insn, backing out of the optimization if the insn is not + // supported. + + INSN_CODE (load_insn) = recog (PATTERN (load_insn), load_insn, 0); + if (INSN_CODE (load_insn) < 0) + { + PATTERN (got_insn) = got_set; + INSN_CODE (got_insn) = recog (PATTERN (got_insn), got_insn, 0); + + PATTERN (load_insn) = old_load_set; + INSN_CODE (load_insn) = recog (PATTERN (load_insn), load_insn, 0); + return false; + } + + return true; +} + + +/* Given an insn, find the next insn in the basic block. Stop if we find a the + end of a basic block, such as a label, call or jump, and return NULL. */ + +static rtx_insn * +next_active_insn_in_basic_block (rtx_insn *insn) +{ + insn = NEXT_INSN (insn); + + while (insn != NULL_RTX) + { + /* If the basic block ends or there is a jump of some kind, exit the + loop. */ + if (CALL_P (insn) + || JUMP_P (insn) + || JUMP_TABLE_DATA_P (insn) + || LABEL_P (insn) + || BARRIER_P (insn)) + return NULL; + + /* If this is a real insn, return it. */ + if (!insn->deleted () + && NONJUMP_INSN_P (insn) + && GET_CODE (PATTERN (insn)) != USE + && GET_CODE (PATTERN (insn)) != CLOBBER) + return insn; + + /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs. */ + insn = NEXT_INSN (insn); + } + + return NULL; +} + + +// Given an insn with that loads up a base register with the address of an +// external symbol (GOT address), see if we can optimize it with the PCREL_OPT +// optimization. + +void +pcrel_opt::do_pcrel_opt_got_addr (rtx_insn *got_insn) +{ + int num_insns = 0; + + // Do some basic validation. + rtx got_set = PATTERN (got_insn); + if (GET_CODE (got_set) != SET) + return; + + rtx got = SET_DEST (got_set); + rtx got_addr = SET_SRC (got_set); + + if (!base_reg_operand (got, Pmode) + || !pcrel_external_address (got_addr, Pmode)) + return; + + rtx_insn *insn = got_insn; + bool looping = true; + bool had_load = false; // whether intermediate insns had a load + bool had_store = false; // whether intermediate insns had a store + bool is_load = false; // whether the current insn is a load + bool is_store = false; // whether the current insn is a store + + // Check the following insns and see if it is a load or store that uses the + // GOT address. If we can't do the optimization, just return. + while (looping) + { + // Don't allow too many insns between the load of the GOT address and the + // eventual load or store. + if (++num_insns >= MAX_PCREL_OPT_INSNS) + return; + + insn = next_active_insn_in_basic_block (insn); + if (!insn) + return; + + // See if the current insn is a load or store + switch (get_attr_type (insn)) + { + // While load of the GOT register is a 'load' for scheduling + // purposes, it should be safe to allow other load GOTs between the + // load of the GOT address and the store using that address. + case TYPE_LOAD: + if (INSN_CODE (insn) == CODE_FOR_pcrel_extern_addr) + { + is_load = is_store = false; + break; + } + else + { + rtx set = single_set (insn); + if (set) + { + rtx src = SET_SRC (set); + if (GET_CODE (src) == UNSPEC + && XINT (src, 1) == UNSPEC_PCREL_OPT_LD_GOT) + { + is_load = is_store = false; + break; + } + } + } + /* fall through */ + + case TYPE_FPLOAD: + case TYPE_VECLOAD: + is_load = true; + is_store = false; + break; + + case TYPE_STORE: + case TYPE_FPSTORE: + case TYPE_VECSTORE: + is_load = false; + is_store = true; + break; + + // For a first pass, don't do the optimization through atomic + // operations. + case TYPE_LOAD_L: + case TYPE_STORE_C: + case TYPE_HTM: + case TYPE_HTMSIMPLE: + return; + + default: + is_load = is_store = false; + break; + } + + // If the GOT register was referenced, it must also die in the same insn. + if (reg_referenced_p (got, PATTERN (insn))) + { + if (!dead_or_set_p (insn, got)) + return; + + looping = false; + } + + // If it dies by being set without being referenced, exit. + else if (dead_or_set_p (insn, got)) + return; + + // If it isn't the insn we want, remember if there were loads or stores. + else + { + had_load |= is_load; + had_store |= is_store; + } + } + + // If the insn does not use the GOT pointer, or the GOT pointer does not die + // at this insn, we can't do the optimization. + if (!reg_referenced_p (got, PATTERN (insn)) || !dead_or_set_p (insn, got)) + return; + + // If the last insn is not a load, we can't do the optimization. If it is a + // load, get the register and memory. + rtx load_set = single_set (insn); + if (!load_set) + return; + + rtx reg = NULL_RTX; + rtx mem = NULL_RTX; + + // Get register and memory, and validate it. + if (is_load) + { + reg = SET_DEST (load_set); + mem = SET_SRC (load_set); + if (!MEM_P (mem)) + return; + + if (!REG_P (reg) && !SUBREG_P (reg)) + return; + + // If there were any stores in the insns between loading the GOT address + // and doing the load, turn off the optimization. + if (had_store) + return; + } + + else + return; + + machine_mode mode = GET_MODE (reg); + unsigned int regno = reg_or_subregno (reg); + unsigned int size = GET_MODE_SIZE (mode); + + // Eliminate various possiblies involving multiple instructions. + if (get_attr_length (insn) != 4) + return; + + if (size == 16 && !VSX_REGNO_P (regno)) + return; + + if (size > 16) + return; + + if (mode == TFmode && !TARGET_IEEEQUAD) + return; + + // If the register being loaded was used or set between the load of the GOT + // address and the load using the GOT address, we can't do the optimization. + if (reg_used_between_p (reg, got_insn, insn) + || reg_set_between_p (reg, got_insn, insn)) + return; + + // Process the load in detail + if (is_load) + { + if (do_pcrel_opt_load (got_insn, insn)) + { + counters.loads++; + counters.load_separation[num_insns-1]++; + } + } + + return; +} + + +// Optimize pcrel external variable references + +unsigned int +pcrel_opt::do_pcrel_opt_pass (function *fun) +{ + basic_block bb; + rtx_insn *insn, *curr_insn = 0; + + memset ((char *) &counters, '\0', sizeof (counters)); + + // Dataflow analysis for use-def chains. + df_set_flags (DF_RD_PRUNE_DEAD_DEFS); + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN); + df_note_add_problem (); + df_analyze (); + df_set_flags (DF_DEFER_INSN_RESCAN | DF_LR_RUN_DCE); + + // Look at each basic block to see if there is a load of an external + // variable's GOT address, and a single load using that GOT address. + FOR_ALL_BB_FN (bb, fun) + { + FOR_BB_INSNS_SAFE (bb, insn, curr_insn) + { + if (NONJUMP_INSN_P (insn) + && INSN_CODE (insn) == CODE_FOR_pcrel_extern_addr) + { + counters.gots++; + do_pcrel_opt_got_addr (insn); + } + } + } + + df_remove_problem (df_chain); + df_process_deferred_rescans (); + df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_LR_RUN_DCE); + df_chain_add_problem (DF_UD_CHAIN); + df_note_add_problem (); + df_analyze (); + + if (dump_file) + { + if (!counters.gots) + fprintf (dump_file, "\nNo external symbols were referenced\n"); + + else + { + fprintf (dump_file, + "\n# of loads of an address of an external symbol = %lu\n", + counters.gots); + + if (!counters.loads) + fprintf (dump_file, + "\nNo PCREL_OPT load optimizations were done\n"); + + else + { + fprintf (dump_file, "# of PCREL_OPT loads = %lu\n", + counters.loads); + + fprintf (dump_file, "# of adjacent PCREL_OPT loads = %lu\n", + counters.load_separation[0]); + + for (int i = 1; i < MAX_PCREL_OPT_INSNS; i++) + { + if (counters.load_separation[i]) + fprintf (dump_file, + "# of PCREL_OPT loads separated by %d insn%s = %lu\n", + i, (i == 1) ? "" : "s", + counters.load_separation[i]); + } + } + } + + fprintf (dump_file, "\n"); + } + + return 0; +} + + +rtl_opt_pass * +make_pass_pcrel_opt (gcc::context *ctxt) +{ + return new pcrel_opt (ctxt); +} Index: gcc/config/rs6000/rs6000-passes.def =================================================================== --- gcc/config/rs6000/rs6000-passes.def (revision 278287) +++ gcc/config/rs6000/rs6000-passes.def (working copy) @@ -24,4 +24,15 @@ along with GCC; see the file COPYING3. REPLACE_PASS (PASS, INSTANCE, TGT_PASS) */ + /* Pass to add the appropriate vector swaps on power8 little endian systems. + The power8 does not have instructions that automaticaly do the byte swaps + for loads and stores. */ INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps); + + /* Pass to do the PCREL_OPT optimization that combines the load of an + external symbol's address along with a single load or store using that + address as a base register. This pass should be the last pass before + final, so that it can make sure the address being loaded up dies in a + single reference, and it doesn't have to worry about something else using + the address. */ + INSERT_PASS_BEFORE (pass_final, 1, pass_pcrel_opt); Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 278287) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -183,11 +183,13 @@ enum non_prefixed_form { NON_PREFIXED_D, /* All 16-bits are valid. */ NON_PREFIXED_DS, /* Bottom 2 bits must be 0. */ NON_PREFIXED_DQ, /* Bottom 4 bits must be 0. */ - NON_PREFIXED_X /* No offset memory form exists. */ + NON_PREFIXED_X, /* No offset memory form exists. */ + NON_PREFIXED_PCREL_OPT /* Offset for PCREL_OPT optimizations. */ }; extern enum insn_form address_to_insn_form (rtx, machine_mode, enum non_prefixed_form); +extern enum non_prefixed_form reg_to_non_prefixed (rtx, machine_mode); extern bool prefixed_load_p (rtx_insn *); extern bool prefixed_store_p (rtx_insn *); extern bool prefixed_paddi_p (rtx_insn *); @@ -303,6 +305,7 @@ namespace gcc { class context; } class rtl_opt_pass; extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *); +extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *); extern bool rs6000_sum_of_two_registers_p (const_rtx expr); extern bool rs6000_quadword_masked_address_p (const_rtx exp); extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx); Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 278287) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -4213,7 +4213,7 @@ rs6000_option_override_internal (bool gl if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0) error ("%qs requires %qs", "-mpcrel", "-mcmodel=medium"); - rs6000_isa_flags &= ~OPTION_MASK_PCREL; + rs6000_isa_flags &= ~(OPTION_MASK_PCREL | OPTION_MASK_PCREL_OPT); } /* Enable defaults if desired. */ @@ -4227,7 +4227,8 @@ rs6000_option_override_internal (bool gl if (!explicit_pcrel && TARGET_PCREL_DEFAULT && TARGET_CMODEL == CMODEL_MEDIUM) - rs6000_isa_flags |= OPTION_MASK_PCREL; + rs6000_isa_flags |= (OPTION_MASK_PCREL + | OPTION_MASK_PCREL_OPT); } } @@ -4248,7 +4249,17 @@ rs6000_option_override_internal (bool gl if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0) error ("%qs requires %qs", "-mpcrel", "-mprefixed-addr"); - rs6000_isa_flags &= ~OPTION_MASK_PCREL; + rs6000_isa_flags &= ~(OPTION_MASK_PCREL + | OPTION_MASK_PCREL_OPT); + } + + /* Check -mfuture debug switches. */ + if (!TARGET_PCREL && TARGET_PCREL_OPT) + { + if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL_OPT) != 0) + error ("%qs requires %qs", "-mpcrel-opt", "-mpcrel"); + + rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT; } if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET) @@ -8379,7 +8390,9 @@ rs6000_delegitimize_address (rtx orig_x) { rtx x, y, offset; - if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR) + if (GET_CODE (orig_x) == UNSPEC + && (XINT (orig_x, 1) == UNSPEC_FUSION_GPR + || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_GOT)) orig_x = XVECEXP (orig_x, 0, 0); orig_x = delegitimize_mem_from_attrs (orig_x); @@ -13016,6 +13029,19 @@ print_operand (FILE *file, rtx x, int co fprintf (file, "%d", 128 >> (REGNO (x) - CR0_REGNO)); return; + case 'r': + /* X is a label number for the PCREL_OPT optimization. Emit the .reloc + to enable this optimization, unless the value is 0. */ + gcc_assert (CONST_INT_P (x)); + if (UINTVAL (x) != 0) + { + unsigned int label_num = UINTVAL (x); + fprintf (file, + ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t", + label_num, label_num); + } + return; + case 's': /* Low 5 bits of 32 - value */ if (! INT_P (x)) @@ -22950,6 +22976,7 @@ static struct rs6000_opt_mask const rs60 { "mulhw", OPTION_MASK_MULHW, false, true }, { "multiple", OPTION_MASK_MULTIPLE, false, true }, { "pcrel", OPTION_MASK_PCREL, false, true }, + { "pcrel-opt", OPTION_MASK_PCREL_OPT, false, true }, { "popcntb", OPTION_MASK_POPCNTB, false, true }, { "popcntd", OPTION_MASK_POPCNTD, false, true }, { "power8-fusion", OPTION_MASK_P8_FUSION, false, true }, @@ -24911,8 +24938,32 @@ address_to_insn_form (rtx addr, local. */ if (TARGET_PCREL) { + /* Special case the PCREL_OPT optimization to only allow offsets that can + fit in the worst case instruction format. This test is done before + register allocation, so we might miss an occasionally offset that + could have been used in some cases (for example, a SImode that isn't + sign extended or a floating point scalar that is loaded into the + traditional FPR registers instead of the traditional Altivec + registers). */ if (SYMBOL_REF_P (op0) && !SYMBOL_REF_LOCAL_P (op0)) - return INSN_FORM_PCREL_EXTERNAL; + { + if (non_prefixed_format == NON_PREFIXED_PCREL_OPT) + { + if (!SIGNED_16BIT_OFFSET_P (offset)) + return INSN_FORM_BAD; + + unsigned int size = GET_MODE_SIZE (mode); + if (size >= 16 && (offset & 0xf) != 0) + return INSN_FORM_BAD; + + /* SImode might be sign extended (DS format). SFmode and DFmode + might be loaded into Altivec registers (DS format). */ + if (size >= 4 && (offset & 0x2) != 0) + return INSN_FORM_BAD; + } + + return INSN_FORM_PCREL_EXTERNAL; + } if (SYMBOL_REF_P (op0) || LABEL_REF_P (op0)) return INSN_FORM_PCREL_LOCAL; @@ -24953,6 +25004,24 @@ address_to_insn_form (rtx addr, non_prefixed_format = NON_PREFIXED_D; } + /* If we are validating the load or store off of the external pointer, be + stricter in terms of the offset allowed. */ + else if (non_prefixed_format == NON_PREFIXED_PCREL_OPT) + { + unsigned int size = GET_MODE_SIZE (mode); + + if (size >= 16 && (offset & 0xf) != 0) + non_prefixed_format = NON_PREFIXED_DQ; + + /* SImode might be sign extended (DS format). SFmode and DFmode might be + loaded into Altivec registers (DS format). */ + else if (size >= 4 && (offset & 0x2) != 0) + non_prefixed_format = NON_PREFIXED_DS; + + else + non_prefixed_format = NON_PREFIXED_D; + } + /* Classify the D/DS/DQ-form addresses. */ switch (non_prefixed_format) { @@ -24992,7 +25061,7 @@ address_to_insn_form (rtx addr, /* Helper function to take a REG and a MODE and turn it into the non-prefixed instruction format (D/DS/DQ) used for offset memory. */ -static enum non_prefixed_form +enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode) { /* If it isn't a register, use the defaults. */ @@ -25199,7 +25268,14 @@ void rs6000_asm_output_opcode (FILE *stream) { if (next_insn_prefixed_p) - fprintf (stream, "p"); + { + fprintf (stream, "p"); + + /* Reset flag in case there are separate insn lines in the sequence, so + the 'p' is only emited for the first line. This shows up in + pcrel_opt_ld_got. */ + next_insn_prefixed_p = false; + } return; } Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 278287) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -150,6 +150,8 @@ (define_c_enum "unspec" UNSPEC_PLT16_HA UNSPEC_PLT16_LO UNSPEC_PLT_PCREL + UNSPEC_PCREL_OPT_LD_GOT + UNSPEC_PCREL_OPT_LD_RELOC ]) ;; @@ -10022,7 +10024,7 @@ (define_insn "*pcrel_local_addr" ;; to a PADDI. Otherwise, it will create a GOT address that is relocated by ;; the dynamic linker and loaded up. Print_operand_address will append a ;; @got@pcrel to the symbol. -(define_insn "*pcrel_extern_addr" +(define_insn "pcrel_extern_addr" [(set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_operand:DI 1 "pcrel_external_address"))] "TARGET_PCREL" @@ -14774,6 +14776,94 @@ (define_insn "*cmpeqb_internal" "cmpeqb %0,%1,%2" [(set_attr "type" "logical")]) +;; Modes that are supported for PCREL_OPT +(define_mode_iterator PO [QI HI SI DI TI SF DF KF + V1TI V2DI V4SI V8HI V16QI V2DF V4SF + (TF "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")]) + +;; Vector modes for PCREL_OPT +(define_mode_iterator POV [TI KF V1TI V2DI V4SI V8HI V16QI V2DF V4SF + (TF "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")]) + +;; Alternate form of pcrel_extern_addr used for the PCREL_OPT optimization for +;; loads. We need to put the label after the PLD instruction, because the +;; assembler might insert a NOP before the PLD for alignment. +(define_insn "pcrel_opt_ld_got<mode>" + [(set (match_operand:DI 0 "gpc_reg_operand" "=b") + (unspec:DI [(match_operand:DI 1 "pcrel_external_address") + (match_operand 2 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_GOT)) + (clobber (match_scratch:PO 3 "=rwaX"))] + "TARGET_PCREL_OPT" +{ + return (INTVAL (operands[2])) ? "ld %0,%a1\n.Lpcrel%2:" : "ld %0,%a1"; +} + [(set_attr "prefixed" "yes") + (set_attr "type" "load")]) + +;; Alternate form of the loads that include a marker to identify whether we can +;; do the PCREL_OPT optimization. +(define_insn "*pcrel_opt_ld<mode>" + [(set (match_operand:QHSI 0 "gpc_reg_operand" "=r") + (unspec:QHSI [(match_operand:QHSI 1 "non_prefixed_memory" "o") + (match_operand 2 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_scratch:DI 3 "=bX"))] + "TARGET_PCREL_OPT" + "%r2l<wd>z %0,%1" + [(set_attr "type" "load")]) + +(define_insn "*pcrel_opt_lddi" + [(set (match_operand:DI 0 "gpc_reg_operand" "=r,d,v") + (unspec:DI [(match_operand:DI 1 "non_prefixed_memory" "o,o,o") + (match_operand 2 "const_int_operand" "n,n,n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_scratch:DI 3 "=bX,bX,bX"))] + "TARGET_PCREL_OPT && TARGET_POWERPC64" + "@ + %r2ld %0,%1 + %r2lfd %0,%1 + %r2lxsd %0,%1" + [(set_attr "type" "load,fpload,fpload")]) + +(define_insn "*pcrel_opt_ldsf" + [(set (match_operand:SF 0 "gpc_reg_operand" "=d,v,r") + (unspec:SF [(match_operand:SF 1 "non_prefixed_memory" "o,o,o") + (match_operand 2 "const_int_operand" "n,n,n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_scratch:DI 3 "=bX,bX,bX"))] + "TARGET_PCREL_OPT" + "@ + %r2lfs %0,%1 + %r2lxssp %0,%1 + %r2lwz %0,%1" + [(set_attr "type" "fpload,fpload,load")]) + +(define_insn "*pcrel_opt_lddf" + [(set (match_operand:DF 0 "gpc_reg_operand" "=d,v,r") + (unspec:DF [(match_operand:DF 1 "non_prefixed_memory" "o,o,o") + (match_operand 2 "const_int_operand" "n,n,n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_scratch:DI 3 "=bX,bX,bX"))] + "TARGET_PCREL_OPT + && (TARGET_POWERPC64 || vsx_register_operand (operands[0], DFmode))" + "@ + %r2lfd %0,%1 + %r2lxsd %0,%1 + %r2ld %0,%1" + [(set_attr "type" "fpload,fpload,load")]) + +(define_insn "*pcrel_opt_ld<mode>" + [(set (match_operand:POV 0 "gpc_reg_operand" "=wa") + (unspec:POV [(match_operand:POV 1 "non_prefixed_memory" "o") + (match_operand 2 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_scratch:DI 3 "=bX"))] + "TARGET_PCREL_OPT" + "%r2lxv %x0,%1" + [(set_attr "type" "vecload")]) + + (include "sync.md") (include "vector.md") Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 278287) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -577,3 +577,7 @@ Generate (do not generate) prefixed memo mpcrel Target Report Mask(PCREL) Var(rs6000_isa_flags) Generate (do not generate) pc-relative memory addressing. + +mpcrel-opt +Target Undocumented Mask(PCREL_OPT) Var(rs6000_isa_flags) +Generate (do not generate) pc-relative memory optimizations for externals. Index: gcc/config/rs6000/t-rs6000 =================================================================== --- gcc/config/rs6000/t-rs6000 (revision 278287) +++ gcc/config/rs6000/t-rs6000 (working copy) @@ -47,6 +47,10 @@ rs6000-call.o: $(srcdir)/config/rs6000/r $(COMPILE) $< $(POSTCOMPILE) +pcrel-opt.o: $(srcdir)/config/rs6000/pcrel-opt.c + $(COMPILE) $< + $(POSTCOMPILE) + $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \ $(srcdir)/config/rs6000/rs6000-cpus.def $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \ Index: gcc/config.gcc =================================================================== --- gcc/config.gcc (revision 278287) +++ gcc/config.gcc (working copy) @@ -502,7 +502,7 @@ or1k*-*-*) ;; powerpc*-*-*) cpu_type=rs6000 - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o" + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o pcrel-opt.o" extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h" extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h" extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h" @@ -517,6 +517,7 @@ powerpc*-*-*) esac extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt" target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c \$(srcdir)/config/rs6000/rs6000-call.c" + target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c" ;; pru-*-*) cpu_type=pru @@ -528,8 +529,9 @@ riscv*) ;; rs6000*-*-*) extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt" - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o" + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o pcrel-opt.o" target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c \$(srcdir)/config/rs6000/rs6000-call.c" + target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c" ;; sparc*-*-*) cpu_type=sparc -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797