This patch adds the basic support for the PCREL_OPT optimization for loads.  It
is on the long side, because I needed to create the infrastructure for the
support.  It creates a new pass that is run just before final to see if it can
find appropriate load external addresses and a single load/store using that
address.

I have bootstrapped a compiler with this a little endian power8 system, and
there were no regressions in the test suite.  Can I check this into the FSF
trunk once the patches it depends on from the V7 series have been checked in?

2019-11-15  Michael Meissner  <meiss...@linux.ibm.com>

        * config/rs6000/pcrel-opt.c: New file to implement the PCREL_OPT
        optimization as a new pass.
        * config/rs6000/rs6000-passes.def: Add comment for the analyze
        swaps pass.  Add new pass to do the PCREL_OPT optimization.
        * config/rs6000/rs6000-protos.h (enum non_prefixed_form): Add a
        new case to recognize memory that meets PCREL_OPT requirements.
        (reg_to_non_prefixed): New declaration.
        (make_pass_pcrel_opt): New declaration.
        * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
        support for -mpcrel-opt.
        (rs6000_delegitimize_address): Convert PCREL_OPT unspec for GOT
        load back into a normal SYMBOL_REF.
        (print_operand): Add %r<n> to print the .reloc for PCREL_OPT.
        (rs6000_opt_masks): Add -mpcrel-opt.
        (address_to_insn_form): For addresses used with PCREL_OPT, only
        recognize addresses that can be used in a non-prefixed
        instruction.
        (reg_to_non_prefixed): Make global.
        * config/rs6000/rs6000.md (UNSPEC_PCREL_OPT_LD_GOT): New unspec.
        (UNSPEC_PCREL_OPT_LD_RELOC): New unspec.
        (pcrel_extern_addr): Make it a global insn.
        (PO mode iterator): New mode iterator for the PCREL_OPT
        optimization.
        (POV mode iterator): New mode iterator for the PCREL_OPT
        optimization.
        (pcrel_opt_ld_got<mode>, PO iterator): New insns for the PCREL_OPT
        optimization to load the address of an external symbol.
        (pcrel_opt_ld<mode>, QHSI iterator): New insns for the PCREL_OPT
        optimization to load the value of an external variable.
        (pcrel_opt_lddi): New insn for the PCREL_OPT optimization to load
        a DImode external variable.
        (pcrel_opt_ldsf): New insn for the PCREL_OPT optimization to load
        a SFmode external variable.
        (pcrel_opt_lddf): New insn for the PCREL_OPT optimization to load
        a DFmode external variable.
        (pcrel_opt_ld<mode>): New insns for the PCREL_OPT optimization to
        load external vector variables.
        * config/rs6000/rs6000.opt (-mpcrel-opt): New undocumented
        switch.
        * config/rs6000/t-rs6000 (pcrel-opt.o): Add build rules.
        * config.gcc (powerpc*-*-*): Add pcrel-opt.o.
        (rs6000*-*-*): Add pcrel-opt.o.

Index: gcc/config/rs6000/pcrel-opt.c
===================================================================
--- gcc/config/rs6000/pcrel-opt.c       (revision 278311)
+++ gcc/config/rs6000/pcrel-opt.c       (working copy)
@@ -0,0 +1,623 @@
+/* Subroutines used support the pc-relative linker optimization.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file implements a RTL pass that looks for pc-relative loads of the
+   address of an external variable using the PCREL_GOT relocation and a single
+   load that uses that GOT pointer.  If that is found we create the PCREL_OPT
+   relocation to possibly convert:
+
+       pld b,var@pcrel@got(0),1
+
+       # possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+       lwz r,0(b)
+
+   into:
+
+       plwz r,var@pcrel(0),1
+
+       # possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+       nop
+
+   If the variable is not defined in the main program or the code using it is
+   not in the main program, the linker put the address in the .got section and
+   do:
+
+       .section .got
+       .Lvar_got:      .dword var
+
+       .section .text
+       pld b,.Lvar_got@pcrel(0),1
+
+       # possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+       lwz r,0(b)
+       
+   We only look for a single usage in the basic block where the GOT pointer is
+   loaded.  Multiple uses or references in another basic block will force us to
+   not use the PCREL_OPT relocation.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "expmed.h"
+#include "optabs.h"
+#include "recog.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "tree-pass.h"
+#include "rtx-vector-builder.h"
+#include "print-rtl.h"
+#include "insn-attr.h"
+#include "insn-codes.h"
+
+
+// Optimize pc-relative references
+const pass_data pass_data_pcrel_opt =
+{
+  RTL_PASS,                    // type
+  "pcrel_opt",                 // name
+  OPTGROUP_NONE,               // optinfo_flags
+  TV_NONE,                     // tv_id
+  0,                           // properties_required
+  0,                           // properties_provided
+  0,                           // properties_destroyed
+  0,                           // todo_flags_start
+  TODO_df_finish,              // todo_flags_finish
+};
+
+// Maximum number of insns to scan between the load address and the load that
+// uses that address.
+const int MAX_PCREL_OPT_INSNS  = 10;
+
+/* Next PCREL_OPT label number.  */
+static unsigned int pcrel_opt_next_num;
+
+// Pass data structures
+class pcrel_opt : public rtl_opt_pass
+{
+private:
+  // Pass to look for insns loading the PC-relative GOT address and then
+  // possibly optimizing them.
+  unsigned int do_pcrel_opt_pass (function *);
+
+  // Given the load of a PC-relative GOT address, optimize it.
+  void do_pcrel_opt_got_addr (rtx_insn *);
+
+  // Optimize a particular PC-relative load
+  bool do_pcrel_opt_load (rtx_insn *, rtx_insn *);
+
+  // Various counters
+  struct {
+    unsigned long gots;
+    unsigned long loads;
+    unsigned long load_separation[MAX_PCREL_OPT_INSNS+1];
+  } counters;
+
+public:
+  pcrel_opt (gcc::context *ctxt)
+  : rtl_opt_pass (pass_data_pcrel_opt, ctxt)
+  {}
+
+  ~pcrel_opt (void)
+  {}
+
+  // opt_pass methods:
+  virtual bool gate (function *)
+  {
+    return TARGET_PCREL && TARGET_PCREL_OPT && optimize;
+  }
+
+  virtual unsigned int execute (function *fun)
+  {
+    return do_pcrel_opt_pass (fun);
+  }
+
+  opt_pass *clone ()
+  {
+    return new pcrel_opt (m_ctxt);
+  }
+};
+
+
+// Optimize a PC-relative load address to be used in a load.
+
+// If the sequence of insns is safe to use the PCREL_OPT optimization (i.e. no
+// additional references to the address register, the address register dies at
+// the load, and no references to the load), convert insns of the form:
+//
+//     (set (reg:DI addr)
+//          (symbol_ref:DI "ext_symbol"))
+//
+//     ...
+//
+//     (set (reg:<MODE> value)
+//          (mem:<MODE> (reg:DI addr)))
+//
+// into:
+//
+//     (parallel [(set (reg:DI addr)
+//                      (unspec:DI [(symbol_ref:DI "ext_symbol")
+//                                  (const_int label_num)]
+//                                 UNSPEC_PCREL_OPT_LD_GOT))
+//                 (clobber (reg:<MODE> value))])
+//
+//     ...
+//
+//     (parallel [(set (reg:<MODE>)
+//                      (unspec:<MODE> [(mem:<MODE> (reg:DI addr))
+//                                      (const_int label_num)]
+//                                     UNSPEC_PCREL_OPT_LD_RELOC))
+//                 (clobber (reg:DI addr))])
+//
+//
+// The UNSPEC_PCREL_OPT_LD_GOT insn will generate the load address plus a
+// definition of a label (.Lpcrel<n>), while the UNSPEC_PCREL_OPT_LD_RELOC insn
+// will generate the .reloc to tell the linker to tie the load address and load
+// using that address together.
+//
+//     pld b,ext_symbol@got@pcrel(0),1
+// .Lpcrel1:
+//
+//     ...
+//
+//     .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
+//     lwz r,0(b)
+//
+// If ext_symbol is defined in another object file in the main program and we
+// are linking the main program, the linker will convert the above instructions
+// to:
+//
+//     plwz r,ext_symbol@got@pcrel(0),1
+//
+//     ...
+//
+//     nop
+//
+// Return true if the PCREL_OPT load optimization succeeded.
+
+bool
+pcrel_opt::do_pcrel_opt_load (rtx_insn *got_insn,      // insn loading GOT
+                             rtx_insn *load_insn)      // insn using GOT
+{
+  rtx got_set = PATTERN (got_insn);
+  rtx got = SET_DEST (got_set);
+  rtx got_addr = SET_SRC (got_set);
+  rtx load_set = single_set (load_insn);
+  rtx reg = SET_DEST (load_set);
+  rtx mem = SET_SRC (load_set);
+  machine_mode reg_mode = GET_MODE (reg);
+  machine_mode mem_mode = GET_MODE (mem);
+  rtx mem_inner = mem;
+  unsigned int reg_regno = reg_or_subregno (reg);
+
+  if (!MEM_P (mem_inner))
+    return false;
+
+  // If this is LFIWAX or similar instructions that are indexed only, we can't
+  // do the optimization.
+  enum non_prefixed_form non_prefixed = reg_to_non_prefixed (reg, mem_mode);
+  if (non_prefixed == NON_PREFIXED_X)
+    return false;
+
+  // The optimization will only work on non-prefixed offsettable loads.
+  rtx addr = XEXP (mem_inner, 0);
+  enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed);
+  if (iform != INSN_FORM_BASE_REG
+      && iform != INSN_FORM_D
+      && iform != INSN_FORM_DS
+      && iform != INSN_FORM_DQ)
+    return false;
+
+  // Allocate a new PC-relative label, and update the GOT insn.  If the GOT
+  // register is not the same register being loaded, add a clobber just in case
+  // something runs after this pass.
+  //
+  // (parallel [(set (got)
+  //                 (unspec [(symbol_ref got_addr)
+  //                          (const_int label_num)]
+  //                         UNSPEC_PCREL_OPT_LD_GOT))
+  //            (clobber (reg))])
+
+  ++pcrel_opt_next_num;
+  unsigned int got_regno = reg_or_subregno (got);
+  rtx label_num = GEN_INT (pcrel_opt_next_num);
+  rtvec v_got = gen_rtvec (2, got_addr, label_num);
+  rtx got_unspec = gen_rtx_UNSPEC (Pmode, v_got, UNSPEC_PCREL_OPT_LD_GOT);
+  rtx got_new_set = gen_rtx_SET (got, got_unspec);
+  rtx got_clobber = gen_rtx_CLOBBER (VOIDmode,
+                                    (got_regno == reg_regno
+                                     ? gen_rtx_SCRATCH (reg_mode)
+                                     : reg));
+
+  PATTERN (got_insn)
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, got_new_set, got_clobber));
+
+  // Revalidate the insn, backing out of the optimization if the insn is not
+  // supported.
+  INSN_CODE (got_insn) = recog (PATTERN (got_insn), got_insn, 0);
+  if (INSN_CODE (got_insn) < 0)
+    {
+      PATTERN (got_insn) = got_set;
+      INSN_CODE (got_insn) = recog (PATTERN (got_insn), got_insn, 0);
+      return false;
+    }
+
+  // Update the load insn.  Add an explicit clobber of the GOT register just in
+  // case something runs after this pass.
+  //
+  // (parallel [(set (reg)
+  //                 (unspec:<MODE> [(mem (got)
+  //                                 (const_int label_num)]
+  //                                UNSPEC_PCREL_OPT_LD_RELOC))
+  //            (clobber (reg:DI got))])
+
+  rtvec v_load = gen_rtvec (2, mem_inner, label_num);
+  rtx new_load = gen_rtx_UNSPEC (GET_MODE (mem_inner), v_load,
+                                UNSPEC_PCREL_OPT_LD_RELOC);
+
+  rtx old_load_set = PATTERN (load_insn);
+  rtx new_load_set = gen_rtx_SET (reg, new_load);
+  rtx load_clobber = gen_rtx_CLOBBER (VOIDmode,
+                                     (got_regno == reg_regno
+                                      ? gen_rtx_SCRATCH (Pmode)
+                                      : got));
+  PATTERN (load_insn)
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, new_load_set, load_clobber));
+
+  // Revalidate the insn, backing out of the optimization if the insn is not
+  // supported.
+
+  INSN_CODE (load_insn) = recog (PATTERN (load_insn), load_insn, 0);
+  if (INSN_CODE (load_insn) < 0)
+    {
+      PATTERN (got_insn) = got_set;
+      INSN_CODE (got_insn) = recog (PATTERN (got_insn), got_insn, 0);
+
+      PATTERN (load_insn) = old_load_set;
+      INSN_CODE (load_insn) = recog (PATTERN (load_insn), load_insn, 0);
+      return false;
+    }
+
+  return true;
+}
+
+
+/* Given an insn, find the next insn in the basic block.  Stop if we find a the
+   end of a basic block, such as a label, call or jump, and return NULL.  */
+
+static rtx_insn *
+next_active_insn_in_basic_block (rtx_insn *insn)
+{
+  insn = NEXT_INSN (insn);
+
+  while (insn != NULL_RTX)
+    {
+      /* If the basic block ends or there is a jump of some kind, exit the
+        loop.  */
+      if (CALL_P (insn)
+         || JUMP_P (insn)
+         || JUMP_TABLE_DATA_P (insn)
+         || LABEL_P (insn)
+         || BARRIER_P (insn))
+       return NULL;
+
+      /* If this is a real insn, return it.  */
+      if (!insn->deleted ()
+         && NONJUMP_INSN_P (insn)
+         && GET_CODE (PATTERN (insn)) != USE
+         && GET_CODE (PATTERN (insn)) != CLOBBER)
+       return insn;
+
+      /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs.  */
+      insn = NEXT_INSN (insn);
+    }
+
+  return NULL;
+}
+
+
+// Given an insn with that loads up a base register with the address of an
+// external symbol (GOT address), see if we can optimize it with the PCREL_OPT
+// optimization.
+
+void
+pcrel_opt::do_pcrel_opt_got_addr (rtx_insn *got_insn)
+{
+  int num_insns = 0;
+
+  // Do some basic validation.
+  rtx got_set = PATTERN (got_insn);
+  if (GET_CODE (got_set) != SET)
+    return;
+
+  rtx got = SET_DEST (got_set);
+  rtx got_addr = SET_SRC (got_set);
+
+  if (!base_reg_operand (got, Pmode)
+      || !pcrel_external_address (got_addr, Pmode))
+    return;
+
+  rtx_insn *insn = got_insn;
+  bool looping = true;
+  bool had_load = false;       // whether intermediate insns had a load
+  bool had_store = false;      // whether intermediate insns had a store
+  bool is_load = false;                // whether the current insn is a load
+  bool is_store = false;       // whether the current insn is a store
+
+  // Check the following insns and see if it is a load or store that uses the
+  // GOT address.  If we can't do the optimization, just return.
+  while (looping)
+    {
+      // Don't allow too many insns between the load of the GOT address and the
+      // eventual load or store.
+      if (++num_insns >= MAX_PCREL_OPT_INSNS)
+       return;
+
+      insn = next_active_insn_in_basic_block (insn);
+      if (!insn)
+       return;
+
+      // See if the current insn is a load or store
+      switch (get_attr_type (insn))
+       {
+         // While load of the GOT register is a 'load' for scheduling
+         // purposes, it should be safe to allow other load GOTs between the
+         // load of the GOT address and the store using that address.
+       case TYPE_LOAD:
+         if (INSN_CODE (insn) == CODE_FOR_pcrel_extern_addr)
+           {
+             is_load = is_store = false;
+             break;
+           }
+         else
+           {
+             rtx set = single_set (insn);
+             if (set)
+               {
+                 rtx src = SET_SRC (set);
+                 if (GET_CODE (src) == UNSPEC
+                     && XINT (src, 1) == UNSPEC_PCREL_OPT_LD_GOT)
+                   {
+                     is_load = is_store = false;
+                     break;
+                   }
+               }
+           }
+         /* fall through */
+
+       case TYPE_FPLOAD:
+       case TYPE_VECLOAD:
+         is_load = true;
+         is_store = false;
+         break;
+
+       case TYPE_STORE:
+       case TYPE_FPSTORE:
+       case TYPE_VECSTORE:
+         is_load = false;
+         is_store = true;
+         break;
+
+         // For a first pass, don't do the optimization through atomic
+         // operations.
+       case TYPE_LOAD_L:
+       case TYPE_STORE_C:
+       case TYPE_HTM:
+       case TYPE_HTMSIMPLE:
+         return;
+
+       default:
+         is_load = is_store = false;
+         break;
+       }
+
+      // If the GOT register was referenced, it must also die in the same insn.
+      if (reg_referenced_p (got, PATTERN (insn)))
+       {
+         if (!dead_or_set_p (insn, got))
+           return;
+
+         looping = false;
+       }
+
+      // If it dies by being set without being referenced, exit.
+      else if (dead_or_set_p (insn, got))
+       return;
+
+      // If it isn't the insn we want, remember if there were loads or stores.
+      else
+       {
+         had_load |= is_load;
+         had_store |= is_store;
+       }
+    }
+
+  // If the insn does not use the GOT pointer, or the GOT pointer does not die
+  // at this insn, we can't do the optimization.
+  if (!reg_referenced_p (got, PATTERN (insn)) || !dead_or_set_p (insn, got))
+    return;
+
+  // If the last insn is not a load, we can't do the optimization.  If it is a
+  // load, get the register and memory.
+  rtx load_set = single_set (insn);
+  if (!load_set)
+    return;
+
+  rtx reg = NULL_RTX;
+  rtx mem = NULL_RTX;
+
+  // Get register and memory, and validate it.
+  if (is_load)
+    {
+      reg = SET_DEST (load_set);
+      mem = SET_SRC (load_set);
+      if (!MEM_P (mem))
+       return;
+
+      if (!REG_P (reg) && !SUBREG_P (reg))
+       return;
+
+      // If there were any stores in the insns between loading the GOT address
+      // and doing the load, turn off the optimization.
+      if (had_store)
+       return;
+    }
+
+  else
+    return;
+
+  machine_mode mode = GET_MODE (reg);
+  unsigned int regno = reg_or_subregno (reg);
+  unsigned int size = GET_MODE_SIZE (mode);
+
+  // Eliminate various possiblies involving multiple instructions.
+  if (get_attr_length (insn) != 4)
+    return;
+
+  if (size == 16 && !VSX_REGNO_P (regno))
+    return;
+
+  if (size > 16)
+    return;
+
+  if (mode == TFmode && !TARGET_IEEEQUAD)
+    return;
+
+  // If the register being loaded was used or set between the load of the GOT
+  // address and the load using the GOT address, we can't do the optimization.
+  if (reg_used_between_p (reg, got_insn, insn)
+      || reg_set_between_p (reg, got_insn, insn))
+    return;
+
+  // Process the load in detail
+  if (is_load)
+    {
+      if (do_pcrel_opt_load (got_insn, insn))
+       {
+         counters.loads++;
+         counters.load_separation[num_insns-1]++;
+       }
+    }
+
+  return;
+}
+
+
+// Optimize pcrel external variable references
+
+unsigned int
+pcrel_opt::do_pcrel_opt_pass (function *fun)
+{
+  basic_block bb;
+  rtx_insn *insn, *curr_insn = 0;
+
+  memset ((char *) &counters, '\0', sizeof (counters));
+
+  // Dataflow analysis for use-def chains.
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
+  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
+  df_note_add_problem ();
+  df_analyze ();
+  df_set_flags (DF_DEFER_INSN_RESCAN | DF_LR_RUN_DCE);
+
+  // Look at each basic block to see if there is a load of an external
+  // variable's GOT address, and a single load using that GOT address.
+  FOR_ALL_BB_FN (bb, fun)
+    {
+      FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
+       {
+         if (NONJUMP_INSN_P (insn)
+             && INSN_CODE (insn) == CODE_FOR_pcrel_extern_addr)
+           {
+             counters.gots++;
+             do_pcrel_opt_got_addr (insn);
+           }
+       }
+    }
+
+  df_remove_problem (df_chain);
+  df_process_deferred_rescans ();
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_LR_RUN_DCE);
+  df_chain_add_problem (DF_UD_CHAIN);
+  df_note_add_problem ();
+  df_analyze ();
+
+  if (dump_file)
+    {
+      if (!counters.gots)
+       fprintf (dump_file, "\nNo external symbols were referenced\n");
+
+      else
+       {
+         fprintf (dump_file,
+                  "\n# of loads of an address of an external symbol = %lu\n",
+                  counters.gots);
+
+         if (!counters.loads)
+           fprintf (dump_file,
+                    "\nNo PCREL_OPT load optimizations were done\n");
+
+         else
+           {
+             fprintf (dump_file, "# of PCREL_OPT loads = %lu\n",
+                      counters.loads);
+
+             fprintf (dump_file, "# of adjacent PCREL_OPT loads = %lu\n",
+                      counters.load_separation[0]);
+
+             for (int i = 1; i < MAX_PCREL_OPT_INSNS; i++)
+               {
+                 if (counters.load_separation[i])
+                   fprintf (dump_file,
+                            "# of PCREL_OPT loads separated by %d insn%s = 
%lu\n",
+                            i, (i == 1) ? "" : "s",
+                            counters.load_separation[i]);
+               }
+           }
+       }
+
+      fprintf (dump_file, "\n");
+    }
+
+  return 0;
+}
+
+
+rtl_opt_pass *
+make_pass_pcrel_opt (gcc::context *ctxt)
+{
+  return new pcrel_opt (ctxt);
+}
Index: gcc/config/rs6000/rs6000-passes.def
===================================================================
--- gcc/config/rs6000/rs6000-passes.def (revision 278287)
+++ gcc/config/rs6000/rs6000-passes.def (working copy)
@@ -24,4 +24,15 @@ along with GCC; see the file COPYING3.
    REPLACE_PASS (PASS, INSTANCE, TGT_PASS)
  */
 
+  /* Pass to add the appropriate vector swaps on power8 little endian systems.
+     The power8 does not have instructions that automaticaly do the byte swaps
+     for loads and stores.  */
   INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
+
+  /* Pass to do the PCREL_OPT optimization that combines the load of an
+     external symbol's address along with a single load or store using that
+     address as a base register.  This pass should be the last pass before
+     final, so that it can make sure the address being loaded up dies in a
+     single reference, and it doesn't have to worry about something else using
+     the address.  */
+  INSERT_PASS_BEFORE (pass_final, 1, pass_pcrel_opt);
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h   (revision 278287)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -183,11 +183,13 @@ enum non_prefixed_form {
   NON_PREFIXED_D,              /* All 16-bits are valid.  */
   NON_PREFIXED_DS,             /* Bottom 2 bits must be 0.  */
   NON_PREFIXED_DQ,             /* Bottom 4 bits must be 0.  */
-  NON_PREFIXED_X               /* No offset memory form exists.  */
+  NON_PREFIXED_X,              /* No offset memory form exists.  */
+  NON_PREFIXED_PCREL_OPT       /* Offset for PCREL_OPT optimizations.  */
 };
 
 extern enum insn_form address_to_insn_form (rtx, machine_mode,
                                            enum non_prefixed_form);
+extern enum non_prefixed_form reg_to_non_prefixed (rtx, machine_mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
@@ -303,6 +305,7 @@ namespace gcc { class context; }
 class rtl_opt_pass;
 
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
+extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
 extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
 extern bool rs6000_quadword_masked_address_p (const_rtx exp);
 extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c  (revision 278287)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4213,7 +4213,7 @@ rs6000_option_override_internal (bool gl
          if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
            error ("%qs requires %qs", "-mpcrel", "-mcmodel=medium");
 
-         rs6000_isa_flags &= ~OPTION_MASK_PCREL;
+         rs6000_isa_flags &= ~(OPTION_MASK_PCREL | OPTION_MASK_PCREL_OPT);
        }
 
       /* Enable defaults if desired.  */
@@ -4227,7 +4227,8 @@ rs6000_option_override_internal (bool gl
 
          if (!explicit_pcrel && TARGET_PCREL_DEFAULT
              && TARGET_CMODEL == CMODEL_MEDIUM)
-           rs6000_isa_flags |= OPTION_MASK_PCREL;
+           rs6000_isa_flags |= (OPTION_MASK_PCREL
+                                | OPTION_MASK_PCREL_OPT);
        }
     }
 
@@ -4248,7 +4249,17 @@ rs6000_option_override_internal (bool gl
       if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
        error ("%qs requires %qs", "-mpcrel", "-mprefixed-addr");
 
-      rs6000_isa_flags &= ~OPTION_MASK_PCREL;
+      rs6000_isa_flags &= ~(OPTION_MASK_PCREL
+                           | OPTION_MASK_PCREL_OPT);
+    }
+
+  /* Check -mfuture debug switches.  */
+  if (!TARGET_PCREL && TARGET_PCREL_OPT)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL_OPT) != 0)
+       error ("%qs requires %qs", "-mpcrel-opt", "-mpcrel");
+
+      rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
     }
 
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
@@ -8379,7 +8390,9 @@ rs6000_delegitimize_address (rtx orig_x)
 {
   rtx x, y, offset;
 
-  if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR)
+  if (GET_CODE (orig_x) == UNSPEC
+      && (XINT (orig_x, 1) == UNSPEC_FUSION_GPR
+         || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_GOT))
     orig_x = XVECEXP (orig_x, 0, 0);
 
   orig_x = delegitimize_mem_from_attrs (orig_x);
@@ -13016,6 +13029,19 @@ print_operand (FILE *file, rtx x, int co
        fprintf (file, "%d", 128 >> (REGNO (x) - CR0_REGNO));
       return;
 
+    case 'r':
+      /* X is a label number for the PCREL_OPT optimization.  Emit the .reloc
+        to enable this optimization, unless the value is 0.  */
+      gcc_assert (CONST_INT_P (x));
+      if (UINTVAL (x) != 0)
+       {
+         unsigned int label_num = UINTVAL (x);
+         fprintf (file,
+                  ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t",
+                  label_num, label_num);
+       }
+      return;
+
     case 's':
       /* Low 5 bits of 32 - value */
       if (! INT_P (x))
@@ -22950,6 +22976,7 @@ static struct rs6000_opt_mask const rs60
   { "mulhw",                   OPTION_MASK_MULHW,              false, true  },
   { "multiple",                        OPTION_MASK_MULTIPLE,           false, 
true  },
   { "pcrel",                   OPTION_MASK_PCREL,              false, true  },
+  { "pcrel-opt",               OPTION_MASK_PCREL_OPT,          false, true  },
   { "popcntb",                 OPTION_MASK_POPCNTB,            false, true  },
   { "popcntd",                 OPTION_MASK_POPCNTD,            false, true  },
   { "power8-fusion",           OPTION_MASK_P8_FUSION,          false, true  },
@@ -24911,8 +24938,32 @@ address_to_insn_form (rtx addr,
      local.  */
   if (TARGET_PCREL)
     {
+      /* Special case the PCREL_OPT optimization to only allow offsets that can
+        fit in the worst case instruction format.  This test is done before
+        register allocation, so we might miss an occasionally offset that
+        could have been used in some cases (for example, a SImode that isn't
+        sign extended or a floating point scalar that is loaded into the
+        traditional FPR registers instead of the traditional Altivec
+        registers).  */
       if (SYMBOL_REF_P (op0) && !SYMBOL_REF_LOCAL_P (op0))
-       return INSN_FORM_PCREL_EXTERNAL;
+       {
+         if (non_prefixed_format == NON_PREFIXED_PCREL_OPT)
+           {
+             if (!SIGNED_16BIT_OFFSET_P (offset))
+               return INSN_FORM_BAD;
+
+             unsigned int size = GET_MODE_SIZE (mode);
+             if (size >= 16 && (offset & 0xf) != 0)
+               return INSN_FORM_BAD;
+
+             /* SImode might be sign extended (DS format).  SFmode and DFmode
+                might be loaded into Altivec registers (DS format).  */
+             if (size >= 4 && (offset & 0x2) != 0)
+               return INSN_FORM_BAD;
+           }
+
+         return INSN_FORM_PCREL_EXTERNAL;
+       }
 
       if (SYMBOL_REF_P (op0) || LABEL_REF_P (op0))
        return INSN_FORM_PCREL_LOCAL;
@@ -24953,6 +25004,24 @@ address_to_insn_form (rtx addr,
        non_prefixed_format = NON_PREFIXED_D;
     }
 
+  /* If we are validating the load or store off of the external pointer, be
+     stricter in terms of the offset allowed.  */
+  else if (non_prefixed_format == NON_PREFIXED_PCREL_OPT)
+    {
+      unsigned int size = GET_MODE_SIZE (mode);
+
+      if (size >= 16 && (offset & 0xf) != 0)
+       non_prefixed_format = NON_PREFIXED_DQ;
+
+      /* SImode might be sign extended (DS format).  SFmode and DFmode might be
+        loaded into Altivec registers (DS format).  */
+      else if (size >= 4 && (offset & 0x2) != 0)
+       non_prefixed_format = NON_PREFIXED_DS;
+
+      else
+       non_prefixed_format = NON_PREFIXED_D;
+    }
+
   /* Classify the D/DS/DQ-form addresses.  */
   switch (non_prefixed_format)
     {
@@ -24992,7 +25061,7 @@ address_to_insn_form (rtx addr,
 /* Helper function to take a REG and a MODE and turn it into the non-prefixed
    instruction format (D/DS/DQ) used for offset memory.  */
 
-static enum non_prefixed_form
+enum non_prefixed_form
 reg_to_non_prefixed (rtx reg, machine_mode mode)
 {
   /* If it isn't a register, use the defaults.  */
@@ -25199,7 +25268,14 @@ void
 rs6000_asm_output_opcode (FILE *stream)
 {
   if (next_insn_prefixed_p)
-    fprintf (stream, "p");
+    {
+      fprintf (stream, "p");
+
+      /* Reset flag in case there are separate insn lines in the sequence, so
+        the 'p' is only emited for the first line.  This shows up in
+        pcrel_opt_ld_got.  */
+      next_insn_prefixed_p = false;
+    }
 
   return;
 }
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md (revision 278287)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -150,6 +150,8 @@ (define_c_enum "unspec"
    UNSPEC_PLT16_HA
    UNSPEC_PLT16_LO
    UNSPEC_PLT_PCREL
+   UNSPEC_PCREL_OPT_LD_GOT
+   UNSPEC_PCREL_OPT_LD_RELOC
   ])
 
 ;;
@@ -10022,7 +10024,7 @@ (define_insn "*pcrel_local_addr"
 ;; to a PADDI.  Otherwise, it will create a GOT address that is relocated by
 ;; the dynamic linker and loaded up.  Print_operand_address will append a
 ;; @got@pcrel to the symbol.
-(define_insn "*pcrel_extern_addr"
+(define_insn "pcrel_extern_addr"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
        (match_operand:DI 1 "pcrel_external_address"))]
   "TARGET_PCREL"
@@ -14774,6 +14776,94 @@ (define_insn "*cmpeqb_internal"
   "cmpeqb %0,%1,%2"
   [(set_attr "type" "logical")])
 
+;; Modes that are supported for PCREL_OPT
+(define_mode_iterator PO [QI HI SI DI TI SF DF KF
+                         V1TI V2DI V4SI V8HI V16QI V2DF V4SF
+                         (TF   "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")])
+
+;; Vector modes for PCREL_OPT
+(define_mode_iterator POV [TI KF V1TI V2DI V4SI V8HI V16QI V2DF V4SF
+                          (TF   "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")])
+
+;; Alternate form of pcrel_extern_addr used for the PCREL_OPT optimization for
+;; loads.  We need to put the label after the PLD instruction, because the
+;; assembler might insert a NOP before the PLD for alignment.
+(define_insn "pcrel_opt_ld_got<mode>"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=b")
+       (unspec:DI [(match_operand:DI 1 "pcrel_external_address")
+                   (match_operand 2 "const_int_operand" "n")]
+               UNSPEC_PCREL_OPT_LD_GOT))
+   (clobber (match_scratch:PO 3 "=rwaX"))]
+  "TARGET_PCREL_OPT"
+{
+  return (INTVAL (operands[2])) ? "ld %0,%a1\n.Lpcrel%2:" : "ld %0,%a1";
+}
+  [(set_attr "prefixed" "yes")
+   (set_attr "type" "load")])
+
+;; Alternate form of the loads that include a marker to identify whether we can
+;; do the PCREL_OPT optimization.
+(define_insn "*pcrel_opt_ld<mode>"
+  [(set (match_operand:QHSI 0 "gpc_reg_operand" "=r")
+       (unspec:QHSI [(match_operand:QHSI 1 "non_prefixed_memory" "o")
+                     (match_operand 2 "const_int_operand" "n")]
+                    UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_scratch:DI 3 "=bX"))]
+  "TARGET_PCREL_OPT"
+  "%r2l<wd>z %0,%1"
+  [(set_attr "type" "load")])
+
+(define_insn "*pcrel_opt_lddi"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,d,v")
+       (unspec:DI [(match_operand:DI 1 "non_prefixed_memory" "o,o,o")
+                   (match_operand 2 "const_int_operand" "n,n,n")]
+                  UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_scratch:DI 3 "=bX,bX,bX"))]
+  "TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   %r2ld %0,%1
+   %r2lfd %0,%1
+   %r2lxsd %0,%1"
+  [(set_attr "type" "load,fpload,fpload")])
+
+(define_insn "*pcrel_opt_ldsf"
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=d,v,r")
+       (unspec:SF [(match_operand:SF 1 "non_prefixed_memory" "o,o,o")
+                   (match_operand 2 "const_int_operand" "n,n,n")]
+                  UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_scratch:DI 3 "=bX,bX,bX"))]
+  "TARGET_PCREL_OPT"
+  "@
+   %r2lfs %0,%1
+   %r2lxssp %0,%1
+   %r2lwz %0,%1"
+  [(set_attr "type" "fpload,fpload,load")])
+
+(define_insn "*pcrel_opt_lddf"
+  [(set (match_operand:DF 0 "gpc_reg_operand" "=d,v,r")
+       (unspec:DF [(match_operand:DF 1 "non_prefixed_memory" "o,o,o")
+                   (match_operand 2 "const_int_operand" "n,n,n")]
+                  UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_scratch:DI 3 "=bX,bX,bX"))]
+  "TARGET_PCREL_OPT
+   && (TARGET_POWERPC64 || vsx_register_operand (operands[0], DFmode))"
+  "@
+   %r2lfd %0,%1
+   %r2lxsd %0,%1
+   %r2ld %0,%1"
+  [(set_attr "type" "fpload,fpload,load")])
+
+(define_insn "*pcrel_opt_ld<mode>"
+  [(set (match_operand:POV 0 "gpc_reg_operand" "=wa")
+       (unspec:POV [(match_operand:POV 1 "non_prefixed_memory" "o")
+                    (match_operand 2 "const_int_operand" "n")]
+                   UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_scratch:DI 3 "=bX"))]
+  "TARGET_PCREL_OPT"
+  "%r2lxv %x0,%1"
+  [(set_attr "type" "vecload")])
+
+
 
 (include "sync.md")
 (include "vector.md")
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt        (revision 278287)
+++ gcc/config/rs6000/rs6000.opt        (working copy)
@@ -577,3 +577,7 @@ Generate (do not generate) prefixed memo
 mpcrel
 Target Report Mask(PCREL) Var(rs6000_isa_flags)
 Generate (do not generate) pc-relative memory addressing.
+
+mpcrel-opt
+Target Undocumented Mask(PCREL_OPT) Var(rs6000_isa_flags)
+Generate (do not generate) pc-relative memory optimizations for externals.
Index: gcc/config/rs6000/t-rs6000
===================================================================
--- gcc/config/rs6000/t-rs6000  (revision 278287)
+++ gcc/config/rs6000/t-rs6000  (working copy)
@@ -47,6 +47,10 @@ rs6000-call.o: $(srcdir)/config/rs6000/r
        $(COMPILE) $<
        $(POSTCOMPILE)
 
+pcrel-opt.o: $(srcdir)/config/rs6000/pcrel-opt.c
+       $(COMPILE) $<
+       $(POSTCOMPILE)
+
 $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
   $(srcdir)/config/rs6000/rs6000-cpus.def
        $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc      (revision 278287)
+++ gcc/config.gcc      (working copy)
@@ -502,7 +502,7 @@ or1k*-*-*)
        ;;
 powerpc*-*-*)
        cpu_type=rs6000
-       extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o"
+       extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o pcrel-opt.o"
        extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
        extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
        extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -517,6 +517,7 @@ powerpc*-*-*)
        esac
        extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
        target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c 
\$(srcdir)/config/rs6000/rs6000-call.c"
+       target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
        ;;
 pru-*-*)
        cpu_type=pru
@@ -528,8 +529,9 @@ riscv*)
        ;;
 rs6000*-*-*)
        extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
-       extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o"
+       extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o pcrel-opt.o"
        target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c 
\$(srcdir)/config/rs6000/rs6000-call.c"
+       target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
        ;;
 sparc*-*-*)
        cpu_type=sparc

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Reply via email to