Re: [PATCH] Modula-2 into the GCC tree on master

2021-06-20 Thread Gaius Mulley via Gcc-patches
Segher Boessenkool  writes:

> Hi!
>
> On Fri, Jun 18, 2021 at 10:00:40PM +0100, Gaius Mulley wrote:
>> Segher Boessenkool  writes:
>> > On Thu, Jun 17, 2021 at 11:26:41PM +0100, Gaius Mulley via Gcc-patches 
>> > wrote:
>> >> Debian Stretch using make -j 4, x86_64 GNU/Linux Debian Stretch built
>> >> using make -j 24 and also under x86_64 GNU/Linux Debian Buster using
>> >> make -j 4.
>> >
>> > I am building it on powerpc64-linux (-m32,-m64) and poweerpc64le-linux
>> > currently.  (All CentOS 7 fwiw).
>>
>> excellent the more varieties the better - I'm eagerly awaiting a risc-v
>> motherboard which might also be interesting
>
> I needed a few fixes to get it to build, they are in my branch
> (https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=users/segher/heads/gm2)
>
> The files gm2-libs/getopt.def and gm2-libs/GetOpt.def have filenames
> that differ case only, this is censored by the scripts that we run on
> the Git server.  I renamed the former to cgetopt.def for now, but of
> course more changes are needed for this to work at all.

Hi Segher,

ah yes thanks for spotting this - I recall I had a similar issue with
SYSTEM.def will change to getopt.def to cgetopt.def.

>> > It does not want to build gm2tools, haven't investigated that yet
>> > either.
>
> Not yet :-)
>
>> > Will report results later.
>
> powerpc64-linux now is building, and is running the tetsuite.  My
> powerpc64le-linux build used --enable-languages=all, but Ada fails to
> build, so I'll redo that without Ada.
>
> Gaius, could you look through the two patches I did to get the build to
> work, see if those are correct or if something better needs to be done?
>
> 
> $(subdir) is an absolute path for me, so ../$(subdir) cannot work.

this looks sensible - I'll also test and apply this on a few machines.

> 
> Maybe your texinfo is less picky than mine, I use an older one (5.1)?

(Debian buster texinfo is on 6.5.0 and Debian stretch is on 6.3.0).  But
the up node was inconsistent :-), again thanks for these up node fixes.

I will rebuild on aarch64 (Debian stretch), x86_64 Debian stretch and
x86_64 Debian buster and make source changes for cgetopt.def etc.


regards,
Gaius


[x86_64 PATCH] PR target/11877: Use xor to write zero to memory with -Os

2021-06-20 Thread Roger Sayle

The following patch attempts to resolve PR target/11877 (without
triggering PR/23102).  On x86_64, writing an SImode or DImode zero
to memory uses an instruction encoding that is larger than first
clearing a register (using xor) then writing that to memory.  Hence,
after reload, the peephole2 pass can determine if there's a suitable
free register, and if so, use that to shrink the code size with -Os.

To improve code size, and avoid inserting a large number of xor
instructions (PR target/23102), this patch makes use of peephole2's
efficient pattern matching to use a single temporary for a run of
consecutive writes.  In theory, one could do better still with a
new target-specific pass, gated on -Os, to shrink these instructions
(like stv), but that's probably overkill for the little remaining
space savings.

Evaluating this patch on the CSiBE benchmark (v2.1.1) results in a
0.26% code size improvement (3715273 bytes down to 3705477) on x86_64
with -Os [saving 1 byte every 400].  549 of 894 tests improve, two
tests grow larger.  Analysis of these 2 pathological cases reveals
that although peephole2's match_scratch prefers to use a call-clobbered
register (to avoid requiring a new stack frame), very rarely this
interacts with GCC's shrink wrapping optimization, which may previously
have avoided saving/restoring a call clobbered register, such as %eax,
in the calling function.

This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap
and make -k check with no new failures.

Ok for mainline?


2021-06-20  Roger Sayle  

gcc/ChangeLog
PR target/11877
* config/i386/i386.md: New define_peephole2s to shrink writing
1, 2 or 4 consecutive zeros to memory when optimizing for size.

gcc/testsuite/ChangeLog
PR target/11877
* gcc.target/i386/pr11877.c: New test case.

--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 48532eb..2333261 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -19357,6 +19357,42 @@
   ix86_expand_clear (operands[1]);
 })
 
+;; When optimizing for size, zeroing memory should use a register.
+(define_peephole2
+  [(match_scratch:SWI48 0 "r")
+   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))
+   (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))
+   (set (match_operand:SWI48 3 "memory_operand" "") (const_int 0))
+   (set (match_operand:SWI48 4 "memory_operand" "") (const_int 0))]
+  "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(set (match_dup 1) (match_dup 0))
+   (set (match_dup 2) (match_dup 0))
+   (set (match_dup 3) (match_dup 0))
+   (set (match_dup 4) (match_dup 0))]
+{
+  ix86_expand_clear (operands[0]);
+})
+
+(define_peephole2
+  [(match_scratch:SWI48 0 "r")
+   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))
+   (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))]
+  "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(set (match_dup 1) (match_dup 0))
+   (set (match_dup 2) (match_dup 0))]
+{
+  ix86_expand_clear (operands[0]);
+})
+
+(define_peephole2
+  [(match_scratch:SWI48 0 "r")
+   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))]
+  "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(set (match_dup 1) (match_dup 0))]
+{
+  ix86_expand_clear (operands[0]);
+})
+
 ;; Reload dislikes loading constants directly into class_likely_spilled
 ;; hard registers.  Try to tidy things up here.
 (define_peephole2
/* PR target/11877 */
/* { dg-do compile } */
/* { dg-options "-Os" } */

void foo (long long *p)
{
  *p = 0;
}

void bar (int *p)
{
  *p = 0;
}

/* { dg-final { scan-assembler-times "xorl\[ \t\]" 2 } } */
/* { dg-final { scan-assembler-not "\\\$0," } } */


[PATCH] doc/lto.texi: List slim object format as the default

2021-06-20 Thread Dimitar Dimitrov
Slim LTO object files have been the default for quite a while, since:
  commit e9f67e625c2a4225a7169d7220dcb85b6fdd7ca9
  Author: Jan Hubicka 
  common.opt (ffat-lto-objects): Disable by default.

That commit did not update lto.texi, so do it now.

gcc/ChangeLog:

* doc/lto.texi (Design Overview): Update that slim objects are
the default.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/doc/lto.texi | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/gcc/doc/lto.texi b/gcc/doc/lto.texi
index 1f55216328a..755258ccb2b 100644
--- a/gcc/doc/lto.texi
+++ b/gcc/doc/lto.texi
@@ -36,11 +36,16 @@ bytecode representation of GIMPLE that is emitted in 
special sections
 of @code{.o} files.  Currently, LTO support is enabled in most
 ELF-based systems, as well as darwin, cygwin and mingw systems.
 
-Since GIMPLE bytecode is saved alongside final object code, object
-files generated with LTO support are larger than regular object files.
-This ``fat'' object format makes it easy to integrate LTO into
-existing build systems, as one can, for instance, produce archives of
-the files.  Additionally, one might be able to ship one set of fat
+Object files generated with LTO support contain only GIMPLE bytecode.
+Such objects are called ``slim'', and they require that tools like
+@code{ar} and @code{nm} understand symbol tables of LTO sections.  These tools
+have been extended to use the plugin infrastructure, so GCC can support
+``slim'' objects consisting of the intermediate code alone.
+
+GIMPLE bytecode could also be saved alongside final object code if the
+@option{-ffat-lto-objects} option is passed.  But this would make the
+object files generated with LTO support larger than regular object
+files.  This ``fat'' object format allows to ship one set of fat
 objects which could be used both for development and the production of
 optimized builds.  A, perhaps surprising, side effect of this feature
 is that any mistake in the toolchain leads to LTO information not
@@ -49,14 +54,6 @@ This is both an advantage, as the system is more robust, and 
a
 disadvantage, as the user is not informed that the optimization has
 been disabled.
 
-The current implementation only produces ``fat'' objects, effectively
-doubling compilation time and increasing file sizes up to 5x the
-original size.  This hides the problem that some tools, such as
-@code{ar} and @code{nm}, need to understand symbol tables of LTO
-sections.  These tools were extended to use the plugin infrastructure,
-and with these problems solved, GCC will also support ``slim'' objects
-consisting of the intermediate code alone.
-
 At the highest level, LTO splits the compiler in two.  The first half
 (the ``writer'') produces a streaming representation of all the
 internal data structures needed to optimize and generate code.  This
-- 
2.31.1



[PATCH 1/2] Add -fsingle-global-definition

2021-06-20 Thread H.J. Lu via Gcc-patches
1. Generate a single global definition marker in relocatable objects.
   a. Always use GOT to access undefined data and function symbols,
  including in PIE and non-PIE.  These will avoid copy relocations
  in executables.
   b. This is compatible with existing executables and shared libraries.
2. In executable and shared library, bind symbols with the STV_PROTECTED
   visibility locally:
   a. The address of data symbol is the address of data body.
   b. For systems without function descriptor, the function pointer is
  the address of function body.
   c. The resulting shared libraries may not be incompatible with
  executables which have copy relocations on protected symbols.
3. Update asm_preferred_eh_data_format to properly select EH encoding
format with -fsingle-global-definition.
4. Add ix86_reloc_rw_mask for TARGET_ASM_RELOC_RW_MASK to avoid copy
relocation with -fsingle-global-definition.

gcc/

PR target/35513
PR target/100593
* common.opt: Add -fsingle-global-definition.
* config/i386/i386-protos.h (ix86_force_load_from_GOT_p): Add a
bool argument.
* config/i386/i386.c (ix86_force_load_from_GOT_p): Add a bool
argument to indicate call operand.  Force non-call load
from GOT for -fsingle-global-definition.
(legitimate_pic_address_disp_p): Avoid copy relocation in PIE
for -fsingle-global-definition.
(ix86_print_operand): Pass true to ix86_force_load_from_GOT_p
for call operand.
(asm_preferred_eh_data_format): Use PC-relative format for
-fsingle-global-definition to avoid copy relocation.  Check
ptr_mode instead of TARGET_64BIT when selecting DW_EH_PE_sdata4.
(ix86_binds_local_p): Don't treat protected data as extern and
avoid copy relocation on common symbol.
(ix86_reloc_rw_mask): New to avoid copy relocation for
-fsingle-global-definition.
(TARGET_ASM_RELOC_RW_MASK): New.
* doc/invoke.texi: Document -fsingle-global-definition.

gcc/testsuite/

PR target/35513
PR target/100593
* g++.dg/pr35513-1.C: New file.
* g++.dg/pr35513-2.C: Likewise.
* gcc.target/i386/pr35513-1.c: Likewise.
* gcc.target/i386/pr35513-2.c: Likewise.
* gcc.target/i386/pr35513-3.c: Likewise.
* gcc.target/i386/pr35513-4.c: Likewise.
* gcc.target/i386/pr35513-5.c: Likewise.
* gcc.target/i386/pr35513-6.c: Likewise.
* gcc.target/i386/pr35513-7.c: Likewise.
* gcc.target/i386/pr35513-8.c: Likewise.
---
 gcc/common.opt|  4 ++
 gcc/config/i386/i386-protos.h |  2 +-
 gcc/config/i386/i386.c| 50 +++--
 gcc/doc/invoke.texi   |  8 +++-
 gcc/testsuite/g++.dg/pr35513-1.C  | 25 +++
 gcc/testsuite/g++.dg/pr35513-2.C  | 53 +++
 gcc/testsuite/gcc.target/i386/pr35513-1.c | 16 +++
 gcc/testsuite/gcc.target/i386/pr35513-2.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-3.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-4.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-5.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-6.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr35513-7.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-8.c | 41 ++
 14 files changed, 272 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr35513-1.C
 create mode 100644 gcc/testsuite/g++.dg/pr35513-2.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-8.c

diff --git a/gcc/common.opt b/gcc/common.opt
index a1353e06bdc..b1cb53bb780 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2579,6 +2579,10 @@ fsigned-zeros
 Common Var(flag_signed_zeros) Init(1) Optimization SetByCombined
 Disable floating point optimizations that ignore the IEEE signedness of zero.
 
+fsingle-global-definition
+Common Var(flag_single_global_definition) Optimization
+Use GOT to access external symbols and make access to protected symbols local.
+
 fsingle-precision-constant
 Common Var(flag_single_precision_constant) Optimization
 Convert floating point constants to single precision constants.
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index e6ac9390777..30f75b9900b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -77,7 +77,7 @@ extern bool ix86_expand_cmpstrn_or_cmpmem (rtx, rtx, rtx, 
rtx, rtx, bool);
 extern bool constant_address_p

[PATCH 0/2] Implement single global definition

2021-06-20 Thread H.J. Lu via Gcc-patches
On systems with copy relocation:
* A copy in executable is created for the definition in a shared library
at run-time by ld.so.
* The copy is referenced by executable and shared libraries.
* Executable can access the copy directly.

Issues are:
* Overhead of a copy, time and space, may be visible at run-time.
* Read-only data in the shared library becomes read-write copy in
executable at run-time.
* Local access to data with the STV_PROTECTED visibility in the shared
library must use GOT.

On systems without function descriptor, function pointers vary depending
on where and how the functions are defined.
* If the function is defined in executable, it can be the address of
function body.
* If the function, including the function with STV_PROTECTED visibility,
is defined in the shared library, it can be the address of the PLT entry
in executable or shared library.

Issues are:
* The address of function body may not be used as its function pointer.
* ld.so needs to search loaded shared libraries for the function pointer
of the function with STV_PROTECTED visibility.

Here is a proposal to remove copy relocation and use canonical function
pointer:

1. Accesses, including in PIE and non-PIE, to undefined symbols must
use GOT.
  a. Linker may optimize out GOT access if the data is defined in PIE or
  non-PIE.
2. Read-only data in the shared library remain read-only at run-time
3. Address of global data with the STV_PROTECTED visibility in the shared
library is the address of data body.
  a. Can use IP-relative access.
  b. May need GOT without IP-relative access.
4. For systems without function descriptor,
  a. All global function pointers of undefined functions in PIE and
  non-PIE must use GOT.  Linker may optimize out GOT access if the
  function is defined in PIE or non-PIE.
  b. Function pointer of functions with the STV_PROTECTED visibility in
  executable and shared library is the address of function body.
   i. Can use IP-relative access.
   ii. May need GOT without IP-relative access.
   iii. Branches to undefined functions may use PLT.
5. Single global definition marker:

Add GNU_PROPERTY_1_NEEDED:

#define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO

to indicate the needed properties by the object file.

Add GNU_PROPERTY_1_NEEDED_SINGLE_GLOBAL_DEFINITION:

#define GNU_PROPERTY_1_NEEDED_SINGLE_GLOBAL_DEFINITION (1U << 0)

to indicate that the object file requires canonical function pointers and
cannot be used with copy relocation.

  a. Copy relocation should be disallowed at link-time and run-time.
  b. Canonical function pointers are required at link-time and run-tima

Add a compiler option, -fsingle-global-definition:

1. Always to use GOT to access undefined symbols, including in PIE and
non-PIE.  This is safe to do and does not break the ABI.
2. In executable and shared library, for symbols with the STV_PROTECTED
visibility:
  a. The address of data symbol is the address of data body.
  b. For systems without function descriptor, the function pointer is
  the address of function body.
These break the ABI and resulting shared libraries may not be compatible
with executables which are not compiled with -fsingle-global-definition.
3. Generate a single global definition marker in relocatable objects.

H.J. Lu (2):
  Add -fsingle-global-definition
  Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE

 gcc/common.opt|  4 ++
 gcc/config.in |  6 +++
 gcc/config/i386/gnu-property.c| 31 -
 gcc/config/i386/i386-protos.h |  2 +-
 gcc/config/i386/i386.c| 52 --
 gcc/configure | 42 --
 gcc/configure.ac  | 20 +
 gcc/doc/invoke.texi   |  8 +++-
 gcc/doc/tm.texi   |  5 +++
 gcc/doc/tm.texi.in|  2 +
 gcc/output.h  |  2 +
 gcc/target.def|  8 
 gcc/testsuite/g++.dg/pr35513-1.C  | 25 +++
 gcc/testsuite/g++.dg/pr35513-2.C  | 53 +++
 gcc/testsuite/gcc.target/i386/pr35513-1.c | 16 +++
 gcc/testsuite/gcc.target/i386/pr35513-2.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-3.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-4.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-5.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-6.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr35513-7.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-8.c | 41 ++
 gcc/toplev.c  |  3 ++
 gcc/varasm.c  | 47 
 24 files changed, 406 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr35513-1.C
 create mode 100644 gcc/testsuite/g++.dg/pr35513-2.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-1.c
 create mode 100644 gcc/testsuite/gcc.ta

[PATCH 2/2] Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE

2021-06-20 Thread H.J. Lu via Gcc-patches
Generate the GNU_PROPERTY_1_NEEDED_SINGLE_GLOBAL_DEFINITION marker for
-fsingle-global-definition to indicate that the object file requires
canonical function pointers and cannot be used with copy relocation.

* configure.ac (HAVE_LD_SINGLE_GLOBAL_DEFINITION_SUPPORT): New.
Define to 1 if linker supports -z single-global-definition.
* output.h (emit_gnu_property): New.
(emit_gnu_property_note): Likewise.
* target.def (emit_gnu_property_note): Add a argetm.asm_out hook.
* toplev.c (compile_file): Call emit_gnu_property_note before
file_end.
* varasm.c (emit_gnu_property): New.
(emit_gnu_property_note): Likewise.
* config.in: Regenerated.
* configure: Likewise.
* doc/tm.texi: Likewise.
* config/i386/gnu-property.c (emit_gnu_property): Removed.
(TARGET_ASM_EMIT_GNU_PROPERTY_NOTE): New.
* doc/tm.texi.in: Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE.
---
 gcc/config.in  |  6 +
 gcc/config/i386/gnu-property.c | 31 --
 gcc/config/i386/i386.c |  2 ++
 gcc/configure  | 42 +++---
 gcc/configure.ac   | 20 +++
 gcc/doc/tm.texi|  5 
 gcc/doc/tm.texi.in |  2 ++
 gcc/output.h   |  2 ++
 gcc/target.def |  8 ++
 gcc/toplev.c   |  3 +++
 gcc/varasm.c   | 47 ++
 11 files changed, 134 insertions(+), 34 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 18e627141cc..ee2a94f3847 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1690,6 +1690,12 @@
 #endif
 
 
+/* Define to 1 if your linker supports -z single-global-definition */
+#ifndef USED_FOR_TARGET
+#undef HAVE_LD_SINGLE_GLOBAL_DEFINITION_SUPPORT
+#endif
+
+
 /* Define if your linker supports the *_sol2 emulations. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_LD_SOL2_EMULATION
diff --git a/gcc/config/i386/gnu-property.c b/gcc/config/i386/gnu-property.c
index 4ba04403002..9fe8d00132e 100644
--- a/gcc/config/i386/gnu-property.c
+++ b/gcc/config/i386/gnu-property.c
@@ -24,37 +24,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "linux-common.h"
 
-static void
-emit_gnu_property (unsigned int type, unsigned int data)
-{
-  int p2align = ptr_mode == SImode ? 2 : 3;
-
-  switch_to_section (get_section (".note.gnu.property",
- SECTION_NOTYPE, NULL));
-
-  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
-  /* name length.  */
-  fprintf (asm_out_file, ASM_LONG "1f - 0f\n");
-  /* data length.  */
-  fprintf (asm_out_file, ASM_LONG "4f - 1f\n");
-  /* note type: NT_GNU_PROPERTY_TYPE_0.  */
-  fprintf (asm_out_file, ASM_LONG "5\n");
-  fprintf (asm_out_file, "0:\n");
-  /* vendor name: "GNU".  */
-  fprintf (asm_out_file, STRING_ASM_OP "\"GNU\"\n");
-  fprintf (asm_out_file, "1:\n");
-  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
-  /* pr_type.  */
-  fprintf (asm_out_file, ASM_LONG "0x%x\n", type);
-  /* pr_datasz.  */
-  fprintf (asm_out_file, ASM_LONG "3f - 2f\n");
-  fprintf (asm_out_file, "2:\n");
-  fprintf (asm_out_file, ASM_LONG "0x%x\n", data);
-  fprintf (asm_out_file, "3:\n");
-  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
-  fprintf (asm_out_file, "4:\n");
-}
-
 void
 file_end_indicate_exec_stack_and_gnu_property (void)
 {
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9878c3126d0..b1268756322 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -24036,6 +24036,8 @@ ix86_run_selftests (void)
 #if !TARGET_MACHO && !TARGET_DLLIMPORT_DECL_ATTRIBUTES
 # undef TARGET_ASM_RELOC_RW_MASK
 # define TARGET_ASM_RELOC_RW_MASK ix86_reloc_rw_mask
+# undef TARGET_ASM_EMIT_GNU_PROPERTY_NOTE
+# define TARGET_ASM_EMIT_GNU_PROPERTY_NOTE emit_gnu_property_note
 #endif
 
 static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
diff --git a/gcc/configure b/gcc/configure
index dd0194a57f4..3d53ce8cc9a 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -911,6 +911,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -1085,6 +1086,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE}'
@@ -1337,6 +1339,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+ac_prev=runstatedir ;;
+  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+  | --run=* | --ru=* | --r=*)
+runstatedir=$ac_optarg ;;
+
   -sbindir | --sbindir | 

Re: [PATCH] c++: conversion to base of vbase in NSDMI [PR80431]

2021-06-20 Thread Jason Merrill via Gcc-patches

On 6/18/21 4:39 PM, Patrick Palka wrote:

The delayed processing of conversions to a virtual base inside an NSDMI
assumes the target base type is a (possibly indirect) virtual base of
the current class, but the target base type could also be an indirect
non-virtual base inherited from a virtual base, as in the testcase below.
Since such a base isn't a part of CLASSTYPE_VBASECLASSES, we end up
miscompiling the testcase due to build_base_path (called with
binfo=NULL_TREE) silently returning error_mark_node.  Fix this by
using convert_to_base to build the conversion.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/80431

gcc/cp/ChangeLog:

* tree.c (bot_replace): Use convert_to_base instead of
only looking through CLASSTYPE_VBASECLASSES of the current class
type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-virtual1a.C: New test.
---
  gcc/cp/tree.c| 10 ++-
  gcc/testsuite/g++.dg/cpp0x/nsdmi-virtual1a.C | 29 
  2 files changed, 32 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-virtual1a.C

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index fec5afaa2be..3537f395960 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -3244,13 +3244,9 @@ bot_replace (tree* t, int* /*walk_subtrees*/, void* 
data_)
  {
/* In an NSDMI build_base_path defers building conversions to virtual


So this should be "morally virtual" 
(https://itanium-cxx-abi.github.io/cxx-abi/abi.html#definitions)


OK with that change.


 bases, and we handle it here.  */
-  tree basetype = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (*t)));
-  vec *vbases = CLASSTYPE_VBASECLASSES (current_class_type);
-  int i; tree binfo;
-  FOR_EACH_VEC_SAFE_ELT (vbases, i, binfo)
-   if (BINFO_TYPE (binfo) == basetype)
- break;
-  *t = build_base_path (PLUS_EXPR, TREE_OPERAND (*t, 0), binfo, true,
+  tree basetype = TREE_TYPE (*t);
+  *t = convert_to_base (TREE_OPERAND (*t, 0), basetype,
+   /*check_access=*/false, /*nonnull=*/true,
tf_warning_or_error);
  }
  
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-virtual1a.C b/gcc/testsuite/g++.dg/cpp0x/nsdmi-virtual1a.C

new file mode 100644
index 000..fe647fe3cf7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-virtual1a.C
@@ -0,0 +1,29 @@
+// PR c++/80431
+// { dg-do run { target c++11 } }
+
+// A variant of nsdmi-virtual1.C that turns A from a virtual base of B to a 
base
+// of a virtual base of B, using the intermediate class D.
+
+struct A
+{
+  A(): i(42) { }
+  int i;
+  int f() { return i; }
+};
+
+struct D : A { int pad; };
+
+struct B : virtual D
+{
+  int j = i + f();
+  int k = A::i + A::f();
+};
+
+struct C: B { int pad; };
+
+int main()
+{
+  C c;
+  if (c.j != 84 || c.k != 84)
+__builtin_abort();
+}





Re: [PATCH] c++: REF_PARENTHESIZED_P wrapper inhibiting NRVO [PR67302]

2021-06-20 Thread Jason Merrill via Gcc-patches

On 6/19/21 3:45 PM, Patrick Palka wrote:

Here, in C++14 or later, we remember the parentheses around 'a' in the
return statement by using a REF_PARENTHESIZED_P wrapper, which ends up
inhibiting NRVO because we don't look through this wrapper before
checking the conditions for NRVO.  This patch fixes this by calling
maybe_undo_parenthesized_ref sooner in check_return_expr.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/67302

gcc/cp/ChangeLog:

* typeck.c (check_return_expr): Call maybe_undo_parenthesized_ref
sooner, before the NRVO handling.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv21.C: New test.
---
  gcc/cp/typeck.c  |  9 -
  gcc/testsuite/g++.dg/opt/nrv21.C | 14 ++
  2 files changed, 18 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/opt/nrv21.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index dbb2370510c..aa014c3812a 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -10306,7 +10306,10 @@ check_return_expr (tree retval, bool *no_warning)
  
   See finish_function and finalize_nrv for the rest of this optimization.  */

if (retval)
-STRIP_ANY_LOCATION_WRAPPER (retval);
+{
+  retval = maybe_undo_parenthesized_ref (retval);
+  STRIP_ANY_LOCATION_WRAPPER (retval);
+}
  
bool named_return_value_okay_p = can_do_nrvo_p (retval, functype);

if (fn_returns_value_p && flag_elide_constructors)
@@ -10340,10 +10343,6 @@ check_return_expr (tree retval, bool *no_warning)
if (VOID_TYPE_P (functype))
return error_mark_node;
  
-  /* If we had an id-expression obfuscated by force_paren_expr, we need

-to undo it so we can try to treat it as an rvalue below.  */
-  retval = maybe_undo_parenthesized_ref (retval);
-
if (processing_template_decl)
retval = build_non_dependent_expr (retval);
  
diff --git a/gcc/testsuite/g++.dg/opt/nrv21.C b/gcc/testsuite/g++.dg/opt/nrv21.C

new file mode 100644
index 000..31bff79afc1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/nrv21.C
@@ -0,0 +1,14 @@
+// PR c++/67302
+// { dg-additional-options -fdump-tree-gimple }
+// { dg-final { scan-tree-dump-not " = a" "gimple" } }
+
+struct A
+{
+  int ar[42];
+  A();
+};
+
+A f() {
+  A a;
+  return (a);
+}





[PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-20 Thread liuhongt via Gcc-patches
The avx512 supports bitwise operations with mask registers, but the
throughput of those instructions is much lower than that of the
corresponding gpr version, so we would additionally disparages
slightly the mask register alternative for bitwise operations in the
LRA.

Also when allocano cost of GENERAL_REGS is same as MASK_REGS, allocate
MASK_REGS first since it has already been disparaged.

gcc/ChangeLog:

PR target/101142
* config/i386/i386.md: (*anddi_1): Disparage slightly the mask
register alternative.
(*and_1): Ditto.
(*andqi_1): Ditto.
(*andn_1): Ditto.
(*_1): Ditto.
(*qi_1): Ditto.
(*one_cmpl2_1): Ditto.
(*one_cmplsi2_1_zext): Ditto.
(*one_cmplqi2_1): Ditto.
* config/i386/i386.c (x86_order_regs_for_local_alloc): Change
the order of mask registers to be before general registers.

gcc/testsuite/ChangeLog:

PR target/101142
* gcc.target/i386/spill_to_mask-1.c: Adjust testcase.
* gcc.target/i386/spill_to_mask-2.c: Adjust testcase.
* gcc.target/i386/spill_to_mask-3.c: Adjust testcase.
* gcc.target/i386/spill_to_mask-4.c: Adjust testcase.
---
 gcc/config/i386/i386.c|  8 +-
 gcc/config/i386/i386.md   | 20 ++---
 .../gcc.target/i386/spill_to_mask-1.c | 89 +--
 .../gcc.target/i386/spill_to_mask-2.c | 11 ++-
 .../gcc.target/i386/spill_to_mask-3.c | 11 ++-
 .../gcc.target/i386/spill_to_mask-4.c | 11 ++-
 6 files changed, 91 insertions(+), 59 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a61255857ff..a651853ca3b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void)
int pos = 0;
int i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+ reg_alloc_order [pos++] = i;
+
/* First allocate the local general purpose registers.  */
for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
  if (GENERAL_REGNO_P (i) && call_used_or_fixed_reg_p (i))
@@ -20489,10 +20493,6 @@ x86_order_regs_for_local_alloc (void)
for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
  reg_alloc_order [pos++] = i;
 
-   /* Mask register.  */
-   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
- reg_alloc_order [pos++] = i;
-
/* x87 registers.  */
if (TARGET_SSE_MATH)
  for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6e4abf32e7c..3eef56b27d7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9138,7 +9138,7 @@ (define_insn_and_split "*anddi3_doubleword"
 })
 
 (define_insn "*anddi_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,k")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,?k")
(and:DI
 (match_operand:DI 1 "nonimmediate_operand" "%0,0,0,qm,k")
 (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,m,L,k")))
@@ -9226,7 +9226,7 @@ (define_insn "*andsi_1_zext"
(set_attr "mode" "SI")])
 
 (define_insn "*and_1"
-  [(set (match_operand:SWI24 0 "nonimmediate_operand" "=rm,r,Ya,k")
+  [(set (match_operand:SWI24 0 "nonimmediate_operand" "=rm,r,Ya,?k")
(and:SWI24 (match_operand:SWI24 1 "nonimmediate_operand" "%0,0,qm,k")
   (match_operand:SWI24 2 "" "r,m,L,k")))
(clobber (reg:CC FLAGS_REG))]
@@ -9255,7 +9255,7 @@ (define_insn "*and_1"
(set_attr "mode" ",,SI,")])
 
 (define_insn "*andqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,k")
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,?k")
(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k")
(match_operand:QI 2 "general_operand" "qn,m,rn,k")))
(clobber (reg:CC FLAGS_REG))]
@@ -9651,7 +9651,7 @@ (define_split
 })
 
 (define_insn "*andn_1"
-  [(set (match_operand:SWI48 0 "register_operand" "=r,r,k")
+  [(set (match_operand:SWI48 0 "register_operand" "=r,r,?k")
(and:SWI48
  (not:SWI48 (match_operand:SWI48 1 "register_operand" "r,r,k"))
  (match_operand:SWI48 2 "nonimmediate_operand" "r,m,k")))
@@ -9667,7 +9667,7 @@ (define_insn "*andn_1"
(set_attr "mode" "")])
 
 (define_insn "*andn_1"
-  [(set (match_operand:SWI12 0 "register_operand" "=r,k")
+  [(set (match_operand:SWI12 0 "register_operand" "=r,?k")
(and:SWI12
  (not:SWI12 (match_operand:SWI12 1 "register_operand" "r,k"))
  (match_operand:SWI12 2 "register_operand" "r,k")))
@@ -9757,7 +9757,7 @@ (define_insn_and_split "*di3_doubleword"
 })
 
 (define_insn "*_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,k")
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
(any_or:SWI248
 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
 (match_operand:SWI248 2 "" "r,m,k")))
@@

[PATCH] MAINTAINERS: Add myself as maintainer of the i386 vector extensions.

2021-06-20 Thread liuhongt via Gcc-patches
ChangeLog:

* MAINTAINERS: Add myself as maintainer of the i386 vector
extensions.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 32a414ba8af..4ac4fc5f3bd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -74,6 +74,7 @@ hppa port John David Anglin   

 i386 port  Jan Hubicka 
 i386 port  Uros Bizjak 
 i386 vector ISA extns  Kirill Yukhin   
+i386 vector ISA extns  Hongtao Liu 
 iq2000 portNick Clifton
 lm32 port  Sebastien Bourdeauducq  
 m32r port  Nick Clifton
-- 
2.18.1



[patch] Fortran: fix sm computation in CFI_allocate [PR93524]

2021-06-20 Thread Sandra Loosemore
I ran into this bug in CFI_allocate while testing something else and 
then realized there was already a PR open for it.  It seems like an easy 
fix, and I've used Tobias's test case from the issue more or less 
verbatim.


There were some other bugs added on to this issue but I think they have 
all been fixed already except for this one.


OK to check in?

-Sandra
commit de9920753469e36c968b273a0e8b4d66a1d57946
Author: Sandra Loosemore 
Date:   Sun Jun 20 22:37:55 2021 -0700

Fortran: fix sm computation in CFI_allocate [PR93524]

This patch fixes a bug in setting the step multiplier field in the
C descriptor for array dimensions > 2.

2021-06-20  Sandra Loosemore  
Tobias Burnus  

libgfortran/
	PR fortran/93524
	* runtime/ISO_Fortran_binding.c (CFI_allocate): Fix
	sm computation.

gcc/testsuite/
	PR fortran/93524
	* gfortran.dg/pr93524.c, gfortran.dg/pr93524.f90: New.

diff --git a/gcc/testsuite/gfortran.dg/pr93524.c b/gcc/testsuite/gfortran.dg/pr93524.c
new file mode 100644
index 000..8a6c066
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr93524.c
@@ -0,0 +1,33 @@
+/* Test the fix for PR93524, in which CFI_allocate was computing
+   sm incorrectly for dimensions > 2.  */
+
+#include   // For size_t
+#include 
+
+void my_fortran_sub_1 (CFI_cdesc_t *dv); 
+void my_fortran_sub_2 (CFI_cdesc_t *dv); 
+
+int main ()
+{
+  CFI_CDESC_T (3) a;
+  CFI_cdesc_t *dv = (CFI_cdesc_t *) &a;
+  // dv, base_addr, attribute,type, elem_len, rank, extents
+  CFI_establish (dv, NULL, CFI_attribute_allocatable, CFI_type_float, 0, 3, NULL); 
+
+  if (dv->base_addr != NULL)
+return 1;  // shall not be allocated
+
+  CFI_index_t lower_bounds[] = {-10, 0, 3}; 
+  CFI_index_t upper_bounds[] = {10, 5, 10}; 
+  size_t elem_len = 0;  // only needed for strings
+  if (CFI_SUCCESS != CFI_allocate (dv, lower_bounds, upper_bounds, elem_len))
+return 2;
+
+  if (!CFI_is_contiguous (dv))
+return 2;  // allocatables shall be contiguous,unless a strided section is used
+
+  my_fortran_sub_1 (dv);
+  my_fortran_sub_2 (dv);
+  CFI_deallocate (dv);
+  return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/pr93524.f90 b/gcc/testsuite/gfortran.dg/pr93524.f90
new file mode 100644
index 000..b21030b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr93524.f90
@@ -0,0 +1,15 @@
+! { dg-additional-sources pr93524.c }
+! { dg-do run }
+!
+! Test the fix for PR93524.  The main program is in pr93524.c.
+
+subroutine my_fortran_sub_1 (A) bind(C)
+  real :: A(:, :, :)
+  print *, 'Lower bounds: ', lbound(A) ! Lower bounds:111
+  print *, 'Upper bounds: ', ubound(A) ! Upper bounds:   2168
+end
+subroutine my_fortran_sub_2 (A) bind(C)
+  real, ALLOCATABLE :: A(:, :, :)
+  print *, 'Lower bounds: ', lbound(A)
+  print *, 'Upper bounds: ', ubound(A)
+end subroutine my_fortran_sub_2
diff --git a/libgfortran/runtime/ISO_Fortran_binding.c b/libgfortran/runtime/ISO_Fortran_binding.c
index 20833ad..0978832 100644
--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -254,10 +254,7 @@ CFI_allocate (CFI_cdesc_t *dv, const CFI_index_t lower_bounds[],
 	{
 	  dv->dim[i].lower_bound = lower_bounds[i];
 	  dv->dim[i].extent = upper_bounds[i] - dv->dim[i].lower_bound + 1;
-	  if (i == 0)
-	dv->dim[i].sm = dv->elem_len;
-	  else
-	dv->dim[i].sm = dv->elem_len * dv->dim[i - 1].extent;
+	  dv->dim[i].sm = dv->elem_len * arr_len;
 	  arr_len *= dv->dim[i].extent;
 }
 }


Re: [PATCH] doc/lto.texi: List slim object format as the default

2021-06-20 Thread Richard Biener
On Sun, 20 Jun 2021, Dimitar Dimitrov wrote:

> Slim LTO object files have been the default for quite a while, since:
>   commit e9f67e625c2a4225a7169d7220dcb85b6fdd7ca9
>   Author: Jan Hubicka 
>   common.opt (ffat-lto-objects): Disable by default.
> 
> That commit did not update lto.texi, so do it now.

LGTM.  Btw, on targets where linker plugin support is not detected
by configury fat objects are still the default.

> gcc/ChangeLog:
> 
>   * doc/lto.texi (Design Overview): Update that slim objects are
>   the default.
> 
> Signed-off-by: Dimitar Dimitrov 
> ---
>  gcc/doc/lto.texi | 23 ++-
>  1 file changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/doc/lto.texi b/gcc/doc/lto.texi
> index 1f55216328a..755258ccb2b 100644
> --- a/gcc/doc/lto.texi
> +++ b/gcc/doc/lto.texi
> @@ -36,11 +36,16 @@ bytecode representation of GIMPLE that is emitted in 
> special sections
>  of @code{.o} files.  Currently, LTO support is enabled in most
>  ELF-based systems, as well as darwin, cygwin and mingw systems.
>  
> -Since GIMPLE bytecode is saved alongside final object code, object
> -files generated with LTO support are larger than regular object files.
> -This ``fat'' object format makes it easy to integrate LTO into
> -existing build systems, as one can, for instance, produce archives of
> -the files.  Additionally, one might be able to ship one set of fat
> +Object files generated with LTO support contain only GIMPLE bytecode.
> +Such objects are called ``slim'', and they require that tools like
> +@code{ar} and @code{nm} understand symbol tables of LTO sections.  These 
> tools
> +have been extended to use the plugin infrastructure, so GCC can support
> +``slim'' objects consisting of the intermediate code alone.
> +
> +GIMPLE bytecode could also be saved alongside final object code if the
> +@option{-ffat-lto-objects} option is passed.  But this would make the
> +object files generated with LTO support larger than regular object
> +files.  This ``fat'' object format allows to ship one set of fat
>  objects which could be used both for development and the production of
>  optimized builds.  A, perhaps surprising, side effect of this feature
>  is that any mistake in the toolchain leads to LTO information not
> @@ -49,14 +54,6 @@ This is both an advantage, as the system is more robust, 
> and a
>  disadvantage, as the user is not informed that the optimization has
>  been disabled.
>  
> -The current implementation only produces ``fat'' objects, effectively
> -doubling compilation time and increasing file sizes up to 5x the
> -original size.  This hides the problem that some tools, such as
> -@code{ar} and @code{nm}, need to understand symbol tables of LTO
> -sections.  These tools were extended to use the plugin infrastructure,
> -and with these problems solved, GCC will also support ``slim'' objects
> -consisting of the intermediate code alone.
> -
>  At the highest level, LTO splits the compiler in two.  The first half
>  (the ``writer'') produces a streaming representation of all the
>  internal data structures needed to optimize and generate code.  This
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH V3] Split loop for NE condition.

2021-06-20 Thread guojiufu via Gcc-patches

On 2021-06-09 19:18, guojiufu wrote:

On 2021-06-09 17:42, guojiufu via Gcc-patches wrote:

On 2021-06-08 18:13, Richard Biener wrote:

On Fri, 4 Jun 2021, Jiufu Guo wrote:


cut...

cut...


Here is the updated patch, thanks for your time!


Updates:
. Enhance code to support negative step.
. Check step +-1 to make sure it hits loop condition !=
. Enhance runtime cases to check more boundary cases and run order 
cases.
. Refine for compiling time: check loop num of insns and can_copy_bbs_p 
later




diff --git a/gcc/testsuite/gcc.dg/loop-split1.c
b/gcc/testsuite/gcc.dg/loop-split1.c
new file mode 100644
index 000..dd2d03a7b96
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-split1.c
@@ -0,0 +1,101 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+
+void
+foo (int *a, int *b, unsigned l, unsigned n)
+{
+  while (++l != n)
+a[l] = b[l] + 1;
+}
+void
+foo_1 (int *a, int *b, unsigned n)
+{
+  unsigned l = 0;
+  while (++l != n)
+a[l] = b[l] + 1;
+}
+
+void
+foo1 (int *a, int *b, unsigned l, unsigned n)
+{
+  while (l++ != n)
+a[l] = b[l] + 1;
+}
+
+/* No wrap.  */
+void
+foo1_1 (int *a, int *b, unsigned n)
+{
+  unsigned l = 0;
+  while (l++ != n)
+a[l] = b[l] + 1;
+}
+
+unsigned
+foo2 (char *a, char *b, unsigned l, unsigned n)
+{
+  while (++l != n)
+if (a[l] != b[l])
+  break;
+
+  return l;
+}
+
+unsigned
+foo2_1 (char *a, char *b, unsigned l, unsigned n)
+{
+  l = 0;
+  while (++l != n)
+if (a[l] != b[l])
+  break;
+
+  return l;
+}
+
+unsigned
+foo3 (char *a, char *b, unsigned l, unsigned n)
+{
+  while (l++ != n)
+if (a[l] != b[l])
+  break;
+
+  return l;
+}
+
+/* No wrap.  */
+unsigned
+foo3_1 (char *a, char *b, unsigned l, unsigned n)
+{
+  l = 0;
+  while (l++ != n)
+if (a[l] != b[l])
+  break;
+
+  return l;
+}
+
+void
+bar ();
+void
+foo4 (unsigned n, unsigned i)
+{
+  do
+{
+  if (i == n)
+   return;
+  bar ();
+  ++i;
+}
+  while (1);
+}
+
+unsigned
+find_skip_diff (char *p, char *q, unsigned n, unsigned i)
+{
+  while (p[i] == q[i] && ++i != n)
+p++, q++;
+
+  return i;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop split" 8 "lsplit" } } */
diff --git a/gcc/testsuite/gcc.dg/loop-split2.c
b/gcc/testsuite/gcc.dg/loop-split2.c
new file mode 100644
index 000..56377e2f2f5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-split2.c
@@ -0,0 +1,155 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+extern void
+abort (void);
+extern void
+exit (int);
+void
+push (int);
+
+#define NI __attribute__ ((noinline))
+
+void NI
+foo (int *a, int *b, unsigned char l, unsigned char n)
+{
+  while (++l != n)
+a[l] = b[l] + 1;
+}
+
+unsigned NI
+bar (int *a, int *b, unsigned char l, unsigned char n)
+{
+  while (l++ != n)
+{
+  push (l);
+  if (a[l] != b[l])
+   break;
+  push (l + 1);
+}
+  return l;
+}
+
+void NI
+foo_1 (int *a, int *b, unsigned char l, unsigned char n)
+{
+  while (--l != n)
+a[l] = b[l] + 1;
+}
+
+unsigned NI
+bar_1 (int *a, int *b, unsigned char l, unsigned char n)
+{
+  while (l-- != n)
+{
+  push (l);
+  if (a[l] != b[l])
+   break;
+  push (l + 1);
+}
+
+  return l;
+}
+
+int a[258];
+int b[258];
+int c[1024];
+static int top = 0;
+void
+push (int e)
+{
+  c[top++] = e;
+}
+
+void
+reset ()
+{
+  top = 0;
+  __builtin_memset (c, 0, sizeof (c));
+}
+
+#define check(a, b) (a == b)
+
+int
+check_c (int *c, int a0, int a1, int a2, int a3, int a4, int a5)
+{
+  return check (c[0], a0) && check (c[1], a1) && check (c[2], a2)
+&& check (c[3], a3) && check (c[4], a4) && check (c[5], a5);
+}
+
+int
+main ()
+{
+  __builtin_memcpy (b, a, sizeof (a));
+  reset ();
+  if (bar (a, b, 6, 8) != 9 || !check_c (c, 7, 8, 8, 9, 0, 0))
+abort ();
+
+  reset ();
+  if (bar (a, b, 5, 3) != 4 || !check_c (c, 6, 7, 7, 8, 8, 9)
+  || !check_c (c + 496, 254, 255, 255, 256, 0, 1))
+abort ();
+
+  reset ();
+  if (bar (a, b, 6, 6) != 7 || !check_c (c, 0, 0, 0, 0, 0, 0))
+abort ();
+
+  reset ();
+  if (bar (a, b, 253, 255) != 0 || !check_c (c, 254, 255, 255, 256, 0, 
0))

+abort ();
+
+  reset ();
+  if (bar (a, b, 253, 0) != 1 || !check_c (c, 254, 255, 255, 256, 0, 
1))

+abort ();
+
+  reset ();
+  if (bar_1 (a, b, 6, 8) != 7 || !check_c (c, 5, 6, 4, 5, 3, 4))
+abort ();
+
+  reset ();
+  if (bar_1 (a, b, 5, 3) != 2 || !check_c (c, 4, 5, 3, 4, 0, 0))
+abort ();
+
+  reset ();
+  if (bar_1 (a, b, 6, 6) != 5)
+abort ();
+
+  reset ();
+  if (bar_1 (a, b, 2, 255) != 254 || !check_c (c, 1, 2, 0, 1, 255, 
256))

+abort ();
+
+  reset ();
+  if (bar_1 (a, b, 2, 0) != 255 || !check_c (c, 1, 2, 0, 1, 0, 0))
+abort ();
+
+  b[100] += 1;
+  reset ();
+  if (bar (a, b, 90, 110) != 100)
+abort ();
+
+  reset ();
+  if (bar (a, b, 110, 105) != 100)
+abort ();
+
+  reset ();
+  if (bar_1 (a, b, 90, 110) != 109)
+abort ();
+
+  reset ();
+  if (bar_1 (a, b, 2, 90) != 100)
+ 

Re: [PATCH 1/7] Reset the range info on the moved instruction in PHIOPT

2021-06-20 Thread Richard Biener via Gcc-patches
On Sat, Jun 19, 2021 at 9:48 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> I had missed this when wrote the patch which allowed the
> gimple to be moved from inside the conditional as it.  It
> was also missed in the review.  Anyways the range information
> needs to be reset for the moved gimple as it was under a
> conditional and the flow has changed to be unconditional.
> I have not seen any testcase in the wild that produces wrong code
> yet which is why there is no testcase but this is similar to what
> the other code in phiopt does so after moving those to match, there
> might be some.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Richard.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.c (match_simplify_replacement): Reset
> flow senatitive info on the moved ssa set.
> ---
>  gcc/tree-ssa-phiopt.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 02e26f974a5..24cbce9955a 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -836,7 +836,7 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>if (!is_gimple_assign (stmt_to_move))
> return false;
>
> -  tree lhs = gimple_assign_lhs  (stmt_to_move);
> +  tree lhs = gimple_assign_lhs (stmt_to_move);
>gimple *use_stmt;
>use_operand_p use_p;
>
> @@ -892,6 +892,7 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
> }
>gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt_to_move);
>gsi_move_before (&gsi1, &gsi);
> +  reset_flow_sensitive_info (gimple_assign_lhs (stmt_to_move));
>  }
>if (seq)
>  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
> --
> 2.27.0
>


Re: [PATCH 2/7] Duplicate the range information of the phi onto the new ssa_name

2021-06-20 Thread Richard Biener via Gcc-patches
On Sat, Jun 19, 2021 at 9:49 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> Since match_simplify_replacement uses gimple_simplify, there is a new
> ssa name created sometimes and then we go and replace the phi edge with
> this new ssa name, the range information on the phi is lost.
> I don't have a testcase right now where we lose the range information
> though but it does show up when enhancing match.pd to handle
> some min/max patterns and g++.dg/warn/Wstringop-overflow-1.C starts
> to fail.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.c (match_simplify_replacement): Duplicate range
> info if we're the only things setting the target PHI.
> ---
>  gcc/tree-ssa-phiopt.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 24cbce9955a..feb8ca8d0d1 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -894,6 +894,14 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>gsi_move_before (&gsi1, &gsi);
>reset_flow_sensitive_info (gimple_assign_lhs (stmt_to_move));
>  }
> +  /* Duplicate range info if we're the only things setting the target PHI.  
> */
> +  tree phi_result = PHI_RESULT (phi);
> +  if (!gimple_seq_empty_p (seq)
> +  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
> +  && !POINTER_TYPE_P (TREE_TYPE (phi_result))

Please use INTEGRAL_TYPE_P (...)

> +  && SSA_NAME_RANGE_INFO (phi_result)

&& !SSA_NAME_RANGE_INFO (result)

?  Why conditional on !gimple_seq_empty_p (seq)?

It looks like we could do this trick (actually in both directions,
wherever the range
info is missing?) in replace_phi_edge_with_variable instead?

Thanks,
Richard.

)
> +duplicate_ssa_name_range_info (result, SSA_NAME_RANGE_TYPE (phi_result),
> +  SSA_NAME_RANGE_INFO (phi_result));
>if (seq)
>  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
>
> --
> 2.27.0
>


Re: [PATCH 3/7] Try inverted comparison for match_simplify in phiopt

2021-06-20 Thread Richard Biener via Gcc-patches
On Sat, Jun 19, 2021 at 10:51 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> Since match and simplify does not have all of the inverted
> comparison patterns, it make sense to just have
> phi-opt try to do the inversion and try match and simplify again.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

Richard.

> Thanks,
> Andrew Pinski
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.c (match_simplify_replacement):
> If "A ? B : C" fails to simplify, try "(!A) ? C : B".
> ---
>  gcc/tree-ssa-phiopt.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index feb8ca8d0d1..3b3762a668b 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -879,7 +879,26 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
> arg0, arg1,
> &seq, NULL);
>if (!result)
> -return false;
> +{
> +  /* Try !A ? arg1 : arg0 instead.
> +Not all match patterns support inverted comparisons.  */
> +  enum tree_code comp_code = gimple_cond_code (stmt);
> +  tree cmp0 = gimple_cond_lhs (stmt);
> +  tree cmp1 = gimple_cond_rhs (stmt);
> +  comp_code = invert_tree_comparison (comp_code, HONOR_NANS (cmp0));
> +  if (comp_code != ERROR_MARK)
> +   {
> + cond = build2_loc (gimple_location (stmt),
> +comp_code, boolean_type_node,
> +cmp0, cmp1);
> + result = gimple_simplify (COND_EXPR, type,
> +   cond,
> +   arg1, arg0,
> +   &seq, NULL);
> +   }
> +  if (!result)
> +   return false;
> +}
>
>gsi = gsi_last_bb (cond_bb);
>if (stmt_to_move)
> --
> 2.27.0
>


Re: [PATCH 4/7] Expand the comparison argument of fold_cond_expr_with_comparison

2021-06-20 Thread Richard Biener via Gcc-patches
On Sat, Jun 19, 2021 at 11:30 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> To make things slightly easiler to convert fold_cond_expr_with_comparison
> over to match.pd, expanding the arg0 argument into 3 different arguments
> is done. Also this was simple because we don't use arg0 after grabbing
> the code and the two operands.
> Also since we do this, we don't need to fold the comparison to
> get the inverse but just use invert_tree_comparison directly.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Probaby fold_invert_truthvalue is not 100% equivalent to
just invert_tree_comparison but definitely simple inverting was the intent.

Thus OK.

Richard.

> gcc/ChangeLog:
>
> * fold-const.c (fold_cond_expr_with_comparison):
> Exand arg0 into comp_code, arg00, and arg01.
> (fold_ternary_loc): Use invert_tree_comparison
> instead of fold_invert_truthvalue for the case
> where we have A CMP B ? C : A.
> ---
>  gcc/fold-const.c | 39 ++-
>  1 file changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 95673d2..85e90f4 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -126,7 +126,8 @@ static tree range_binop (enum tree_code, tree, tree, int, 
> tree, int);
>  static tree range_predecessor (tree);
>  static tree range_successor (tree);
>  static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
> -static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, 
> tree);
> +static tree fold_cond_expr_with_comparison (location_t, tree, enum tree_code,
> +   tree, tree, tree, tree);
>  static tree unextend (tree, int, int, tree);
>  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
>  static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *);
> @@ -5735,20 +5736,19 @@ merge_ranges (int *pin_p, tree *plow, tree *phigh, 
> int in0_p, tree low0,
>
>
>  /* Subroutine of fold, looking inside expressions of the form
> -   A op B ? A : C, where ARG0, ARG1 and ARG2 are the three operands
> -   of the COND_EXPR.  This function is being used also to optimize
> -   A op B ? C : A, by reversing the comparison first.
> +   A op B ? A : C, where (ARG00, COMP_CODE, ARG01), ARG1 and ARG2
> +   are the three operands of the COND_EXPR.  This function is
> +   being used also to optimize A op B ? C : A, by reversing the
> +   comparison first.
>
> Return a folded expression whose code is not a COND_EXPR
> anymore, or NULL_TREE if no folding opportunity is found.  */
>
>  static tree
>  fold_cond_expr_with_comparison (location_t loc, tree type,
> -   tree arg0, tree arg1, tree arg2)
> +   enum tree_code comp_code,
> +   tree arg00, tree arg01, tree arg1, tree arg2)
>  {
> -  enum tree_code comp_code = TREE_CODE (arg0);
> -  tree arg00 = TREE_OPERAND (arg0, 0);
> -  tree arg01 = TREE_OPERAND (arg0, 1);
>tree arg1_type = TREE_TYPE (arg1);
>tree tem;
>
> @@ -12822,7 +12822,10 @@ fold_ternary_loc (location_t loc, enum tree_code 
> code, tree type,
>   && operand_equal_for_comparison_p (TREE_OPERAND (arg0, 0), op1)
>   && !HONOR_SIGNED_ZEROS (element_mode (op1)))
> {
> - tem = fold_cond_expr_with_comparison (loc, type, arg0, op1, op2);
> + tem = fold_cond_expr_with_comparison (loc, type, TREE_CODE (arg0),
> +   TREE_OPERAND (arg0, 0),
> +   TREE_OPERAND (arg0, 1),
> +   op1, op2);
>   if (tem)
> return tem;
> }
> @@ -12831,14 +12834,16 @@ fold_ternary_loc (location_t loc, enum tree_code 
> code, tree type,
>   && operand_equal_for_comparison_p (TREE_OPERAND (arg0, 0), op2)
>   && !HONOR_SIGNED_ZEROS (element_mode (op2)))
> {
> - location_t loc0 = expr_location_or (arg0, loc);
> - tem = fold_invert_truthvalue (loc0, arg0);
> - if (tem && COMPARISON_CLASS_P (tem))
> -   {
> - tem = fold_cond_expr_with_comparison (loc, type, tem, op2, op1);
> - if (tem)
> -   return tem;
> -   }
> + enum tree_code comp_code = TREE_CODE (arg0);
> + tree arg00 = TREE_OPERAND (arg0, 0);
> + tree arg01 = TREE_OPERAND (arg0, 1);
> + comp_code = invert_tree_comparison (comp_code, HONOR_NANS (arg00));
> + tem = fold_cond_expr_with_comparison (loc, type, comp_code,
> +   arg00,
> +   arg01,
> +   op2, op1);
> + if (tem)
> +   return tem;
> }
>
>/* If the second operand is simpler than the third,