[PATCH 2/7 v2] lto: Remove random_seed from section name.

2024-01-09 Thread Michal Jires
This patch removes suffixes from section names during LTO linking.

These suffixes were originally added for ld -r to work (PR lto/44992).
They were added to all LTO object files, but are only useful before WPA.
After that they waste space, and if kept random, make LTO caching impossible.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-streamer.cc (lto_get_section_name): Remove suffixes after WPA.

gcc/lto/ChangeLog:

* lto-common.cc (lto_section_with_id): Dont load suffix during LTRANS.
---
 gcc/lto-streamer.cc   | 11 +--
 gcc/lto/lto-common.cc |  7 +++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/lto-streamer.cc b/gcc/lto-streamer.cc
index 8032bbf7108..61b5f8ed4dc 100644
--- a/gcc/lto-streamer.cc
+++ b/gcc/lto-streamer.cc
@@ -132,11 +132,18 @@ lto_get_section_name (int section_type, const char *name,
  doesn't confuse the reader with merged sections.
 
  For options don't add a ID, the option reader cannot deal with them
- and merging should be ok here. */
-  if (section_type == LTO_section_opts)
+ and merging should be ok here.
+
+ LTRANS files (output of wpa, input and output of ltrans) are handled
+ directly inside of linker/lto-wrapper, so name uniqueness for external
+ tools is not needed.
+ Randomness would inhibit incremental LTO.  */
+  if (section_type == LTO_section_opts || flag_ltrans)
 strcpy (post, "");
   else if (f != NULL) 
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
+  else if (flag_wpa)
+strcpy (post, "");
   else
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
   char *res = concat (section_name_prefix, sep, add, post, NULL);
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 11e7d63f1be..44aeeddf46f 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -2174,6 +2174,13 @@ lto_section_with_id (const char *name, unsigned 
HOST_WIDE_INT *id)
 
   if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
 return 0;
+
+  if (flag_ltrans)
+{
+  *id = 0;
+  return 1;
+}
+
   s = strrchr (name, '.');
   if (!s)
 return 0;
-- 
2.43.0



Re: [PATCH 3/7] Lockfile.

2024-01-09 Thread Michal Jires
Hi,
> You do not implement GCOV_LINKED_WITH_LOCKING patch, does locking work
> with mingw? Or we only build gcc with cygwin emulation layer these days?

I tried to test _locking implementation with both mingw and msys2, in both
cases fcntl was present and _locking was not. Admittedly I was unable to
finish bootstrap without errors, so I might have been doing something wrong.

So I didn't include _locking implementation, because I was unable to test it,
and I am unsure whether we even have supported host which would require it.

Michal


[COMMITTED] Add myself to write after approval

2023-11-16 Thread Michal Jires
ChangeLog:

* MAINTAINERS: Add myself.
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c43167d9a75..f0112f5d029 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -486,6 +486,7 @@ Fariborz Jahanian   

 Surya Kumari Jangala   
 Haochen Jiang  
 Qian Jianhua   
+Michal Jires   
 Janis Johnson  
 Teresa Johnson 
 Kean Johnston  
@@ -753,6 +754,7 @@ information.
 
 Robin Dapp 
 Robin Dapp 
+Michal Jires   
 Matthias Kretz 
 Tim Lange  
 Jeff Law   
-- 
2.42.1



[PATCH 0/7] lto: Incremental LTO.

2023-11-17 Thread Michal Jires
Hi,
these patches implement Incremental LTO, specifically by caching results of
ltrans phase. Secondarily these patches contain changes to reduce divergence of
ltrans partitions so that they can be cached.

The aim is to reduce compile times for quick edit-compile cycles while using
LTO. Even with these minimal changes to the rest of GCC it works surprisingly
well. Currently testing by self compiling cc1, with individual commits used as
incremental changes, on average only ~1/3 of partitions need to be recompiled
with `-O2 -g0` and ~1/2 with `-O2 -g`. Which directly reduces time spent in
ltrans phase of LTO.

Unfortunately larger gains are a bit fragile. You may remember that during my
Cauldron talk I claimed reduction to ~1/6 and ~1/3 recompilations. That was
achieved with branch from March. Since then there were at least two commits
which introduced new divergence of partitions, though they seem fixable in
future.


[PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_.

2023-11-17 Thread Michal Jires
Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_.
---
 gcc/lto-opts.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/lto-opts.cc b/gcc/lto-opts.cc
index c9bee9d4197..0451e290c75 100644
--- a/gcc/lto-opts.cc
+++ b/gcc/lto-opts.cc
@@ -152,6 +152,7 @@ lto_write_options (void)
case OPT_fprofile_prefix_map_:
case OPT_fcanon_prefix_map:
case OPT_fwhole_program:
+   case OPT_fltrans_output_list_:
  continue;
 
default:
-- 
2.42.1



[PATCH 2/7] lto: Remove random_seed from section name.

2023-11-17 Thread Michal Jires
Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-streamer.cc (lto_get_section_name): Remove random_seed in WPA.
---
 gcc/lto-streamer.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-streamer.cc b/gcc/lto-streamer.cc
index 4968fd13413..53275e32618 100644
--- a/gcc/lto-streamer.cc
+++ b/gcc/lto-streamer.cc
@@ -132,11 +132,17 @@ lto_get_section_name (int section_type, const char *name,
  doesn't confuse the reader with merged sections.
 
  For options don't add a ID, the option reader cannot deal with them
- and merging should be ok here. */
+ and merging should be ok here.
+
+ WPA output is sent to LTRANS directly inside of lto-wrapper, so name
+ uniqueness for external tools is not needed.
+ Randomness would inhibit incremental LTO.  */
   if (section_type == LTO_section_opts)
 strcpy (post, "");
   else if (f != NULL) 
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
+  else if (flag_wpa)
+strcpy (post, ".0");
   else
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
   char *res = concat (section_name_prefix, sep, add, post, NULL);
-- 
2.42.1



[PATCH 4/7] lto: Implement ltrans cache

2023-11-17 Thread Michal Jires
This patch implements Incremental LTO as ltrans cache.

The cache is active when directory $GCC_LTRANS_CACHE is specified and exists.
Stored are pairs of ltrans input/output files and input file hash.
File locking is used to allow multiple GCC instances to use to same cache.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lto-ltrans-cache.o.
* lto-wrapper.cc: Use ltrans cache.
* lto-ltrans-cache.cc: New file.
* lto-ltrans-cache.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/lto-ltrans-cache.cc | 407 
 gcc/lto-ltrans-cache.h  | 164 
 gcc/lto-wrapper.cc  | 150 +--
 4 files changed, 711 insertions(+), 15 deletions(-)
 create mode 100644 gcc/lto-ltrans-cache.cc
 create mode 100644 gcc/lto-ltrans-cache.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2c527245c81..495e5f3d069 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1831,7 +1831,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o lockfile.o
+  lto-wrapper.o collect-utils.o lockfile.o lto-ltrans-cache.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2359,7 +2359,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o \
+  lto-ltrans-cache.o
 
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
new file mode 100644
index 000..0d43e548fb3
--- /dev/null
+++ b/gcc/lto-ltrans-cache.cc
@@ -0,0 +1,407 @@
+/* File caching.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "md5.h"
+#include "lto-ltrans-cache.h"
+
+#include 
+#include 
+#include 
+
+const md5_checksum_t INVALID_CHECKSUM = {
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+};
+
+/* Computes checksum for given file, returns INVALID_CHECKSUM if not possible.
+ */
+static md5_checksum_t
+file_checksum (char const *filename)
+{
+  FILE *file = fopen (filename, "rb");
+
+  if (!file)
+return INVALID_CHECKSUM;
+
+  md5_checksum_t result;
+
+  int ret = md5_stream (file, &result);
+
+  if (ret)
+result = INVALID_CHECKSUM;
+
+  fclose (file);
+
+  return result;
+}
+
+/* Checks identity of two files byte by byte.  */
+static bool
+files_identical (char const *first_filename, char const *second_filename)
+{
+  FILE *f_first = fopen (first_filename, "rb");
+  if (!f_first)
+return false;
+
+  FILE *f_second = fopen (second_filename, "rb");
+  if (!f_second)
+{
+  fclose (f_first);
+  return false;
+}
+
+  bool ret = true;
+
+  for (;;)
+{
+  int c1, c2;
+  c1 = fgetc (f_first);
+  c2 = fgetc (f_second);
+
+  if (c1 != c2)
+   {
+ ret = false;
+ break;
+   }
+
+  if (c1 == EOF)
+   break;
+}
+
+  fclose (f_first);
+  fclose (f_second);
+  return ret;
+}
+
+/* Contructor of cache item.  */
+ltrans_file_cache::item::item (std::string input, std::string output,
+  md5_checksum_t input_checksum, uint32_t last_used):
+  input (std::move (input)), output (std::move (output)),
+  input_checksum (input_checksum), last_used (last_used)
+{
+  lock = lockfile (this->input + ".lock");
+}
+/* Destructor of cache item.  */
+ltrans_file_cache::item::~item ()
+{
+  lock.unlock ();
+}
+
+/* Reads next cache item from cachedata file.
+   Adds `dir/` prefix to filenames.  */
+static ltrans_file_cache::item*
+read_cache_item (FILE* f, const char* dir)
+{
+  md5_checksum_t checksum;
+  uint32_t last_used;
+
+  if (fread (&checksum, 1, checksum.size (), f) != checksum.size ())
+return NULL;
+  if (fread (&last_used, sizeof (last_used), 1, f) != 1)
+return NULL;
+
+  std::vector input

[PATCH 3/7] Lockfile.

2023-11-17 Thread Michal Jires
This patch implements lockfile used for incremental LTO.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lockfile.o.
* lockfile.cc: New file.
* lockfile.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/lockfile.cc | 136 
 gcc/lockfile.h  |  85 ++
 3 files changed, 224 insertions(+), 2 deletions(-)
 create mode 100644 gcc/lockfile.cc
 create mode 100644 gcc/lockfile.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7b7a4ff789a..2c527245c81 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1831,7 +1831,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o
+  lto-wrapper.o collect-utils.o lockfile.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2359,7 +2359,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
   $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBS)
diff --git a/gcc/lockfile.cc b/gcc/lockfile.cc
new file mode 100644
index 000..9440e8938f3
--- /dev/null
+++ b/gcc/lockfile.cc
@@ -0,0 +1,136 @@
+/* File locking.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+
+#include "lockfile.h"
+
+
+/* Unique write lock.  No other lock can be held on this lockfile.
+   Blocking call.  */
+int
+lockfile::lock_write ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_WRLCK;
+
+  while (fcntl (fd, F_SETLKW, &s_flock) && errno == EINTR)
+continue;
+#endif
+  return 0;
+}
+
+/* Unique write lock.  No other lock can be held on this lockfile.
+   Only locks if this filelock is not locked by any other process.
+   Return whether locking was successful.  */
+int
+lockfile::try_lock_write ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_WRLCK;
+
+  if (fcntl (fd, F_SETLK, &s_flock) == -1)
+{
+  close (fd);
+  fd = -1;
+  return 1;
+}
+#endif
+  return 0;
+}
+
+/* Shared read lock.  Only read lock can be held concurrently.
+   If write lock is already held by this process, it will be
+   changed to read lock.
+   Blocking call.  */
+int
+lockfile::lock_read ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_RDLCK;
+
+  while (fcntl (fd, F_SETLKW, &s_flock) && errno == EINTR)
+continue;
+#endif
+  return 0;
+}
+
+/* Unlock all previously placed locks.  */
+void
+lockfile::unlock ()
+{
+  if (fd < 0)
+{
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_UNLCK;
+
+  fcntl (fd, F_SETLK, &s_flock);
+#endif
+  close (fd);
+  fd = -1;
+}
+}
+
+/* Are lockfiles supported?  */
+bool
+lockfile::lockfile_supported ()
+{
+#if HAVE_FCNTL_H
+  return true;
+#else
+  return false;
+#endif
+}
diff --git a/gcc/lockfile.h b/gcc/lockfile.h
new file mode 100644
index 000..afcbaf599c1
--- /dev/null
+++ b/gcc/lockfile.h
@@ -0,0 +1,85 @@
+/* File locking.
+   Copyright (C) 2

[PATCH 5/7] lto: Implement cache partitioning

2023-11-17 Thread Michal Jires
This patch implements new cache partitioning. It tries to keep symbols
from single source file together to minimize propagation of divergence.

It starts with symbols already grouped by source files. If reasonably
possible it only either combines several files into one final partition,
or, if a file is large, split the file into several final partitions.

Intermediate representation is partition_set which contains set of
groups of symbols (each group corresponding to original source file) and
number of final partitions this partition_set should split into.

First partition_fixed_split splits partition_set into constant number of
partition_sets with equal number of symbols groups. If for example there
are 39 source files, the resulting partition_sets will contain 10, 10,
10, and 9 source files. This splitting intentionally ignores estimated
instruction counts to minimize propagation of divergence.

Second partition_over_target_split separates too large files and splits
them into individual symbols to be combined back into several smaller
files in next step.

Third partition_binary_split splits partition_set into two halves until
it should be split into only one final partition, at which point the
remaining symbols are joined into one final partition.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* common.opt: Add cache partitioning.
* flag-types.h (enum lto_partition_model): Likewise.

gcc/lto/ChangeLog:

* lto-partition.cc (new_partition): Use new_partition_no_push.
(new_partition_no_push): New.
(free_ltrans_partition): New.
(free_ltrans_partitions): Use free_ltrans_partition.
(join_partitions): New.
(split_partition_into_nodes): New.
(is_partition_reorder): New.
(class partition_set): New.
(distribute_n_partitions): New.
(partition_over_target_split): New.
(partition_binary_split): New.
(partition_fixed_split): New.
(class partitioner_base): New.
(class partitioner_default): New.
(lto_cache_map): New.
* lto-partition.h (lto_cache_map): New.
* lto.cc (do_whole_program_analysis): Use lto_cache_map.

gcc/testsuite/ChangeLog:

* gcc.dg/completion-2.c: Add -flto-partition=cache.
---
 gcc/common.opt  |   3 +
 gcc/flag-types.h|   3 +-
 gcc/lto/lto-partition.cc| 605 +++-
 gcc/lto/lto-partition.h |   1 +
 gcc/lto/lto.cc  |   2 +
 gcc/testsuite/gcc.dg/completion-2.c |   1 +
 6 files changed, 605 insertions(+), 10 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 1cf3bdd3b51..fe5cf3c0a05 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2174,6 +2174,9 @@ Enum(lto_partition_model) String(1to1) 
Value(LTO_PARTITION_1TO1)
 EnumValue
 Enum(lto_partition_model) String(max) Value(LTO_PARTITION_MAX)
 
+EnumValue
+Enum(lto_partition_model) String(cache) Value(LTO_PARTITION_CACHE)
+
 flto-partition=
 Common Joined RejectNegative Enum(lto_partition_model) Var(flag_lto_partition) 
Init(LTO_PARTITION_BALANCED)
 Specify the algorithm to partition symbols and vars at linktime.
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index c1852cd810c..59b3c23081b 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -393,7 +393,8 @@ enum lto_partition_model {
   LTO_PARTITION_ONE = 1,
   LTO_PARTITION_BALANCED = 2,
   LTO_PARTITION_1TO1 = 3,
-  LTO_PARTITION_MAX = 4
+  LTO_PARTITION_MAX = 4,
+  LTO_PARTITION_CACHE = 5
 };
 
 /* flag_lto_linker_output initialization values.  */
diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
index e4c91213f4b..eb31ecba0d3 100644
--- a/gcc/lto/lto-partition.cc
+++ b/gcc/lto/lto-partition.cc
@@ -36,6 +36,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "lto-partition.h"
 #include "sreal.h"
 
+#include 
+#include 
+
 vec ltrans_partitions;
 
 static void add_symbol_to_partition (ltrans_partition part, symtab_node *node);
@@ -59,20 +62,41 @@ cmp_partitions_order (const void *a, const void *b)
   return orderb - ordera;
 }
 
-/* Create new partition with name NAME.  */
-
+/* Create new partition with name NAME.
+   Does not push into ltrans_partitions.  */
 static ltrans_partition
-new_partition (const char *name)
+new_partition_no_push (const char *name)
 {
   ltrans_partition part = XCNEW (struct ltrans_partition_def);
   part->encoder = lto_symtab_encoder_new (false);
   part->name = name;
   part->insns = 0;
   part->symbols = 0;
+  return part;
+}
+
+/* Create new partition with name NAME.  */
+
+static ltrans_partition
+new_partition (const char *name)
+{
+  ltrans_partition part = new_partition_no_push (name);
   ltrans_partitions.safe_push (part);
   return part;
 }
 
+/* Free memory used by ltrans partition.
+   Encoder can be kept to be freed after streaming.  */
+static void
+free_ltrans_partition (ltrans_partition part, bool delete_encoder)
+  {
+if (part->init

[PATCH 6/7] lto: squash order of symbols in partitions

2023-11-17 Thread Michal Jires
This patch squashes order of symbols in individual partitions, so that
their relative order is conserved, but is not influenced by symbols in
other partitions.
Order of cloned symbols is set to 0. This should be fine because order
specifies order of symbols in input files, which cloned symbols are not
part of.

This is important for incremental LTO because if there is a new symbol,
it otherwise shifts order of all symbols with higher order, which would
diverge them all.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-cgraph.cc (lto_output_node): Add and use order_remap.
(lto_output_varpool_node): Likewise.
(output_symtab): Likewise.
* lto-streamer-out.cc (produce_asm): Likewise.
(output_function): Likewise.
(output_constructor): Likewise.
(copy_function_or_variable): Likewise.
(cmp_int): New.
(lto_output): Generate order_remap.
* lto-streamer.h (produce_asm): Add order_remap.
(output_symtab): Likewise.
---
 gcc/lto-cgraph.cc   | 20 
 gcc/lto-streamer-out.cc | 71 +
 gcc/lto-streamer.h  |  5 +--
 3 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 32c0f5ac6db..a7530290fba 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -381,7 +381,8 @@ reachable_from_this_partition_p (struct cgraph_node *node, 
lto_symtab_encoder_t
 
 static void
 lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
-lto_symtab_encoder_t encoder)
+lto_symtab_encoder_t encoder,
+hash_map, int>* order_remap)
 {
   unsigned int tag;
   struct bitpack_d bp;
@@ -405,7 +406,9 @@ lto_output_node (struct lto_simple_output_block *ob, struct 
cgraph_node *node,
 
   streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
   tag);
-  streamer_write_hwi_stream (ob->main_stream, node->order);
+
+  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
+  streamer_write_hwi_stream (ob->main_stream, order);
 
   /* In WPA mode, we only output part of the call-graph.  Also, we
  fake cgraph node attributes.  There are two cases that we care.
@@ -585,7 +588,8 @@ lto_output_node (struct lto_simple_output_block *ob, struct 
cgraph_node *node,
 
 static void
 lto_output_varpool_node (struct lto_simple_output_block *ob, varpool_node 
*node,
-lto_symtab_encoder_t encoder)
+lto_symtab_encoder_t encoder,
+hash_map, int>* order_remap)
 {
   bool boundary_p = !lto_symtab_encoder_in_partition_p (encoder, node);
   bool encode_initializer_p
@@ -602,7 +606,8 @@ lto_output_varpool_node (struct lto_simple_output_block 
*ob, varpool_node *node,
 
   streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
   LTO_symtab_variable);
-  streamer_write_hwi_stream (ob->main_stream, node->order);
+  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
+  streamer_write_hwi_stream (ob->main_stream, order);
   lto_output_var_decl_ref (ob->decl_state, ob->main_stream, node->decl);
   bp = bitpack_create (ob->main_stream);
   bp_pack_value (&bp, node->externally_visible, 1);
@@ -967,7 +972,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
 /* Output the part of the symtab in SET and VSET.  */
 
 void
-output_symtab (void)
+output_symtab (hash_map, int>* order_remap)
 {
   struct cgraph_node *node;
   struct lto_simple_output_block *ob;
@@ -994,9 +999,10 @@ output_symtab (void)
 {
   symtab_node *node = lto_symtab_encoder_deref (encoder, i);
   if (cgraph_node *cnode = dyn_cast  (node))
-lto_output_node (ob, cnode, encoder);
+   lto_output_node (ob, cnode, encoder, order_remap);
   else
-   lto_output_varpool_node (ob, dyn_cast (node), encoder);
+   lto_output_varpool_node (ob, dyn_cast (node), encoder,
+order_remap);
 }
 
   /* Go over the nodes in SET again to write edges.  */
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index a1bbea8fc68..9448ab195d5 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -2212,7 +2212,8 @@ output_cfg (struct output_block *ob, struct function *fn)
a function, set FN to the decl for that function.  */
 
 void
-produce_asm (struct output_block *ob, tree fn)
+produce_asm (struct output_block *ob, tree fn,
+hash_map, int>* order_remap)
 {
   enum lto_section_type section_type = ob->section_type;
   struct lto_function_header header;
@@ -2221,9 +,11 @@ produce_asm (struct output_block *ob, tree fn)
   if (section_type == LTO_section_function_body)
 {
   const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fn));
-  section_name = lto_get_section_name (section_type, name,
- 

[PATCH 7/7] lto: partition specific lto_clone_numbers

2023-11-17 Thread Michal Jires
Replaces "lto_priv.$clone_number" by
"lto_priv.$partition_hash.$partition_specific_clone_number".
To reduce divergence for incremental LTO.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/lto/ChangeLog:

* lto-partition.cc (set_clone_partition_name_checksum): New.
(CHECKSUM_STRING): New.
(privatize_symbol_name_1): Use partition hash for lto_priv.
(lto_promote_cross_file_statics): Use set_clone_partition_name_checksum.
(lto_promote_statics_nonwpa): Changed clone_map type.
---
 gcc/lto/lto-partition.cc | 49 +++-
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
index eb31ecba0d3..a2ce24eea23 100644
--- a/gcc/lto/lto-partition.cc
+++ b/gcc/lto/lto-partition.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-fnsummary.h"
 #include "lto-partition.h"
 #include "sreal.h"
+#include "md5.h"
 
 #include 
 #include 
@@ -1516,8 +1517,36 @@ validize_symbol_for_target (symtab_node *node)
 }
 }
 
-/* Maps symbol names to unique lto clone counters.  */
-static hash_map *lto_clone_numbers;
+/* Maps symbol names with partition checksum to unique lto clone counters.  */
+using clone_map = hash_map>, unsigned>;
+static clone_map *lto_clone_numbers;
+uint64_t current_partition_checksum = 0;
+
+/* Computes a quick checksum to distinguish partitions of clone numbers.  */
+void
+set_clone_partition_name_checksum (ltrans_partition part)
+{
+#define CHECKSUM_STRING(FOO) md5_process_bytes ((FOO), strlen (FOO), &ctx)
+  struct md5_ctx ctx;
+  md5_init_ctx (&ctx);
+
+  CHECKSUM_STRING (part->name);
+
+  lto_symtab_encoder_iterator lsei;
+  lto_symtab_encoder_t encoder = part->encoder;
+
+  for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next (&lsei))
+{
+  symtab_node *node = lsei_node (lsei);
+  CHECKSUM_STRING (node->name ());
+}
+
+  uint64_t checksum[2];
+  md5_finish_ctx (&ctx, checksum);
+  current_partition_checksum = checksum[0];
+#undef CHECKSUM_STRING
+}
 
 /* Helper for privatize_symbol_name.  Mangle NODE symbol name
represented by DECL.  */
@@ -1531,10 +1560,16 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
 return false;
 
   const char *name = maybe_rewrite_identifier (name0);
-  unsigned &clone_number = lto_clone_numbers->get_or_insert (name);
+
+  unsigned &clone_number = lto_clone_numbers->get_or_insert (
+std::pair {name, current_partition_checksum});
+
+  char lto_priv[32];
+  sprintf (lto_priv, "lto_priv.%lu", current_partition_checksum);
+
   symtab->change_decl_assembler_name (decl,
  clone_function_name (
- name, "lto_priv", clone_number));
+ name, lto_priv, clone_number));
   clone_number++;
 
   if (node->lto_file_data)
@@ -1735,11 +1770,13 @@ lto_promote_cross_file_statics (void)
   part->encoder = compute_ltrans_boundary (part->encoder);
 }
 
-  lto_clone_numbers = new hash_map;
+  lto_clone_numbers = new clone_map;
 
   /* Look at boundaries and promote symbols as needed.  */
   for (i = 0; i < n_sets; i++)
 {
+  set_clone_partition_name_checksum (ltrans_partitions[i]);
+
   lto_symtab_encoder_iterator lsei;
   lto_symtab_encoder_t encoder = ltrans_partitions[i]->encoder;
 
@@ -1778,7 +1815,7 @@ lto_promote_statics_nonwpa (void)
 {
   symtab_node *node;
 
-  lto_clone_numbers = new hash_map;
+  lto_clone_numbers = new clone_map;
   FOR_EACH_SYMBOL (node)
 {
   rename_statics (NULL, node);
-- 
2.42.1



[PATCH 3/7 v2] Lockfile.

2024-06-20 Thread Michal Jires
This version differs by using INCLUDE_STRING instead of .
(+whitespace and year)

___

This patch implements lockfile used for incremental LTO.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lockfile.o.
* lockfile.cc: New file.
* lockfile.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/lockfile.cc | 136 
 gcc/lockfile.h  |  78 +++
 3 files changed, 217 insertions(+), 2 deletions(-)
 create mode 100644 gcc/lockfile.cc
 create mode 100644 gcc/lockfile.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f5adb647d3f..90ec59dca75 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1855,7 +1855,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o
+  lto-wrapper.o collect-utils.o lockfile.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2384,7 +2384,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
   $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBS)
diff --git a/gcc/lockfile.cc b/gcc/lockfile.cc
new file mode 100644
index 000..8ecb4dc2848
--- /dev/null
+++ b/gcc/lockfile.cc
@@ -0,0 +1,136 @@
+/* File locking.
+   Copyright (C) 2023-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define INCLUDE_STRING
+#include "config.h"
+#include "system.h"
+#include "lockfile.h"
+
+
+/* Unique write lock.  No other lock can be held on this lockfile.
+   Blocking call.  */
+int
+lockfile::lock_write ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_WRLCK;
+
+  while (fcntl (fd, F_SETLKW, &s_flock) && errno == EINTR)
+continue;
+#endif
+  return 0;
+}
+
+/* Unique write lock.  No other lock can be held on this lockfile.
+   Only locks if this filelock is not locked by any other process.
+   Return whether locking was successful.  */
+int
+lockfile::try_lock_write ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_WRLCK;
+
+  if (fcntl (fd, F_SETLK, &s_flock) == -1)
+{
+  close (fd);
+  fd = -1;
+  return 1;
+}
+#endif
+  return 0;
+}
+
+/* Shared read lock.  Only read lock can be held concurrently.
+   If write lock is already held by this process, it will be
+   changed to read lock.
+   Blocking call.  */
+int
+lockfile::lock_read ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_RDLCK;
+
+  while (fcntl (fd, F_SETLKW, &s_flock) && errno == EINTR)
+continue;
+#endif
+  return 0;
+}
+
+/* Unlock all previously placed locks.  */
+void
+lockfile::unlock ()
+{
+  if (fd < 0)
+{
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_UNLCK;
+
+  fcntl (fd, F_SETLK, &s_flock);
+#endif
+  close (fd);
+  fd = -1;
+}
+}
+
+/* Are lockfiles supported?  */
+bool
+lockfile::lockfile_supported ()
+{
+#if HAVE_FCNTL_H
+  return true;
+#else
+  return false;
+#endif
+}
diff --git a/gcc/lockfile.h b/gcc/lockfile.h
new file mode 100644
index 

[PATCH 4/7 v2] lto: Implement ltrans cache

2024-06-20 Thread Michal Jires
Outside of suggested changes, this version:
- uses #define INCLUDE_*
- is rebased onto current trunk - 'fprintf (mstream,' lines
- --verbose 'recompiling++' count is moved into correct if branch

___

This patch implements Incremental LTO as ltrans cache.

The cache is active when directory $GCC_LTRANS_CACHE is specified and exists.
Stored are pairs of ltrans input/output files and input file hash.
File locking is used to allow multiple GCC instances to use to same cache.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lto-ltrans-cache.o.
* lto-wrapper.cc: Use ltrans cache.
* lto-ltrans-cache.cc: New file.
* lto-ltrans-cache.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/lto-ltrans-cache.cc | 409 
 gcc/lto-ltrans-cache.h  | 147 +++
 gcc/lto-wrapper.cc  | 153 +--
 4 files changed, 699 insertions(+), 15 deletions(-)
 create mode 100644 gcc/lto-ltrans-cache.cc
 create mode 100644 gcc/lto-ltrans-cache.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 90ec59dca75..ab7335ed882 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1855,7 +1855,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o lockfile.o
+  lto-wrapper.o collect-utils.o lockfile.o lto-ltrans-cache.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2384,7 +2384,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o \
+  lto-ltrans-cache.o
 
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
new file mode 100644
index 000..a6ff02afb58
--- /dev/null
+++ b/gcc/lto-ltrans-cache.cc
@@ -0,0 +1,409 @@
+/* File caching.
+   Copyright (C) 2023-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define INCLUDE_ALGORITHM
+#define INCLUDE_STRING
+#define INCLUDE_ARRAY
+#define INCLUDE_MAP
+#define INCLUDE_VECTOR
+#include "config.h"
+#include "system.h"
+#include "md5.h"
+#include "lto-ltrans-cache.h"
+
+static const md5_checksum_t INVALID_CHECKSUM = {
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+};
+
+/* Computes checksum for given file, returns INVALID_CHECKSUM if not
+   possible.
+ */
+static md5_checksum_t
+file_checksum (char const *filename)
+{
+  FILE *file = fopen (filename, "rb");
+
+  if (!file)
+return INVALID_CHECKSUM;
+
+  md5_checksum_t result;
+
+  int ret = md5_stream (file, &result);
+
+  if (ret)
+result = INVALID_CHECKSUM;
+
+  fclose (file);
+
+  return result;
+}
+
+/* Checks identity of two files byte by byte.  */
+static bool
+files_identical (char const *first_filename, char const *second_filename)
+{
+  FILE *f_first = fopen (first_filename, "rb");
+  if (!f_first)
+return false;
+
+  FILE *f_second = fopen (second_filename, "rb");
+  if (!f_second)
+{
+  fclose (f_first);
+  return false;
+}
+
+  bool ret = true;
+
+  for (;;)
+{
+  int c1, c2;
+  c1 = fgetc (f_first);
+  c2 = fgetc (f_second);
+
+  if (c1 != c2)
+   {
+ ret = false;
+ break;
+   }
+
+  if (c1 == EOF)
+   break;
+}
+
+  fclose (f_first);
+  fclose (f_second);
+  return ret;
+}
+
+/* Contructor of cache item.  */
+ltrans_file_cache::item::item (std::string input, std::string output,
+  md5_checksum_t input_checksum,
+  uint32_t last_used):
+  input (std::move (input)), output (std::move (output)),
+  input_checksum (input_checksum), last_used (last_used)
+{
+  lock = lockfile (this->input + ".lock");
+}
+/* Destructor of cache item.  */
+ltrans_file_cache::item::~item ()
+{
+  lock.unlock ();
+}
+
+/* Reads next cache item from cachedata file.
+   Ad

[PATCH] [lto] ipcp don't propagate where not needed

2024-10-25 Thread Michal Jires
This patch disables propagation of ipcp information into lto partitions
where all instances of the node are marked to be inlined.

Motivation:
Incremental LTO needs stable values between compilations to be
effective. This requirement fails with following example:

void heavily_used_debug_function(int);
...
heavily_used_debug_function(__LINE__);

Ipcp creates long list of all __LINE__ arguments, and then
propagates it with every function clone, even though for inlined
functions this information is not useful.

gcc/ChangeLog:

* ipa-prop.cc (write_ipcp_transformation_info): Disable
  uneeded value propagation.
* lto-cgraph.cc (lto_symtab_encoder_encode): Default values.
(lto_symtab_encoder_always_inlined_p): New.
(lto_set_symtab_encoder_not_always_inlined): New.
(add_node_to): Set always inlined.
* lto-streamer.h (struct lto_encoder_entry): New field.
(lto_symtab_encoder_always_inlined_p): New.
---
 gcc/ipa-prop.cc| 12 +---
 gcc/lto-cgraph.cc  | 41 +
 gcc/lto-streamer.h |  4 
 3 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 032358fde22..ef83ce3edb6 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -5404,9 +5404,15 @@ write_ipcp_transformation_info (output_block *ob, 
cgraph_node *node,
   streamer_write_bitpack (&bp);
 }
 
-  streamer_write_uhwi (ob, vec_safe_length (ts->m_vr));
-  for (const ipa_vr &parm_vr : ts->m_vr)
-parm_vr.streamer_write (ob);
+  /* If all instances of this node are inlined, ipcp info is not useful.  */
+  if (!lto_symtab_encoder_always_inlined_p (encoder, node))
+{
+  streamer_write_uhwi (ob, vec_safe_length (ts->m_vr));
+  for (const ipa_vr &parm_vr : ts->m_vr)
+   parm_vr.streamer_write (ob);
+}
+  else
+streamer_write_uhwi (ob, 0);
 }
 
 /* Stream in the aggregate value replacement chain for NODE from IB.  */
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 53cc965bdfd..5c3e3076c8d 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -114,7 +114,7 @@ lto_symtab_encoder_encode (lto_symtab_encoder_t encoder,
 
   if (!encoder->map)
 {
-  lto_encoder_entry entry = {node, false, false, false};
+  lto_encoder_entry entry = {node, false, false, true, false};
 
   ref = encoder->nodes.length ();
   encoder->nodes.safe_push (entry);
@@ -124,7 +124,7 @@ lto_symtab_encoder_encode (lto_symtab_encoder_t encoder,
   size_t *slot = encoder->map->get (node);
   if (!slot || !*slot)
 {
-  lto_encoder_entry entry = {node, false, false, false};
+  lto_encoder_entry entry = {node, false, false, true, false};
   ref = encoder->nodes.length ();
   if (!slot)
 encoder->map->put (node, ref + 1);
@@ -190,6 +190,27 @@ lto_set_symtab_encoder_encode_body (lto_symtab_encoder_t 
encoder,
   encoder->nodes[index].body = true;
 }
 
+/* Return TRUE if the NODE and its clones are always inlined.  */
+
+bool
+lto_symtab_encoder_always_inlined_p (lto_symtab_encoder_t encoder,
+struct cgraph_node *node)
+{
+  int index = lto_symtab_encoder_lookup (encoder, node);
+  return encoder->nodes[index].always_inlined;
+}
+
+/* Specify that the NODE and its clones are not always inlined.  */
+
+static void
+lto_set_symtab_encoder_not_always_inlined (lto_symtab_encoder_t encoder,
+  struct cgraph_node *node)
+{
+  int index = lto_symtab_encoder_encode (encoder, node);
+  gcc_checking_assert (encoder->nodes[index].node == node);
+  encoder->nodes[index].always_inlined = false;
+}
+
 /* Return TRUE if we should encode initializer of NODE (if any).  */
 
 bool
@@ -799,15 +820,27 @@ output_refs (lto_symtab_encoder_t encoder)
 
 static void
 add_node_to (lto_symtab_encoder_t encoder, struct cgraph_node *node,
-bool include_body)
+bool include_body, bool not_inlined)
 {
   if (node->clone_of)
-add_node_to (encoder, node->clone_of, include_body);
+add_node_to (encoder, node->clone_of, include_body, not_inlined);
   if (include_body)
 lto_set_symtab_encoder_encode_body (encoder, node);
+  if (not_inlined)
+lto_set_symtab_encoder_not_always_inlined (encoder, node);
   lto_symtab_encoder_encode (encoder, node);
 }
 
+/* Add NODE into encoder as well as nodes it is cloned from.
+   Do it in a way so clones appear first.  */
+
+static void
+add_node_to (lto_symtab_encoder_t encoder, struct cgraph_node *node,
+bool include_body)
+{
+  add_node_to (encoder, node, include_body, include_body && !node->inlined_to);
+}
+
 /* Add all references in NODE to encoders.  */
 
 static void
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 397f5fc8d68..1a6bc9fa3a1 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -449,6 +449,8 @@ struct lto_encoder_entry
   unsigned int in_partition:1;
   /* Do we encode body in this partition?  */
   u

[COMMITED] [lto] ipcp don't propagate where not needed

2024-11-06 Thread Michal Jires
Commited with suggested changes.

-

This patch disables propagation of ipcp information into partitions
where all instances of the node are marked to be inlined.

Motivation:
Incremental LTO needs stable values between compilations to be
effective. This requirement fails with following example:

void heavily_used_function(int);
...
heavily_used_function(__LINE__);

Ipcp creates long list of all __LINE__ arguments, and then
propagates it with every function clone, even though for inlined
functions this information is not useful.

gcc/ChangeLog:

* ipa-prop.cc (write_ipcp_transformation_info): Disable
  uneeded value propagation.
* lto-cgraph.cc (lto_symtab_encoder_encode): Default values.
(lto_symtab_encoder_always_inlined_p): New.
(lto_set_symtab_encoder_not_always_inlined): New.
(add_node_to): Set always inlined.
* lto-streamer.h (struct lto_encoder_entry): New field.
(lto_symtab_encoder_always_inlined_p): New.
---
 gcc/ipa-prop.cc| 12 +---
 gcc/lto-cgraph.cc  | 47 +-
 gcc/lto-streamer.h | 11 +++
 3 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 032358fde22..c5633796721 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -5404,9 +5404,15 @@ write_ipcp_transformation_info (output_block *ob, 
cgraph_node *node,
   streamer_write_bitpack (&bp);
 }
 
-  streamer_write_uhwi (ob, vec_safe_length (ts->m_vr));
-  for (const ipa_vr &parm_vr : ts->m_vr)
-parm_vr.streamer_write (ob);
+  /* If all instances of this node are inlined, ipcp info is not useful.  */
+  if (!lto_symtab_encoder_only_for_inlining_p (encoder, node))
+{
+  streamer_write_uhwi (ob, vec_safe_length (ts->m_vr));
+  for (const ipa_vr &parm_vr : ts->m_vr)
+   parm_vr.streamer_write (ob);
+}
+  else
+streamer_write_uhwi (ob, 0);
 }
 
 /* Stream in the aggregate value replacement chain for NODE from IB.  */
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 53cc965bdfd..c66f32852a7 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -114,7 +114,7 @@ lto_symtab_encoder_encode (lto_symtab_encoder_t encoder,
 
   if (!encoder->map)
 {
-  lto_encoder_entry entry = {node, false, false, false};
+  lto_encoder_entry entry (node);
 
   ref = encoder->nodes.length ();
   encoder->nodes.safe_push (entry);
@@ -124,7 +124,7 @@ lto_symtab_encoder_encode (lto_symtab_encoder_t encoder,
   size_t *slot = encoder->map->get (node);
   if (!slot || !*slot)
 {
-  lto_encoder_entry entry = {node, false, false, false};
+  lto_encoder_entry entry (node);
   ref = encoder->nodes.length ();
   if (!slot)
 encoder->map->put (node, ref + 1);
@@ -168,6 +168,15 @@ lto_symtab_encoder_delete_node (lto_symtab_encoder_t 
encoder,
   return true;
 }
 
+/* Return TRUE if the NODE and its clones are always inlined.  */
+
+bool
+lto_symtab_encoder_only_for_inlining_p (lto_symtab_encoder_t encoder,
+   struct cgraph_node *node)
+{
+  int index = lto_symtab_encoder_lookup (encoder, node);
+  return encoder->nodes[index].only_for_inlining;
+}
 
 /* Return TRUE if we should encode the body of NODE (if any).  */
 
@@ -179,17 +188,6 @@ lto_symtab_encoder_encode_body_p (lto_symtab_encoder_t 
encoder,
   return encoder->nodes[index].body;
 }
 
-/* Specify that we encode the body of NODE in this partition.  */
-
-static void
-lto_set_symtab_encoder_encode_body (lto_symtab_encoder_t encoder,
-   struct cgraph_node *node)
-{
-  int index = lto_symtab_encoder_encode (encoder, node);
-  gcc_checking_assert (encoder->nodes[index].node == node);
-  encoder->nodes[index].body = true;
-}
-
 /* Return TRUE if we should encode initializer of NODE (if any).  */
 
 bool
@@ -799,13 +797,28 @@ output_refs (lto_symtab_encoder_t encoder)
 
 static void
 add_node_to (lto_symtab_encoder_t encoder, struct cgraph_node *node,
-bool include_body)
+bool include_body, bool not_inlined)
 {
   if (node->clone_of)
-add_node_to (encoder, node->clone_of, include_body);
+add_node_to (encoder, node->clone_of, include_body, not_inlined);
+
+  int index = lto_symtab_encoder_encode (encoder, node);
+  gcc_checking_assert (encoder->nodes[index].node == node);
+
   if (include_body)
-lto_set_symtab_encoder_encode_body (encoder, node);
-  lto_symtab_encoder_encode (encoder, node);
+encoder->nodes[index].body = true;
+  if (not_inlined)
+encoder->nodes[index].only_for_inlining = false;
+}
+
+/* Add NODE into encoder as well as nodes it is cloned from.
+   Do it in a way so clones appear first.  */
+
+static void
+add_node_to (lto_symtab_encoder_t encoder, struct cgraph_node *node,
+bool include_body)
+{
+  add_node_to (encoder, node, include_body, include_body && !no

[PATCH 3/3] dwarf: lto: Stabilize external die references.

2024-11-06 Thread Michal Jires
During Incremental LTO, contents of LTO partitions diverge because of
external DIE references (DW_AT_abstract_origin).

External references are in form 'die_symbol+offset'.
Originally there is only single die_symbol for each compilation unit and
its offsets are in 100'000s, which easily diverge.

Die symbols have to be unique across compilation units. Originally for
this purpose the die symbol name is computed from hash of entire file.
To avoid this I added flag_lto_debuginfo_assume_unique_filepaths
which computes the die_symbol only from filepath, which seems reasonable
assumption for any project using incremental LTO.
Compilation unit's die symbol name is then prepended to each die symbol
for uniqueness.

To remove divergence of offsets in case of C++, we have to add die
symbols to DW_TAG_subprogram (functions), DW_TAG_variable and
DW_TAG_namespace.

Benefits:
Before this patch Incremental LTO diverges/recompiles ~twice as much
with '-g'. With this, additional divergence with '-g' is under 10 %.

Negatives:
When the flag is set, the added die symbols survive into final
executable. For `cc1` executable this represents almost 10 % size
increase of only added symbols.
You can strip them out, but I have not found a simple way to remove them
automatically in GCC.
However for the purposes of Incremental LTO it should suffice. There was
no measured compilation time increase because of streaming these
additional symbols/strings.

gcc/ChangeLog:

* common.opt: New flag.
* dwarf2out.cc (compute_comp_unit_symbol):
  With flag, don't checksum contents but filepath.
(compute_die_symbols_from_die): New.
(compute_die_symbols): New.
(dwarf2out_early_finish): Call compute_die_symbols.

gcc/testsuite/ChangeLog:

* g++.dg/lto/die_symbol_conflicts_0.C: New test.
---
 gcc/common.opt|   4 +
 gcc/dwarf2out.cc  | 120 +-
 .../g++.dg/lto/die_symbol_conflicts_0.C   |  12 ++
 3 files changed, 132 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C

diff --git a/gcc/common.opt b/gcc/common.opt
index 12b25ff486d..4aa80f0df8f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2253,6 +2253,10 @@ flto-partition=
 Common Joined RejectNegative Enum(lto_partition_model) Var(flag_lto_partition) 
Init(LTO_PARTITION_BALANCED)
 Specify the algorithm to partition symbols and vars at linktime.
 
+flto-debuginfo-assume-unique-filepaths
+Common Var(flag_lto_debuginfo_assume_unique_filepaths) Init(0)
+Assume all linked source files have unique filepaths.
+
 ; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h.
 flto-compression-level=
 Common Joined RejectNegative UInteger Var(flag_lto_compression_level) Init(-1) 
IntegerRange(0, 19)
diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index bf1ac45ed73..af272a3a824 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -8015,9 +8015,17 @@ compute_comp_unit_symbol (dw_die_ref unit_die)
  the name filename of the unit.  */
 
   md5_init_ctx (&ctx);
-  mark = 0;
-  die_checksum (unit_die, &ctx, &mark);
-  unmark_all_dies (unit_die);
+  if (flag_lto_debuginfo_assume_unique_filepaths)
+{
+  gcc_assert (die_name);
+  md5_process_bytes (die_name, strlen (die_name), &ctx);
+}
+  else
+{
+  mark = 0;
+  die_checksum (unit_die, &ctx, &mark);
+  unmark_all_dies (unit_die);
+}
   md5_finish_ctx (&ctx, checksum);
 
   /* When we this for comp_unit_die () we have a DW_AT_name that might
@@ -33119,6 +33127,110 @@ ctf_debug_do_cu (dw_die_ref die)
   FOR_EACH_CHILD (die, c, ctf_do_die (c));
 }
 
+/* Recursively compute die symbols from DIE's attributes.
+   Not all symbols can be computed this way.  */
+static void
+compute_die_symbols_from_die (dw_die_ref die)
+{
+  dw_attr_node *a;
+  int i;
+  const char* name = NULL;
+
+  if (!die->die_attr)
+return;
+
+  switch (die->die_tag)
+{
+  /* Assumed that each die parent has at most single children namespace
+with the same name.  */
+  case DW_TAG_namespace:
+  case DW_TAG_module:
+
+   FOR_EACH_VEC_ELT (*die->die_attr, i, a)
+ {
+   if (a->dw_attr == DW_AT_name)
+ name = AT_string (a);
+   /* Ignored DW_AT_abstract_origin, leads to duplicates.  */
+ }
+   break;
+
+  default: break;
+}
+
+  if (name)
+{
+  gcc_assert (!die->die_id.die_symbol);
+  gcc_assert (die->die_parent);
+
+  const char* parent_symbol = die->die_parent->die_id.die_symbol;
+  /* Prefix with parent symbol to guarantee uniqueness.  Important for
+namespaces.  Toplevel functions and variables can and do use just
+comp_unit's symbol as prefix.  Die symbols of these toplevel symbols
+may overlap.  Use the 'r' to differentiate.  */
+  die->die_id.die_symbol = concat (parent_symbol, ".r.", name, NULL);
+}
+
+  /* Splitting functio

[PATCH 2/3] dwarf: lto: Allow die_symbol outside of comp_unit.

2024-11-06 Thread Michal Jires
Die symbols are used for external references.
Typically during LTO, early debug emits 'die_symbol+offset' for each
possibly referenced DIE in future. Partitions in LTRANS phase then
use these references.

Originally die symbols are handled only in root comp_unit and
in attributes.

This patch allows die symbols to be attached to any DIE.
References then choose closest parent with die symbol.

gcc/ChangeLog:

* dwarf2out.cc (dwarf2out_die_ref_for_decl):
  Choose closest parent with die_symbol.
(output_die): Output asm label.
(output_unit_die_symbol_list): New.
(output_comp_unit): Output die_symbol list.
(reset_dies): Reset all die_symbols.
(dwarf2out_finish): Don't reset comp_unit die_symbol.
---
 gcc/dwarf2out.cc | 80 +++-
 1 file changed, 45 insertions(+), 35 deletions(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index e10a5c78fe9..bf1ac45ed73 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -6039,14 +6039,14 @@ dwarf2out_die_ref_for_decl (tree decl, const char **sym,
 
   /* Similar to get_ref_die_offset_label, but using the "correct"
  label.  */
-  *off = die->die_offset;
-  while (die->die_parent)
+  unsigned HOST_WIDE_INT unit_offset = die->die_offset;
+  while (die->die_parent && (die->comdat_type_p || !die->die_id.die_symbol))
 die = die->die_parent;
-  /* For the containing CU DIE we compute a die_symbol in
+  /* Root CU DIE always contains die_symbol computed in
  compute_comp_unit_symbol.  */
-  if (die->die_tag == DW_TAG_compile_unit)
+  if (!die->comdat_type_p && die->die_id.die_symbol)
 {
-  gcc_assert (die->die_id.die_symbol != NULL);
+  *off = unit_offset - die->die_offset;
   *sym = die->die_id.die_symbol;
   return true;
 }
@@ -10798,6 +10798,10 @@ output_die (dw_die_ref die)
   unsigned long size;
   unsigned ix;
 
+  if ((flag_generate_lto || flag_generate_offload)
+  && !die->comdat_type_p && die->die_id.die_symbol)
+ASM_OUTPUT_LABEL (asm_out_file, die->die_id.die_symbol);
+
   dw2_asm_output_data_uleb128 (die->die_abbrev, "(DIE (%#lx) %s)",
   (unsigned long)die->die_offset,
   dwarf_tag_name (die->die_tag));
@@ -11228,14 +11232,41 @@ output_compilation_unit_header (enum dwarf_unit_type 
ut)
 dw2_asm_output_data (1, DWARF2_ADDR_SIZE, "Pointer Size (in bytes)");
 }
 
+/* Output list of all die symbols in the DIE.  */
+static void
+output_unit_die_symbol_list (dw_die_ref die)
+{
+  if (!die->comdat_type_p && die->die_id.die_symbol)
+{
+  const char* sym = die->die_id.die_symbol;
+  /* ???  No way to get visibility assembled without a decl.  */
+  tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+ get_identifier (sym), char_type_node);
+  TREE_PUBLIC (decl) = true;
+  TREE_STATIC (decl) = true;
+  DECL_ARTIFICIAL (decl) = true;
+  DECL_VISIBILITY (decl) = VISIBILITY_HIDDEN;
+  DECL_VISIBILITY_SPECIFIED (decl) = true;
+  targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN);
+#ifdef ASM_WEAKEN_LABEL
+  /* We prefer a .weak because that handles duplicates from duplicate
+archive members in a graceful way.  */
+  ASM_WEAKEN_LABEL (asm_out_file, sym);
+#else
+  targetm.asm_out.globalize_label (asm_out_file, sym);
+#endif
+}
+
+  dw_die_ref c;
+  FOR_EACH_CHILD (die, c, output_unit_die_symbol_list (c));
+}
+
 /* Output the compilation unit DIE and its children.  */
 
 static void
 output_comp_unit (dw_die_ref die, int output_if_empty,
  const unsigned char *dwo_id)
 {
-  const char *oldsym;
-
   /* Unless we are outputting main CU, we may throw away empty ones.  */
   if (!output_if_empty && die->die_child == NULL)
 return;
@@ -11267,34 +11298,12 @@ output_comp_unit (dw_die_ref die, int output_if_empty,
 : DWARF_COMPILE_UNIT_HEADER_SIZE);
   calc_die_sizes (die);
 
-  oldsym = die->die_id.die_symbol;
-
   switch_to_section (debug_info_section);
   ASM_OUTPUT_LABEL (asm_out_file, debug_info_section_label);
   info_section_emitted = true;
 
-  /* For LTO cross unit DIE refs we want a symbol on the start of the
- debuginfo section, not on the CU DIE.  */
-  if ((flag_generate_lto || flag_generate_offload) && oldsym)
-{
-  /* ???  No way to get visibility assembled without a decl.  */
-  tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
- get_identifier (oldsym), char_type_node);
-  TREE_PUBLIC (decl) = true;
-  TREE_STATIC (decl) = true;
-  DECL_ARTIFICIAL (decl) = true;
-  DECL_VISIBILITY (decl) = VISIBILITY_HIDDEN;
-  DECL_VISIBILITY_SPECIFIED (decl) = true;
-  targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN);
-#ifdef ASM_WEAKEN_LABEL
-  /* We prefer a .weak because that handles duplicates from duplicate
- archive members in a graceful way.  */

[PATCH 1/3] ipa-strub: Replace cgraph_node order with uid.

2024-11-06 Thread Michal Jires
ipa_strub_set_mode_for_new_functions uses node order as unique ever
increasing identifier. This is better satisfied with uid.
Order loses uniqueness with following patches.

gcc/ChangeLog:
* ipa-strub.cc (ipa_strub_set_mode_for_new_functions): Replace
  order with uid.
(pass_ipa_strub_mode::execute): Likewise.
---
 gcc/ipa-strub.cc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/ipa-strub.cc b/gcc/ipa-strub.cc
index 8fa7bdf5300..9c0b15c88b1 100644
--- a/gcc/ipa-strub.cc
+++ b/gcc/ipa-strub.cc
@@ -2254,16 +2254,16 @@ remove_named_attribute_unsharing (const char *name, 
tree *attrs)
 }
 }
 
-/* Record the order of the last cgraph entry whose mode we've already set, so
+/* Record the uid of the last cgraph entry whose mode we've already set, so
that we can perform mode setting incrementally without duplication.  */
-static int last_cgraph_order;
+static int last_cgraph_uid;
 
 /* Set strub modes for functions introduced since the last call.  */
 
 static void
 ipa_strub_set_mode_for_new_functions ()
 {
-  if (symtab->order == last_cgraph_order)
+  if (symtab->cgraph_max_uid == last_cgraph_uid)
 return;
 
   cgraph_node *node;
@@ -2278,13 +2278,13 @@ ipa_strub_set_mode_for_new_functions ()
continue;
 
   /*  Already done.  */
-  if (node->order < last_cgraph_order)
+  if (node->get_uid () < last_cgraph_uid)
continue;
 
   set_strub_mode (node);
 }
 
-  last_cgraph_order = symtab->order;
+  last_cgraph_uid = symtab->cgraph_max_uid;
 }
 
 /* Return FALSE if NODE is a strub context, and TRUE otherwise.  */
@@ -2660,7 +2660,7 @@ pass_ipa_strub::adjust_at_calls_calls (cgraph_node *node)
 unsigned int
 pass_ipa_strub_mode::execute (function *)
 {
-  last_cgraph_order = 0;
+  last_cgraph_uid = 0;
   ipa_strub_set_mode_for_new_functions ();
 
   /* Verify before any inlining or other transformations.  */
-- 
2.47.0



[PATCH 3/3] incremental lto: Remap node order for stability.

2024-11-06 Thread Michal Jires
This patch adds remapping of node order for each lto partition.
Resulting order conserves relative order inside partition, but
is independent of outside symbols. So if lto partition contains
identical set of symbols, their remapped order will be stable
between compilations.

gcc/ChangeLog:

* ipa-devirt.cc (ipa_odr_summary_write):
Add unused argument.
* ipa-fnsummary.cc (ipa_fn_summary_write): Likewise.
* ipa-icf.cc (sem_item_optimizer::write_summary): Likewise.
* ipa-modref.cc (modref_write): Likewise.
* ipa-prop.cc (ipa_prop_write_jump_functions): Likewise.
(ipcp_write_transformation_summaries): Likewise.
* ipa-sra.cc (ipa_sra_write_summary): Likewise.
* lto-cgraph.cc (lto_symtab_encoder_delete): Delete remap.
(lto_output_node): Remap order.
(lto_output_varpool_node): Likewise.
(output_cgraph_opt_summary): Add unused argument.
* lto-streamer-out.cc (produce_asm): Use remapped order.
(output_function): Propagate remapped order.
(output_constructor): Likewise.
(copy_function_or_variable): Likewise.
(cmp_int): New.
(create_order_remap): New.
(lto_output): Create remap. Remap order.
* lto-streamer.h (struct lto_symtab_encoder_d): Remap hash_map.
(produce_asm): Add order argument.
---
 gcc/ipa-devirt.cc   |  2 +-
 gcc/ipa-fnsummary.cc|  2 +-
 gcc/ipa-icf.cc  |  2 +-
 gcc/ipa-modref.cc   |  4 +-
 gcc/ipa-prop.cc |  4 +-
 gcc/ipa-sra.cc  |  2 +-
 gcc/lto-cgraph.cc   | 10 +++--
 gcc/lto-streamer-out.cc | 84 +++--
 gcc/lto-streamer.h  |  5 ++-
 9 files changed, 91 insertions(+), 24 deletions(-)

diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index c406e5138db..098798281b7 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -4131,7 +4131,7 @@ ipa_odr_summary_write (void)
   odr_enum_map = NULL;
 }
 
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, -1);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index b3824783406..badc5e703b2 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -4911,7 +4911,7 @@ ipa_fn_summary_write (void)
}
 }
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, -1);
   destroy_output_block (ob);
 
   ipa_prop_write_jump_functions ();
diff --git a/gcc/ipa-icf.cc b/gcc/ipa-icf.cc
index b10a6baf109..d9cd7d0c1c0 100644
--- a/gcc/ipa-icf.cc
+++ b/gcc/ipa-icf.cc
@@ -2216,7 +2216,7 @@ sem_item_optimizer::write_summary (void)
 }
 
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, -1);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index 19359662f8f..7f36fab3db2 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -3739,7 +3739,7 @@ modref_write ()
 {
   streamer_write_uhwi (ob, 0);
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, -1);
   destroy_output_block (ob);
   return;
 }
@@ -3814,7 +3814,7 @@ modref_write ()
}
 }
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, -1);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 78d1fb7086d..032358fde22 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -5297,7 +5297,7 @@ ipa_prop_write_jump_functions (void)
 ipa_write_node_info (ob, node);
 }
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, -1);
   destroy_output_block (ob);
 }
 
@@ -5489,7 +5489,7 @@ ipcp_write_transformation_summaries (void)
write_ipcp_transformation_info (ob, cnode, ts);
 }
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, -1);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index 04920f2aa8e..630f4d6c14f 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -2898,7 +2898,7 @@ ipa_sra_write_summary (void)
 isra_write_node_summary (ob, node);
 }
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob, NULL, -1);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 1d4311a8832..53cc965bdfd 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -96,6 +96,8 @@ lto_symtab_encoder_delete (lto_symtab_encoder_t encoder)
encoder->nodes.release ();
if (encoder->map)
  delete encoder->map;
+   if (encoder->order_remap)
+ delete encoder->order_remap;
free (encoder);
 }
 
@@ -406,7 +408,8 @@ lto_output_node (struct lto_simple_output_block *ob, struct 
cgraph_node *node,
 
   streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_t

[PATCH 0/3] dwarf: incremental lto: Stabilize external references.

2024-11-06 Thread Michal Jires
These patches allow adding additional die symbols, so that
external references represented as 'die_symbol+offset' don't diverge
contents of LTO partitions.

Bootstrapped/regtested on x86_64-linux


[PATCH 0/3] incremental lto: Stabilizing node order.

2024-11-06 Thread Michal Jires
This set of patches replaces the original Incremental LTO patch for
stabilizing node order in lto partitions.

Main difference is earlier handling of node clone order, as suggested by
Honza (https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651593.html).
Previously I simply set clone's order to zero during reorder, now they
share order with their originating node since their creation.

Bootstrapped/regtested on x86_64-linux


[PATCH 2/3] Node clones share order.

2024-11-06 Thread Michal Jires
Symbol order corresponds to the order in source code.
For clones their order is currently arbitrarily chosen as max order++
But it would be more consistent with original purpose to choose clones
order to be shared with the original node order.
This stabilizes clone order for Incremental LTO.

Order is thus no longer unique, but this property is not used outside
of previous patch, where we can use uid.
If total order would be needed, sorting by order and then uid suffices.

gcc/ChangeLog:

* cgraph.h (symbol_table::register_symbol):
  Order can be already set.
* cgraphclones.cc (cgraph_node::create_clone):
  Reuse order for clones.
---
 gcc/cgraph.h| 5 +++--
 gcc/cgraphclones.cc | 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index a8c3224802c..508788b062b 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -121,7 +121,7 @@ public:
   used_from_other_partition (false), in_other_partition (false),
   address_taken (false), in_init_priority_hash (false),
   need_lto_streaming (false), offloadable (false), ifunc_resolver (false),
-  order (false), next_sharing_asm_name (NULL),
+  order (-1), next_sharing_asm_name (NULL),
   previous_sharing_asm_name (NULL), same_comdat_group (NULL), ref_list (),
   alias_target (NULL), lto_file_data (NULL), aux (NULL),
   x_comdat_group (NULL_TREE), x_section (NULL)
@@ -2815,7 +2815,8 @@ symbol_table::register_symbol (symtab_node *node)
 nodes->previous = node;
   nodes = node;
 
-  node->order = order++;
+  if (node->order == -1)
+node->order = order++;
 }
 
 /* Register a top-level asm statement ASM_STR.  */
diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index 4fff6873a36..fc0f9046d99 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -401,6 +401,7 @@ cgraph_node::create_clone (tree new_decl, profile_count 
prof_count,
 count = count.combine_with_ipa_count (count.ipa () - prof_count.ipa 
());
 }
   new_node->decl = new_decl;
+  new_node->order = order;
   new_node->register_symbol ();
   new_node->lto_file_data = lto_file_data;
   new_node->analyzed = analyzed;
-- 
2.47.0



[PATCH 1/3] dwarf: Delete dead code.

2024-11-06 Thread Michal Jires
This if branch checks for comdat_type_p (GTY union tag) and then uses
incorrect union variant die_id.die_symbol. There is no way to create
this combination of valid values even if we ignore the GTY.

Running testsuite with abort() in branch confirms that it is never taken.

gcc/ChangeLog:

* dwarf2out.cc (output_comp_unit): Delete dead code.
---
 gcc/dwarf2out.cc | 25 +
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 38aedb64470..e10a5c78fe9 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -11234,8 +11234,7 @@ static void
 output_comp_unit (dw_die_ref die, int output_if_empty,
  const unsigned char *dwo_id)
 {
-  const char *secname, *oldsym;
-  char *tmp;
+  const char *oldsym;
 
   /* Unless we are outputting main CU, we may throw away empty ones.  */
   if (!output_if_empty && die->die_child == NULL)
@@ -11269,21 +11268,10 @@ output_comp_unit (dw_die_ref die, int output_if_empty,
   calc_die_sizes (die);
 
   oldsym = die->die_id.die_symbol;
-  if (oldsym && die->comdat_type_p)
-{
-  tmp = XALLOCAVEC (char, strlen (oldsym) + 24);
 
-  sprintf (tmp, ".gnu.linkonce.wi.%s", oldsym);
-  secname = tmp;
-  die->die_id.die_symbol = NULL;
-  switch_to_section (get_section (secname, SECTION_DEBUG, NULL));
-}
-  else
-{
-  switch_to_section (debug_info_section);
-  ASM_OUTPUT_LABEL (asm_out_file, debug_info_section_label);
-  info_section_emitted = true;
-}
+  switch_to_section (debug_info_section);
+  ASM_OUTPUT_LABEL (asm_out_file, debug_info_section_label);
+  info_section_emitted = true;
 
   /* For LTO cross unit DIE refs we want a symbol on the start of the
  debuginfo section, not on the CU DIE.  */
@@ -11322,10 +11310,7 @@ output_comp_unit (dw_die_ref die, int output_if_empty,
   /* Leave the marks on the main CU, so we can check them in
  output_pubnames.  */
   if (oldsym)
-{
-  unmark_dies (die);
-  die->die_id.die_symbol = oldsym;
-}
+unmark_dies (die);
 }
 
 /* Whether to generate the DWARF accelerator tables in .debug_pubnames
-- 
2.47.0



Re: [COMMITED] [lto] ipcp don't propagate where not needed

2024-11-06 Thread Michal Jires
On Wed, 2024-11-06 at 17:33:50 +, Jonathan Wakely wrote:
> 
> If there's going to be a constructor then it should initialize the members.
> 
> Otherwise, your original patch was better, because you could write
> this to get an all-zeros object:
> 
>   lto_encoder_entry e{};
> 
> Now you can't safely initialize it, because the default constructor
> leaves everything indeterminate. That's just a bug waiting to happen.
> 

Using all-zeros would be probably bug anyway and explicitly initializing
might encourage thinking that such default values are supposed to be
used.

Anyway, I have misglanced the code for which this was needed, and we can
trivially get rid of it.

Is this now OK?

---
 gcc/lto-cgraph.cc  | 3 +--
 gcc/lto-streamer.h | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index b18d2b34e46..c9b846a04d6 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -142,7 +142,6 @@ lto_symtab_encoder_delete_node (lto_symtab_encoder_t 
encoder,
symtab_node *node)
 {
   int index;
-  lto_encoder_entry last_node;
 
   size_t *slot = encoder->map->get (node);
   if (slot == NULL || !*slot)
@@ -153,7 +152,7 @@ lto_symtab_encoder_delete_node (lto_symtab_encoder_t 
encoder,
 
   /* Remove from vector. We do this by swapping node with the last element
  of the vector.  */
-  last_node = encoder->nodes.pop ();
+  lto_encoder_entry last_node = encoder->nodes.pop ();
   if (last_node.node != node)
 {
   bool existed = encoder->map->put (last_node.node, index + 1);
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 1c416a7a1b9..294e7b3e328 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -443,8 +443,7 @@ struct lto_stats_d
 /* Entry of LTO symtab encoder.  */
 struct lto_encoder_entry
 {
-  /* Constructors.  */
-  lto_encoder_entry () {}
+  /* Constructor.  */
   lto_encoder_entry (symtab_node* n)
 : node (n), in_partition (false), body (false), only_for_inlining (true),
   initializer (false)
-- 
2.47.0



[PATCH 4/7 v3] lto: Implement ltrans cache

2024-11-08 Thread Michal Jires
Changes from previous version:

1) As suggested, I replaced md5 with sha1.
Though I have not been able to measure a difference.

Checksum computation will be later moved to WPA before partitions are
streamed to disk.

2) File comparison with mmap.

Ltrans cache overhead when compiling cc1:
whole LTO (WPA + LTRANS(16 threads) + cache): 280 s - with no cache hit
WPA: 45 s
cache:
  Checksum computation: 0.5 s
  File comparison: 0.1 s - with full cache hits (2 s in previous patch)

3) Replaced the envar usage with flags.
To enable efficient Incremental LTO, other flags are needed anyway.

--

This patch implements Incremental LTO as ltrans cache.

Stored are pairs of ltrans input/output files and input file hash.
File locking is used to allow multiple GCC instances to use to same cache.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lto-ltrans-cache.o.
* lto-wrapper.cc: Use ltrans cache.
* lto-ltrans-cache.cc: New file.
* lto-ltrans-cache.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/common.opt  |   8 +
 gcc/lto-ltrans-cache.cc | 434 
 gcc/lto-ltrans-cache.h  | 144 +
 gcc/lto-opts.cc |   2 +
 gcc/lto-wrapper.cc  | 164 +--
 6 files changed, 742 insertions(+), 15 deletions(-)
 create mode 100644 gcc/lto-ltrans-cache.cc
 create mode 100644 gcc/lto-ltrans-cache.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index eb9e52dce83..c114f5b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1866,7 +1866,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o lockfile.o
+  lto-wrapper.o collect-utils.o lockfile.o lto-ltrans-cache.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2395,7 +2395,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o \
+  lto-ltrans-cache.o
 
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
diff --git a/gcc/common.opt b/gcc/common.opt
index 164cec7dc32..d364ab8d791 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2232,6 +2232,14 @@ flto=
 Common RejectNegative Joined Var(flag_lto)
 Link-time optimization with number of parallel jobs or jobserver.
 
+flto-incremental=
+Common Joined Var(flag_lto_incremental)
+Enable incremental LTO, with its cache in given directory.
+
+flto-incremental-cache-size=
+Common Joined RejectNegative UInteger Var(flag_lto_incremental_cache_size) 
Init(2048)
+Number of cache entries in incremental LTO after which to prune old entries.
+
 Enum
 Name(lto_partition_model) Type(enum lto_partition_model) UnknownError(unknown 
LTO partitioning model %qs)
 
diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
new file mode 100644
index 000..dfc1de1dfe3
--- /dev/null
+++ b/gcc/lto-ltrans-cache.cc
@@ -0,0 +1,434 @@
+/* File caching.
+   Copyright (C) 2023-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define INCLUDE_ALGORITHM
+#define INCLUDE_STRING
+#define INCLUDE_ARRAY
+#define INCLUDE_MAP
+#define INCLUDE_VECTOR
+#include "config.h"
+#include "system.h"
+#include "sha1.h"
+#include "lto-ltrans-cache.h"
+
+static const checksum_t NULL_CHECKSUM = {
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+};
+
+/* Computes checksum for given file, returns NULL_CHECKSUM if not
+   possible.
+ */
+static checksum_t
+file_checksum (char const *filename)
+{
+  FILE *file = fopen (filename, "rb");
+
+  if (!file)
+return NULL_CHECKSUM;
+
+  checksum_t result = NULL_CHECKSUM;
+
+  int ret = sha1_stream (file, &result);
+
+  if (ret)
+result = NULL_CHECKSUM;
+
+  fclose (file);
+
+  return result;
+}
+
+/* Checks identity of two files.  */
+static bool
+files_identical

[PATCH 2/3 v2] dwarf: lto: Allow die_symbol outside of comp_unit.

2024-12-12 Thread Michal Jires
On Wed, 2024-11-27 at 15:18:39 +, Richard Biener wrote:
> I'm not sure it will work this way together with the output_die hunk,
> instead
> assemblers likely expect all this to happen close to the actual label
> emission, so I suggest to only split out the visibiltiy/globalizing fancy
> and emit it from output_die instead.

Thanks, apparently somehow I got the idea that the
globalization/weakening of symbols has to be together.
Which seems to be not needed, so I moved everything to output_die next
to label emission.
Michal

---

Die symbols are used for external references.
Typically during LTO, early debug emits 'die_symbol+offset' for each
possibly referenced DIE in future. Partitions in LTRANS phase then
use these references.

Originally die symbols are handled only in root comp_unit and
in attributes.

This patch allows die symbols to be attached to any DIE.
References then choose closest parent with die symbol.

gcc/ChangeLog:

* dwarf2out.cc (dwarf2out_die_ref_for_decl):
  Choose closest parent with die_symbol.
(output_die): Output asm label.
(output_comp_unit): Output die_symbol list.
(reset_dies): Reset all die_symbols.
(dwarf2out_finish): Don't reset comp_unit die_symbol.
---
 gcc/dwarf2out.cc | 69 +++-
 1 file changed, 33 insertions(+), 36 deletions(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 6bb73c6e5c6..1e55b900712 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -6039,14 +6039,14 @@ dwarf2out_die_ref_for_decl (tree decl, const char **sym,
 
   /* Similar to get_ref_die_offset_label, but using the "correct"
  label.  */
-  *off = die->die_offset;
-  while (die->die_parent)
+  unsigned HOST_WIDE_INT unit_offset = die->die_offset;
+  while (die->die_parent && (die->comdat_type_p || !die->die_id.die_symbol))
 die = die->die_parent;
-  /* For the containing CU DIE we compute a die_symbol in
+  /* Root CU DIE always contains die_symbol computed in
  compute_comp_unit_symbol.  */
-  if (die->die_tag == DW_TAG_compile_unit)
+  if (!die->comdat_type_p && die->die_id.die_symbol)
 {
-  gcc_assert (die->die_id.die_symbol != NULL);
+  *off = unit_offset - die->die_offset;
   *sym = die->die_id.die_symbol;
   return true;
 }
@@ -10798,6 +10798,29 @@ output_die (dw_die_ref die)
   unsigned long size;
   unsigned ix;
 
+  /* Output die_symbol.  */
+  if ((flag_generate_lto || flag_generate_offload)
+  && !die->comdat_type_p && die->die_id.die_symbol)
+{
+  const char* sym = die->die_id.die_symbol;
+  /*tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+ get_identifier (sym), char_type_node);
+  TREE_PUBLIC (decl) = true;
+  TREE_STATIC (decl) = true;
+  DECL_ARTIFICIAL (decl) = true;
+  DECL_VISIBILITY (decl) = VISIBILITY_HIDDEN;
+  DECL_VISIBILITY_SPECIFIED (decl) = true;
+  targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN);*/
+#ifdef ASM_WEAKEN_LABEL
+  /* We prefer a .weak because that handles duplicates from duplicate
+archive members in a graceful way.  */
+  ASM_WEAKEN_LABEL (asm_out_file, sym);
+#else
+  targetm.asm_out.globalize_label (asm_out_file, sym);
+#endif
+  ASM_OUTPUT_LABEL (asm_out_file, sym);
+}
+
   dw2_asm_output_data_uleb128 (die->die_abbrev, "(DIE (%#lx) %s)",
   (unsigned long)die->die_offset,
   dwarf_tag_name (die->die_tag));
@@ -11234,8 +11257,6 @@ static void
 output_comp_unit (dw_die_ref die, int output_if_empty,
  const unsigned char *dwo_id)
 {
-  const char *oldsym;
-
   /* Unless we are outputting main CU, we may throw away empty ones.  */
   if (!output_if_empty && die->die_child == NULL)
 return;
@@ -11267,35 +11288,10 @@ output_comp_unit (dw_die_ref die, int output_if_empty,
 : DWARF_COMPILE_UNIT_HEADER_SIZE);
   calc_die_sizes (die);
 
-  oldsym = die->die_id.die_symbol;
-
   switch_to_section (debug_info_section);
   ASM_OUTPUT_LABEL (asm_out_file, debug_info_section_label);
   info_section_emitted = true;
 
-  /* For LTO cross unit DIE refs we want a symbol on the start of the
- debuginfo section, not on the CU DIE.  */
-  if ((flag_generate_lto || flag_generate_offload) && oldsym)
-{
-  /* ???  No way to get visibility assembled without a decl.  */
-  tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
- get_identifier (oldsym), char_type_node);
-  TREE_PUBLIC (decl) = true;
-  TREE_STATIC (decl) = true;
-  DECL_ARTIFICIAL (decl) = true;
-  DECL_VISIBILITY (decl) = VISIBILITY_HIDDEN;
-  DECL_VISIBILITY_SPECIFIED (decl) = true;
-  targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN);
-#ifdef ASM_WEAKEN_LABEL
-  /* We prefer a .weak because that handles duplicates from duplicate
- archive

[PATCH 3/3 v2] lto: Remap node order for stability.

2024-12-12 Thread Michal Jires
On Sun, 2024-11-17 at 19:15:04 +, Jan Hubicka wrote:
> 
> I would suggest renaming produce_asm to produce_symbol_asm 
> and making produce_asm wrapper which passes fn=NULL and output_order=-1,
> so we do not have odd parameters everywhere in streaming code.
> 
> OK with this change.
> Honza

Applied suggested change.



This patch adds remapping of node order for each lto partition.
Resulting order conserves relative order inside partition, but
is independent of outside symbols. So if lto partition contains
identical set of symbols, their remapped order will be stable
between compilations.

This stability is needed for Incremental LTO.

gcc/ChangeLog:

* ipa-devirt.cc (ipa_odr_summary_write):
  Add unused argument.
* ipa-fnsummary.cc (ipa_fn_summary_write): Likewise.
* ipa-icf.cc (sem_item_optimizer::write_summary): Likewise.
* ipa-modref.cc (modref_write): Likewise.
* ipa-prop.cc (ipa_prop_write_jump_functions): Likewise.
(ipcp_write_transformation_summaries): Likewise.
* ipa-sra.cc (ipa_sra_write_summary): Likewise.
* lto-cgraph.cc (lto_symtab_encoder_delete): Delete remap.
(lto_output_node): Remap order.
(lto_output_varpool_node): Likewise.
(output_cgraph_opt_summary): Add unused argument.
* lto-streamer-out.cc (produce_symbol_asm): Renamed. Use remapped order.
(produce_asm): Rename. New wrapper.
(output_function): Propagate remapped order.
(output_constructor): Likewise.
(copy_function_or_variable): Likewise.
(cmp_int): New.
(create_order_remap): New.
(lto_output): Create remap. Remap order.
* lto-streamer.h (struct lto_symtab_encoder_d): Remap hash_map.
(produce_asm): Add order argument.
---
 gcc/ipa-devirt.cc   |  2 +-
 gcc/ipa-fnsummary.cc|  2 +-
 gcc/ipa-icf.cc  |  2 +-
 gcc/ipa-modref.cc   |  4 +-
 gcc/ipa-prop.cc |  4 +-
 gcc/ipa-sra.cc  |  2 +-
 gcc/lto-cgraph.cc   | 10 +++--
 gcc/lto-streamer-out.cc | 93 +++--
 gcc/lto-streamer.h  |  5 ++-
 9 files changed, 99 insertions(+), 25 deletions(-)

diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index e88e9db781e..cdd520ba76b 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -4131,7 +4131,7 @@ ipa_odr_summary_write (void)
   odr_enum_map = NULL;
 }
 
-  produce_asm (ob, NULL);
+  produce_asm (ob);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index 3f5e09960ef..c057536f551 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -5091,7 +5091,7 @@ ipa_fn_summary_write (void)
}
 }
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob);
   destroy_output_block (ob);
 
   ipa_prop_write_jump_functions ();
diff --git a/gcc/ipa-icf.cc b/gcc/ipa-icf.cc
index 60152e60bc5..e9c5ae764f0 100644
--- a/gcc/ipa-icf.cc
+++ b/gcc/ipa-icf.cc
@@ -2216,7 +2216,7 @@ sem_item_optimizer::write_summary (void)
 }
 
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index 7449041c102..e68f434aa10 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -3746,7 +3746,7 @@ modref_write ()
 {
   streamer_write_uhwi (ob, 0);
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob);
   destroy_output_block (ob);
   return;
 }
@@ -3821,7 +3821,7 @@ modref_write ()
}
 }
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 9070a45f683..86044e392aa 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -5338,7 +5338,7 @@ ipa_prop_write_jump_functions (void)
 ipa_write_node_info (ob, node);
 }
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob);
   destroy_output_block (ob);
 }
 
@@ -5536,7 +5536,7 @@ ipcp_write_transformation_summaries (void)
write_ipcp_transformation_info (ob, cnode, ts);
 }
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index 04920f2aa8e..e6a75139eb0 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -2898,7 +2898,7 @@ ipa_sra_write_summary (void)
 isra_write_node_summary (ob, node);
 }
   streamer_write_char_stream (ob->main_stream, 0);
-  produce_asm (ob, NULL);
+  produce_asm (ob);
   destroy_output_block (ob);
 }
 
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index d1d63fd90ea..14275ed7c42 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -96,6 +96,8 @@ lto_symtab_encoder_delete

[PATCH 4/7 v4] lto: Implement ltrans cache

2024-12-13 Thread Michal Jires
On Thu, 2024-12-12 at 15:48:19 +, Jan Hubicka wrote:
> fgetc has kind of non-trivial overhead.  For non-MMAP systems (is
> Windows such?), I think allocating some buffer, say 64K
> and doing fread/memcmp is probably better.
Ok, changed to fread/memcmp fallback.

> Isn't std::string always 0 terminated?
It appears so, removed the push_back.

> Patch is OK, but please update the fgetc based file compare.

---

This patch implements Incremental LTO as ltrans cache.

Stored are pairs of ltrans input/output files and input file hash.
File locking is used to allow multiple GCC instances to use to same cache.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lto-ltrans-cache.o.
* lto-wrapper.cc: Use ltrans cache.
* lto-ltrans-cache.cc: New file.
* lto-ltrans-cache.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/common.opt  |   8 +
 gcc/lto-ltrans-cache.cc | 437 
 gcc/lto-ltrans-cache.h  | 144 +
 gcc/lto-opts.cc |   2 +
 gcc/lto-wrapper.cc  | 164 +--
 6 files changed, 745 insertions(+), 15 deletions(-)
 create mode 100644 gcc/lto-ltrans-cache.cc
 create mode 100644 gcc/lto-ltrans-cache.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index bb82d402ed0..bca3e94aec8 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1879,7 +1879,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o lockfile.o
+  lto-wrapper.o collect-utils.o lockfile.o lto-ltrans-cache.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2541,7 +2541,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o \
+  lto-ltrans-cache.o
 
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
diff --git a/gcc/common.opt b/gcc/common.opt
index a42537c5f1e..0afcac0fa1c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2236,6 +2236,14 @@ flto=
 Common RejectNegative Joined Var(flag_lto)
 Link-time optimization with number of parallel jobs or jobserver.
 
+flto-incremental=
+Common Joined Var(flag_lto_incremental)
+Enable incremental LTO, with its cache in given directory.
+
+flto-incremental-cache-size=
+Common Joined RejectNegative UInteger Var(flag_lto_incremental_cache_size) 
Init(2048)
+Number of cache entries in incremental LTO after which to prune old entries.
+
 Enum
 Name(lto_partition_model) Type(enum lto_partition_model) UnknownError(unknown 
LTO partitioning model %qs)
 
diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
new file mode 100644
index 000..c3e26f84072
--- /dev/null
+++ b/gcc/lto-ltrans-cache.cc
@@ -0,0 +1,437 @@
+/* File caching.
+   Copyright (C) 2023-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define INCLUDE_ALGORITHM
+#define INCLUDE_STRING
+#define INCLUDE_ARRAY
+#define INCLUDE_MAP
+#define INCLUDE_VECTOR
+#include "config.h"
+#include "system.h"
+#include "sha1.h"
+#include "lto-ltrans-cache.h"
+
+static const checksum_t NULL_CHECKSUM = {
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+};
+
+/* Computes checksum for given file, returns NULL_CHECKSUM if not
+   possible.  */
+static checksum_t
+file_checksum (char const *filename)
+{
+  FILE *file = fopen (filename, "rb");
+
+  if (!file)
+return NULL_CHECKSUM;
+
+  checksum_t result = NULL_CHECKSUM;
+
+  int ret = sha1_stream (file, &result);
+
+  if (ret)
+result = NULL_CHECKSUM;
+
+  fclose (file);
+
+  return result;
+}
+
+/* Checks identity of two files.  */
+static bool
+files_identical (char const *first_filename, char const *second_filename)
+{
+  bool ret = true;
+
+#if HAVE_MMAP_FILE
+  struct stat st;
+  if (stat (first_filename, &st) < 0 || !

[PATCH] lto: Remove link() to fix build with MinGW [PR118238]

2025-01-13 Thread Michal Jires
I used link() to create cheap copies of Incremental LTO cache contents
to prevent their deletion once linking is finished.
This is unnecessary, since output_files are deleted in our lto-plugin
and not in the linker itself.

Bootstrapped/regtested on x86_64-linux.
lto-wrapper now again builds on MinGW. Though so far I have not setup
MinGW to be able to do full bootstrap.
Ok for trunk?

PR lto/118238

gcc/ChangeLog:

* lto-wrapper.cc (run_gcc): Remove link() copying.

lto-plugin/ChangeLog:

* lto-plugin.c (cleanup_handler):
Keep output_files when using Incremental LTO.
(onload): Detect Incremental LTO.
---
 gcc/lto-wrapper.cc  | 34 +-
 lto-plugin/lto-plugin.c |  9 +++--
 2 files changed, 12 insertions(+), 31 deletions(-)

diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index f9b2511c38e..a980b208783 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -1571,6 +1571,8 @@ run_gcc (unsigned argc, char *argv[])
  /* Exists.  */
  if (access (option->arg, W_OK) == 0)
ltrans_cache_dir = option->arg;
+ else
+   fatal_error (input_location, "missing directory: %s", option->arg);
  break;
 
case OPT_flto_incremental_cache_size_:
@@ -2218,39 +2220,13 @@ cont:
{
  for (i = 0; i < nr; ++i)
{
- char *input_name = input_names[i];
- char const *output_name = output_names[i];
-
  ltrans_file_cache::item* item;
- item = ltrans_cache.get_item (input_name);
+ item = ltrans_cache.get_item (input_names[i]);
 
- if (item && !save_temps)
+ if (item)
{
+ /* Ensure LTRANS for this item finished.  */
  item->lock.lock_read ();
- /* Ensure that cached compiled file is not deleted.
-Create copy.  */
-
- obstack_grow (&env_obstack, output_name,
-   strlen (output_name) - 2);
- obstack_grow (&env_obstack, ".cache_copy.XXX.o",
-   sizeof (".cache_copy.XXX.o"));
-
- char* output_name_link = XOBFINISH (&env_obstack, char *);
- char* name_idx = output_name_link + strlen (output_name_link)
-  - strlen ("XXX.o");
-
- /* lto-wrapper can run in parallel and access
-the same partition.  */
- for (int j = 0; ; j++)
-   {
- gcc_assert (j < 1000);
- sprintf (name_idx, "%03d.o", j);
-
- if (link (output_name, output_name_link) != EEXIST)
-   break;
-   }
-
- output_names[i] = output_name_link;
  item->lock.unlock ();
}
}
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index 6bccb56291c..6c78d019cf1 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -214,6 +214,7 @@ static char *ltrans_objects = NULL;
 
 static bool debug;
 static bool save_temps;
+static bool flto_incremental;
 static bool verbose;
 static char nop;
 static char *resolution_file = NULL;
@@ -941,8 +942,9 @@ cleanup_handler (void)
   if (arguments_file_name)
 maybe_unlink (arguments_file_name);
 
-  for (i = 0; i < num_output_files; i++)
-maybe_unlink (output_files[i]);
+  if (!flto_incremental)
+for (i = 0; i < num_output_files; i++)
+  maybe_unlink (output_files[i]);
 
   free_2 ();
   return LDPS_OK;
@@ -1615,6 +1617,9 @@ onload (struct ld_plugin_tv *tv)
   if (strstr (collect_gcc_options, "'-save-temps'"))
save_temps = true;
 
+  if (strstr (collect_gcc_options, "'-flto-incremental="))
+   flto_incremental = true;
+
   if (strstr (collect_gcc_options, "'-v'")
   || strstr (collect_gcc_options, "'--verbose'"))
verbose = true;
-- 
2.47.1



[committed] lto: Fix empty fnctl.h build error with MinGW.

2025-01-12 Thread Michal Jires
MSYS2+MinGW contains headers without defining expected contents.
This fix checks that the fcntl function is actually defined.

Bootstrapped/regtested on x86_64-linux. Committed as obvious.

gcc/ChangeLog:

* lockfile.cc (LOCKFILE_USE_FCNTL): New.
(lockfile::lock_write): Use LOCKFILE_USE_FCNTL.
(lockfile::try_lock_write): Use LOCKFILE_USE_FCNTL.
(lockfile::lock_read): Use LOCKFILE_USE_FCNTL.
(lockfile::unlock): Use LOCKFILE_USE_FCNTL.
(lockfile::lockfile_supported): Use LOCKFILE_USE_FCNTL.
---
 gcc/lockfile.cc | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/lockfile.cc b/gcc/lockfile.cc
index b385c295851..cecbb86491d 100644
--- a/gcc/lockfile.cc
+++ b/gcc/lockfile.cc
@@ -22,6 +22,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "lockfile.h"
 
+/* fcntl.h may exist without expected contents.  */
+#if HAVE_FCNTL_H && HOST_HAS_F_SETLKW
+#define LOCKFILE_USE_FCNTL 1
+#endif
 
 /* Unique write lock.  No other lock can be held on this lockfile.
Blocking call.  */
@@ -32,7 +36,7 @@ lockfile::lock_write ()
   if (fd < 0)
 return -1;
 
-#if HAVE_FCNTL_H
+#ifdef LOCKFILE_USE_FCNTL
   struct flock s_flock;
 
   s_flock.l_whence = SEEK_SET;
@@ -57,7 +61,7 @@ lockfile::try_lock_write ()
   if (fd < 0)
 return -1;
 
-#if HAVE_FCNTL_H
+#ifdef LOCKFILE_USE_FCNTL
   struct flock s_flock;
 
   s_flock.l_whence = SEEK_SET;
@@ -87,7 +91,7 @@ lockfile::lock_read ()
   if (fd < 0)
 return -1;
 
-#if HAVE_FCNTL_H
+#ifdef LOCKFILE_USE_FCNTL
   struct flock s_flock;
 
   s_flock.l_whence = SEEK_SET;
@@ -108,7 +112,7 @@ lockfile::unlock ()
 {
   if (fd < 0)
 {
-#if HAVE_FCNTL_H
+#ifdef LOCKFILE_USE_FCNTL
   struct flock s_flock;
 
   s_flock.l_whence = SEEK_SET;
@@ -128,7 +132,7 @@ lockfile::unlock ()
 bool
 lockfile::lockfile_supported ()
 {
-#if HAVE_FCNTL_H
+#ifdef LOCKFILE_USE_FCNTL
   return true;
 #else
   return false;
-- 
2.47.1



[committed] lto: Pass cache checksum by reference [PR118181]

2025-01-12 Thread Michal Jires
Bootstrapped/regtested on x86_64-linux. Committed as obvious.

PR lto/118181

gcc/ChangeLog:

* lto-ltrans-cache.cc (ltrans_file_cache::create_item):
Pass checksum by reference.
* lto-ltrans-cache.h: Likewise.
---
 gcc/lto-ltrans-cache.cc | 2 +-
 gcc/lto-ltrans-cache.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
index 22c0bffaed5..c57775fae85 100644
--- a/gcc/lto-ltrans-cache.cc
+++ b/gcc/lto-ltrans-cache.cc
@@ -309,7 +309,7 @@ ltrans_file_cache::save_cache ()
 
Must be called with creation_lock held to prevent data race.  */
 ltrans_file_cache::item*
-ltrans_file_cache::create_item (checksum_t checksum)
+ltrans_file_cache::create_item (const checksum_t& checksum)
 {
   size_t prefix_len = cache_prefix.size ();
 
diff --git a/gcc/lto-ltrans-cache.h b/gcc/lto-ltrans-cache.h
index b95f63c3335..5fef44bae53 100644
--- a/gcc/lto-ltrans-cache.h
+++ b/gcc/lto-ltrans-cache.h
@@ -108,7 +108,7 @@ private:
  New input/output files are chosen to not collide with other items.
 
  Must be called with creation_lock held to prevent data race.  */
-  item* create_item (checksum_t checksum);
+  item* create_item (const checksum_t& checksum);
 
   /* Prunes oldest unused cache items over limit.
  Must be called with deletion_lock held to prevent data race.  */
-- 
2.47.1



[PATCH] Fix uniqueness of symtab_node::get_dump_name.

2025-01-16 Thread Michal Jires
symtab_node::get_dump_name uses node order to identify nodes.
Order is no longer unique because of Incremental LTO patches.
This patch moves uid from cgraph_node node to symtab_node,
so get_dump_name can use uid instead and get back unique dump names.

In inlining passes, uid is replaced with more appropriate (more compact
for indexing) summary id.

Bootstrapped/regtested on x86_64-linux.
Ok for trunk?

gcc/ChangeLog:

* cgraph.cc (symbol_table::create_empty):
Move uid to symtab_node.
(test_symbol_table_test): Change expected dump id.
* cgraph.h (struct cgraph_node):
Move uid to symtab_node.
(symbol_table::register_symbol): Likewise.
* dumpfile.cc (test_capture_of_dump_calls):
Change expected dump id.
* ipa-inline.cc (update_caller_keys):
Use summary id instead of uid.
(update_callee_keys): Likewise.
* symtab.cc (symtab_node::get_dump_name):
Use uid instead of order.

gcc/testsuite/ChangeLog:

* gcc.dg/live-patching-1.c: Change expected dump id.
* gcc.dg/live-patching-4.c: Likewise.
---
 gcc/cgraph.cc  |  4 ++--
 gcc/cgraph.h   | 25 ++---
 gcc/dumpfile.cc|  8 
 gcc/ipa-inline.cc  |  6 +++---
 gcc/symtab.cc  |  2 +-
 gcc/testsuite/gcc.dg/live-patching-1.c |  2 +-
 gcc/testsuite/gcc.dg/live-patching-4.c |  2 +-
 7 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 83a9b59ef30..d0b19ad850e 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -290,7 +290,7 @@ cgraph_node *
 symbol_table::create_empty (void)
 {
   cgraph_count++;
-  return new (ggc_alloc ()) cgraph_node (cgraph_max_uid++);
+  return new (ggc_alloc ()) cgraph_node ();
 }
 
 /* Register HOOK to be called with DATA on each removed edge.  */
@@ -4338,7 +4338,7 @@ test_symbol_table_test ()
   /* Verify that the node has order 0 on both iterations,
 and thus that nodes have predictable dump names in selftests.  */
   ASSERT_EQ (node->order, 0);
-  ASSERT_STREQ (node->dump_name (), "test_decl/0");
+  ASSERT_STREQ (node->dump_name (), "test_decl/1");
 }
 }
 
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 7856d53c9e9..065fcc742e8 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -124,7 +124,7 @@ public:
   order (-1), next_sharing_asm_name (NULL),
   previous_sharing_asm_name (NULL), same_comdat_group (NULL), ref_list (),
   alias_target (NULL), lto_file_data (NULL), aux (NULL),
-  x_comdat_group (NULL_TREE), x_section (NULL)
+  x_comdat_group (NULL_TREE), x_section (NULL), m_uid (-1)
   {}
 
   /* Return name.  */
@@ -492,6 +492,12 @@ public:
   /* Perform internal consistency checks, if they are enabled.  */
   static inline void checking_verify_symtab_nodes (void);
 
+  /* Get unique identifier of the node.  */
+  inline int get_uid ()
+  {
+return m_uid;
+  }
+
   /* Type of the symbol.  */
   ENUM_BITFIELD (symtab_type) type : 8;
 
@@ -668,6 +674,9 @@ protected:
  void *data,
  bool include_overwrite);
 private:
+  /* Unique id of the node.  */
+  int m_uid;
+
   /* Workers for set_section.  */
   static bool set_section_from_string (symtab_node *n, void *s);
   static bool set_section_from_node (symtab_node *n, void *o);
@@ -882,7 +891,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public 
symtab_node
   friend class symbol_table;
 
   /* Constructor.  */
-  explicit cgraph_node (int uid)
+  explicit cgraph_node ()
 : symtab_node (SYMTAB_FUNCTION), callees (NULL), callers (NULL),
   indirect_calls (NULL),
   next_sibling_clone (NULL), prev_sibling_clone (NULL), clones (NULL),
@@ -903,7 +912,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public 
symtab_node
   redefined_extern_inline (false), tm_may_enter_irr (false),
   ipcp_clone (false), gc_candidate (false),
   called_by_ifunc_resolver (false), has_omp_variant_constructs (false),
-  m_uid (uid), m_summary_id (-1)
+  m_summary_id (-1)
   {}
 
   /* Remove the node from cgraph and all inline clones inlined into it.
@@ -1304,12 +1313,6 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
 dump_cgraph (stderr);
   }
 
-  /* Get unique identifier of the node.  */
-  inline int get_uid ()
-  {
-return m_uid;
-  }
-
   /* Get summary id of the node.  */
   inline int get_summary_id ()
   {
@@ -1503,8 +1506,6 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
   unsigned has_omp_variant_constructs : 1;
 
 private:
-  /* Unique id of the node.  */
-  int m_uid;
 
   /* Summary id that is recycled.  */
   int m_summary_id;
@@ -2815,6 +2816,8 @@ symbol_table::register_symbol (symtab_node *node)
 nodes->previous = node;
   nodes = node;
 
+  nodes->m_uid = cgraph_max_uid++;
+

[PATCH] lto: Fix missing cleanup with incremental LTO.

2025-03-15 Thread Michal Jires
Incremental LTO disabled cleanup of output_files since they have to
persist in ltrans cache.
This unintetionally also kept temporary early debug "*.debug.temp.o"
files.

Bootstrapped/regtested on x86_64-linux.
Ok for trunk?

lto-plugin/ChangeLog:

* lto-plugin.c (cleanup_handler): Keep only files in ltrans
cache.
---
 lto-plugin/lto-plugin.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index 3d272551fed..09d5441ecc7 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -945,6 +945,17 @@ cleanup_handler (void)
   if (!flto_incremental)
 for (i = 0; i < num_output_files; i++)
   maybe_unlink (output_files[i]);
+  else
+{
+  /* Keep files in ltrans cache.  */
+  const char* suffix = ".ltrans.o";
+  for (i = 0; i < num_output_files; i++)
+   {
+ int offset = strlen (output_files[i]) - strlen (suffix);
+ if (offset < 0 || strcmp (output_files[i] + offset, suffix))
+   maybe_unlink (output_files[i]);
+   }
+}
 
   free_2 ();
   return LDPS_OK;
-- 
2.48.1



[COMMITTED] doc: Regenerate common.opt.urls

2025-03-17 Thread Michal Jires
Regenerated common.opt.urls, which I missed until autobuilder noticed.

Committed as obvious.

gcc/ChangeLog:

* common.opt.urls: Regenerate.
---
 gcc/common.opt.urls | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index 79c322bed2b..ac602631179 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -935,6 +935,12 @@ UrlSuffix(gcc/Optimize-Options.html#index-flto)
 flto=
 UrlSuffix(gcc/Optimize-Options.html#index-flto)
 
+flto-incremental=
+UrlSuffix(gcc/Optimize-Options.html#index-flto-incremental)
+
+flto-incremental-cache-size=
+UrlSuffix(gcc/Optimize-Options.html#index-flto-incremental-cache-size)
+
 flto-partition=
 UrlSuffix(gcc/Optimize-Options.html#index-flto-partition)
 
-- 
2.48.1



Re: [COMMITTED,wwwdocs] Mention Incremental LTO in GCC15

2025-03-27 Thread Michal Jires
(already Ok-ed off-list, since I forgot to Cc: )

On Thu, 2025-03-27 at 14:40:57 +0100, Gerald Pfeifer wrote:
> On Thu, 27 Mar 2025, Michal Jires wrote:
> > +  Introduced incremental Link-Time Optimizations to significantly 
> > reduce
> > +average recompilation time with small code changes while using LTO.
> 
> How about rephrasing this to "Incremental Link-Time Optimizations 
> significantly reduce average recompilation time with only small changes to 
> generated code..."?
> 
> Go ahead with this or variation if you agree with the general idea; if I 
> misunderstood the intent, please push back and let me know. :-)

Thanks a lot, you probably misunderstood. The generated code should be
identical for identical source code with or without Incremental LTO,
which your phrasing seems to heavily imply is not the case (and mine
could be interpreted that way). I will have to make it clearer.


Maybe:
```
Incremental Link-Time Optimizations significantly reduce average
recompilation time of LTO when doing small code edits
(e.g. editing a single function).
```
explains it better?

Michal

> 
> Gerald

---
 htdocs/gcc-15/changes.html | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index dbc82be2..5c802a6b 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -59,6 +59,13 @@ a work-in-progress.
 system and user time. This reduces the overhead of the option 
significantly,
 making it possible to use in standard build systems.
   
+  Incremental Link-Time Optimizations significantly reduce average
+recompilation time of LTO when doing small code edits
+(e.g. editing a single function).
+Enable with https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flto-incremental";
+>-flto-incremental=.
+  
 
 
 
-- 
2.48.1



Re: [PATCH] doc: document incremental LTO flags

2025-03-28 Thread Michal Jires
On Thu, 2025-03-27 at 15:33:44 +, Sam James wrote:
> 
> One thing I wasn't quite sure on yet: is -flto-partition=cache automatic
> with -flto-incremental? Or is it just an optional flag I can pass for
> more effective incremental LTO?
> 
> If it's the latter, should we mention that in the -flto-incremental
> documentation?
> 

It is not automatic, because different partitioning will result in
different executable. Most of the time this should not matter, but for
example a performance bug depending on instruction alignment would not
be reproduced.

The cache partitioning is most useful with large amount of divergences
per diverging partition. Which was very useful at the start, but it
happens less with each divergence I remove.
Last time I measured it, the improvement was no longer noticeable
without debug symbols and only a few percent improvement with debug
symbols, with one outlier case being ~50 % worse.

The benefits are minor, a bit unclear, and caveats are hard to properly
explain. So I do not want to actively recommend the option for now.

> > [...]
> 
> Thanks for working on incremental LTO. I had the opportunity to use it
> for a bug for the first time last weekend and enjoyed it.

Thanks, glad it is already useful.

Michal


Re: [PATCH 3/5] ipa: Dump cgraph_node UID instead of order into ipa-clones dump file

2025-04-30 Thread Michal Jires
On Mon, 2025-04-28 at 16:10:58 +0200, Martin Jambor wrote:
> Hi,
> 
> starting with GCC 15 the order is not unique for any symtab_nodes but
> m_uid is, I believe we ought to dump the latter in the ipa-clones dump,
> if only so that people can reliably match entries about new clones to
> those about removed nodes (if any).
> 
> Bootstrapped and tested on x86_64-linux. OK for master and gcc 15?
> 
> Thanks,
> 
> Martin
> 

We probably want the following changes as well.
These should cover all dumps affected by the order/uid change.

Not sure whether as part of this patch or a separate one.

Michal

---
 gcc/ipa-cp.cc  | 2 +-
 gcc/ipa-sra.cc | 2 +-
 gcc/symtab.cc  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index f7e5aa9bfd5..16ab608e82b 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -288,7 +288,7 @@ ipcp_lattice::print (FILE * f, bool dump_sources, 
bool dump_benefits)
  else
fprintf (f, " [scc: %i, from:", val->scc_no);
  for (s = val->sources; s; s = s->next)
-   fprintf (f, " %i(%f)", s->cs->caller->order,
+   fprintf (f, " %i(%f)", s->cs->caller->get_uid (),
 s->cs->sreal_frequency ().to_double ());
  fprintf (f, "]");
}
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index 1331ba49b50..88bfae9502c 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -4644,7 +4644,7 @@ ipa_sra_summarize_function (cgraph_node *node)
 {
   if (dump_file)
 fprintf (dump_file, "Creating summary for %s/%i:\n", node->name (),
-node->order);
+node->get_uid ());
   gcc_obstack_init (&gensum_obstack);
   loaded_decls = new hash_set;
 
diff --git a/gcc/symtab.cc b/gcc/symtab.cc
index fe9c031247f..fc1155f4696 100644
--- a/gcc/symtab.cc
+++ b/gcc/symtab.cc
@@ -989,10 +989,10 @@ symtab_node::dump_base (FILE *f)
 same_comdat_group->dump_asm_name ());
   if (next_sharing_asm_name)
 fprintf (f, "  next sharing asm name: %i\n",
-next_sharing_asm_name->order);
+next_sharing_asm_name->get_uid ());
   if (previous_sharing_asm_name)
 fprintf (f, "  previous sharing asm name: %i\n",
-previous_sharing_asm_name->order);
+previous_sharing_asm_name->get_uid ());
 
   if (address_taken)
 fprintf (f, "  Address is taken.\n");
-- 
2.49.0



[PATCH] doc: document incremental LTO flags

2025-03-13 Thread Michal Jires
This adds missing documentation for LTO flags.

Ok?

gcc/ChangeLog:

* doc/invoke.texi: (Optimize Options):
Add incremental LTO flags.
---
 gcc/doc/invoke.texi | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4fbb4cda101..3efc6602898 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -601,7 +601,8 @@ Objective-C and Objective-C++ Dialects}.
 -floop-block  -floop-interchange  -floop-strip-mine
 -floop-unroll-and-jam  -floop-nest-optimize
 -floop-parallelize-all  -flra-remat  -flto  -flto-compression-level
--flto-partition=@var{alg}  -fmalloc-dce -fmerge-all-constants
+-flto-partition=@var{alg} -flto-incremental=@var{path}
+-flto-incremental-cache-size=@var{n} -fmalloc-dce -fmerge-all-constants
 -fmerge-constants  -fmodulo-sched  -fmodulo-sched-allow-regmoves
 -fmove-loop-invariants  -fmove-loop-stores  -fno-branch-count-reg
 -fno-defer-pop  -fno-fp-int-builtin-inexact  -fno-function-cse
@@ -15086,8 +15087,10 @@ Specify the partitioning algorithm used by the 
link-time optimizer.
 The value is either @samp{1to1} to specify a partitioning mirroring
 the original source files or @samp{balanced} to specify partitioning
 into equally sized chunks (whenever possible) or @samp{max} to create
-new partition for every symbol where possible.  Specifying @samp{none}
-as an algorithm disables partitioning and streaming completely.
+new partition for every symbol where possible or @samp{cache} to
+balance chunk sizes while keeping related symbols together for better
+caching in incremental LTO.  Specifying @samp{none} as an algorithm
+disables partitioning and streaming completely.
 The default value is @samp{balanced}. While @samp{1to1} can be used
 as an workaround for various code ordering issues, the @samp{max}
 partitioning is intended for internal testing only.
@@ -15095,6 +15098,23 @@ The value @samp{one} specifies that exactly one 
partition should be
 used while the value @samp{none} bypasses partitioning and executes
 the link-time optimization step directly from the WPA phase.
 
+@opindex flto-incremental
+@item -flto-incremental=@var{path}
+Enable incremental LTO, with its cache in given existing directory.
+Can significantly shorten edit-compile cycles with LTO.
+
+When used with LTO (@option{-flto}), the output of translation units
+inside LTO is cached. Cached translation units are likely to be
+encountered again when recompiling with small code changes, leading to
+recompile time reduction.
+
+Multiple GCC instances can use the same cache in parallel.
+
+@opindex flto-incremental-cache-size
+@item -flto-incremental-cache-size=@var{n}
+Specifies number of cache entries in incremental LTO after which to prune
+old entries. This is a soft limit, temporarily there may be more entries.
+
 @opindex flto-compression-level
 @item -flto-compression-level=@var{n}
 This option specifies the level of compression used for intermediate
-- 
2.48.1



[PATCH,wwwdocs] Mention Incremental LTO in GCC15

2025-03-27 Thread Michal Jires
This patch adds mention of my new Incremental LTO
in gcc-15/changes.html General Improvements.

Ok?

---
 htdocs/gcc-15/changes.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index dbc82be2..8a050aed 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -59,6 +59,12 @@ a work-in-progress.
 system and user time. This reduces the overhead of the option 
significantly,
 making it possible to use in standard build systems.
   
+  Introduced incremental Link-Time Optimizations to significantly reduce
+average recompilation time with small code changes while using LTO.
+Enable with https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flto-incremental";
+>-flto-incremental=.
+  
 
 
 
-- 
2.48.1