Hi Richard, Am 02.09.24 um 13:58 schrieb Richard Biener:
Hmm, I can't really follow how and where it's currently decided whether to output offload tables for the LTRANS units
Before the patch, output_offload_tables is called unconditionally, but guarded by the check whether there is anything to output at all. Call trees:
When outputting the .o files, the call is done via ipa_passes → ipa_write_summaries → ipa_write_summaries_1.
This calls ipa_write_summaries twice: once for the offload/for-device LTO section and once for the host LTO section – and both calls are needed.
For the LTO (lto1, ltrans) step, the call tree starts with: do_whole_program_analysis → lto_wpa_write_files → stream_out_partitions → stream_out_partitions_1 → stream_out → ipa_write_optimization_summaries.
Here, stream_out_partitions potentially forks the 'stream_out_partitions_1' calls. And each stream_out_partitions_1 calls for each (of its share) of the partitions 'stream_out' in a loop.
With either code path, the ipa_write... function then calls: write_lto → lto_output → output_offload_tables.
but instead of an odd global variable would it be possible to pass that down as a flag or, alternatively encode that flag in the representation for the LTRANS partition? I suppose that's the out_decl_state?
Actually, I tried follow your initial suggestion of the PR, but now moved to the somewhat clearer out_decl_state.
Tobias
LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535] When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming sufficient partiations, e.g., via -flto-partition=max), output_offload_tables wrote the output tables once per fork. PR lto/116535 gcc/ChangeLog: * lto-cgraph.cc (output_offload_tables): Remove offload_ frees. * lto-streamer-out.cc (lto_output): Make call to it depend on lto_get_out_decl_state ()->output_offload_tables_p. * lto-streamer.h (struct lto_out_decl_state): Add output_offload_tables_p field. * tree-pass.h (ipa_write_optimization_summaries): Add bool argument. * passes.cc (ipa_write_summaries_1): Add bool output_offload_tables_p arg. (ipa_write_summaries): Update call. (ipa_write_optimization_summaries): Accept output_offload_tables_p. gcc/lto/ChangeLog: * lto.cc (stream_out): Update call to ipa_write_optimization_summaries to pass true for first partition. gcc/lto-cgraph.cc | 10 ---------- gcc/lto-streamer-out.cc | 3 ++- gcc/lto-streamer.h | 3 +++ gcc/lto/lto.cc | 2 +- gcc/passes.cc | 11 ++++++++--- gcc/tree-pass.h | 3 ++- 6 files changed, 16 insertions(+), 16 deletions(-) diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc index 6395033ab9d..1492409427c 100644 --- a/gcc/lto-cgraph.cc +++ b/gcc/lto-cgraph.cc @@ -1139,16 +1139,6 @@ output_offload_tables (void) streamer_write_uhwi_stream (ob->main_stream, 0); lto_destroy_simple_output_block (ob); - - /* In WHOPR mode during the WPA stage the joint offload tables need to be - streamed to one partition only. That's why we free offload_funcs and - offload_vars after the first call of output_offload_tables. */ - if (flag_wpa) - { - vec_free (offload_funcs); - vec_free (offload_vars); - vec_free (offload_ind_funcs); - } } /* Verify the partitioning of NODE. */ diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc index 523d6dad221..a4b171358d4 100644 --- a/gcc/lto-streamer-out.cc +++ b/gcc/lto-streamer-out.cc @@ -2829,7 +2829,8 @@ lto_output (void) statements using the statement UIDs. */ output_symtab (); - output_offload_tables (); + if (lto_get_out_decl_state ()->output_offload_tables_p) + output_offload_tables (); if (flag_checking) { diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 79c44d2cae7..4da1a3efe03 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -531,6 +531,9 @@ struct lto_out_decl_state /* True if decl state is compressed. */ bool compressed; + + /* True if offload tables should be output. */ + bool output_offload_tables_p; }; typedef struct lto_out_decl_state *lto_out_decl_state_ptr; diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc index 52dd436fd9a..1ee215d8f1d 100644 --- a/gcc/lto/lto.cc +++ b/gcc/lto/lto.cc @@ -178,7 +178,7 @@ stream_out (char *temp_filename, lto_symtab_encoder_t encoder, int part) gcc_assert (!dump_file); streamer_dump_file = dump_begin (TDI_lto_stream_out, NULL, part); - ipa_write_optimization_summaries (encoder); + ipa_write_optimization_summaries (encoder, part == 0); free (CONST_CAST (char *, file->filename)); diff --git a/gcc/passes.cc b/gcc/passes.cc index d73f8ba97b6..057850f4dec 100644 --- a/gcc/passes.cc +++ b/gcc/passes.cc @@ -2829,11 +2829,13 @@ ipa_write_summaries_2 (opt_pass *pass, struct lto_out_decl_state *state) summaries. SET is the set of nodes to be written. */ static void -ipa_write_summaries_1 (lto_symtab_encoder_t encoder) +ipa_write_summaries_1 (lto_symtab_encoder_t encoder, + bool output_offload_tables_p) { pass_manager *passes = g->get_passes (); struct lto_out_decl_state *state = lto_new_out_decl_state (); state->symtab_node_encoder = encoder; + state->output_offload_tables_p = output_offload_tables_p; lto_output_init_mode_table (); lto_push_out_decl_state (state); @@ -2897,7 +2899,8 @@ ipa_write_summaries (void) if (vnode->need_lto_streaming) lto_set_symtab_encoder_in_partition (encoder, vnode); - ipa_write_summaries_1 (compute_ltrans_boundary (encoder)); + ipa_write_summaries_1 (compute_ltrans_boundary (encoder), + flag_generate_offload); free (order); if (streamer_dump_file) @@ -2952,10 +2955,12 @@ ipa_write_optimization_summaries_1 (opt_pass *pass, NULL, write out all summaries of all nodes. */ void -ipa_write_optimization_summaries (lto_symtab_encoder_t encoder) +ipa_write_optimization_summaries (lto_symtab_encoder_t encoder, + bool output_offload_tables_p) { struct lto_out_decl_state *state = lto_new_out_decl_state (); state->symtab_node_encoder = encoder; + state->output_offload_tables_p = output_offload_tables_p; lto_output_init_mode_table (); lto_push_out_decl_state (state); diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 3a0cf13089e..a928cbe4557 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -684,7 +684,8 @@ extern void emergency_dump_function (void); extern void print_current_pass (FILE *); extern void debug_pass (void); extern void ipa_write_summaries (void); -extern void ipa_write_optimization_summaries (struct lto_symtab_encoder_d *); +extern void ipa_write_optimization_summaries (struct lto_symtab_encoder_d *, + bool); extern void ipa_read_summaries (void); extern void ipa_read_optimization_summaries (void); extern void register_one_dump_file (opt_pass *);