https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109816
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Tobias Burnus from comment #4) > (In reply to Jakub Jelinek from comment #3) > > And we emit all toplevel asms into the offloading target code? > > Or how does it make into PTX? > > It seems as if this is always written (once). Thus, the minimal change would > be the following. We could save some bits by not writing > lto_output_toplevel_asms without '-flto', but I think that is not really > needed, given that top-level asm are rather rare and small. > > --- a/gcc/lto-cgraph.cc > +++ b/gcc/lto-cgraph.cc > @@ -1587,3 +1587,5 @@ input_cgraph_1 (struct lto_file_decl_data *file_data, > > +#ifndef ACCEL_COMPILER > lto_input_toplevel_asms (file_data, file_data->order_base); > +#endif The above can work only if toplevel asms are in separate section and so inputting it or not doesn't affect input of other data. I think it would be better to also not to stream it if lto_stream_offload_p.