On Fri, 2026-01-30 at 16:27 +0000, Qing Zhao wrote:
> Hi, David,
> 
> Thanks a lot for your information. They are very interesting and
> promising. 

Thanks.

> 
> I do have two questions:
> 
> 1. When you wrote the prototype that embeds SARIF as an ELF section,
> did you collect
> any data on the code size increase of the final object files? 

I didn't collect realistic data.

FWIW I've uploaded the patch I had to
https://dmalcolm.fedorapeople.org/gcc/2026-01-30/0001-Initial-proof-of-concept-of-writing-sarif-to-asm-plu.patch
but it's heavily bit-rotted against trunk.

By way of example, in the same directory is a test.s and a test.o
generated using the patch on a trivial C file:

$ cat test.c
int i;
static int j;

$ ./cc1 -quiet test.c \
    -fdiagnostics-add-output=sarif:section=.sarif.json,serialization=json \
    -fdiagnostics-add-output=sarif:section=.sarif.json5,serialization=json5 \
    -fdiagnostics-add-output=sarif:section=.sarif.cbor,serialization=cbor \
    -o test.s \
    -Wall

test.c:2:12: warning: ‘j’ defined but not used [-Wunused-variable]
    2 | static int j;
      |            ^

$ as test.s -o test.o

$ for s in json json5 cbor ; do eu-readelf test.o -x .sarif.$s | head ; done
Hex dump of section [7] '.sarif.json', 2987 bytes at offset 0x1078:
  0x00000000 7b222473 6368656d 61223a20 22687474 {"$schema": "htt
  0x00000010 70733a2f 2f646f63 732e6f61 7369732d ps://docs.oasis-
  0x00000020 6f70656e 2e6f7267 2f736172 69662f73 open.org/sarif/s
  0x00000030 61726966 2f76322e 312e302f 65727261 arif/v2.1.0/erra
  0x00000040 74613031 2f6f732f 73636865 6d61732f ta01/os/schemas/
  0x00000050 73617269 662d7363 68656d61 2d322e31 sarif-schema-2.1
  0x00000060 2e302e6a 736f6e22 2c0a2022 76657273 .0.json",. "vers
  0x00000070 696f6e22 3a202232 2e312e30 222c0a20 ion": "2.1.0",. 
Hex dump of section [6] '.sarif.json5', 2699 bytes at offset 0x5ed:
  0x00000000 7b222473 6368656d 61223a20 22687474 {"$schema": "htt
  0x00000010 70733a2f 2f646f63 732e6f61 7369732d ps://docs.oasis-
  0x00000020 6f70656e 2e6f7267 2f736172 69662f73 open.org/sarif/s
  0x00000030 61726966 2f76322e 312e302f 65727261 arif/v2.1.0/erra
  0x00000040 74613031 2f6f732f 73636865 6d61732f ta01/os/schemas/
  0x00000050 73617269 662d7363 68656d61 2d322e31 sarif-schema-2.1
  0x00000060 2e302e6a 736f6e22 2c0a2076 65727369 .0.json",. versi
  0x00000070 6f6e3a20 22322e31 2e30222c 0a207275 on: "2.1.0",. ru
Hex dump of section [5] '.sarif.cbor', 1410 bytes at offset 0x6b:
  0x00000000 a3672473 6368656d 61785a68 74747073 .g$schemaxZhttps
  0x00000010 3a2f2f64 6f63732e 6f617369 732d6f70 ://docs.oasis-op
  0x00000020 656e2e6f 72672f73 61726966 2f736172 en.org/sarif/sar
  0x00000030 69662f76 322e312e 302f6572 72617461 if/v2.1.0/errata
  0x00000040 30312f6f 732f7363 68656d61 732f7361 01/os/schemas/sa
  0x00000050 7269662d 73636865 6d612d32 2e312e30 rif-schema-2.1.0
  0x00000060 2e6a736f 6e677665 7273696f 6e65322e .jsongversione2.
  0x00000070 312e3064 72756e73 81a56474 6f6f6ca1 1.0druns..dtool.

Dumping the sections:

$ objcopy test.o /dev/null --dump-section .sarif.json=/dev/stdout | head
{"$schema": 
"https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/schemas/sarif-schema-2.1.0.json";,
 "version": "2.1.0",
 "runs": [{"tool": {"driver": {"name": "GNU C23",
                               "fullName": "GNU C23 (GCC) version 16.0.0 
20250505 (experimental) (x86_64-pc-linux-gnu)",
                               "version": "16.0.0 20250505 (experimental)",
                               "informationUri": "https://gcc.gnu.org/gcc-16/";,
                               "rules": [{"id": "-Wunused-variable",
                                          "helpUri": 
"https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wno-unused-variable"}]}},
           "invocations": [{"arguments": ["./cc1",
                                          "-quiet",


$ objcopy test.o /dev/null --dump-section .sarif.cbor=/dev/stdout | 
cbor2pretty.rb  | head
a3                                      # map(3)
   67                                   # text(7)
      24736368656d61                    # "$schema"
   78 5a                                # text(90)
      
68747470733a2f2f646f63732e6f617369732d6f70656e2e6f72672f73617269662f73617269662f76322e312e302f65727261746130312f6f732f736368656d61732f73617269662d736368656d612d322e312e302e6a736f6e
 # 
"https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/schemas/sarif-schema-2.1.0.json";
   67                                   # text(7)
      76657273696f6e                    # "version"
   65                                   # text(5)
      322e312e30                        # "2.1.0"
   64                                   # text(4)

but I think gzipping the json would be simpler and likely more space-
efficient than using CBOR.

Replaying the diagnostics in test.o using sarif-replay:

$ objcopy test.o /dev/null --dump-section .sarif.json=tmp.json \
    | LD_LIBRARY_PATH=. ./sarif-replay tmp.json
test.c:2:12: warning: ‘j’ defined but not used [-Wunused-variable]
    2 | static int j;
      |            ^

So presumably we could do something similar with optimization records.


> 2. What the major concerns when we decide whether to dump the
> optimization info to a 
> separate file, or embed the optimization info into the object file?

For my use-case, I was thinking of diagnostics and build metadata, as a
kind of "annobin on steroids".   I don't know what the pros/cons of
embedding vs separate file would be for optimization info.

Dave


> 
> Thanks a lot.
> 
> Qing
> 
> > On Jan 29, 2026, at 11:58, David Malcolm <[email protected]>
> > wrote:
> > 
> > On Wed, 2026-01-28 at 17:53 -0500, Siddhesh Poyarekar wrote:
> > > On 2026-01-28 10:41, Qing Zhao via Gcc wrote:
> > > > Does GCC provide any option to record optimization information,
> > > > such as inlining, loop transformation,
> > > >   profiling consistency, etc into specific sections of binary
> > > > code?
> > > 
> > > I may be misremembering this, but I think David had some ideas
> > > about 
> > > doing something like this in SARIF.
> > > 
> > 
> > Several thoughts here:
> > 
> > (a) I've written a prototype that embeds SARIF as an ELF section in
> > the
> > generated object file, rather like debuginfo (my idea at the time
> > being
> > that a binary could contain within it its build flags and other
> > metadata, and its diagnostics, etc).  I don't think I posted it to
> > the
> > mailing list though.
> > 
> > (b) A long time ago I prototyped a gcc implementation of llvm's
> > idea of
> > optimization remarks, to send info optimization through the
> > diagnostics
> > subsystem, but IIRC that work ended up as the revamp of optinfo (in
> > GCC
> > 9?; see my Cauldron 2018 talk on optimization records), which
> > generalized some of the internals of how we track optimization
> > info. 
> > The machine-readable output is a custom json-based format.
> > 
> > (c) SARIF would probably be a good fit for optimization records;
> > it's
> > machine-readable, and has a rich vocabulary for source locations,
> > code
> > constructs, machine locations, etc; IDEs and other tooling
> > understand
> > it, so they'd get a source-level view of optimization info "for
> > free".
> > Note that currently our SARIF output captures the contents of every
> > source file referred to by any diagnostics, but we could e.g.
> > capture
> > every source file/header used during the compile, and could capture
> > e.g. SHA1 sums rather than file content.
> > 
> > (d) I've added the ability to add custom info to diagnostic sinks;
> > see
> > e.g. capturing CFG information in 
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e20eee3897ae8cd0f2212dad0710d64df8f1a956
> > 
> > (e) I've added a new publish/subscribe framework to GCC for loosely
> > coupled notifications that would probably help with the
> > implementation
> > (to avoid needing to have the diagnostics subsystem "know" too much
> > about the optimizer).
> > 
> > So possible GCC 17 material might be:
> > 
> > (d) add a new sink to the optinfo subsystem that adds a new pub/sub
> > channel about optimization info, and sends notifications about the
> > optimization records there
> > 
> > (e) add a new option to -fdiagnostics-add-output to capture
> > optinfo,
> > which when enabled subscribes the diagnostic sink to the optinfo
> > notifications channel.  Or we just skip (d) and work more directly
> > with
> > optinfo, but (d) allows some extra flexibility e.g. for plugins
> > that
> > listen for optimization decisions.
> > 
> > (f) potentially add a new option to the SARIF sink to support
> > embedding
> > the data in an ELF section, rather than writing to a file (as per
> > (a)
> > above).
> > 
> > Brainstorming, the user might be able to do something like:
> > 
> > -fdiagnostics-add-output=sarif:elf-
> > section=optimizations,optinfo=inline
> > 
> > or whatnot, and have an ELF section capturing the decisions made by
> > the
> > inliner.
> > 
> > Or we could have an option to send optinfo as diagnostics, like
> > LLVM's
> > optimization records (and (b) above), and have the diagnostics
> > sinks
> > handle them that way (text, SARIF, HTML).
> > 
> > Dave
> > 
> 

Reply via email to