On 11/13/19 10:37 AM, Martin Liška wrote:
On 11/7/19 3:50 PM, Egeyar Bagcioglu wrote:
On 11/7/19 10:24 AM, Martin Liška wrote:
On 11/6/19 6:21 PM, Egeyar Bagcioglu wrote:
Hello,
Hello.
Thanks for your detailed reply Martin. You'll find my reply inline.
Since you added Nick Clifton to your following reply, I am adding him
to this email too. He is not only the author of annobin, he also
submitted the -frecord-gcc-switches to GCC. I agree that this
discussion can benefit from his input.
I would like to propose the following patches which introduce a
compile option --record-gcc-command-line. When passed to gcc, it
saves the command line option into the produced object file. The
option makes it trivial to trace back how a file was compiled and
by which version of the gcc. It helps with debugging, reproducing
bugs and repeating the build process.
I like your motivation, we as SUSE would like to have a similar
functionality. But the current approach has some limitations that
make it not usable (will explain later).
I am glad you agree with the motivation. Let me answer below the
other concerns that you have.
This option is similar to -frecord-gcc-switches. However, they have
three fundamental differences: Firstly, -frecord-gcc-switches saves
the internal state after the argv is processed and passed by the
driver. As opposed to that, --record-gcc-command-line saves the
I would not name it as a fundamental changes, it's doing very
similar to what -frecord-gcc-switches does.
It is very similar; however, I still insist that what I outlined are
fundamental differences. As I mentioned in my previous email, I built
binutils as my test-case-project. I attach to this email the output
of "readelf -p .GCC.command.line ld/ld-new", so that you can see how
well the output is merged in general. Please take a look. It saves
the command line *as is* and as one entry per invocation.
Hello.
Ok, works for me and I'm glad you also wrote a counterpart for
bintuils which can easily present the information to a user.
I am glad you liked the output. It is output by readelf without any
additional patches.
For the record, this is just to test and showcase the functionality.
This patch in fact has nothing to do with binutils.
Moreover, we also have one another option -grecord-gcc-switches
that saves command line into DWARF.
As Nick also mentioned many times, -grecord-gcc-switches is in DWARF
and this causes a great disadvantage: it gets stripped out.
Well, that's still something I disagree. I bet RedHat is similarly to
openSUSE also building all packages with a debug info, which
is later stripped and put into a foo-devel package. That's why one can
easily read the compile options from these sub-packages.
My motivation is to write a rpm linter check that will verify that all
object files really used flags that we expect.
I understand your use case. However, some of the use cases we have for
this patch are not for the distros but for the development. Having the
compile options in the object files allows developers to pass around
objects that are compiled differently without needing to tag them
separately. This eases for example the performance analysis. A similar
argument can also be made for reporting bugs.
I believe the -grecord-gcc-switches is moot for the sake of this
discussion. Because I think the discussion surrounding the submission
of -frecord-gcc-switches makes it clear that the necessity to keep
the compile options in the object file is something that is already
agreed on.
Plus there's a Red Hat plugin called Annobin:
https://www.youtube.com/watch?v=uzffr1M-w5M
https://developers.redhat.com/blog/2018/02/20/annobin-storing-information-binaries/
I am aware of annobin, which is already released as a part of RHEL8.
I think it is much superior to what I am aiming here. The sole
purpose of this patch is to keep the command line options in the
object file. I believe this is a basic functionality that should be
supported by the GCC itself, without requiring a plugin.
I fully aggree with you.
In other words, I think pushing a different build of a GCC plugin for
each GCC version we use on each distro (i.e. versions-times-distros
many plugin builds) is an overkill for such a basic need.
Yep.
Those who already use annobin for any of its many use cases, might of
course prefer it over this functionality. For the rest of the distros
and the GCC versions, I believe this patch is quite useful and
extendable for its quite basic purpose.
Main limitation of current approach (and probably the suggested
patch) are:
a) it does not print per function options, which can be modified
with __attribute__ (or pragma):
$ cat demo.c
int foo()
{
return 1;
}
#pragma GCC optimize ("-O3")
int bar()
{
return 0;
}
int main()
{
return 0;
}
I understand the need here. However, the purpose of this patch is
only to save the command line options. Your example is a change in
the source file. Of course, the source file can change. Even the
implementation of the functions themselves might change. But I
believe this is out of the scope of this patch, which is the command
line.
I can easily live with that.
b) we as SUSE are switching to LTO (-flto); doing that each LTO
LTRANS will become one compilation unit and
one will see a misleading command line invocation:
$ gcc -flto -O2 demo2.c -c
$ gcc -flto -O3 demo.c -c
$ gcc demo.o demo2.o -o a.out -frecord-gcc-switches
...
.file "<artificial>"
.section .GCC.command.line,"MS",@progbits,1
.ascii "-mtune=generic"
.zero 1
.ascii "-march=x86-64"
.zero 1
.ascii "-auxbase-strip a.out.ltrans0.ltrans.o"
.zero 1
.ascii "-O3"
.zero 1
.ascii "-fno-openmp"
.zero 1
.ascii "-fno-openacc"
.zero 1
.ascii "-fno-pie"
.zero 1
.ascii "-frecord-gcc-switches"
.zero 1
.ascii "-fltrans"
.zero 1
.ascii "a.out.ltrans0.o"
.zero 1
.text
.type foo, @function
This is a very interesting case indeed. Thanks for bringing it to my
attention. LTO seems to be discarding .GCC.command.line sections.
[egeyar@localhost lto]$ gcc -flto -O3 demo.c -c
--record-gcc-command-line
[egeyar@localhost lto]$ readelf -p .GCC.command.line demo.o
String dump of section '.GCC.command.line':
[ 0] 10.0.0 20191025 (experimental) : gcc -flto -O3 demo.c -c
--record-gcc-command-line
[egeyar@localhost lto]$ gcc -flto -O2 demo2.c -c
--record-gcc-command-line
[egeyar@localhost lto]$ readelf -p .GCC.command.line demo2.o
String dump of section '.GCC.command.line':
[ 0] 10.0.0 20191025 (experimental) : gcc -flto -O2 demo2.c
-c --record-gcc-command-line
[egeyar@localhost lto]$ gcc demo.o demo2.o -o a.out
[egeyar@localhost lto]$ readelf -p .GCC.command.line a.out
readelf: a.out: Warning: Section '.GCC.command.line' was not dumped
because it does not exist!
[egeyar@localhost lto]$ gcc demo.o demo2.o -o a.out
--record-gcc-command-line
[egeyar@localhost lto]$ readelf -p .GCC.command.line a.out
String dump of section '.GCC.command.line':
[ 0] 10.0.0 20191025 (experimental) : gcc @/tmp/ccSlWkxd
Running this last command with "-v" reveals that gcc is called twice
with temporary files. Only the .GCC.command.line of the second call
survives.> I believe the beauty of this implementation is that driver
saves the options *as it receives* in an intermediate file. It then
passes this -or not- to the child processes. I am not familiar with
the internals of LTO. However, thinking that those string sections
can actually be merged and output, this implementation can also be
extended for lto1 to handle the option correctly, and to suppress it
for its calls to gcc.
As already mentioned, the following feature is crucial for me. LTO
works in following steps:
1) you first generate LTO bytecode with -c -flto (so you have N input
files for which you want
to report command lines)
2) Then you do so called WPA phase, you load all N files and stream
bytecode to M files (usually to 128)
3) Then, these M files emit proper .s file and you eventually end up
with an executable (or a shared library).
That said, you have to stream the parsed command line through LTO and
eventually emit multiple records based
for each incoming N files.
Before receiving this email, I checked the linker side of it, only to
realize that the linker is replacing those N files with dummy bfds and
ignores everything that's put in there when the files are claimed by the
plugin. Therefore, it turns out my guess that they needed to be merged
on the linker side might not be the least invasive.
Independently from this patch, I believe it would be good to have LTO
output such sections.
c) Current option recording is missing macros, which can influence
compilation significantly:
-D_FORTIFY_SOURCE=2
This is not the case for this patch:
[egeyar@localhost save-commandline]$ g++ main.c
--record-gcc-command-line -D_FORTIFY_SOURCE=2 -O3
[egeyar@localhost save-commandline]$ readelf -p .GCC.command.line a.out
String dump of section '.GCC.command.line':
[ 0] 10.0.0 20191025 (experimental) : g++ main.c
--record-gcc-command-line -D_FORTIFY_SOURCE=2 -O3
That's fine, thanks.
May I please ask you to take a look at LTO deeper? The patch will
really help us significantly.
I will look into that within a few weeks. Thanks again for bringing this
case to my attention.
Best regards
Egeyar
Thanks,
Martin
The attached file showcases this as well. Please take a look, I think
you might find it useful.
Regards
Egeyar
Martin
command-line as received by the driver. Secondly,
-frecord-gcc-switches saves the switches as separate entries into a
mergeable string section. Therefore, the entries belonging to
different object files get mixed up after being linked. The new
--record-gcc-command-line, on the other hand, creates one entry per
invocation. By doing so, it makes it clear which options were used
together in a single gcc invocation. Lastly,
--record-gcc-command-line also adds the version of the gcc into this
single entry to make it clear which version of gcc was called with
any given command line. This is useful in cases where .comment
section reports multiple versions.
While there are also similarities between the implementations of
these two options, they are completely independent. These commands
can be used separately or together without issues. I used the same
section that -frecord-gcc-switches uses on purpose. I could not use
the name -frecord-gcc-command-line for this option; because of a
{f*} in the specs, which forwards all options starting with -f to
cc1/cc1plus as is. This is not we want for this option. We would
like to append it a filename as well to pass the argv of the driver
to child processes.
This functionality operates as the following: It saves gcc's argv
into a temporary file, and passes --record-gcc-command-line
<tempfilename> to cc1 or cc1plus. The functionality of the backend
is implemented via a hook. This patch includes an example
implementation of the hook for elf targets:
elf_record_gcc_command_line function. This function reads the given
file and writes gcc's version and the command line into a mergeable
string section, .GCC.command.line.
Here is an *example usage* of the option:
[egeyar@localhost save-commandline]$ gcc main.c
--record-gcc-command-line
[egeyar@localhost save-commandline]$ readelf -p .GCC.command.line
a.out
String dump of section '.GCC.command.line':
[ 0] 10.0.0 20191025 (experimental) : gcc main.c
--record-gcc-command-line
The following is a *second example* calling g++ with -save-temps,
-frecord-gcc-switches, and repetition of options, where
--save-temps saves the intermediate file, main.cmdline in this
case. You can see that the options are recorded unprocessed:
[egeyar@localhost save-commandline]$ g++ main.c -save-temps
--record-gcc-command-line -O0 -O2 -O3 --record-gcc-command-line
[egeyar@localhost save-commandline]$ readelf -p .GCC.command.line
a.out
String dump of section '.GCC.command.line':
[ 0] 10.0.0 20191025 (experimental) : g++ main.c
-save-temps --record-gcc-command-line -O0 -O2 -O3
--record-gcc-command-line
Here is a *third example* calling g++ with both
-frecord-gcc-switches and --record-gcc-command-line for comparison:
[egeyar@localhost save-commandline]$ g++ main.c
--record-gcc-command-line -frecord-gcc-switches
[egeyar@localhost save-commandline]$ readelf -p .GCC.command.line
a.out
String dump of section '.GCC.command.line':
[ 0] 10.0.0 20191025 (experimental) : g++ main.c
--record-gcc-command-line -frecord-gcc-switches
[ 5c] -D_GNU_SOURCE
[ 6a] main.c
[ 71] -mtune=generic
[ 80] -march=x86-64
[ 8e] --record-gcc-command-line /tmp/ccgC4ZtS.cmdline
The first patch of this two-patch-series only extends the testsuite
machinery, while the second patch implements this functionality and
adds a test case for it. In addition to that new test case, I built
binutils as my test case after passing this option to CFLAGS. The
added .GCC.command.line section of ld listed many compile commands
as expected. Tested on x86_64-pc-linux-gnu.
Please review the patches, let me know what you think and apply if
appropriate.
Regards
Egeyar
Egeyar Bagcioglu (2):
Introduce dg-require-target-object-format
Introduce the gcc option --record-gcc-command-line
gcc/common.opt | 4 +++
gcc/config/elfos.h | 5 +++
gcc/doc/tm.texi | 22 ++++++++++++
gcc/doc/tm.texi.in | 4 +++
gcc/gcc.c | 41
++++++++++++++++++++++
gcc/gcc.h | 1 +
gcc/target.def | 30
++++++++++++++++
gcc/target.h | 3 ++
.../c-c++-common/record-gcc-command-line.c | 8 +++++
gcc/testsuite/lib/target-supports-dg.exp | 11 ++++++
gcc/toplev.c | 13 +++++++
gcc/varasm.c | 36
+++++++++++++++++++
12 files changed, 178 insertions(+)
create mode 100644
gcc/testsuite/c-c++-common/record-gcc-command-line.c