On Mon, 2010-06-28 at 23:30 -0700, Ian Lance Taylor wrote: > Basile Starynkevitch <bas...@starynkevitch.net> writes: > > > However, I see a slightly more general use of executable_checksum (or > > something similar) in plugins. Imagine a plugin that store some > > information somewhere (e.g. in a database) and which might reload that > > information later. It could be very useful (for that or such plugin[s]) > > to store a [nearly] unique identifier of the GCC compiler using it with > > the data (to avoid reusing the same data with a slightly different GCC > > compiler, eg 4.5.1 vs 4.5.0). Then that plugin would be happy to use > > the executable_checksum to avoid nightmares when incorrectly reusing > > some data with a slightly different compiler. And version information is > > not exactly adequate (the same gcc 4.5.0 could be built & configured > > differently). > > The executable_checksum is very precise, and almost any change to the > compiler will change it. That is appropriate for a precompiled header, > but I don't think it is appropriate for testing whether a plugin works. > For a plugin, I think it should normally be sufficient to record the > version and configuration information, both of which should be available > (e.g., gcc -v can print them out).
Do we have a programmatical way to access configuration information from inside plugins? (and not only version information) In particular, I would believe that ENABLE_CHECKING configuration is very sensitive to binary data stored by potential plugins. Maybe even some GCC data structures (accessible by plugins) is depending upon such configuration options. The scenario I am considering wants to detect any -even small- change to the compiler (since I suppose a plugin is storing some data -e.g. in a database- about the compiled file, and retrieving the same data later. It should be "certain" that the compiler is exactly the same). I still think that some plugins (MELT in particular) would be happy with a unique signature of the compiler using them. Actually, I would prefer it to be a textual signature (because it is simpler to print & to compare). In MELT I am just implementing the following precise trick: every MELT generated C file includes only one single file "melt-run.h" [which a few days ago was named run-melt.h]. This file includes all the rest: /* MELT file melt-run.h included in every generated file. all include files for generated code #include "gcc-plugin.h" /* usual GCC middle-end includes, copied from melt-runtime.c */ #include "config.h" #include "system.h" #include "coretypes.h" #include "obstack.h" #include "tm.h" #include "tree.h" #include "gimple.h" #include "filenames.h" #include "tree-pass.h" #include "tree-dump.h" #include "tree-flow.h" #include "tree-iterator.h" #include "tree-inline.h" #include "basic-block.h" #include "cfgloop.h" #include "timevar.h" #include "ggc.h" #include "cgraph.h" #include "diagnostic.h" #include "flags.h" #include "toplev.h" #include "options.h" #include "params.h" #include "real.h" #include "prefix.h" #include "md5.h" #include "cppdefault.h" /* MELT specific includes */ #include "ppl_c.h" #include "melt-runtime.h" The building of MELT computes the md5 hash of the preprocessed output of melt-run.h into generated file melt-run-md5.h ## file melt-run-md5.h contains only the md5 string of preprocessing ## of melt-run.h and is used to ensure that the melt-run.h is the one ## expected. It is included in melt-runtime.c melt-run-md5.h: Makefile $(srcdir)/melt-run.h $(CONFIG_H) $(SYSTEM_H) \ $(TIMEVAR_H) $(TM_H) \ $(TREE_H) $(GGC_H) $(BASIC_BLOCK_H) $(GIMPLE_H) $(CFGLOOP_H) \ tree-pass.h $(MELT_H) gt-melt-runtime.h $(PLUGIN_H) $(TOPLEV_H) $(VERSION_H) melt_run_md5=`$(CC) -C -E $(ALL_CFLAGS) $(ALL_CPPFLAGS) \ $(srcdir)/melt-run.h | md5sum | cut -c 1-32`; \ echo "const char melt_run_preprocessed_md5[]=\"$$melt_run_md5\";" > $...@-tmp $(SHELL) $(srcdir)/../move-if-change $...@-tmp $@ A typical generated melt-run-md5.h file contains only one line like const char melt_run_preprocessed_md5[]="d5e72c7dd8f4d47ec5b4e996df432d1a"; This is the md5sum of the output of the preprocessor on melt-run.h so it depends on most of GCC headers. The MELT infrastructure outputs that hash in generated C files, e.g. /* hash of preprocessed melt-run.h generating this file: */ const char md5prepromeltrun_melt[]="a67ba20ce4f7a5536152f377645219af"; #include "melt-run.h" This is dlsym-ed by MELT which is able to issue a warning when the md5 differs (like here). I am pretty sure that a plugin which stores some data about the Gimple of the C file compiled with the help of that plugin in a MySQL database would want to also store a precise checksum or hash of the GCC binaries running that plugin, to be able, when, days later, that MySQL data is reused by this (or a brother) plugin, to warn the user about a mismatch. I would not be surprised if MILEPOST used such tricks (but I don't know about the implementation details of MILEPOST so I may be wrong.) So I do believe that some plugins need to retrieve an hash identifying the precise GCC compiler executables running them. I also believe it would be simpler if that hash is a printable string (like I do in MELT). Any plugin storing data outside of the GCC output files (ie outside of *.o files) should have a mean to precisely identify the GCC compiler executables which produced that data (and version information is not enough). And I can assure you that messing slightly incompatible data (or in the case of MELT, generated C files) coming from slightly different compilers (e.g. svn revisions of a MELT branch from one week to the next) is a nightmare. I did lose some hours (and so did some of MELT users) on such "bugs" and they are hard to find (& easy to check, so I am adding the check explained above). Cheers. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***