On Mon, 2010-06-28 at 23:30 -0700, Ian Lance Taylor wrote:
> Basile Starynkevitch <bas...@starynkevitch.net> writes:
> 
> > However, I see a slightly more general use of executable_checksum (or
> > something similar) in plugins.  Imagine a plugin that store some
> > information somewhere (e.g. in a database) and which might reload that
> > information later.  It could be very useful (for that or such plugin[s])
> > to store a [nearly] unique identifier of the GCC compiler using it with
> > the data (to avoid reusing the same data with a slightly different GCC
> > compiler, eg 4.5.1 vs 4.5.0).  Then that plugin would be happy to use
> > the executable_checksum to avoid nightmares when incorrectly reusing
> > some data with a slightly different compiler. And version information is
> > not exactly adequate (the same gcc 4.5.0 could be built & configured
> > differently).
> 
> The executable_checksum is very precise, and almost any change to the
> compiler will change it.  That is appropriate for a precompiled header,
> but I don't think it is appropriate for testing whether a plugin works.
> For a plugin, I think it should normally be sufficient to record the
> version and configuration information, both of which should be available
> (e.g., gcc -v can print them out).


Do we have a programmatical way to access configuration information from
inside plugins? (and not only version information)

In particular, I would believe that ENABLE_CHECKING configuration is
very sensitive to binary data stored by potential plugins. Maybe even
some GCC data structures (accessible by plugins) is depending upon such
configuration options.

The scenario I am considering wants to detect any -even small- change to
the compiler (since I suppose a plugin is storing some data -e.g. in a
database- about the compiled file, and retrieving the same data later.
It should be "certain" that the compiler is exactly the same).

I still think that some plugins (MELT in particular) would be happy with
a unique signature of the compiler using them. Actually, I would prefer
it to be a textual signature (because it is simpler to print & to
compare).

In MELT I am just implementing the following precise trick:

  every MELT generated C file includes only one single file
"melt-run.h" [which a few days ago was named run-melt.h]. This file
includes all the rest:

    /*  MELT file melt-run.h included in every generated file.
       all include files for generated code
    #include "gcc-plugin.h"
    /* usual GCC middle-end includes, copied from melt-runtime.c */
    #include "config.h"
    #include "system.h"
    #include "coretypes.h"
    #include "obstack.h"
    #include "tm.h"
    #include "tree.h"
    #include "gimple.h"
    #include "filenames.h"
    #include "tree-pass.h"
    #include "tree-dump.h"
    #include "tree-flow.h"
    #include "tree-iterator.h"
    #include "tree-inline.h"
    #include "basic-block.h"
    #include "cfgloop.h"
    #include "timevar.h"
    #include "ggc.h"
    #include "cgraph.h"
    #include "diagnostic.h"
    #include "flags.h"
    #include "toplev.h"
    #include "options.h"
    #include "params.h"
    #include "real.h"
    #include "prefix.h"
    #include "md5.h"
    #include "cppdefault.h"
    /* MELT specific includes */
    #include "ppl_c.h"
    #include "melt-runtime.h"


The building of MELT computes the md5 hash of the preprocessed output of
melt-run.h into generated file melt-run-md5.h

   ## file melt-run-md5.h contains only the md5 string of preprocessing
   ## of melt-run.h and is used to ensure that the melt-run.h is the one
   ## expected. It is included in melt-runtime.c
   melt-run-md5.h: Makefile  $(srcdir)/melt-run.h $(CONFIG_H)
$(SYSTEM_H) \
      $(TIMEVAR_H) $(TM_H) \
      $(TREE_H)  $(GGC_H) $(BASIC_BLOCK_H) $(GIMPLE_H) $(CFGLOOP_H)  \
      tree-pass.h $(MELT_H) gt-melt-runtime.h $(PLUGIN_H) $(TOPLEV_H)
$(VERSION_H)
           melt_run_md5=`$(CC) -C -E $(ALL_CFLAGS) $(ALL_CPPFLAGS) \
   $(srcdir)/melt-run.h | md5sum | cut -c 1-32`; \
   echo  "const char melt_run_preprocessed_md5[]=\"$$melt_run_md5\";" >
$...@-tmp
           $(SHELL) $(srcdir)/../move-if-change  $...@-tmp $@


A typical generated melt-run-md5.h file contains only one line like

const char
melt_run_preprocessed_md5[]="d5e72c7dd8f4d47ec5b4e996df432d1a";

This is the md5sum of the output of the preprocessor on melt-run.h so it
depends on most of GCC headers.


The MELT infrastructure outputs that hash in generated C files, e.g. 
  /* hash of preprocessed melt-run.h generating this file: */
  const char md5prepromeltrun_melt[]="a67ba20ce4f7a5536152f377645219af";
  #include "melt-run.h"

This is dlsym-ed by MELT which is able to issue a warning when the md5
differs (like here).


I am pretty sure that a plugin which stores some data about the Gimple
of the C file compiled with the help of that plugin in a MySQL database
would want to also store a precise checksum or hash of the GCC binaries
running that plugin, to be able, when, days later, that MySQL data is
reused by this (or a brother) plugin, to warn the user about a mismatch.
I would not be surprised if MILEPOST used such tricks (but I don't know
about the implementation details of MILEPOST so I may be wrong.)

So I do believe that some plugins need to retrieve an hash identifying
the precise GCC compiler executables running them. I also believe it
would be simpler if that hash is a printable string (like I do in MELT).

Any plugin storing data outside of the GCC output files (ie outside of
*.o files) should have a mean to precisely identify the GCC compiler
executables  which produced that data (and version information is not
enough).

And I can assure you that messing slightly incompatible data (or in the
case of MELT, generated C files) coming from slightly different
compilers (e.g. svn revisions of a MELT branch from one week to the
next) is a nightmare. I did lose some hours (and so did some of MELT
users) on such "bugs" and they are hard to find (& easy to check, so I
am adding the check explained above).

Cheers.
-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Reply via email to