At this point, I see two possibilities to implement a v2:

1. Invoke a callback in fold_offsetof and add a test case that ensures
both __builtin_offsetof and "((size_t)&((struct
S*)0)->m)" end up going through fold_offsetof. But this is still at an
oddly low level and could cause issues when refactoring in the future.

2. Invoke the callback from the parser itself. To catch both cases, it
would have to be called from c_parser_postfix_expression for the
RID_OFFSETOF token and from c_parser_postfix_expression_after_primary
for CPP_DOT and CPP_DEREF. Technically, even in the CPP_DOT and
CPP_DEREF cases, a check could determine if the parsed construct is
offsetof-like. However, the parsing happens from the inside-out, so
the check would trigger potentially many times for a single offsetof
(e.g., "((size_t)&((struct S*)0)->m0.m1.m2.m3....)"), which would
cause the callback to be invoked repeatedly. So perhaps it should not
be restricted to offsetof, but any parsed component references? For
example, I could make a PLUGIN_PARSE_COMPONENT_REF event and callback,
or an equivalent solution outside the public plugin API.

Which of these sounds best? Do you see any alternatives?

Kind Regards
Jasper

On Mon, Oct 20, 2025 at 3:38 PM Jasper Niebuhr <[email protected]> wrote:
>
> Invoking a hook in c_parser_postfix_expression catches
> __builtin_offsetof, but expressions such as "((size_t)&((struct
> S*)0)->m)" are still lost irreversibly. For the plugin I'm writing, I
> need to get them all.
>
> On Mon, Oct 20, 2025 at 3:16 PM Richard Biener
> <[email protected]> wrote:
> >
> > On Mon, Oct 20, 2025 at 1:23 PM Jasper Niebuhr <[email protected]> 
> > wrote:
> > >
> > > That makes total sense. The COMPONENT_REFs that I know are being
> > > folded are either in __builtin_offsetof or offsetof-like constructs,
> > > e.g. "((size_t)&((struct S*)0)->m)". The former case calls
> > > fold_offsetof immediately, while parsing. Expressions using the latter
> > > are not folded during parsing, but right after. At the moment, the
> > > folding logic for this still ends up calling fold_offsetof to do the
> > > job.
> > >
> > > Technically, I could invoke a callback from inside fold_offsetof.
> > > However, I find it somewhat fragile to assume that future refactorings
> > > of the folder will always continue to route offseof-like folding
> > > through fold_offsetof(). The logic could easily be inlined or
> > > delegated elsewhere. I suppose an accompanying test case with a
> > > minimal plugin could ensure that the callback is actually invoked in
> > > all relevant cases and would catch any future change that bypasses
> > > this path.
> >
> > So is it good enough to invoke an existing hook before this early
> > foldiing (I suppose from c_parser_postfix_expression?)
> >
> > > Does that sound good?
> > >
> > > On Mon, Oct 20, 2025 at 11:19 AM Richard Biener
> > > <[email protected]> wrote:
> > > >
> > > > On Mon, Oct 20, 2025 at 10:36 AM Jasper Niebuhr
> > > > <[email protected]> wrote:
> > > > >
> > > > > I'm not aware of any mechanisms, or a foundation for introducing a new
> > > > > mechanism that allows associating original trees with folded constant
> > > > > expressions, without the risk of those being lost during further
> > > > > folding.
> > > > >
> > > > > That said, I see your point about a new global plugin event being
> > > > > heavy-weight in terms of API surface and maintenance. I could make
> > > > > this a front-end-local callback. For example, the C front-end could
> > > > > expose an internal registration function, implemented in c-common.cc
> > > > > and declared only in c-common.h.
> > > > >
> > > > > Since c-common.h is not included in the public plugin headers, this
> > > > > wouldn't become part of the documented plugin API. Plugins that really
> > > > > need this would have to declare the function prototype themselves and
> > > > > link against the C front end. In other words, this would be a private,
> > > > > opt-in mechanism rather than a public API commitment, so it shouldn’t
> > > > > constrain future refactoring.
> > > > >
> > > > > If that seems reasonable, I can prepare a v2 implementing it this way.
> > > >
> > > > I guess my thinking is more in the line of this being very much a too 
> > > > low-level
> > > > point to do any interception.  As for catching expressions pre-folding 
> > > > and a
> > > > frontend specific hook I would suggest to research into a direection to 
> > > > have
> > > > a hook at the point the parser finishes parsing certain constructs - 
> > > > I'm not
> > > > sure what exactly we are looking at, possibly {r,l}values?  A point 
> > > > before
> > > > semantic analysis which is what breaks your case as far as I understand.
> > > >
> > > > Richard.
> > > >
> > > > > On Mon, Oct 20, 2025 at 9:22 AM Richard Biener
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Sat, Oct 18, 2025 at 3:49 PM York Jasper Niebuhr
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > This patch adds the PLUGIN_BUILD_COMPONENT_REF callback, which is 
> > > > > > > invoked
> > > > > > > by the C front end when a COMPONENT_REF node is built. The 
> > > > > > > callback
> > > > > > > receives a pointer to the COMPONENT_REF tree (of type 'tree *'). 
> > > > > > > Plugins
> > > > > > > may replace the node by assigning through the pointer, but any
> > > > > > > replacement must be type-compatible with the original node.
> > > > > > >
> > > > > > > The callback allows plugins to observe or instrument struct member
> > > > > > > accesses that would otherwise be lost due to folding before the 
> > > > > > > earliest
> > > > > > > possible plugin pass or hook. In particular, the fold_offsetof
> > > > > > > functionality removes all traces of type and member information in
> > > > > > > offsetof-like trees, leaving only an integer constant for plugins 
> > > > > > > to
> > > > > > > inspect.
> > > > > > >
> > > > > > > A considered alternative was to disable fold_offsetof altogether.
> > > > > > > However, that prevents offsetof expressions from qualifying as
> > > > > > > constant-expressions; for example, static assertions can no 
> > > > > > > longer be
> > > > > > > evaluated if they contain non-folded offsetof expressions. The 
> > > > > > > callback
> > > > > > > provides fine-grained control over individual COMPONENT_REFs 
> > > > > > > instead of
> > > > > > > universally changing folding behavior.
> > > > > >
> > > > > > I think a hook on COMPONENT_REF building is quite heavy-weight.  IMO
> > > > > > folding required for constant-expression-ness diagnosing might be 
> > > > > > better
> > > > > > of exposing both the folding result and the original tree somehow?
> > > > > >
> > > > > > > A typical use case would be to replace a select set of 
> > > > > > > COMPONENT_REF
> > > > > > > nodes with type-compatible expressions calling a placeholder 
> > > > > > > function,
> > > > > > > e.g. __deferred_offsetof(type, member). These calls cannot be 
> > > > > > > folded
> > > > > > > away and thus remain available for plugin analysis in later 
> > > > > > > passes.
> > > > > > > Offsets not of interest can be left untouched, preserving their 
> > > > > > > const
> > > > > > > qualification and use in static assertions.
> > > > > > >
> > > > > > > Allowing PLUGIN_BUILD_COMPONENT_REF to alter COMPONENT_REF nodes 
> > > > > > > required
> > > > > > > minor adjustments to fold_offsetof, which assumes a specific input
> > > > > > > format. Code paths that cannot guarantee that format should now 
> > > > > > > use
> > > > > > > fold_offsetof_maybe(), which attempts to fold normally but, on 
> > > > > > > failure,
> > > > > > > casts the unfolded expression to the desired output type.
> > > > > > >
> > > > > > > If the callback is not used to alter COMPONENT_REF trees, there 
> > > > > > > is **no
> > > > > > > change** in GCC’s behavior.
> > > > > > >
> > > > > > > Signed-off-by: York Jasper Niebuhr <[email protected]>
> > > > > > >
> > > > > > > ---
> > > > > > >  gcc/c-family/c-common.cc | 48 
> > > > > > > +++++++++++++++++++++++++++++++---------
> > > > > > >  gcc/c-family/c-common.h  |  3 ++-
> > > > > > >  gcc/c/c-parser.cc        |  2 +-
> > > > > > >  gcc/c/c-typeck.cc        | 12 ++++++++++
> > > > > > >  gcc/doc/plugins.texi     |  6 +++++
> > > > > > >  gcc/plugin.cc            |  2 ++
> > > > > > >  gcc/plugin.def           |  6 +++++
> > > > > > >  7 files changed, 66 insertions(+), 13 deletions(-)
> > > > > > >
> > > > > > > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > > > > > > index 587d76461e9..d34edfaa688 100644
> > > > > > > --- a/gcc/c-family/c-common.cc
> > > > > > > +++ b/gcc/c-family/c-common.cc
> > > > > > > @@ -7076,43 +7076,48 @@ c_common_to_target_charset (HOST_WIDE_INT 
> > > > > > > c)
> > > > > > >     the whole expression.  Return the folded result.  */
> > > > > > >
> > > > > > >  tree
> > > > > > > -fold_offsetof (tree expr, tree type, enum tree_code ctx)
> > > > > > > +fold_offsetof (tree expr, tree type, enum tree_code ctx, bool 
> > > > > > > may_fail)
> > > > > > >  {
> > > > > > >    tree base, off, t;
> > > > > > >    tree_code code = TREE_CODE (expr);
> > > > > > > +
> > > > > > >    switch (code)
> > > > > > >      {
> > > > > > >      case ERROR_MARK:
> > > > > > >        return expr;
> > > > > > >
> > > > > > >      case VAR_DECL:
> > > > > > > -      error ("cannot apply %<offsetof%> to static data member 
> > > > > > > %qD", expr);
> > > > > > > +      if (!may_fail)
> > > > > > > +       error ("cannot apply %<offsetof%> to static data member 
> > > > > > > %qD", expr);
> > > > > > >        return error_mark_node;
> > > > > > >
> > > > > > >      case CALL_EXPR:
> > > > > > >      case TARGET_EXPR:
> > > > > > > -      error ("cannot apply %<offsetof%> when %<operator[]%> is 
> > > > > > > overloaded");
> > > > > > > +      if (!may_fail)
> > > > > > > +       error ("cannot apply %<offsetof%> when %<operator[]%> is 
> > > > > > > overloaded");
> > > > > > >        return error_mark_node;
> > > > > > >
> > > > > > >      case NOP_EXPR:
> > > > > > >      case INDIRECT_REF:
> > > > > > >        if (!TREE_CONSTANT (TREE_OPERAND (expr, 0)))
> > > > > > >         {
> > > > > > > -         error ("cannot apply %<offsetof%> to a non constant 
> > > > > > > address");
> > > > > > > +         if (!may_fail)
> > > > > > > +           error ("cannot apply %<offsetof%> to a non constant 
> > > > > > > address");
> > > > > > >           return error_mark_node;
> > > > > > >         }
> > > > > > >        return convert (type, TREE_OPERAND (expr, 0));
> > > > > > >
> > > > > > >      case COMPONENT_REF:
> > > > > > > -      base = fold_offsetof (TREE_OPERAND (expr, 0), type, code);
> > > > > > > +      base = fold_offsetof (TREE_OPERAND (expr, 0), type, code, 
> > > > > > > may_fail);
> > > > > > >        if (base == error_mark_node)
> > > > > > >         return base;
> > > > > > >
> > > > > > >        t = TREE_OPERAND (expr, 1);
> > > > > > >        if (DECL_C_BIT_FIELD (t))
> > > > > > >         {
> > > > > > > -         error ("attempt to take address of bit-field structure "
> > > > > > > -                "member %qD", t);
> > > > > > > +         if (!may_fail)
> > > > > > > +           error ("attempt to take address of bit-field 
> > > > > > > structure "
> > > > > > > +                  "member %qD", t);
> > > > > > >           return error_mark_node;
> > > > > > >         }
> > > > > > >        off = size_binop_loc (input_location, PLUS_EXPR, 
> > > > > > > DECL_FIELD_OFFSET (t),
> > > > > > > @@ -7121,7 +7126,7 @@ fold_offsetof (tree expr, tree type, enum 
> > > > > > > tree_code ctx)
> > > > > > >        break;
> > > > > > >
> > > > > > >      case ARRAY_REF:
> > > > > > > -      base = fold_offsetof (TREE_OPERAND (expr, 0), type, code);
> > > > > > > +      base = fold_offsetof (TREE_OPERAND (expr, 0), type, code, 
> > > > > > > may_fail);
> > > > > > >        if (base == error_mark_node)
> > > > > > >         return base;
> > > > > > >
> > > > > > > @@ -7178,17 +7183,38 @@ fold_offsetof (tree expr, tree type, enum 
> > > > > > > tree_code ctx)
> > > > > > >      case COMPOUND_EXPR:
> > > > > > >        /* Handle static members of volatile structs.  */
> > > > > > >        t = TREE_OPERAND (expr, 1);
> > > > > > > -      gcc_checking_assert (VAR_P (get_base_address (t)));
> > > > > > > -      return fold_offsetof (t, type);
> > > > > > > +      if (!VAR_P (get_base_address (t)))
> > > > > > > +       return error_mark_node;
> > > > > > > +      return fold_offsetof (t, type, ERROR_MARK, may_fail);
> > > > > > >
> > > > > > >      default:
> > > > > > > -      gcc_unreachable ();
> > > > > > > +      return error_mark_node;
> > > > > > >      }
> > > > > > >
> > > > > > >    if (!POINTER_TYPE_P (type))
> > > > > > >      return size_binop (PLUS_EXPR, base, convert (type, off));
> > > > > > >    return fold_build_pointer_plus (base, off);
> > > > > > >  }
> > > > > > > +
> > > > > > > +/* Tries folding expr using fold_offsetof.  On success, the 
> > > > > > > folded offsetof
> > > > > > > +   is returned.  On failure, the original expr is wrapped in an 
> > > > > > > ADDR_EXPR
> > > > > > > +   and converted to the desired expression type.  The resulting 
> > > > > > > expression
> > > > > > > +   may or may not be constant!  */
> > > > > > > +
> > > > > > > +tree
> > > > > > > +fold_offsetof_maybe (tree expr, tree type)
> > > > > > > +{
> > > > > > > +  /* expr might not have the correct structure, thus folding may 
> > > > > > > fail.  */
> > > > > > > +  tree maybe_folded = fold_offsetof (expr, type, ERROR_MARK, 
> > > > > > > true);
> > > > > > > +  if (maybe_folded != error_mark_node)
> > > > > > > +    return maybe_folded;
> > > > > > > +
> > > > > > > +  tree ptr_type = build_pointer_type (TREE_TYPE (expr));
> > > > > > > +  tree ptr = build1 (ADDR_EXPR, ptr_type, expr);
> > > > > > > +
> > > > > > > +  return fold_convert (type, ptr);
> > > > > > > +}
> > > > > > > +
> > > > > > >
> > > > > > >  /* *PTYPE is an incomplete array.  Complete it with a domain 
> > > > > > > based on
> > > > > > >     INITIAL_VALUE.  If INITIAL_VALUE is not present, use 1 if 
> > > > > > > DO_DEFAULT
> > > > > > > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> > > > > > > index ea6c2975056..70fcfeb6661 100644
> > > > > > > --- a/gcc/c-family/c-common.h
> > > > > > > +++ b/gcc/c-family/c-common.h
> > > > > > > @@ -1174,7 +1174,8 @@ extern bool c_dump_tree (void *, tree);
> > > > > > >  extern void verify_sequence_points (tree);
> > > > > > >
> > > > > > >  extern tree fold_offsetof (tree, tree = size_type_node,
> > > > > > > -                          tree_code ctx = ERROR_MARK);
> > > > > > > +                          tree_code ctx = ERROR_MARK, bool 
> > > > > > > may_fail = false);
> > > > > > > +extern tree fold_offsetof_maybe (tree, tree = size_type_node);
> > > > > > >
> > > > > > >  extern int complete_array_type (tree *, tree, bool);
> > > > > > >  extern void complete_flexible_array_elts (tree);
> > > > > > > diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> > > > > > > index 22ec0f849b7..6a8a5d58e6d 100644
> > > > > > > --- a/gcc/c/c-parser.cc
> > > > > > > +++ b/gcc/c/c-parser.cc
> > > > > > > @@ -11823,7 +11823,7 @@ c_parser_postfix_expression (c_parser 
> > > > > > > *parser)
> > > > > > >             location_t end_loc = c_parser_peek_token 
> > > > > > > (parser)->get_finish ();
> > > > > > >             c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
> > > > > > >                                        "expected %<)%>");
> > > > > > > -           expr.value = fold_offsetof (offsetof_ref);
> > > > > > > +           expr.value = fold_offsetof_maybe (offsetof_ref);
> > > > > > >             set_c_expr_source_range (&expr, loc, end_loc);
> > > > > > >           }
> > > > > > >           break;
> > > > > > > diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> > > > > > > index 55d896e02df..aff6dce36fb 100644
> > > > > > > --- a/gcc/c/c-typeck.cc
> > > > > > > +++ b/gcc/c/c-typeck.cc
> > > > > > > @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not 
> > > > > > > see
> > > > > > >  #include "realmpfr.h"
> > > > > > >  #include "tree-pretty-print-markup.h"
> > > > > > >  #include "gcc-urlifier.h"
> > > > > > > +#include "plugin.h"
> > > > > > >
> > > > > > >  /* Possible cases of implicit conversions.  Used to select 
> > > > > > > diagnostic messages
> > > > > > >     and control folding initializers in convert_for_assignment.  
> > > > > > > */
> > > > > > > @@ -133,6 +134,7 @@ static int lvalue_or_else (location_t, 
> > > > > > > const_tree, enum lvalue_use);
> > > > > > >  static void record_maybe_used_decl (tree);
> > > > > > >  static bool comptypes_internal (const_tree, const_tree,
> > > > > > >                                 struct comptypes_data *data);
> > > > > > > +
> > > > > > >
> > > > > > >  /* Return true if EXP is a null pointer constant, false 
> > > > > > > otherwise.  */
> > > > > > >
> > > > > > > @@ -3174,6 +3176,16 @@ build_component_ref (location_t loc, tree 
> > > > > > > datum, tree component,
> > > > > > >           else if (TREE_DEPRECATED (subdatum))
> > > > > > >             warn_deprecated_use (subdatum, NULL_TREE);
> > > > > > >
> > > > > > > +      tree pre_cb_type = TREE_TYPE (ref);
> > > > > > > +      if (invoke_plugin_callbacks (PLUGIN_BUILD_COMPONENT_REF, 
> > > > > > > &ref)
> > > > > > > +             == PLUGEVT_SUCCESS
> > > > > > > +             && !comptypes (TREE_TYPE (ref), pre_cb_type))
> > > > > > > +       {
> > > > > > > +         error_at (EXPR_LOCATION (ref),
> > > > > > > +                   "PLUGIN_BUILD_COMPONENT_REF callback returned"
> > > > > > > +                   " expression of incompatible type");
> > > > > > > +       }
> > > > > > > +
> > > > > > >           datum = ref;
> > > > > > >
> > > > > > >           field = TREE_CHAIN (field);
> > > > > > > diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi
> > > > > > > index c11167a34ef..312f178fab4 100644
> > > > > > > --- a/gcc/doc/plugins.texi
> > > > > > > +++ b/gcc/doc/plugins.texi
> > > > > > > @@ -222,6 +222,12 @@ enum plugin_event
> > > > > > >       ana::plugin_analyzer_init_iface *.  */
> > > > > > >    PLUGIN_ANALYZER_INIT,
> > > > > > >
> > > > > > > +  /* Called by the C front end when a COMPONENT_REF node is 
> > > > > > > built.  The
> > > > > > > +     callback receives a pointer to the COMPONENT_REF tree (of 
> > > > > > > type 'tree *').
> > > > > > > +     Plugins may replace the node by assigning through the 
> > > > > > > pointer, but any
> > > > > > > +     replacement must be type-compatible with the original node. 
> > > > > > >  */
> > > > > > > +  PLUGIN_BUILD_COMPONENT_REF,
> > > > > > > +
> > > > > > >    PLUGIN_EVENT_FIRST_DYNAMIC    /* Dummy event used for indexing 
> > > > > > > callback
> > > > > > >                                     array.  */
> > > > > > >  @};
> > > > > > > diff --git a/gcc/plugin.cc b/gcc/plugin.cc
> > > > > > > index 0de2cc2dd2c..975e8c4e291 100644
> > > > > > > --- a/gcc/plugin.cc
> > > > > > > +++ b/gcc/plugin.cc
> > > > > > > @@ -500,6 +500,7 @@ register_callback (const char *plugin_name,
> > > > > > >        case PLUGIN_NEW_PASS:
> > > > > > >        case PLUGIN_INCLUDE_FILE:
> > > > > > >        case PLUGIN_ANALYZER_INIT:
> > > > > > > +      case PLUGIN_BUILD_COMPONENT_REF:
> > > > > > >          {
> > > > > > >            struct callback_info *new_callback;
> > > > > > >            if (!callback)
> > > > > > > @@ -581,6 +582,7 @@ invoke_plugin_callbacks_full (int event, void 
> > > > > > > *gcc_data)
> > > > > > >        case PLUGIN_NEW_PASS:
> > > > > > >        case PLUGIN_INCLUDE_FILE:
> > > > > > >        case PLUGIN_ANALYZER_INIT:
> > > > > > > +      case PLUGIN_BUILD_COMPONENT_REF:
> > > > > > >          {
> > > > > > >            /* Iterate over every callback registered with this 
> > > > > > > event and
> > > > > > >               call it.  */
> > > > > > > diff --git a/gcc/plugin.def b/gcc/plugin.def
> > > > > > > index 94e012a1e00..b0335178762 100644
> > > > > > > --- a/gcc/plugin.def
> > > > > > > +++ b/gcc/plugin.def
> > > > > > > @@ -103,6 +103,12 @@ DEFEVENT (PLUGIN_INCLUDE_FILE)
> > > > > > >     ana::plugin_analyzer_init_iface *.  */
> > > > > > >  DEFEVENT (PLUGIN_ANALYZER_INIT)
> > > > > > >
> > > > > > > +/* Called by the C front end when a COMPONENT_REF node is built.
> > > > > > > +   The callback receives a pointer to the COMPONENT_REF tree (of 
> > > > > > > type 'tree *').
> > > > > > > +   Plugins may replace the node by assigning through the 
> > > > > > > pointer, but any
> > > > > > > +   replacement must be type-compatible with the original node.  
> > > > > > > */
> > > > > > > +DEFEVENT (PLUGIN_BUILD_COMPONENT_REF)
> > > > > > > +
> > > > > > >  /* When adding a new hard-coded plugin event, don't forget to 
> > > > > > > edit in
> > > > > > >     file plugin.cc the functions register_callback and
> > > > > > >     invoke_plugin_callbacks_full accordingly!  */
> > > > > > > --
> > > > > > > 2.43.0
> > > > > > >

Reply via email to