Re: [RFC PATCH 2/7] mangle: Introduce C typeinfo mangling API

Qing Zhao Fri, 22 Aug 2025 08:21:35 -0700


> On Aug 21, 2025, at 17:29, Kees Cook <k...@kernel.org> wrote:
> 
> On Thu, Aug 21, 2025 at 07:14:31PM +0000, Qing Zhao wrote:
>> 
>> 
>>> On Aug 21, 2025, at 12:16, Kees Cook <k...@kernel.org> wrote:
>>> 
>>> 
>>>>> +  else if (TREE_CODE (fntype_or_fndecl) == FUNCTION_DECL)
>>>>> +    {
>>>>> +      tree fndecl = fntype_or_fndecl;
>>>>> +      tree base_fntype = TREE_TYPE (fndecl);
>>>>> +
>>>>> +      /* For FUNCTION_DECL, build a synthetic function type using
>>>>> DECL_ARGUMENTS
>>>>> +        if available to preserve typedef information.  */
>>>>> 
>>>> 
>>>> Why do the building? Seems like you could just do that work here. Also
>>>> doesn't FUNCTION_DECL's type have exactly what you need?
>>> 
>>> I need the function prototype in three places:
>>> 
>>> - address-taken extern functions
>>> - function preambles
>>> - indirect call sites
>>> 
>> 
>> A little confused with the above:
>> 
>> From my understanding, 
>> 
>> 1. At each indirect call sites, we should generate the checking code to 
>>     A. load the hashed precomputed typeid from the callee’s preamble 
>>     B. compare it with the precomputed typeid for this call site
>> 
>>    So, we need the function prototype of  the indirect call site to compute 
>> the typeid for this call site.
> 
> Correct.
> 
>> 2. For every “address-taken” function, we should generate the function
>>    preamble, in which the precomputed typeid for this function is stored. 
>> 
>>    So, we need the function prototype of  this function to compute the 
>> typeid for this function. 
>> 
>> The above 2 should cover all the KCFI ABIs.
> 
> For non-static functions, we cannot know if other compilation units may
> make indirect calls to a given function, so those functions must always
> have their kcfi preamble added. For static functions, if they are
> address-taken by the current compilation unit, then they must get a kcfi
> preamble added.


Oh, yeah, I see. without lto or whole-program-mode, we cannot determine 
whether a extern function is address taken or not. Therefore, we have to 
treat ALL extern functions conservatively as address taken. 

So, from my understanding, the complete list that need to compute the typeid 
from the function prototype is:

- At indirect call sites
        - all indirect call sites; (At the call site)
- At function preambles
        - all address-taken static functions  (At the function definition)
        - all extern functions  (At function declaration or function 
definition?? Please see my question below)


> 
>> What I was confused is, why “address-taken external function” and “function 
>> preambles” are separated items? 
>> For the function preambles, shall we generate for all the functions? Or only 
>> for address-taken functions in
>> the compilation?
> 
> The other case is emitting the __ckfi_typeid_FUNC weak symbols, which is
> used for link-time resolution with non-C code (i.e. raw .S assembly)
> which doesn't have access to the C type system to calculate the hashes
> on its own, and needs to have a way to build its own kcfi preambles.

So, for such functions, there should be an extern function declaration in the C 
code. 
But the definition of such function is not available in the C code we are 
compiling. 
Therefore the weak __ckfi_typeid_FUNC symbol is emitted at the function 
declaration
point for such function when we compile the C code? 

And the typeid (the hash value) for such routine is computed at the function 
declaration 
point too. 

Is the above understanding correct? 

Then for the other extern function whose definition is in the C code of other 
modules that might
be compiled later, should the typeid is computed at the declaration or the 
definition? 

> This
> is how Linux constructs its assembly function entry points:
> 
> #ifndef __CFI_TYPE
> #define __CFI_TYPE(name)                                \
>        .4byte __kcfi_typeid_##name
> #endif
> 
> #define SYM_TYPED_ENTRY(name, linkage, align...)        \
>        linkage(name) ASM_NL                            \
>        align ASM_NL                                    \
>        __CFI_TYPE(name) ASM_NL                         \
>        name:
> 
> That way all the asm functions can be be indirect call targets without
> knowing the hash value (which will be filled in at link time).

Okay. I see.  This is the case for the extern function whose definition is in 
the assembly file. (Not available in
the C code)

> 
>>> At indirect call sites (during the early GIMPLE pass), I had a
>>> FUNCTION_TYPE available that still had the full typedef information,
>>> and I could use it fine.
>> 
>>> For the other two, it's later on and the
>>> TREE_TYPE(fndecl)'s FUNCTION_TYPE had lost the typedef information (which
>>> I need to be able to examine in cases where the typedef name was needed
>>> for the mangling vs looking at the underlying types).
>> 
>> Then why not also compute the typeid for the function preamble during early 
>> GIMPLE phase 
>> the same as the indirect call sites when all the typedef information is 
>> available?
> 
> I assume I just didn't see how yet. :) I wasn't able to identify nor
> store the typeid for function definitions that ultimately end up getting
> .s file output.
So, the problem only exists for the external functions whose definition is NOT 
in the C code? 

Qing


> For example, down in ix86_asm_output_function_label(),
> I have the decl (but it's way late):
> 
> ix86_asm_output_function_label (FILE *out_file, const char *fname,
>                                tree decl)
> 
> I couldn't figure out how to find these during the GIMPLE pass. Oh,
> perhaps I can do this with an IPA pass? That should let me walk all
> functions including externs. I'll give it a try...
> 
> -- 
> Kees Cook

Re: [RFC PATCH 2/7] mangle: Introduce C typeinfo mangling API

Reply via email to