Am Montag, dem 08.09.2025 um 20:05 -0700 schrieb Kees Cook:
> On Mon, Sep 08, 2025 at 04:36:57PM -0700, Andrew Pinski wrote:
> > On Mon, Sep 8, 2025 at 4:24 PM Kees Cook <k...@kernel.org> wrote:
> > > 
> > > On Tue, Sep 09, 2025 at 12:13:19AM +0200, Martin Uecker wrote:
> > > > Sorry, example should have been this:
> > > > 
> > > > typedef int arr_t[];
> > > > typedef int arr3_t[3];
> > > > 
> > > > void f(arr3_t*);
> > > > 
> > > > void g(void (*fp)(arr_t*)) {
> > > > 
> > > >       int a[3];
> > > >       (*fp)(&a);      // this call would fail?
> > > > }
> > > > 
> > > > int h()
> > > > {
> > > >   g(&f);
> > > > }
> > > 
> > > Yes, that would fail. It's a pointer to an array of 3 ints, and that's
> > > fine for distinguishing mismatched callers:
> > > 
> > >   void(arr3_t*) -> void(int(*)[3]) -> _ZTSFvPA3_iE
> > >   void(arr_t*)  -> void(int(*)[])  -> _ZTSFvPA_iE
> > > 
> > > How should I adjust this patch's description to say the mangling may
> > > follow a stricter type system?
> > 
> > Maybe it is time to step back for a second and figure out other things
> > are not going to work with this mangling scheme? At the bare minimum
> > write down limitations to the user documentation (and write up a bug
> > report to LLVM for them to do the same).
> 
> Since the mangler is (now) designed to be only used by KCFI, where
> should I write this up? Putting type examples in invoke.texi seems out
> of place. Should it be comments in mangle.cc?
> 
> > Or even better step back and fix the hashing scheme to fit better with
> > C rather than reusing IA64 C++ ABI mangling (which is not designed for
> > C). If we need a hard ABI break due to limitations, it is good to do
> > it now rather than having 2 different implementations that need to
> > change upstream; right now you only need to break LLVM's ABI.


Other things this approach may break are using an enum on the one
side and the integer type which it should be compatible to on the
other.

Maybe also  check whether calling functions via pointers to incomplete
structure types vs function where the struct is complete or vice versa.

> 
> One thing for sure we need to settle on is a common hash and (AIUI)
> LLVM would like to drop the hash KCFI is currently using (KCFI is the
> last user of it). As I mentioned elsewhere, the hash doesn't need to be
> cryptographically secure, so FNV-1a could very well be sufficient.
> 
> As for https://github.com/llvm/llvm-project/issues/106593, my GCC
> implementation actually doesn't exhibit this behavior since it uses the
> typedef name instead already (I had other places where the behavior was
> mismatched and the logic I ended up with turned out to be slightly more
> generalized than LLVM's). (I will need to fix this in LLVM.) I test for
> it in the type mangling tests already:

Note that is also not entirely correct for C.  It is allowed
to call this via different typedef names across TU,


It would be possible to normalize the type before (or during) mangling
in a way that it works correctly for all C ta cost of precision. 

I think this would be desirable because otherwise this could cause
surprises for users, and with it  conforming to C, it could be used
more easily also by other projects.

The cost is that is a bit less precise. 

One issue is also that future C changes could break this again,
but this is an even bigger risk with the current scheme.

> 
> /* Anonymous struct via typedef - should get typedef name as struct name.  */
> typedef struct { int anon_member_1; } anon_typedef_1;
> typedef struct { int anon_member_2; } anon_typedef_2;
> extern void func_anon_typedef_1(anon_typedef_1 *param); /* Should use typedef 
> name.  */
> extern void func_anon_typedef_2(anon_typedef_2 *param); /* Should be 
> different from anon_typedef_1 */
> ...
> /* { dg-final { scan-tree-dump {KCFI type ID: 
> mangled='_ZTSFvP14anon_typedef_1E' typeid=0x55475a23} kcfi0 } } */
> /* { dg-final { scan-tree-dump {KCFI type ID: 
> mangled='_ZTSFvP14anon_typedef_2E' typeid=0x454f8fb8} kcfi0 } } */
> 
> And wow would I like a better way to test this stuff. It's really
> painfully doing it via forcing KCFI address-taking and then finding the
> typeid in the dump file.

Why not have a builtin that returns the hash for a type?

Martin


Reply via email to