On Wed, Oct 16, 2024 at 3:53 PM Frediano Ziglio
<[email protected]> wrote:
>
> On Wed, Oct 16, 2024 at 2:30 PM Andrew Cooper <[email protected]>
> wrote:
> >
> > Hello,
> >
> > Preempting some future work I'm expecting to arrive, I had a go at using
> > __builtin_*() in obj32.
> >
> > This is formed of 2 patches on top of this series:
> > https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32
> >
>
> You are confident we'll have a lot of shared code to need an
> additional "lib32" in the Makefile!
> I would personally stick with obj32.
> Note that file should be strlen.c, not strlen.32.c, otherwise you are
> possibly going to pick up the general rule and not the one in the
> Makefile (but maybe is what you wanted).
>
> > Patch 1 introduces lib32 beside obj32, with strlen() being the first
> > broken-out function, and patch 2 swaps to __builtin_strlen().
> >
> > Both compile, but the difference that patch 2 introduces was unexpected.
> >
> > With just lib32, and taking strsubcmp() as an example, we get:
> >
> > 00000000 <strsubcmp>:
> > 0: 83 ec 0c sub $0xc,%esp
> > 3: 89 5c 24 04 mov %ebx,0x4(%esp)
> > 7: 89 74 24 08 mov %esi,0x8(%esp)
> > b: 89 c6 mov %eax,%esi
> > d: 89 d3 mov %edx,%ebx
> > f: 89 d0 mov %edx,%eax
> > 11: /-- e8 fc ff ff ff call 12 <strsubcmp+0x12>
> > 12: R_386_PC32 strlen
> > 16: 89 c1 mov %eax,%ecx
> > 18: 89 da mov %ebx,%edx
> > 1a: 89 f0 mov %esi,%eax
> > 1c: /-- e8 fc ff ff ff call 1d <strsubcmp+0x1d>
> > 1d: R_386_PC32 .text.strncmp
> > 21: 8b 5c 24 04 mov 0x4(%esp),%ebx
> > 25: 8b 74 24 08 mov 0x8(%esp),%esi
> > 29: 83 c4 0c add $0xc,%esp
> > 2c: c3 ret
> >
> > which all seems fine. We get a plain PC32 relocation to strlen (which
> > is now in the separate library).
> >
> > However, with patch 2 in place (simply swapping the plain extern for
> > __builtin_strlen(), we now get:
> >
> > 00000000 <strsubcmp>:
> > 0: 83 ec 0c sub $0xc,%esp
> > 3: 89 1c 24 mov %ebx,(%esp)
> > 6: 89 74 24 04 mov %esi,0x4(%esp)
> > a: 89 7c 24 08 mov %edi,0x8(%esp)
> > e: /-- e8 fc ff ff ff call f <strsubcmp+0xf>
> > f: R_386_PC32 __x86.get_pc_thunk.bx
> > 13: 81 c3 02 00 00 00 add $0x2,%ebx
> > 15: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
> > 19: 89 c7 mov %eax,%edi
> > 1b: 89 d6 mov %edx,%esi
> > 1d: 89 d0 mov %edx,%eax
> > 1f: /-- e8 fc ff ff ff call 20 <strsubcmp+0x20>
> > 20: R_386_PLT32 strlen
>
> PLT means it not declared hidden, otherwise it would have used the
> relative relocation.
> Maybe
>
> size_t strlen(const char *s);
> #define strlen(s) __builtin_strlen(s)
>
> xen/compiler.h is included, so all declaration should get the hidden
> by default ? Or add __attribute__((visibility("hidden"))) explicitly.
>
> > 24: 89 c1 mov %eax,%ecx
> > 26: 89 f2 mov %esi,%edx
> > 28: 89 f8 mov %edi,%eax
> > 2a: /-- e8 fc ff ff ff call 2b <strsubcmp+0x2b>
> > 2b: R_386_PC32 .text.strncmp
> > 2f: 8b 1c 24 mov (%esp),%ebx
> > 32: 8b 74 24 04 mov 0x4(%esp),%esi
> > 36: 8b 7c 24 08 mov 0x8(%esp),%edi
> > 3a: 83 c4 0c add $0xc,%esp
> > 3d: c3 ret
> >
> >
> > The builtin hasn't managed to optimise away the call to strlen (that's
> > fine). But, we've ended up spilling %ebx to the stack, calculating the
> > location of the GOT and not using it.
> >
>
> Maybe the ABI for PLT is to have %ebx set to the GOT ? Not sure about it.
>
> > So, as it stands, trying to use __builtin_strlen() results in worse code
> > generation. One thing I noticed was that we're not passing
> > -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help
> > either. We do have the pragma from compiler.h, so I'm out of visibility
> > ideas.
> >
>
> The -fvisibility=hidden should be overridden by the xen/compiler.h;
> but should be overridden with hidden!
> Maybe strlen is defined by default with another visibility?
> If you generate the assembly, you should see if the strlen symbol gets
> the .hidden bless or not.
>
I did some more experiments, but I didn't manage to fix the issue. It
looks like when __builtin_strlen fallback to calling strlen it uses a
private declaration of strlen. -mregparam argument is taken into
account, but not visibility. I tried to declare strlen as hidden
(checking also preprocessor output to see if other declaration were
present, none found), still the @PLT. I tried to add a "static inline
strlen"... and it was not used.
I found this workaround:
diff --git a/xen/arch/x86/boot/cmdline.c b/xen/arch/x86/boot/cmdline.c
index 80ffd0885e..ac2b0b0a4d 100644
--- a/xen/arch/x86/boot/cmdline.c
+++ b/xen/arch/x86/boot/cmdline.c
@@ -51,7 +51,12 @@ static const char delim_chars_comma[] = ", \n\r\t";
#define delim_chars (delim_chars_comma + 1)
-#define strlen(s) __builtin_strlen(s)
+size_t strlen(const char *s);
+#define strlen(str) \
+ (__extension__ (__builtin_constant_p(str) \
+ ? __builtin_strlen(str) \
+ : strlen(str)))
+
static int strncmp(const char *cs, const char *ct, size_t count)
{
diff --git a/xen/arch/x86/boot/strlen.32.c b/xen/arch/x86/boot/strlen.c
similarity index 100%
rename from xen/arch/x86/boot/strlen.32.c
rename to xen/arch/x86/boot/strlen.c
Frediano