Hi folks!

Using gcc (tested version 4.9.2-10 from Debian official repo, on x86-64 
architecture), the following function, taking an hypothetical structure and 
returning it after modifying a member, is correctly inlined, but the stack use 
somehow grows when using it:

static __inline__ struct foo add_flag_0(struct foo foo, int flag) {
        foo.flags |= flag;
        return foo;
}

If you call twice the function or more (ie. add_flag_0(add_flag_0(...))), then 
the stack usage continues to grow linearly.

Here's a complete demonstration of the issue:

#define INLINE __inline__
// Note: using __attribute__((const)) does not help

struct foo {
        int flags;
        /* Let it be enough NOT to be packed in registers */
        void *opaque[2];
};

static INLINE struct foo new_foo() {
        struct foo foo = { 0 };
        return foo;
}

static INLINE struct foo add_flag_0(struct foo foo, int flag) {
        foo.flags |= flag;
        return foo;
}

extern void some_unknown_function(struct foo foo);

void demo_1(void) {
        some_unknown_function(new_foo());
}

void demo_2(void) {
        some_unknown_function(add_flag_0(new_foo(), 1));
}

void demo_3(void) {
        some_unknown_function(add_flag_0(add_flag_0(new_foo(), 1), 2));
}

void demo_4(void) {
        some_unknown_function(add_flag_0(add_flag_0(add_flag_0(new_foo(), 1), 
2), 3));
}

$ gcc -S -W -Wall -O3 demo.c -o demo.S

You can see the differences in size of the used stack for the four functions:

$ grep -E "addq.*rsp" /tmp/demo.S
        addq    $104, %rsp
        addq    $136, %rsp
        addq    $200, %rsp
        addq    $264, %rsp

And for example the difference between the demo_1 and demo_2 functions:

demo_1:
        subq    $80, %rsp
        pushq   $0
        pushq   $0
        pushq   $0
        call    some_unknown_function
        addq    $104, %rsp
        ret

demo_2:
        subq    $112, %rsp
        movq    $0, 72(%rsp)
        movl    $1, 72(%rsp)
        pushq   $0
        pushq   $0
        pushq   88(%rsp)
        call    some_unknown_function
        addq    $136, %rsp
        ret

My understanding is that the four functions should ideally have the same stack 
usage. Is this expectation absurd for some kind of conformance/calling 
convention/temporary struct issue somewhere ? (please forgive my total lack of 
expertise on the optimization internals)

Thanks in advance for any clues!

Regards,

Xavier

Reply via email to