------- Comment #51 from rogerio at rilhas dot com  2010-08-12 02:08 -------
Given all that we have established in our conversation I think I can now
demonstrate the bug easily.

The entry to the "format_direct" call (in the main function, just before
entering the "format_direct" function) disassembles to this (using
Code::Blocks, I've added comments):

0x80484de       mov    DWORD PTR [esp+0x10],0x80485f0 // __TIME__
0x80484e6       mov    DWORD PTR [esp+0xc],0x80485f9 // __DATE__
0x80484ee       mov    DWORD PTR [esp+0x8],0x8048605 // format string
0x80484f6       mov    DWORD PTR [esp+0x4],0x3e8 // sizeof(buffer)
0x80484fe       lea    eax,[ebp-0x3f0]
0x8048504       mov    DWORD PTR [esp],eax // buffer
0x8048507       call   0x8048460 <format_direct(char*, int, char const*, ...)>

At this point the $esp is 0xbfaeef00. So, the correct value for &format (as
defined in C99) is:

esp+8 = 0xbfaeef08

Reading that memory address after the "mov"'s I find 0x8048605 (format string).
The 0xbfaeef08 is the value I've been calling X and the value I will expect to
be passed to "format_indirect".

Reading the following address (+4) I see 0x80485f9 (date), and reading the next
(+8) I see there 0x80485f0 (time). They are all packed together as expected by
the cdecl ABI. Snapshot-2 (which I will send you after this message) shows this
(after the "mov"'s).

After entering "format_direct", I inserted the line:

char buffer[1000]; buffer[0]=0;

Without this line the compiler generates correct code, but with it if manifests
the bug. Just before calling "format_indirect", the disassembly is this (also
with comments):

0x804848d       lea    eax,[ebp-0x3f8]
0x8048493       mov    DWORD PTR [esp+0x8],eax  // &format
0x8048497       mov    eax,DWORD PTR [ebp+0xc]
0x804849a       mov    DWORD PTR [esp+0x4],eax  // dst_buffer_size_bytes
0x804849e       mov    eax,DWORD PTR [ebp-0x3f4]
0x80484a4       mov    DWORD PTR [esp],eax      // dst_buffer
0x80484a7       call   0x8048434 <format_indirect(char*, int, char const**)>

The $ebp contains 0xbfaeeef8, and so ebp+0x10 is 0xbfaeef08. That is the value
pushed onto the stack, and it is the correct &format which I called X
(Snapshot-3).

Next, entering "format_indirect" (Snapshot-4), the disassembly is this:

0x804843a       mov    eax,DWORD PTR [ebp+0x10]
0x804843d       mov    DWORD PTR [ebp-0x4],eax
0x8048440       mov    eax,DWORD PTR [ebp-0x4]
0x8048443       mov    eax,DWORD PTR [eax]
0x8048445       mov    DWORD PTR [ebp-0x8],eax
0x8048448       mov    eax,DWORD PTR [ebp-0x4]
0x804844b       add    eax,0x4
0x804844e       mov    eax,DWORD PTR [eax]
0x8048450       mov    DWORD PTR [ebp-0xc],eax
0x8048453       mov    eax,DWORD PTR [ebp-0x4]
0x8048456       add    eax,0x8
0x8048459       mov    eax,DWORD PTR [eax]
0x804845b       mov    DWORD PTR [ebp-0x10],eax

Unfortunately I have a really hard time debugging in Linux and it took me
almost an hour of trial and error (in between breakpoints not working, dumps
not working, the debugger hanging, repeating all addresses and retyping this
message because each time I run things endup at diferent places in memory,
etc.) to get all this information in one run, but I could not get the last
memory dump to work and I will not repeat the process again.

Instead, maybe you can go all out and believe me that the "format_address" is
wrong, as the watch window shows. The PTR4 will contain a "random value Y" of
0xbfaeeb00 which has no relation to the correct address X of 0xbfaeef08.

In fact, the watch window shows what is "around" PTR4, which looks to me like a
"rom" string table for the executable. It also shows that PTR4[0] returns the
correct string, but that PTR4[1] does not. If PTR4 were 0xbfaeef08 (as it
should by the definition of & in C99), then PTR4[1] would return the correct
string __DATE__ (nothing undefined in GCC's code behavior if the address PTR4
is correctly returned as X, as the disassembly shows the machine will just
access memory addresses after X).

With my compilation script I could not reproduce the problem (I don't know all
the options Code::Blocks uses, and so I did not change my compilation script to
use the same options), but that should not be necessary as my original
attachments and compilation script manifest the problem.

Maybe this bug doesn't affect many people, but it is a bug, and it affects me
(and my team). Probably even worse than that, it shows GCC is not C99 compliant
in the & operator.

Also, I hope this demonstration shows how futile it is for all of you to try to
argue that the problem in my code's portability, or that "format_address" is
like an array of 1 entry, or even that accessing PTR4[1] is undefined (the
disassembly shows it is not, the pointer arithmetic and no boundary checking
defined in C99 are all good and applied in the generated code). I think it also
shows that maybe you should have believed me in the first place instead of just
dismissing my claims as forms of "non-conformity". I still don't think this
should have required such a demonstration effort on my part.

I think this settles it, right? I hope so, I'm anxious to read your replies
hoping that maybe you recognize this as a bug and maybe decide to fix it (hope
is a very powerful thing!).

Thanks!


-- 

rogerio at rilhas dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45249

Reply via email to