------- Comment #51 from rogerio at rilhas dot com 2010-08-12 02:08 ------- Given all that we have established in our conversation I think I can now demonstrate the bug easily.
The entry to the "format_direct" call (in the main function, just before entering the "format_direct" function) disassembles to this (using Code::Blocks, I've added comments): 0x80484de mov DWORD PTR [esp+0x10],0x80485f0 // __TIME__ 0x80484e6 mov DWORD PTR [esp+0xc],0x80485f9 // __DATE__ 0x80484ee mov DWORD PTR [esp+0x8],0x8048605 // format string 0x80484f6 mov DWORD PTR [esp+0x4],0x3e8 // sizeof(buffer) 0x80484fe lea eax,[ebp-0x3f0] 0x8048504 mov DWORD PTR [esp],eax // buffer 0x8048507 call 0x8048460 <format_direct(char*, int, char const*, ...)> At this point the $esp is 0xbfaeef00. So, the correct value for &format (as defined in C99) is: esp+8 = 0xbfaeef08 Reading that memory address after the "mov"'s I find 0x8048605 (format string). The 0xbfaeef08 is the value I've been calling X and the value I will expect to be passed to "format_indirect". Reading the following address (+4) I see 0x80485f9 (date), and reading the next (+8) I see there 0x80485f0 (time). They are all packed together as expected by the cdecl ABI. Snapshot-2 (which I will send you after this message) shows this (after the "mov"'s). After entering "format_direct", I inserted the line: char buffer[1000]; buffer[0]=0; Without this line the compiler generates correct code, but with it if manifests the bug. Just before calling "format_indirect", the disassembly is this (also with comments): 0x804848d lea eax,[ebp-0x3f8] 0x8048493 mov DWORD PTR [esp+0x8],eax // &format 0x8048497 mov eax,DWORD PTR [ebp+0xc] 0x804849a mov DWORD PTR [esp+0x4],eax // dst_buffer_size_bytes 0x804849e mov eax,DWORD PTR [ebp-0x3f4] 0x80484a4 mov DWORD PTR [esp],eax // dst_buffer 0x80484a7 call 0x8048434 <format_indirect(char*, int, char const**)> The $ebp contains 0xbfaeeef8, and so ebp+0x10 is 0xbfaeef08. That is the value pushed onto the stack, and it is the correct &format which I called X (Snapshot-3). Next, entering "format_indirect" (Snapshot-4), the disassembly is this: 0x804843a mov eax,DWORD PTR [ebp+0x10] 0x804843d mov DWORD PTR [ebp-0x4],eax 0x8048440 mov eax,DWORD PTR [ebp-0x4] 0x8048443 mov eax,DWORD PTR [eax] 0x8048445 mov DWORD PTR [ebp-0x8],eax 0x8048448 mov eax,DWORD PTR [ebp-0x4] 0x804844b add eax,0x4 0x804844e mov eax,DWORD PTR [eax] 0x8048450 mov DWORD PTR [ebp-0xc],eax 0x8048453 mov eax,DWORD PTR [ebp-0x4] 0x8048456 add eax,0x8 0x8048459 mov eax,DWORD PTR [eax] 0x804845b mov DWORD PTR [ebp-0x10],eax Unfortunately I have a really hard time debugging in Linux and it took me almost an hour of trial and error (in between breakpoints not working, dumps not working, the debugger hanging, repeating all addresses and retyping this message because each time I run things endup at diferent places in memory, etc.) to get all this information in one run, but I could not get the last memory dump to work and I will not repeat the process again. Instead, maybe you can go all out and believe me that the "format_address" is wrong, as the watch window shows. The PTR4 will contain a "random value Y" of 0xbfaeeb00 which has no relation to the correct address X of 0xbfaeef08. In fact, the watch window shows what is "around" PTR4, which looks to me like a "rom" string table for the executable. It also shows that PTR4[0] returns the correct string, but that PTR4[1] does not. If PTR4 were 0xbfaeef08 (as it should by the definition of & in C99), then PTR4[1] would return the correct string __DATE__ (nothing undefined in GCC's code behavior if the address PTR4 is correctly returned as X, as the disassembly shows the machine will just access memory addresses after X). With my compilation script I could not reproduce the problem (I don't know all the options Code::Blocks uses, and so I did not change my compilation script to use the same options), but that should not be necessary as my original attachments and compilation script manifest the problem. Maybe this bug doesn't affect many people, but it is a bug, and it affects me (and my team). Probably even worse than that, it shows GCC is not C99 compliant in the & operator. Also, I hope this demonstration shows how futile it is for all of you to try to argue that the problem in my code's portability, or that "format_address" is like an array of 1 entry, or even that accessing PTR4[1] is undefined (the disassembly shows it is not, the pointer arithmetic and no boundary checking defined in C99 are all good and applied in the generated code). I think it also shows that maybe you should have believed me in the first place instead of just dismissing my claims as forms of "non-conformity". I still don't think this should have required such a demonstration effort on my part. I think this settles it, right? I hope so, I'm anxious to read your replies hoping that maybe you recognize this as a bug and maybe decide to fix it (hope is a very powerful thing!). Thanks! -- rogerio at rilhas dot com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45249