Re: optimization question
Thank you for taking a look Martin. I will attempt to pare this down, provide a sample with typedefs/macros expanded, etc. and repost to gcc-help. To address a couple of your points: This code does work when compiled -O2 with 4.4.6 (failing is 4.8.3). It also works on 4.8.3 when dgHandle is volatile or when a lower -O level is used. dgHandle_t is a uint16_t (unsigned short int) and BS_CFG_DRIVE_GROUPS is the value 480. thanks again. Mark On 5/18/2015 7:05 PM, Martin Sebor wrote: On 05/18/2015 02:01 PM, mark maule wrote: I have a loop which hangs when compiled with -O2, but runs fine when compiled with -O1. Not sure what information is required to get an answer, so starting with the full src code. I have not attempted to reduce to a simpler test case yet. Typically a standalone test case is required. Note that a difference in behavior between -O1 and -O2 doesn't necessarily imply there's a compiler bug. One or more of the -O2 optimizations could be exposing a bug in the program. This is true even if the program runs successfully when compiled with a prior version of the compiler since new optimizations are added existing ones improve between releases. Nevertheless, compiling the program with different versions of the compiler (or different compilers altogether) to see if the problem can be consistently reproduced can provide a useful data point. Attachments: bs_destage.c - full source code bs_destage.dis.O2 - gdb disassembly of bs_destageLoop() bs_destage.dis+m.O2 - src annotated version of the above I suspect this is too big for most people to analyze (it is for me). You will need to reduce it to something more manageable that can be compiled independently of the rest of your system (all the "..." include files). The function in question is bs_destageSearch(). When I compile bs_destage.c with -O2, it seems that the dgHandle condition at line 741 is being ignored, leading to an infinite loop. I can see in the disassembly that dgHandle is still in the code as a 16-bit value stored at 0x32(%rsp), and a running 32-bit copy stored at 0x1c(%rsp). I can also see that the 16 bit version at 0x32(%rsp) is being incremented at the end of the loop, but I don't see anywhere in the code where either version of dgHandle is being used when determining if the while() at 741 should be continued. I'm not very familiar with the optimizations that are done in O2 vs O1, or even what happens in these optimizations. So, I'm wondering if this is a bug, or a subtle valid optimization that I don't understand. Any help would be appreciated. It's hard to tell where the problem is or offer suggestions without having a more complete understanding of the objects and types used in the program. For example, the underlying type of dgHandle_t or the value of the BS_CFG_DRIVE_GROUPS constant the variable is being compared to(*). The way to expose these details is by creating a translation unit for the source file (i.e., preprocessing it with the -E option). From these details it should be possible to determine whether the code in the file is correct (and the likely cause of the problem is a compiler bug) or whether the problem is due to a bug in the code. The translation unit will be much bigger than the source file so you will need to do the initial analysis yourself before asking for more help (on gcc-help). Note: changing the declaration of dgHandle to be volitile appears to modify the code sufficiently that it looks like the dgHandle check is honored (have not tested). [*] For instance, if dgHandle_t was an alias for unsigned short and the value of BS_CFG_DRIVE_GROUPS was USHRT_MAX -1 then GCC could remove the corresponding test in the while loop (since it could never exceed the maximum value for its type) unless dgHandle was declared volatile. Thanks in advance for any help/advice. The gcc list is meant for discussions related to GCC development. The gcc-help list is the right forum for requests for help. Martin
Re: optimization question
On 5/19/2015 10:09 AM, Martin Sebor wrote: I'm not very familiar with the optimizations that are done in O2 vs O1, or even what happens in these optimizations. So, I'm wondering if this is a bug, or a subtle valid optimization that I don't understand. Any help would be appreciated. Another approach to debugging a suspected optimization bug is to look at the optimization dumps produced in response to one or more of the -fdump-tree-xxx and -fdump-rtl-xxx options. There are many of them and they tend to be verbose but not as much as the raw assembly and are usually easier to follow. The dumps make it possible to identify the buggy optimization pass and avoid the bug by disabling just the problematic optimization while leaving all the others enabled. With attribute optimize and/or #pragma GCC optimize, you can then control specify which optimization to disable for subsets of functions in a file. See the following sections in the manual for more: http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Debugging-Options.html#index-fdump-tree-673 http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Function-Specific-Option-Pragmas.html Martin Thanks again Martin. I started going down that road yesterday, and got lost in the forest of options. What I was looking for was some option that would tell me what was being done with dgHandle specifically. I played around with -fopt-info-all but didn't get very far ...
Re: optimization question
On 5/19/2015 10:28 AM, Andrew Haley wrote: On 05/19/2015 04:14 PM, mark maule wrote: Thanks again Martin. I started going down that road yesterday, and got lost in the forest of options. What I was looking for was some option that would tell me what was being done with dgHandle specifically. I played around with -fopt-info-all but didn't get very far ... I think you're going to find it very difficult to debug such a large chunk of code, and it's quite possible that your problem is in the RTL optimization anyway. I don't think you'll find anything better than trying to reduce the test case. Also, all those macros rather obfuscate what's going on. It may help later to preprocess the source and run it through indent. Andrew. Understood. That's my next step, and will repost to gcc-help per earlier suggestions. Thanks for the pointers.
Re: optimization question
On 5/20/2015 3:27 AM, Richard Biener wrote: On Mon, May 18, 2015 at 10:01 PM, mark maule wrote: I have a loop which hangs when compiled with -O2, but runs fine when compiled with -O1. Not sure what information is required to get an answer, so starting with the full src code. I have not attempted to reduce to a simpler test case yet. Attachments: bs_destage.c - full source code bs_destage.dis.O2 - gdb disassembly of bs_destageLoop() bs_destage.dis+m.O2 - src annotated version of the above The function in question is bs_destageSearch(). When I compile bs_destage.c with -O2, it seems that the dgHandle condition at line 741 is being ignored, leading to an infinite loop. I can see in the disassembly that dgHandle is still in the code as a 16-bit value stored at 0x32(%rsp), and a running 32-bit copy stored at 0x1c(%rsp). I can also see that the 16 bit version at 0x32(%rsp) is being incremented at the end of the loop, but I don't see anywhere in the code where either version of dgHandle is being used when determining if the while() at 741 should be continued. I'm not very familiar with the optimizations that are done in O2 vs O1, or even what happens in these optimizations. So, I'm wondering if this is a bug, or a subtle valid optimization that I don't understand. Any help would be appreciated. Note: changing the declaration of dgHandle to be volitile appears to modify the code sufficiently that it looks like the dgHandle check is honored (have not tested). Thanks in advance for any help/advice. The usual issue with this kind of behavior is out-of-bound accesses of arrays in a loop or invoking undefined behavior when signed integer operations wrap. uint32_toutLun[ BS_CFG_DRIVE_GROUPS ]; and while ( ( dgHandle < ( BS_CFG_DRIVE_GROUPS + 1 ) ) && ... dgDestageOut = bs_destageData.outLun[ dgHandle ]; looks like this might access outLun[BS_CFG_DRIVE_GROUPS] which is out-of-bounds. Richard. You are correct, and when I change outLun[] to be size BS_CFG_DRIVE_GROUPS+1, the generated asm looks like it will account for dgHandle in the while() loop. I will pass this back to our development team to get a proper fix. Now, the followon: Something in the compiler/optimizer recognized this out of bounds situation - should a warning have been emitted? Or are there ambiguities which make a warning generation here inappropriate? And an additional question: It still seems wrong to omit the dgHandle check from the while() condition vs. leaving it in and letting the code access beyond the end of the array. Is this one of those areas where if there's a bug in the code all bets are off and your mileage may vary? Thanks to everyone who helped me out here. Mark
Re: optimization question
On 5/20/2015 2:13 PM, Martin Uecker wrote: mark maule : On 5/20/2015 3:27 AM, Richard Biener wrote: On Mon, May 18, 2015 at 10:01 PM, mark maule wrote: The usual issue with this kind of behavior is out-of-bound accesses of arrays in a loop or invoking undefined behavior when signed integer operations wrap. uint32_toutLun[ BS_CFG_DRIVE_GROUPS ]; and while ( ( dgHandle < ( BS_CFG_DRIVE_GROUPS + 1 ) ) && ... dgDestageOut = bs_destageData.outLun[ dgHandle ]; looks like this might access outLun[BS_CFG_DRIVE_GROUPS] which is out-of-bounds. Richard. You are correct, and when I change outLun[] to be size BS_CFG_DRIVE_GROUPS+1, the generated asm looks like it will account for dgHandle in the while() loop. I will pass this back to our development team to get a proper fix. Now, the followon: Something in the compiler/optimizer recognized this out of bounds situation - should a warning have been emitted? Or are there ambiguities which make a warning generation here inappropriate? Yes, ideally a compiler should emit a warning. C compilers traditionally were not very good at this, but it turns out very recent versions of GCC can do this: test.c:14:23: warning: iteration 10u invokes undefined behavior [-Waggressive-loop-optimizations] dgDestageOut = outLun[ dgHandle ]; ^ test.c:11:13: note: containing loop while ( ( dgHandle < ( BS_CFG_DRIVE_GROUPS + 1 ) ) ) For this simplified test case: #include #define BS_CFG_DRIVE_GROUPS 10 uint32_t dgDestageLimit = 0; uint32_t outLun[ BS_CFG_DRIVE_GROUPS ]; void test(void) { int dgHandle = 0; while ( ( dgHandle < ( BS_CFG_DRIVE_GROUPS + 1 ) ) ) { uint32_t dgDestageOut; dgDestageOut = outLun[ dgHandle ]; dgHandle++; } } Martin Thanks Martin. My gcc (4.8.3) must not be recent enough. Any idea when this sort of checking went in? #include #define BS_CFG_DRIVE_GROUPS 0x1e0 uint32_t dgDestageLimit = 0; uint32_t outLun[ 10 ]; int test(void) { int dgHandle = 0; int total; while ( ( dgHandle < ( BS_CFG_DRIVE_GROUPS + 1 ) ) ) { uint32_t dgDestageOut; dgDestageOut = outLun[ dgHandle ]; total += dgDestageOut; dgHandle++; } return dgHandle; } int main(int argc, char **argv) { test(); return 0; } ca-bld-ol71-02$ cc -o foo foo.c -Wall ca-bld-ol71-02$