Re: optimization question

2015-05-18 Thread mark maule
Thank you for taking a look Martin.  I will attempt to pare this down, 
provide a sample with typedefs/macros expanded, etc.  and repost to 
gcc-help.  To address a couple of your points:


This code does work when compiled -O2 with 4.4.6 (failing is 4.8.3). It 
also works on 4.8.3 when dgHandle is volatile or when a lower -O level 
is used.


dgHandle_t is a uint16_t (unsigned short int) and BS_CFG_DRIVE_GROUPS is 
the value 480.


thanks again.
Mark



On 5/18/2015 7:05 PM, Martin Sebor wrote:

On 05/18/2015 02:01 PM, mark maule wrote:

I have a loop which hangs when compiled with -O2, but runs fine when
compiled with -O1.  Not sure what information is required to get an
answer, so starting with the full src code.  I have not attempted to
reduce to a simpler test case yet.


Typically a standalone test case is required.

Note that a difference in behavior between -O1 and -O2 doesn't
necessarily imply there's a compiler bug. One or more of the
-O2 optimizations could be exposing a bug in the program. This
is true even if the program runs successfully when compiled with
a prior version of the compiler since new optimizations are added
existing ones improve between releases. Nevertheless, compiling
the program with different versions of the compiler (or different
compilers altogether) to see if the problem can be consistently
reproduced can provide a useful data point.



Attachments:

bs_destage.c - full source code
bs_destage.dis.O2 - gdb disassembly of bs_destageLoop()
bs_destage.dis+m.O2 - src annotated version of the above


I suspect this is too big for most people to analyze (it is for
me). You will need to reduce it to something more manageable
that can be compiled independently of the rest of your system
(all the "..." include files).



The function in question is bs_destageSearch().  When I compile
bs_destage.c with -O2, it seems that the dgHandle condition at line 741
is being ignored, leading to an infinite loop.  I can see in the
disassembly that dgHandle is still in the code as a 16-bit value stored
at 0x32(%rsp), and a running 32-bit copy stored at 0x1c(%rsp). I can
also see that the 16 bit version at 0x32(%rsp) is being incremented at
the end of the loop, but I don't see anywhere in the code where either
version of dgHandle is being used when determining if the while() at 741
should be continued.

I'm not very familiar with the optimizations that are done in O2 vs O1,
or even what happens in these optimizations.

So, I'm wondering if this is a bug, or a subtle valid optimization that
I don't understand.  Any help would be appreciated.


It's hard to tell where the problem is or offer suggestions
without having a more complete understanding of the objects
and types used in the program. For example, the underlying
type of dgHandle_t or the value of the BS_CFG_DRIVE_GROUPS
constant the variable is being compared to(*). The way to
expose these details is by creating a translation unit for
the source file (i.e., preprocessing it with the -E option).
From these details it should be possible to determine whether
the code in the file is correct (and the likely cause of the
problem is a compiler bug) or whether the problem is due to
a bug in the code. The translation unit will be much bigger
than the source file so you will need to do the initial
analysis yourself before asking for more help (on gcc-help).



Note:  changing the declaration of dgHandle to be volitile appears to
modify the code sufficiently that it looks like the dgHandle check is
honored (have not tested).


[*] For instance, if dgHandle_t was an alias for unsigned
short and the value of BS_CFG_DRIVE_GROUPS was USHRT_MAX
-1 then GCC could remove the corresponding test in the
while loop (since it could never exceed the maximum value
for its type) unless dgHandle was declared volatile.



Thanks in advance for any help/advice.


The gcc list is meant for discussions related to GCC
development. The gcc-help list is the right forum for
requests for help.

Martin





Re: optimization question

2015-05-19 Thread mark maule



On 5/19/2015 10:09 AM, Martin Sebor wrote:

I'm not very familiar with the optimizations that are done in O2 vs O1,
or even what happens in these optimizations.

So, I'm wondering if this is a bug, or a subtle valid optimization that
I don't understand.  Any help would be appreciated.


Another approach to debugging a suspected optimization bug is
to look at the optimization dumps produced in response to one
or more of the -fdump-tree-xxx and -fdump-rtl-xxx options.
There are many of them and they tend to be verbose but not
as much as the raw assembly and are usually easier to follow.
The dumps make it possible to identify the buggy optimization
pass and avoid the bug by disabling just the problematic
optimization while leaving all the others enabled. With
attribute optimize and/or #pragma GCC optimize, you can then
control specify which optimization to disable for subsets of
functions in a file.

See the following sections in the manual for more:

http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Debugging-Options.html#index-fdump-tree-673 



http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Function-Specific-Option-Pragmas.html 



Martin



Thanks again Martin.  I started going down that road yesterday, and got 
lost in the forest of options.   What I was looking for was some option 
that would tell me what was being done with dgHandle specifically.  I 
played around with -fopt-info-all but didn't get very far ...





Re: optimization question

2015-05-19 Thread mark maule



On 5/19/2015 10:28 AM, Andrew Haley wrote:

On 05/19/2015 04:14 PM, mark maule wrote:

Thanks again Martin.  I started going down that road yesterday, and got
lost in the forest of options.   What I was looking for was some option
that would tell me what was being done with dgHandle specifically.  I
played around with -fopt-info-all but didn't get very far ...

I think you're going to find it very difficult to debug such a large
chunk of code, and it's quite possible that your problem is in the RTL
optimization anyway.

I don't think you'll find anything better than trying to reduce the test
case.  Also, all those macros rather obfuscate what's going on.  It may
help later to preprocess the source and run it through indent.

Andrew.


Understood.  That's my next step, and will repost to gcc-help per 
earlier suggestions.


Thanks for the pointers.



Re: optimization question

2015-05-20 Thread mark maule



On 5/20/2015 3:27 AM, Richard Biener wrote:

On Mon, May 18, 2015 at 10:01 PM, mark maule  wrote:

I have a loop which hangs when compiled with -O2, but runs fine when
compiled with -O1.  Not sure what information is required to get an answer,
so starting with the full src code.  I have not attempted to reduce to a
simpler test case yet.

Attachments:

bs_destage.c - full source code
bs_destage.dis.O2 - gdb disassembly of bs_destageLoop()
bs_destage.dis+m.O2 - src annotated version of the above

The function in question is bs_destageSearch().  When I compile bs_destage.c
with -O2, it seems that the dgHandle condition at line 741 is being ignored,
leading to an infinite loop.  I can see in the disassembly that dgHandle is
still in the code as a 16-bit value stored at 0x32(%rsp), and a running
32-bit copy stored at 0x1c(%rsp).  I can also see that the 16 bit version at
0x32(%rsp) is being incremented at the end of the loop, but I don't see
anywhere in the code where either version of dgHandle is being used when
determining if the while() at 741 should be continued.

I'm not very familiar with the optimizations that are done in O2 vs O1, or
even what happens in these optimizations.

So, I'm wondering if this is a bug, or a subtle valid optimization that I
don't understand.  Any help would be appreciated.

Note:  changing the declaration of dgHandle to be volitile appears to modify
the code sufficiently that it looks like the dgHandle check is honored (have
not tested).

Thanks in advance for any help/advice.

The usual issue with this kind of behavior is out-of-bound accesses of
arrays in a loop
or invoking undefined behavior when signed integer operations wrap.


uint32_toutLun[ BS_CFG_DRIVE_GROUPS ];

and

   while ( ( dgHandle < ( BS_CFG_DRIVE_GROUPS + 1 ) ) &&
...
  dgDestageOut = bs_destageData.outLun[ dgHandle ];

looks like this might access outLun[BS_CFG_DRIVE_GROUPS] which is
out-of-bounds.

Richard.


You are correct, and when I change outLun[] to be size 
BS_CFG_DRIVE_GROUPS+1, the generated asm looks like it will account for 
dgHandle in the while() loop.  I will pass this back to our development 
team to get a proper fix.


Now, the followon:  Something in the compiler/optimizer recognized this 
out of bounds situation - should a warning have been emitted? Or are 
there ambiguities which make a warning generation here inappropriate?


And an additional question:  It still seems wrong to omit the dgHandle 
check from the while() condition vs. leaving it in and letting the code 
access beyond the end of the array.  Is this one of those areas where if 
there's a bug in the code all bets are off and your mileage may vary?


Thanks to everyone who helped me out here.

Mark



Re: optimization question

2015-05-21 Thread mark maule



On 5/20/2015 2:13 PM, Martin Uecker wrote:

  mark maule :

On 5/20/2015 3:27 AM, Richard Biener wrote:

On Mon, May 18, 2015 at 10:01 PM, mark maule  wrote:
The usual issue with this kind of behavior is out-of-bound accesses of
arrays in a loop
or invoking undefined behavior when signed integer operations wrap.


 uint32_toutLun[ BS_CFG_DRIVE_GROUPS ];

and

while ( ( dgHandle < ( BS_CFG_DRIVE_GROUPS + 1 ) ) &&
...
   dgDestageOut = bs_destageData.outLun[ dgHandle ];

looks like this might access outLun[BS_CFG_DRIVE_GROUPS] which is
out-of-bounds.

Richard.

You are correct, and when I change outLun[] to be size
BS_CFG_DRIVE_GROUPS+1, the generated asm looks like it will account for
dgHandle in the while() loop.  I will pass this back to our development
team to get a proper fix.

Now, the followon:  Something in the compiler/optimizer recognized this
out of bounds situation - should a warning have been emitted? Or are
there ambiguities which make a warning generation here inappropriate?

Yes, ideally a compiler should emit a warning. C compilers traditionally
were not very good at this, but it turns out very recent versions of GCC
can do this:

test.c:14:23: warning: iteration 10u invokes undefined behavior 
[-Waggressive-loop-optimizations]
   dgDestageOut = outLun[ dgHandle ];
^
test.c:11:13: note: containing loop
while ( ( dgHandle < ( BS_CFG_DRIVE_GROUPS + 1 ) ) )


For this simplified test case:

#include 

#define BS_CFG_DRIVE_GROUPS 10
uint32_t dgDestageLimit = 0;
uint32_t outLun[ BS_CFG_DRIVE_GROUPS ];

void test(void)
{
int dgHandle = 0;

   while ( ( dgHandle < ( BS_CFG_DRIVE_GROUPS + 1 ) ) )
   {
  uint32_t dgDestageOut;
  dgDestageOut = outLun[ dgHandle ];
  dgHandle++;
   }
}


Martin


Thanks Martin.

My gcc (4.8.3) must not be recent enough.  Any idea when this sort of 
checking went in?


#include 

#define BS_CFG_DRIVE_GROUPS 0x1e0

uint32_t dgDestageLimit = 0;
uint32_t outLun[ 10 ];

int
test(void)
{
int dgHandle = 0;
int total;

while ( ( dgHandle < ( BS_CFG_DRIVE_GROUPS + 1 ) ) ) {
uint32_t dgDestageOut;
dgDestageOut = outLun[ dgHandle ];
total += dgDestageOut;
dgHandle++;
}

return dgHandle;
}

int
main(int argc, char **argv)
{
test();
return 0;
}


ca-bld-ol71-02$ cc -o foo foo.c -Wall
ca-bld-ol71-02$