Sure. I've just attached it to the bug.
On 2011/8/24 14:56, Xinliang David Li wrote:
Thanks.
Can you make the test case a standalone preprocessed file (using -E)?
David
On Wed, Aug 24, 2011 at 2:26 PM, Oleg Smolsky wrote:
On 2011/8/24 13:02, Xinliang David Li wrote:
On 2011/8/23 11:38, Xin
Thanks.
Can you make the test case a standalone preprocessed file (using -E)?
David
On Wed, Aug 24, 2011 at 2:26 PM, Oleg Smolsky wrote:
> On 2011/8/24 13:02, Xinliang David Li wrote:
>>>
>>> On 2011/8/23 11:38, Xinliang David Li wrote:
Partial register stall happens when there is a 3
On 2011/8/24 13:02, Xinliang David Li wrote:
On 2011/8/23 11:38, Xinliang David Li wrote:
Partial register stall happens when there is a 32bit register read
followed by a partial register write. In your case, the stall probably
happens in the next iteration when 'add eax, 0Ah' executes, so your
On Wed, Aug 24, 2011 at 12:50 PM, Oleg Smolsky
wrote:
> On 2011/8/23 11:38, Xinliang David Li wrote:
>>
>> Partial register stall happens when there is a 32bit register read
>> followed by a partial register write. In your case, the stall probably
>> happens in the next iteration when 'add eax, 0A
On 2011/8/23 11:38, Xinliang David Li wrote:
Partial register stall happens when there is a 32bit register read
followed by a partial register write. In your case, the stall probably
happens in the next iteration when 'add eax, 0Ah' executes, so your
manual patch does not work. Try change
add a
Partial register stall happens when there is a 32bit register read
followed by a partial register write. In your case, the stall probably
happens in the next iteration when 'add eax, 0Ah' executes, so your
manual patch does not work. Try change
add al, [dx] into two instructions (assuming esi is
Hey Andrew,
On 2011/8/22 18:37, Andrew Pinski wrote:
On Mon, Aug 22, 2011 at 6:34 PM, Oleg Smolsky wrote:
On 2011/8/22 18:09, Oleg Smolsky wrote:
Both compilers fully inline the templated function and the emitted code
looks very similar. I am puzzled as to why one of these loops is
significan
On Mon, Aug 22, 2011 at 6:34 PM, Oleg Smolsky wrote:
> On 2011/8/22 18:09, Oleg Smolsky wrote:
>>
>> Both compilers fully inline the templated function and the emitted code
>> looks very similar. I am puzzled as to why one of these loops is
>> significantly slower than the other. I've attached dis
On 2011/8/22 18:09, Oleg Smolsky wrote:
Both compilers fully inline the templated function and the emitted
code looks very similar. I am puzzled as to why one of these loops is
significantly slower than the other. I've attached disassembled
listings - perhaps someone could have a look please? (
Hey David, these two --param options made no difference to the test.
I've cut the suite down to a single test (attached), which yields the
following results:
./simple_types_constant_folding_os (gcc 41)
test description time operations/s
0 "int8_t constant add" 1.34 sec
Scanning through the profile data you provided -- test functions such
as test_constant ...>
completely disappeared in 4.1's profile which means they are inlined
by gcc4.1. They exist in 4.6's profile. For the unsigned short case
where neither version inlines the call, 4.6 version is much faster.
D
On Mon, Aug 1, 2011 at 8:43 PM, Oleg Smolsky
wrote:
> On 2011/7/29 14:07, Xinliang David Li wrote:
>>
>> Profiling tools are your best friend here. If you don't have access to
>> any, the least you can do is to build the program with -pg option and
>> use gprof tool to find out differences.
>
> Th
Try isolate the int8_t constant folding testing from the rest to see
if the slow down can be reproduced with the isolated case. If the
problem disappear, it is likely due to the following inline
parameters:
large-function-insns, large-function-growth, large-unit-insns,
inline-unit-growth. For inst
On Mon, 1 Aug 2011, Oleg Smolsky wrote:
BTW, some of these tweaks increase the binary size to 99K, yet there is no
performance increase.
I don't see this in the thread: did you use -march=native?
--
Marc Glisse
Hi Benjamin,
On 2011/7/30 06:22, Benjamin Redelings I wrote:
I had some performance degradation with 4.6 as well.
However, I was able to cure it by using -finline-limit=800 or 1000 I
think. However, this lead to a code size increase. Were the old
higher-performance binaries larger?
Yes, th
Hi Oleg,
I had some performance degradation with 4.6 as well.
However, I was able to cure it by using -finline-limit=800 or 1000 I
think. However, this lead to a code size increase. Were the old
higher-performance binaries larger?
IIRC, setting finline-limit=n actually sets two params to n
On Fri, Jul 29, 2011 at 7:56 PM, Oleg Smolsky wrote:
> Hi there, I have compiled and run a set of C++ benchmarks on a CentOS4/64
> box using the following compilers:
> a) g++4.1 that is available for this distro (GCC version 4.1.2 20071124
> (Red Hat 4.1.2-42)
> b) g++4.6 that I built (stock
On Fri, Jul 29, 2011 at 11:57 AM, Oleg Smolsky
wrote:
> Hey David, here are a couple of answers and notes:
> - I built the test suite with -O3 and cannot see anything else related to
> inlining that isn't already ON (except for -finline-limit=n which I do not
> how to use)
size estimation, inl
Hey David, here are a couple of answers and notes:
- I built the test suite with -O3 and cannot see anything else
related to inlining that isn't already ON (except for -finline-limit=n
which I do not how to use)
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
- FTO looks lik
My guess is inlining differences. Try more aggressive inline
parameters to see if helps. Also try FDO to see there is any
performance difference between two versions. You will probably need to
do first level triage and file bug reports.
David
On Fri, Jul 29, 2011 at 10:56 AM, Oleg Smolsky
wrote
20 matches
Mail list logo