[Bug c++/57489] New: [4.8 regression]: invalid code generated for conditional in template function

2013-05-31 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489

Bug ID: 57489
   Summary: [4.8 regression]: invalid code generated for
conditional in template function
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jhand at austin dot rr.com

Created attachment 30235
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30235&action=edit
test_parse.cpp

In gcc-4.8.0 and gcc-4.8.1-RC-20130527, g++ is generating some invalid code
that is causing my test regressions to fail. gcc-4.7.3 and gcc-4.6.3 both
generate valid code. Comping with -O2 or higher fails, while -O1 or lower
works. Adding -fno-inline allows the test to succeed. Changing test_parser() to
a non-template function also allows the test to succeed.

This has been tested only on a 64-bit environment on Fedora 16.

Here is a more complete description of the test. First, the code to parse the
number seems to be working properly. However, a conditional in the test, "if
(expected_v != actual_v || expected_i != actual_i)" seems to fail even though
expected_v == actual_v and expected_i == actual_i. When the test fails, it
generates the following message:

bad parse: "9223372036854775807": expected_v=9223372036854775807
actual_v=9223372036854775807 expected_i=19 actual_i=19

The number that is being parsed is equal to 0x7fff, however
slightly smaller numbers cause the test to still fail, and some number that is
smaller allows the test to succeed.

I'm attaching the simple test case that demonstrates the problem.

Compile with the following:

g++ -O2 -o test_parse test_parse.cpp


[Bug c++/57489] [4.8 regression]: invalid code generated for conditional in template function

2013-05-31 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489

Jim Hand  changed:

   What|Removed |Added

Version|unknown |4.8.1
   Severity|normal  |major


[Bug c++/57489] [4.8 regression]: invalid code generated for conditional in template function

2013-05-31 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489

--- Comment #1 from Jim Hand  ---
One of my coworkers looked at the assembly and said that it appears g++ isn't
even generating code for the conditional "if (expected_v != actual_v ||
expected_i != actual_i)" when this occurs. It appears the compiler is analyzing
the inline code and determining that the conditional isn't required. Testing
with some smaller numbers seems to cause the conditional to be generated.

With a number of 9223372036854775759 or less, the conditional seems to work as
expected.

Good luck with this one!


[Bug c++/57489] [4.8 regression]: invalid code generated for conditional in template function

2013-05-31 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489

--- Comment #3 from Jim Hand  ---
Thanks Marc, that seems to allow the code to work properly. One thing that is
somewhat troubling, even though the code seems to work properly with the
parenthesis, is that changing the following without the parenthesis:

   if (expected_v != actual_v || expected_i != actual_i) {

to just the following:

   if (expected_v != actual_v) {

prevents the printf() from being called. Also having the following prevents the
printf() from being called:

   if (expected_t != actual_i) {

I would agree that the the missing parenthesis results in some undefined
behavior, but it also looks like the compiler isn't behaving in a predictable
manner. Perhaps a compiler warning could be useful in this situation, but
nothing seems to be generated with the -Wall flag.


[Bug c++/57489] [4.8 regression]: invalid code generated for conditional in template function

2013-05-31 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489

--- Comment #4 from Jim Hand  ---
One of my coworkers, a former Intel employee, made the point that signed
integer overflow is precisely defined for X86, in that overflowing and then
underflowing will produce the correct value 100% of the time.

Another point that was made is that when the test fails, there is no other way
to see why it should fail: silent failure. Moreover, adding instrumentation
would change the results.

The few of us that have discussed this believe that the compiler is producing
the incorrect results in this case.


[Bug c++/57489] [4.8 regression]: invalid code generated for conditional in template function

2013-05-31 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489

--- Comment #6 from Jim Hand  ---
You are correct in that it is undefined in C.


[Bug middle-end/57489] [4.8 regression]: invalid code generated for conditional in template function

2013-06-01 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489

--- Comment #8 from Jim Hand  ---
Do you not agree that the compiler is not handling this situation in a very
helpful manner? A warning at minimal would be helpful, but when there is code
that equivalently says the following and prints identical values for the
expected and actual, it makes it pretty hard for a sane developer to debug:

  if (expected_v != actual_v)
  printf("%lld %lld", expected_v, actual_v);

additionally, there are other very slight changes that can be made to the
program that change the behavior wildly, such as make the function a
non-templated class or changing the conditional to only have what is printed
above.

You can close this case as invalid, but I think you may be doing gcc a
disservice because this is an opportunity for quality improvement.


[Bug libstdc++/57885] New: unordered_map find slower in 4.8.1 than 4.7.3 with integer key

2013-07-11 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57885

Bug ID: 57885
   Summary: unordered_map find slower in 4.8.1 than 4.7.3 with
integer key
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jhand at austin dot rr.com

Created attachment 30499
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30499&action=edit
unordered_map.cpp

I have been looking at performance of std::unordered_map versus
tr1::unordered_map using gcc-4.6.3, gcc-4.7.3, and gcc-4.8.1. I am mostly
interested in the performance of find(), although insertion performance is
important too.

I am attaching a simple program that creates a vector of 100,000 integers. It
calls a function that measures the time to insert the vector into an
unordered_map of integers to integers. It does this in a loop five times,
recording the amount spent each time, and clearing the unordered_map at the end
of the loop except the final time through the loop. The time to clear the map
is not measured.

Then within a loop of 5 times, it performs find() on the unordered map using
the original vector of integers. It measures the time spent looking up all of
the integers in the vector 5 times.

On gcc-4.6.3, I get the following output (the numbers are microseconds):
 Container:std::unordered_map  Key:int
 Insertion: 10458 4885 4874 4892 4871  min=4871 max=10458
 Lookup:11609 11613 11597 11611 11594  min=11594 max=11613

 Container:std::tr1::unordered_map  Key:int
 Insertion: 12399 4963 4948 4949 4961  min=4948 max=12399
 Lookup:11580 11573 11562 11559 11570  min=11559 max=11580

On gcc-4.7.3, I get the following output:
 Container:std::unordered_map  Key:int
 Insertion: 14454 16288 16499 15830 15753  min=14454 max=16499
 Lookup:12229 12208 12204 12201 12200  min=12200 max=12229

 Container:std::tr1::unordered_map  Key:int
 Insertion: 9926 4978 5000 4972 4971  min=4971 max=9926
 Lookup:11667 11636 11646 11636 11652  min=11636 max=11667

On gcc-4.8.1, I get the following output:
 Container:std::unordered_map  Key:int
 Insertion: 13234 7441 7519 7546 7569  min=7441 max=13234
 Lookup:14812 14823 14846 14810 14805  min=14805 max=14846

 Container:std::tr1::unordered_map  Key:int
 Insertion: 12450 5228 5197 5186 5249  min=5186 max=12450
 Lookup:11579 11530 11592 11535 11529  min=11529 max=11592

It looks like the tr1::unordered_map implementation performs approximately the
same across compilers.

The std::unordered_map::find() implementation slows down by approximately 5.2%
in gcc-4.7.3 compared to gcc-4.6.3. It slows down by about 27.7% in gcc-4.8.1
compared to gcc-4.6.1. The insertion performance is about 52.7% worse in
gcc-4.8.1 compared to gcc-4.6.1.

The code was compiled with the following:

g++ -std=c++0x -DNDEBUG -O3 -o unordered_map unordered_map.cpp

This is on Linux x86_64 with Intel Xeon X5570 @ 2.93GHz.

I have seen this performance degradation with some of my user defined classes,
but I thought that an integer key best shows the issue.


[Bug libstdc++/57885] unordered_map find slower in 4.8.1 than 4.7.3 with integer key

2013-07-15 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57885

--- Comment #3 from Jim Hand  ---
As a simplification, I tried compiling the following code with gcc-4.6.3 into
assembly with gcc-4.6.3 and 4.8.1:

#include 

bool contains(std::unordered_map a) {
   return a.find(5) != a.end();
}

gcc-4.6.3 generates the following assembly:

.LFB574:
.cfi_startproc
movq8(%rdi), %rcx
movq16(%rdi), %rdi
xorl%edx, %edx
movl$5, %eax
divq%rdi
xorl%eax, %eax
movq(%rcx,%rdi,8), %rsi
movq(%rcx,%rdx,8), %rdx
testq%rdx, %rdx
jne.L6
jmp.L2
.p2align 4,,10
.p2align 3
.L11:
movq8(%rdx), %rdx
testq%rdx, %rdx
je.L10
.L6:
cmpl$5, (%rdx)
jne.L11
cmpq%rdx, %rsi
setne%al
.L2:
rep
ret
.p2align 4,,10
.p2align 3
.L10:
xorl%eax, %eax
ret
.cfi_endproc

gcc-4.8.1 generates the following assembly:

.LFB1323:
.cfi_startproc
movq8(%rdi), %r8
xorl%edx, %edx
movl$5, %eax
divq%r8
movq(%rdi), %rax
movq(%rax,%rdx,8), %rax
movq%rdx, %r9
testq%rax, %rax
je.L7
movq(%rax), %rcx
movl8(%rcx), %esi
.p2align 4,,10
.p2align 3
.L3:
cmpl$5, %esi
je.L5
movq(%rcx), %rcx
testq%rcx, %rcx
je.L7
movl8(%rcx), %esi
xorl%edx, %edx
movslq%esi, %rax
divq%r8
cmpq%rdx, %r9
je.L3
.L7:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
movl$1, %eax
ret
.cfi_endproc

In the gcc-4.8.1 code, I see two divq instructions that I think are coming from
line 345 of _Mod_range_hashing in bits/hashtable_policy.h:

 343 result_type
 344 operator()(first_argument_type __num, second_argument_type __den)
const
 345 { return __num % __den; }

I would think that the hash function only needs to be called once for this
function, but I could be misinterpreting the assembly output. It does look to
me like the 4.8 output is doing a lot more than the 4.6.3 output, but I'll
leave it for you to explore.


[Bug libstdc++/57885] unordered_map find slower in 4.8.1 than 4.7.3 with integer key

2013-07-15 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57885

--- Comment #4 from Jim Hand  ---
I compiled 4.9.0-20130714 today and ran the same tests. Here are the results
for the lookup loops with the times for each compiler:

4.6.3: 11536 microseconds
4.7.3: 12150 microseconds  (5.3% slower than 4.6.3)
4.8.2: 14810 microseconds  (28.3% slower than 4.6.3)
4.9.0-20130714 15365 microseconds (33.1% slower than 4.6.3)

I took a look at the assembly for the "return a.find(5) != a.end();" code for
4.9 and it looks very similar to the 4.8 assembly, but contains two extra
instructions at the beginning of the function:

testq%r8, %r8
 je.L8

I guess those two instructions explain the the slightly worse behavior for
4.9.0 than 4.8.2.


[Bug libstdc++/57885] unordered_map find slower in 4.8.1 than 4.7.3 with integer key

2013-07-19 Thread jhand at austin dot rr.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57885

--- Comment #6 from Jim Hand  ---
Thanks for the information Francois. That makes a lot of sense for the erase()
case.

For a little more information, I also timed this with three boost
implementations and saw the following times for boost 1.47, 1.49, and 1.53:

$ ./um.b147   
Container:boost::unordered_map  Key:int
Insertion: 13589 4581 4581 4593 4577  min=4577 max=13589
Lookup:10344 10353 10339 10332 10338  min=10332 max=10353

$ ./um.b149   
Container:boost::unordered_map  Key:int
Insertion: 16593 7901 7944 7950 7970  min=7901 max=16593
Lookup:15295 15340 15297 15303 15310  min=15295 max=15340

$ ./um.b153  
Container:boost::unordered_map  Key:int
Insertion: 10662 5648 5726 5747 5758  min=5648 max=10662 
Lookup:14880 14842 14844 14840 14928  min=14840 max=14928

The times are pretty comparable to gcc. Inserts may take about 10 or 11
nanoseconds on our hardware and finds look like they are taking about 5 or 6
nanoseconds. For most applications, including ours, the time spent inserting
and executing find() is miniscule compared to the other time spent by the
application.

If you feel you can improve performance while maintaining conformance with the
standard, please do so. If you feel that the current behavior is fine, we won't
complain (too much).

I also want you to know that the gcc developers all are doing a great job in
supporting the C++ community. I've been using gcc for C/C++ programming for
about 20 years now and have found the gcc compiler suite to be a great product.
My colleagues at work are really excited about moving from gcc-4.6 to 4.8,
mostly for the enhanced C++11 support.