[Bug c++/57489] New: [4.8 regression]: invalid code generated for conditional in template function
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489 Bug ID: 57489 Summary: [4.8 regression]: invalid code generated for conditional in template function Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: jhand at austin dot rr.com Created attachment 30235 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30235&action=edit test_parse.cpp In gcc-4.8.0 and gcc-4.8.1-RC-20130527, g++ is generating some invalid code that is causing my test regressions to fail. gcc-4.7.3 and gcc-4.6.3 both generate valid code. Comping with -O2 or higher fails, while -O1 or lower works. Adding -fno-inline allows the test to succeed. Changing test_parser() to a non-template function also allows the test to succeed. This has been tested only on a 64-bit environment on Fedora 16. Here is a more complete description of the test. First, the code to parse the number seems to be working properly. However, a conditional in the test, "if (expected_v != actual_v || expected_i != actual_i)" seems to fail even though expected_v == actual_v and expected_i == actual_i. When the test fails, it generates the following message: bad parse: "9223372036854775807": expected_v=9223372036854775807 actual_v=9223372036854775807 expected_i=19 actual_i=19 The number that is being parsed is equal to 0x7fff, however slightly smaller numbers cause the test to still fail, and some number that is smaller allows the test to succeed. I'm attaching the simple test case that demonstrates the problem. Compile with the following: g++ -O2 -o test_parse test_parse.cpp
[Bug c++/57489] [4.8 regression]: invalid code generated for conditional in template function
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489 Jim Hand changed: What|Removed |Added Version|unknown |4.8.1 Severity|normal |major
[Bug c++/57489] [4.8 regression]: invalid code generated for conditional in template function
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489 --- Comment #1 from Jim Hand --- One of my coworkers looked at the assembly and said that it appears g++ isn't even generating code for the conditional "if (expected_v != actual_v || expected_i != actual_i)" when this occurs. It appears the compiler is analyzing the inline code and determining that the conditional isn't required. Testing with some smaller numbers seems to cause the conditional to be generated. With a number of 9223372036854775759 or less, the conditional seems to work as expected. Good luck with this one!
[Bug c++/57489] [4.8 regression]: invalid code generated for conditional in template function
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489 --- Comment #3 from Jim Hand --- Thanks Marc, that seems to allow the code to work properly. One thing that is somewhat troubling, even though the code seems to work properly with the parenthesis, is that changing the following without the parenthesis: if (expected_v != actual_v || expected_i != actual_i) { to just the following: if (expected_v != actual_v) { prevents the printf() from being called. Also having the following prevents the printf() from being called: if (expected_t != actual_i) { I would agree that the the missing parenthesis results in some undefined behavior, but it also looks like the compiler isn't behaving in a predictable manner. Perhaps a compiler warning could be useful in this situation, but nothing seems to be generated with the -Wall flag.
[Bug c++/57489] [4.8 regression]: invalid code generated for conditional in template function
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489 --- Comment #4 from Jim Hand --- One of my coworkers, a former Intel employee, made the point that signed integer overflow is precisely defined for X86, in that overflowing and then underflowing will produce the correct value 100% of the time. Another point that was made is that when the test fails, there is no other way to see why it should fail: silent failure. Moreover, adding instrumentation would change the results. The few of us that have discussed this believe that the compiler is producing the incorrect results in this case.
[Bug c++/57489] [4.8 regression]: invalid code generated for conditional in template function
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489 --- Comment #6 from Jim Hand --- You are correct in that it is undefined in C.
[Bug middle-end/57489] [4.8 regression]: invalid code generated for conditional in template function
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57489 --- Comment #8 from Jim Hand --- Do you not agree that the compiler is not handling this situation in a very helpful manner? A warning at minimal would be helpful, but when there is code that equivalently says the following and prints identical values for the expected and actual, it makes it pretty hard for a sane developer to debug: if (expected_v != actual_v) printf("%lld %lld", expected_v, actual_v); additionally, there are other very slight changes that can be made to the program that change the behavior wildly, such as make the function a non-templated class or changing the conditional to only have what is printed above. You can close this case as invalid, but I think you may be doing gcc a disservice because this is an opportunity for quality improvement.
[Bug libstdc++/57885] New: unordered_map find slower in 4.8.1 than 4.7.3 with integer key
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57885 Bug ID: 57885 Summary: unordered_map find slower in 4.8.1 than 4.7.3 with integer key Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: jhand at austin dot rr.com Created attachment 30499 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30499&action=edit unordered_map.cpp I have been looking at performance of std::unordered_map versus tr1::unordered_map using gcc-4.6.3, gcc-4.7.3, and gcc-4.8.1. I am mostly interested in the performance of find(), although insertion performance is important too. I am attaching a simple program that creates a vector of 100,000 integers. It calls a function that measures the time to insert the vector into an unordered_map of integers to integers. It does this in a loop five times, recording the amount spent each time, and clearing the unordered_map at the end of the loop except the final time through the loop. The time to clear the map is not measured. Then within a loop of 5 times, it performs find() on the unordered map using the original vector of integers. It measures the time spent looking up all of the integers in the vector 5 times. On gcc-4.6.3, I get the following output (the numbers are microseconds): Container:std::unordered_map Key:int Insertion: 10458 4885 4874 4892 4871 min=4871 max=10458 Lookup:11609 11613 11597 11611 11594 min=11594 max=11613 Container:std::tr1::unordered_map Key:int Insertion: 12399 4963 4948 4949 4961 min=4948 max=12399 Lookup:11580 11573 11562 11559 11570 min=11559 max=11580 On gcc-4.7.3, I get the following output: Container:std::unordered_map Key:int Insertion: 14454 16288 16499 15830 15753 min=14454 max=16499 Lookup:12229 12208 12204 12201 12200 min=12200 max=12229 Container:std::tr1::unordered_map Key:int Insertion: 9926 4978 5000 4972 4971 min=4971 max=9926 Lookup:11667 11636 11646 11636 11652 min=11636 max=11667 On gcc-4.8.1, I get the following output: Container:std::unordered_map Key:int Insertion: 13234 7441 7519 7546 7569 min=7441 max=13234 Lookup:14812 14823 14846 14810 14805 min=14805 max=14846 Container:std::tr1::unordered_map Key:int Insertion: 12450 5228 5197 5186 5249 min=5186 max=12450 Lookup:11579 11530 11592 11535 11529 min=11529 max=11592 It looks like the tr1::unordered_map implementation performs approximately the same across compilers. The std::unordered_map::find() implementation slows down by approximately 5.2% in gcc-4.7.3 compared to gcc-4.6.3. It slows down by about 27.7% in gcc-4.8.1 compared to gcc-4.6.1. The insertion performance is about 52.7% worse in gcc-4.8.1 compared to gcc-4.6.1. The code was compiled with the following: g++ -std=c++0x -DNDEBUG -O3 -o unordered_map unordered_map.cpp This is on Linux x86_64 with Intel Xeon X5570 @ 2.93GHz. I have seen this performance degradation with some of my user defined classes, but I thought that an integer key best shows the issue.
[Bug libstdc++/57885] unordered_map find slower in 4.8.1 than 4.7.3 with integer key
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57885 --- Comment #3 from Jim Hand --- As a simplification, I tried compiling the following code with gcc-4.6.3 into assembly with gcc-4.6.3 and 4.8.1: #include bool contains(std::unordered_map a) { return a.find(5) != a.end(); } gcc-4.6.3 generates the following assembly: .LFB574: .cfi_startproc movq8(%rdi), %rcx movq16(%rdi), %rdi xorl%edx, %edx movl$5, %eax divq%rdi xorl%eax, %eax movq(%rcx,%rdi,8), %rsi movq(%rcx,%rdx,8), %rdx testq%rdx, %rdx jne.L6 jmp.L2 .p2align 4,,10 .p2align 3 .L11: movq8(%rdx), %rdx testq%rdx, %rdx je.L10 .L6: cmpl$5, (%rdx) jne.L11 cmpq%rdx, %rsi setne%al .L2: rep ret .p2align 4,,10 .p2align 3 .L10: xorl%eax, %eax ret .cfi_endproc gcc-4.8.1 generates the following assembly: .LFB1323: .cfi_startproc movq8(%rdi), %r8 xorl%edx, %edx movl$5, %eax divq%r8 movq(%rdi), %rax movq(%rax,%rdx,8), %rax movq%rdx, %r9 testq%rax, %rax je.L7 movq(%rax), %rcx movl8(%rcx), %esi .p2align 4,,10 .p2align 3 .L3: cmpl$5, %esi je.L5 movq(%rcx), %rcx testq%rcx, %rcx je.L7 movl8(%rcx), %esi xorl%edx, %edx movslq%esi, %rax divq%r8 cmpq%rdx, %r9 je.L3 .L7: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L5: movl$1, %eax ret .cfi_endproc In the gcc-4.8.1 code, I see two divq instructions that I think are coming from line 345 of _Mod_range_hashing in bits/hashtable_policy.h: 343 result_type 344 operator()(first_argument_type __num, second_argument_type __den) const 345 { return __num % __den; } I would think that the hash function only needs to be called once for this function, but I could be misinterpreting the assembly output. It does look to me like the 4.8 output is doing a lot more than the 4.6.3 output, but I'll leave it for you to explore.
[Bug libstdc++/57885] unordered_map find slower in 4.8.1 than 4.7.3 with integer key
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57885 --- Comment #4 from Jim Hand --- I compiled 4.9.0-20130714 today and ran the same tests. Here are the results for the lookup loops with the times for each compiler: 4.6.3: 11536 microseconds 4.7.3: 12150 microseconds (5.3% slower than 4.6.3) 4.8.2: 14810 microseconds (28.3% slower than 4.6.3) 4.9.0-20130714 15365 microseconds (33.1% slower than 4.6.3) I took a look at the assembly for the "return a.find(5) != a.end();" code for 4.9 and it looks very similar to the 4.8 assembly, but contains two extra instructions at the beginning of the function: testq%r8, %r8 je.L8 I guess those two instructions explain the the slightly worse behavior for 4.9.0 than 4.8.2.
[Bug libstdc++/57885] unordered_map find slower in 4.8.1 than 4.7.3 with integer key
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57885 --- Comment #6 from Jim Hand --- Thanks for the information Francois. That makes a lot of sense for the erase() case. For a little more information, I also timed this with three boost implementations and saw the following times for boost 1.47, 1.49, and 1.53: $ ./um.b147 Container:boost::unordered_map Key:int Insertion: 13589 4581 4581 4593 4577 min=4577 max=13589 Lookup:10344 10353 10339 10332 10338 min=10332 max=10353 $ ./um.b149 Container:boost::unordered_map Key:int Insertion: 16593 7901 7944 7950 7970 min=7901 max=16593 Lookup:15295 15340 15297 15303 15310 min=15295 max=15340 $ ./um.b153 Container:boost::unordered_map Key:int Insertion: 10662 5648 5726 5747 5758 min=5648 max=10662 Lookup:14880 14842 14844 14840 14928 min=14840 max=14928 The times are pretty comparable to gcc. Inserts may take about 10 or 11 nanoseconds on our hardware and finds look like they are taking about 5 or 6 nanoseconds. For most applications, including ours, the time spent inserting and executing find() is miniscule compared to the other time spent by the application. If you feel you can improve performance while maintaining conformance with the standard, please do so. If you feel that the current behavior is fine, we won't complain (too much). I also want you to know that the gcc developers all are doing a great job in supporting the C++ community. I've been using gcc for C/C++ programming for about 20 years now and have found the gcc compiler suite to be a great product. My colleagues at work are really excited about moving from gcc-4.6 to 4.8, mostly for the enhanced C++11 support.