https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110070
Bug ID: 110070
Summary: Code quality regression with for (int i: {1,2,4,6})
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: roger at nextmovesoftware dot com
Target Milestone: ---
The fix for PR c++/70167 (in GCC 11.3) inadvertently introduced a code quality
regression for simple range-for using initializer lists. The motivating
example is an idiom from the stockfish benchmark [update_continuation_histories
in src/search.cpp]:
#include <initializer_list>
extern void ext(int);
void foo()
{
for (int i: {1,2,4,6})
ext(i);
}
which currently generates inefficient code by copying the array (to the stack)
before use:
foo():
pushq %rbp
pushq %rbx
subq $24, %rsp
movdqa .LC0(%rip), %xmm0
movq %rsp, %rbx
leaq 16(%rsp), %rbp
movaps %xmm0, (%rsp)
.L2:
movl (%rbx), %edi
addq $4, %rbx
call ext(int)
cmpq %rbp, %rbx
jne .L2
addq $24, %rsp
popq %rbx
popq %rbp
ret
.LC0:
.long 1
.long 2
.long 4
.long 6
In GCC 11.2 and earlier, the initializing array is efficiently used without
copying:
foo():
pushq %rbx
movl $C.0.0, %ebx
.L2:
movl (%rbx), %edi
addq $4, %rbx
call ext(int)
cmpq $C.0.0+16, %rbx
jne .L2
popq %rbx
ret
C.0.0:
.long 1
.long 2
.long 4
.long 6
The underlying cause of the code difference stems from whether the initializer
is marked "static" in the middle-end, as shown by the differences between:
const int init[4] = {1,2,4,6};
for (int i: init) ... // generates a copy
and
static const int init[4] = {1,2,4,6};
for (int i: init) ... // doesn't generate a copy
Fortunately, there's already code in the depth of the C++ front=end for marking
such initializer lists/constructors as static, so I initially tried fixing this
myself, at first trying:
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 05df628..a91693d 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -3314,7 +3314,6 @@ finish_compound_literal (tree type, tree
compound_literal,
/* FIXME all C99 compound literals should be variables rather than C++
temporaries, unless they are used as an aggregate initializer. */
if ((!at_function_scope_p () || CP_TYPE_CONST_P (type))
- && fcl_context == fcl_c99
&& TREE_CODE (type) == ARRAY_TYPE
&& !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type)
&& initializer_constant_valid_p (compound_literal, type))
and then trying:
diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 2736f55..53220da 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -8557,7 +8557,10 @@ convert_like_internal (conversion *convs, tree expr,
tree
fn, int argnum,
elttype = cp_build_qualified_type
(elttype, cp_type_quals (elttype) | TYPE_QUAL_CONST);
array = build_array_of_n_type (elttype, len);
- array = finish_compound_literal (array, new_ctor, complain);
+ /* Indicate that a non-lvalue static const array is acceptable
+ by specifying fcl_c99. */
+ array = finish_compound_literal (array, new_ctor, complain,
+ fcl_c99);
/* Take the address explicitly rather than via decay_conversion
to avoid the error about taking the address of a temporary. */
array = cp_build_addr_expr (array, complain);
Both of which fix/improve code generation for this case, but break the
initializer tests in the g++.dg testsuite in interesting ways. At this point I
thought I'd give up and leave the fix to the experts. The range_expr passed to
cp_convert_range_for is:
<constructor 0x7ffff6dc9348
type <lang_type 0x7ffff6dadc78 init list VOID
align:1 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff6dadc78>
constant length:4
val <non_lvalue_expr 0x7ffff6dcd540
type <integer_type 0x7ffff6c415e8 int public type_6 SI
size <integer_cst 0x7ffff6c43228 constant 32>
unit-size <integer_cst 0x7ffff6c43240 constant 4>
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff6c415e8 precision:32 min <integer_cst 0x7ffff6c431e0 -2147483648> max
<integer_cst 0x7ffff6c431f8 2147483647>
pointer_to_this <pointer_type 0x7ffff6c49b28>>
constant public
arg:0 <integer_cst 0x7ffff6c43390 constant 1>
iter.cc:7:16 start: iter.cc:7:16 finish: iter.cc:7:16>
val <non_lvalue_expr 0x7ffff6dcd560 type <integer_type 0x7ffff6c415e8 int>
constant public
arg:0 <integer_cst 0x7ffff6c43768 constant 2>
iter.cc:7:18 start: iter.cc:7:18 finish: iter.cc:7:18>
val <non_lvalue_expr 0x7ffff6dcd580 type <integer_type 0x7ffff6c415e8 int>
constant public
arg:0 <integer_cst 0x7ffff6c43780 constant 4>
iter.cc:7:20 start: iter.cc:7:20 finish: iter.cc:7:20>
val <non_lvalue_expr 0x7ffff6dcd5a0 type <integer_type 0x7ffff6c415e8 int>
constant public
arg:0 <integer_cst 0x7ffff6c437b0 constant 6>
iter.cc:7:22 start: iter.cc:7:22 finish: iter.cc:7:22>>
which contains a lot of non_lvalue_expr, so it's surprising (to me) that we try
to turn this into an lvalue, when it should/could be read-only.
Thanks in advance. My apologies if this is a duplicate/known issue.
Ideally, GCC should be able to unroll this loop, but that's a different issue.