----- Am 18. Aug 2025 um 9:33 schrieb Jørgen Kvalsvik [email protected]:
> On 8/18/25 02:58, Sebastian Huber wrote:
>> Hello,
>>
>> I have question to the counters used for the condition coverage
>> implementation
>> in tree-profile.cc
>>
>> /* Stores the incoming edge and previous counters (in SSA form) on that edge
>> for the node e->deston that edge for the node e->dest. The counters
>> record
>> the seen-true (0), seen-false (1), and current-mask (2). They are
>> stored in
>> an array rather than proper members for access-by-index as the code paths
>> tend to be identical for the different counters. */
>> struct counters
>> {
>> edge e;
>> tree counter[3];
>> tree& operator [] (size_t i) { return counter[i]; }
>> };
>>
>> While working on the -fprofile-update=atomic support for 32-bit targets which
>> lack support for 64-bit atomic operations, I noticed that some atomic
>> no-operations are generated for the instrumented code
>> (https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692555.html). For
>> example:
>>
>> int a(int i);
>> int b(int i);
>>
>> int f(int i)
>> {
>> if (i) {
>> return a(i);
>> } else {
>> return b(i);
>> }
>> }
>>
>> gcc -O2 -fprofile-update=atomic -fcondition-coverage -S -o - test.c
>> -fdump-tree-all
>>
>> ;; Function f (f, funcdef_no=0, decl_uid=4621, cgraph_uid=1, symbol_order=0)
>>
>> int f (int i)
>> {
>> int _1;
>> int _6;
>> int _8;
>>
>> <bb 2> [local count: 1073741824]:
>> if (i_3(D) != 0)
>> goto <bb 3>; [50.00%]
>> else
>> goto <bb 4>; [50.00%]
>>
>> <bb 3> [local count: 536870912]:
>> __atomic_fetch_or_8 (&__gcov8.f[0], 1, 0);
>> __atomic_fetch_or_8 (&__gcov8.f[1], 0, 0);
>> _8 = a (i_3(D)); [tail call]
>> goto <bb 5>; [100.00%]
>>
>> <bb 4> [local count: 536870912]:
>> __atomic_fetch_or_8 (&__gcov8.f[0], 0, 0);
>> __atomic_fetch_or_8 (&__gcov8.f[1], 1, 0);
>> _6 = b (0); [tail call]
>>
>> <bb 5> [local count: 1073741824]:
>> # _1 = PHI <_8(3), _6(4)>
>> return _1;
>>
>> }
>>
>> The __atomic_fetch_or_8 (&__gcov8.f[1], 0, 0) and __atomic_fetch_or_8
>> (&__gcov8.f[0], 0, 0) could be optimized away. Since GCC is able to figure
>> out
>> that the masks are compile-time constants wouldn't it be possible to use a
>> simple uint64_t for the current-mask (2) in struct counters?
>
> Is this something you're seeing consistently, even when the number of
> conditions go up?
I only looked at simple test cases while working on the support for splitting
up the atomic operations. In a slightly more complicated test function
int a(int i);
int b(int i);
int c(int i);
int f(int i, int j, int k, int l, int m)
{
if (i && j && (k || !l || m)) {
if (a(i) || b(i) || l) {
return a(i);
} {
return c(i);
}
} else if (k && m) {
return b(i);
} else {
return 123;
}
}
all the masks are still compile-time constants:
;; Function f (f, funcdef_no=0, decl_uid=4627, cgraph_uid=1, symbol_order=0)
Removing basic block 20
int f (int i, int j, int k, int l, int m)
{
_Bool _1;
_Bool _2;
_Bool _3;
_Bool _5;
_Bool _6;
int _7;
int _8;
_Bool _10;
_Bool _11;
int _12;
int _24;
int _26;
int _28;
_Bool _76;
<bb 2> [local count: 1073741822]:
_1 = i_15(D) != 0;
_2 = j_16(D) != 0;
_3 = _1 & _2;
_76 = k_17(D) != 0;
if (_3 != 0)
goto <bb 3>; [50.00%]
else
goto <bb 16>; [50.00%]
<bb 3> [local count: 536870911]:
_5 = l_18(D) == 0;
_6 = _5 | _76;
if (_6 != 0)
goto <bb 4>; [33.00%]
else
goto <bb 5>; [67.00%]
<bb 4> [local count: 177167400]:
__atomic_fetch_or_8 (&__gcov8.f[0], 3, 0);
__atomic_fetch_or_8 (&__gcov8.f[1], 0, 0);
goto <bb 8>; [100.00%]
<bb 5> [local count: 359703512]:
if (m_19(D) != 0)
goto <bb 7>; [50.00%]
else
goto <bb 6>; [50.00%]
<bb 6> [local count: 179851756]:
__atomic_fetch_or_8 (&__gcov8.f[0], 0, 0);
__atomic_fetch_or_8 (&__gcov8.f[1], 6, 0);
goto <bb 17>; [100.00%]
<bb 7> [local count: 179851756]:
__atomic_fetch_or_8 (&__gcov8.f[0], 5, 0);
__atomic_fetch_or_8 (&__gcov8.f[1], 0, 0);
<bb 8> [local count: 357019156]:
_7 = a (i_15(D));
if (_7 != 0)
goto <bb 9>; [34.00%]
else
goto <bb 10>; [66.00%]
<bb 9> [local count: 121386514]:
__atomic_fetch_or_8 (&__gcov8.f[2], 1, 0);
__atomic_fetch_or_8 (&__gcov8.f[3], 0, 0);
goto <bb 14>; [100.00%]
<bb 10> [local count: 235632641]:
_8 = b (i_15(D));
if (_8 != 0)
goto <bb 11>; [34.00%]
else
goto <bb 12>; [66.00%]
<bb 11> [local count: 80115099]:
__atomic_fetch_or_8 (&__gcov8.f[2], 2, 0);
__atomic_fetch_or_8 (&__gcov8.f[3], 0, 0);
goto <bb 14>; [100.00%]
<bb 12> [local count: 155517543]:
if (l_18(D) != 0)
goto <bb 13>; [67.00%]
else
goto <bb 15>; [33.00%]
<bb 13> [local count: 104196754]:
__atomic_fetch_or_8 (&__gcov8.f[2], 4, 0);
__atomic_fetch_or_8 (&__gcov8.f[3], 0, 0);
<bb 14> [local count: 305698367]:
_26 = a (i_15(D)); [tail call]
goto <bb 19>; [100.00%]
<bb 15> [local count: 51320789]:
__atomic_fetch_or_8 (&__gcov8.f[2], 0, 0);
__atomic_fetch_or_8 (&__gcov8.f[3], 7, 0);
_24 = c (i_15(D)); [tail call]
goto <bb 19>; [100.00%]
<bb 16> [local count: 536870911]:
__atomic_fetch_or_8 (&__gcov8.f[0], 0, 0);
__atomic_fetch_or_8 (&__gcov8.f[1], 1, 0);
_10 = m_19(D) != 0;
_11 = _10 & _76;
if (_11 != 0)
goto <bb 18>; [63.77%]
else
goto <bb 17>; [36.23%]
<bb 17> [local count: 374344247]:
__atomic_fetch_or_8 (&__gcov8.f[4], 0, 0);
__atomic_fetch_or_8 (&__gcov8.f[5], 1, 0);
goto <bb 19>; [100.00%]
<bb 18> [local count: 342378420]:
__atomic_fetch_or_8 (&__gcov8.f[4], 1, 0);
__atomic_fetch_or_8 (&__gcov8.f[5], 0, 0);
_28 = b (i_15(D)); [tail call]
<bb 19> [local count: 1073741824]:
# _12 = PHI <_26(14), _24(15), _28(18), 123(17)>
return _12;
}
>
> I'm sure it's possible to optimize out this case by checking if the
> current-mask still is the initial zero constant. The inputs that
> determine the current-mask are constant and tied to the node, but the
> final state of current-mask once the counters are flushed depends on the
> path taken. I'll try to think a bit more about it, but I don't think it
> can be replaced by a uint64_t without a redesign of the
> instrument_decisions function.
Ok, thanks for having a look at it.
--
embedded brains GmbH & Co. KG
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: [email protected]
phone: +49-89-18 94 741 - 16
fax: +49-89-18 94 741 - 08
Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/