On 8/18/25 02:58, Sebastian Huber wrote:
Hello,

I have question to the counters used for the condition coverage implementation 
in tree-profile.cc

/* Stores the incoming edge and previous counters (in SSA form) on that edge
    for the node e->deston that edge for the node e->dest.  The counters record
    the seen-true (0), seen-false (1), and current-mask (2).  They are stored in
    an array rather than proper members for access-by-index as the code paths
    tend to be identical for the different counters.  */
struct counters
{
     edge e;
     tree counter[3];
     tree& operator [] (size_t i) { return counter[i]; }
};

While working on the -fprofile-update=atomic support for 32-bit targets which 
lack support for 64-bit atomic operations, I noticed that some atomic 
no-operations are generated for the instrumented code 
(https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692555.html). For 
example:

int a(int i);
int b(int i);

int f(int i)
{
     if (i) {
       return a(i);
     } else {
       return b(i);
     }
}

gcc -O2 -fprofile-update=atomic -fcondition-coverage -S -o - test.c 
-fdump-tree-all

;; Function f (f, funcdef_no=0, decl_uid=4621, cgraph_uid=1, symbol_order=0)

int f (int i)
{
   int _1;
   int _6;
   int _8;

   <bb 2> [local count: 1073741824]:
   if (i_3(D) != 0)
     goto <bb 3>; [50.00%]
   else
     goto <bb 4>; [50.00%]

   <bb 3> [local count: 536870912]:
   __atomic_fetch_or_8 (&__gcov8.f[0], 1, 0);
   __atomic_fetch_or_8 (&__gcov8.f[1], 0, 0);
   _8 = a (i_3(D)); [tail call]
   goto <bb 5>; [100.00%]

   <bb 4> [local count: 536870912]:
   __atomic_fetch_or_8 (&__gcov8.f[0], 0, 0);
   __atomic_fetch_or_8 (&__gcov8.f[1], 1, 0);
   _6 = b (0); [tail call]

   <bb 5> [local count: 1073741824]:
   # _1 = PHI <_8(3), _6(4)>
   return _1;

}

The __atomic_fetch_or_8 (&__gcov8.f[1], 0, 0) and __atomic_fetch_or_8 
(&__gcov8.f[0], 0, 0) could be optimized away. Since GCC is able to figure out that 
the masks are compile-time constants wouldn't it be possible to use a simple uint64_t 
for the current-mask (2) in struct counters?

Is this something you're seeing consistently, even when the number of conditions go up?

I'm sure it's possible to optimize out this case by checking if the current-mask still is the initial zero constant. The inputs that determine the current-mask are constant and tied to the node, but the final state of current-mask once the counters are flushed depends on the path taken. I'll try to think a bit more about it, but I don't think it can be replaced by a uint64_t without a redesign of the instrument_decisions function.

Thanks,
Jørgen


I am not sure how the phi node stuff works in resolve_counter() since I am not 
a compiler expert.

Kind regards, Sebastian


Reply via email to