On 12/7/2025 5:11 AM, Sebastian Huber wrote:
There are targets, which only offer 32-bit atomic operations (for
example 32-bit RISC-V).  For these targets, split the 64-bit atomic
bitwise-or operation into two parts.

For this test case

int a(int i);
int b(int i);

int f(int i)
{
   if (i) {
     return a(i);
   } else {
     return b(i);
   }
}

with options

-O2 -fprofile-update=atomic -fcondition-coverage

the code generation to 64-bit vs. 32-bit RISC-V looks like:

         addi    a5,a5,%lo(.LANCHOR0)
         beq     a0,zero,.L2
         li      a4,1
-       amoor.d zero,a4,0(a5)
-       addi    a5,a5,8
-       amoor.d zero,zero,0(a5)
+       amoor.w zero,a4,0(a5)
+       addi    a4,a5,4
+       amoor.w zero,zero,0(a4)
+       addi    a4,a5,8
+       amoor.w zero,zero,0(a4)
+       addi    a5,a5,12
+       amoor.w zero,zero,0(a5)
         tail    a
  .L2:
-       amoor.d zero,zero,0(a5)
+       amoor.w zero,zero,0(a5)
+       addi    a4,a5,4
+       amoor.w zero,zero,0(a4)
         li      a4,1
-       addi    a5,a5,8
-       amoor.d zero,a4,0(a5)
+       addi    a3,a5,8
+       amoor.w zero,a4,0(a3)
+       addi    a5,a5,12
+       amoor.w zero,zero,0(a5)
         tail    b

Not related to this patch, even with -O2 the compiler generates
no-operations like

amoor.d zero,zero,0(a5)

and

amoor.w zero,zero,0(a5)

Would this be possible to filter out in instrument_decisions()?

I'd bet this might be reasonably optimized in either gimple or RTL without major work.  Though someone would have to read up on semantics -- are we allowed to drop atomics like that?



gcc/ChangeLog:

        * tree-profile.cc (split_update_decision_counter): New.
        (instrument_decisions): Use counter_update to determine which
        atomic operations are available.  Use
        split_update_decision_counter() if 64-bit atomic operations can
        be split up into two 32-bit atomic operations.

I was originally thinking that spitting these down in the optimizers or target files would make more sense.  But this looks like a fairly practical solution pending doing some real optimization around atomics (which I think we do need, I've seen unused relaxed loads showing up in profiles from jemalloc).  But until then....


+
+    /* Get the high 32-bit of the counter */
+    tree shift_32 = build_int_cst (integer_type_node, 32);
+    tree counter_high_64 = make_temp_ssa_name (gcov_type_node, NULL,
+                                              "PROF_decision");
+    gassign *assign3 = gimple_build_assign (counter_high_64, LSHIFT_EXPR,
+                                           counter, shift_32);

Doesn't the type of shift_32 need to match the type of the object being shifted?  Or do we have loose requirements around type checking operands for this case (where the shift count is often in a smaller precision than the object being shifted).

Do we need to worry about logical vs arithmetic shifts here? COUNTER's type is going to drive that decision, so we just need to make sure it's sensible.




@@ -1157,6 +1213,11 @@ instrument_decisions (array_slice<basic_block> expr, 
size_t condno,
                                                      next[k], relaxed);
                    gsi_insert_on_edge (e, flush);
                }
+               else if (use_atomic_split)
+               {
+                   split_update_decision_counter (e, ref, next[k],
+                                                  atomic_ior_32, relaxed);
+               }

Consider dropping the extraneous curlys.  That function seems to be formatted without regard to our formatting conventions, so I'm not going to ask that you adjust indention on this little hunk since it mirrors nearby code.

Jeff

Reply via email to