On 27/06/2025 13:36, Alice Ryhl wrote:
On Fri, Jun 27, 2025 at 11:41 AM Jocelyn Falempe <[email protected]> wrote:

On 32bits ARM, u64 divided by a constant is not optimized to a
multiply by inverse by the compiler [1].
So do the multiply by inverse explicitly for this architecture.

Link: https://github.com/llvm/llvm-project/issues/37280 [1]
Reported-by: Andrei Lalaev <[email protected]>
Closes: 
https://lore.kernel.org/dri-devel/[email protected]/
Fixes: 675008f196ca ("drm/panic: Use a decimal fifo to avoid u64 by u64 divide")
Signed-off-by: Jocelyn Falempe <[email protected]>

Not to block this change, but I think this really ought to be fixed in
the compiler. We should not have to do this kind of thing to divide by
10.

I agree, I didn't expect that would be a problem. But I'm not a compiler expert, and it will probably take time to update the compiler, so we have to do this at least temporary.

  drivers/gpu/drm/drm_panic_qr.rs | 24 +++++++++++++++++++++++-
  1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/drm_panic_qr.rs
index dd55b1cb764d..82acecd505d3 100644
--- a/drivers/gpu/drm/drm_panic_qr.rs
+++ b/drivers/gpu/drm/drm_panic_qr.rs
@@ -381,6 +381,24 @@ struct DecFifo {
      len: usize,
  }

+/// On arm32 architecture, dividing an u64 by a constant will generate a call
+/// to __aeabi_uldivmod which is not present in the kernel.
+/// So use the multiply by inverse method for this architecture.
+#[cfg(target_arch = "arm")]
+fn div10(val: u64) -> u64
+{

Please run rustfmt on your patch.

sorry, I will fix that.

+    let val_h = val >> 32;
+    let val_l = val & 0xFFFFFFFF;
+    let b_h: u64 = 0x66666666;
+    let b_l: u64 = 0x66666667;
+
+    let tmp1 = val_h * b_l + ((val_l * b_l) >> 32);
+    let tmp2 = val_l * b_h + (tmp1 & 0xffffffff);
+    let tmp3 = val_h * b_h + (tmp1 >> 32) + (tmp2 >> 32);
+
+    tmp3 >> 2
+}
+
  impl DecFifo {
      fn push(&mut self, data: u64, len: usize) {
          let mut chunk = data;
@@ -389,7 +407,11 @@ fn push(&mut self, data: u64, len: usize) {
          }
          for i in 0..len {
              self.decimals[i] = (chunk % 10) as u8;
-            chunk /= 10;
+            if cfg!(target_arch = "arm") {
+                chunk = div10(chunk);
+            } else {
+                chunk /= 10;
+            }

I would get rid of this conditional and declare another div10 function
that just does input/10 on other arches.

ok, I will send a v2 shortly with that changed.

Alice


Reply via email to