Switch both rmb()/mb() barriers to more lightweight smp_rmb()/smp_mb()
ones. When walking the perf ring buffer they pair the following way,
quoting kernel/events/ring_buffer.c:

  Since the mmap() consumer (userspace) can run on a different CPU:

    kernel                               user

    if (LOAD ->data_tail) {              LOAD ->data_head
                          (A)            smp_rmb()       (C)
      STORE $data                        LOAD $data
      smp_wmb()           (B)            smp_mb()        (D)
      STORE ->data_head                  STORE ->data_tail
    }

  Where A pairs with D, and B pairs with C.

  In our case (A) is a control dependency that separates the load
  of the ->data_tail and the stores of $data. In case ->data_tail
  indicates there is no room in the buffer to store $data we do not.

  D needs to be a full barrier since it separates the data READ from
  the tail WRITE.

  For B a WMB is sufficient since it separates two WRITEs, and for C
  an RMB is sufficient since it separates two READs.

Currently, on x86-64, perf uses LFENCE and MFENCE which is overkill
as we can do more lightweight in particular given this is fast-path.

According to Peter rmb()/mb() were added back then via a94d342b9cb0
("tools/perf: Add required memory barriers") at a time where kernel
still supported chips that needed it, but nowadays support for these
has been ditched completely, therefore we can fix them up as well.

Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: "Paul E. McKenney" <paul...@linux.vnet.ibm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/perf/util/mmap.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 05a6d47..de6dc2e 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -73,7 +73,8 @@ static inline u64 perf_mmap__read_head(struct perf_mmap *mm)
 {
        struct perf_event_mmap_page *pc = mm->base;
        u64 head = READ_ONCE(pc->data_head);
-       rmb();
+
+       smp_rmb();
        return head;
 }
 
@@ -84,7 +85,7 @@ static inline void perf_mmap__write_tail(struct perf_mmap 
*md, u64 tail)
        /*
         * ensure all reads are done before we write the tail out.
         */
-       mb();
+       smp_mb();
        pc->data_tail = tail;
 }
 
-- 
2.9.5

Reply via email to