@sil2100,

thanks for the trust. My TL;DR version for you is:

>From one of liburcu maintainers (https://github.com/compudj):

"""
Posted Nov 24, 2013 23:55 UTC (Sun) by compudj (subscriber, #43335) [Link]

Tracking threads running in multiple processes using a common shared memory is 
not possible with the currently implemented URCU flavors, but we look forward 
to adding a new URCU flavor to support this kind of use case.
"""

So that satisfies the corner case I have thought of.

By that, I'm +1 on the SRU.

----

continuing the longer version...

> I agree with @ddstreet, I don't think liburcu gives that sort of
guarantee when it comes to cross process synchronisation. It was my
belief that liburcu targets synchronisation across a set of threads
within the current process only.

It does not (like stated above) but I had to check, specially cause I
was just checking by the membarrier() syscall point of view (not too
much into liburcu implementation).

NOW with all that I got curious =)...

>From liburcu documentation:

"""
There are multiple flavors of liburcu available:

memb, qsbr, mb, signal, bp.

The API members start with the prefix "urcu__", where is the chosen
flavor name.

Usage of liburcu-memb

#include <urcu/urcu-memb.h>

Link the application with -lurcu-memb

This is the preferred version of the library, in terms of grace-period
detection speed, read-side speed and flexibility. Dynamically detects
kernel support for sys_membarrier().

Falls back on urcu-mb scheme if support is not present, which has slower
read-side. Use the --disable-sys-membarrier-fallback configure option to
disable the fall back, thus requiring sys_membarrier() to be available.
This gives a small speedup when sys_membarrier() is supported by the
kernel, and aborts in the library constructor if not supported.

Usage of liburcu-qsbr

#include <urcu/urcu-qsbr.h>

Link with -lurcu-qsbr

The QSBR flavor of RCU needs to have each reader thread executing
rcu_quiescent_state() periodically to progress. rcu_thread_online() and
rcu_thread_offline() can be used to mark long periods for which the
threads are not active. It provides the fastest read-side at the expense
of more intrusiveness in the application code.

Usage of liburcu-mb

#include <urcu/urcu-mb.h>

Link with -lurcu-mb

This version of the urcu library uses memory barriers on the writer and
reader sides. This results in faster grace-period detection, but results
in slower reads.

Usage of liburcu-signal

#include <urcu/urcu-signal.h>

Link the application with -lurcu-signal

Version of the library that requires a signal, typically SIGUSR1. Can be
overridden with -DSIGRCU by modifying Makefile.build.inc.

Usage of liburcu-bp

#include <urcu/urcu-bp.h>

Link with -lurcu-bp

The BP library flavor stands for "bulletproof". It is specifically
designed to help tracing library to hook on applications without
requiring to modify these applications.

urcu_bp_init(), and urcu_bp_unregister_thread() all become nops, whereas 
calling urcu_bp_register_thread() becomes optional. The state is dealt with by 
the library internally at the expense of read-side and write-side performance.
"""

> If the program links against liburcu 0.9 or lower, the sys_membarrier
syscall did not exist yet, and liburcu will use the default compiler
based membarrier, which is only good within the current process.
Synchronisation across shared memory pages fails. This is the case on
Xenial, Trusty and the like.

> If the program links against liburcu 0.11 or newer, the sys_membarrier
syscall does exist, but MEMBARRIER_CMD_SHARED is only used if the
current running kernel does not support
MEMBARRIER_CMD_PRIVATE_EXPEDITED.

Yep. Showed here => https://tinyurl.com/y96692o8

> There is no toggle option in the API at all, so for users with a
kernel 4.14 or higher, MEMBARRIER_CMD_PRIVATE_EXPEDITED will be used,
and synchronisation across shared memory pages will fail. This is the
case on Eoan, Focal, Groovy.

Understood and agreed. The SRU line of thinking is always "not
introducing regressions" so I was more interested in the "change of
behavior" (even if "it is all broken").

> If the program links against liburcu 0.10, and uses the -qsbr, -md and 
> -signal variants, sys_membarrier is not used at all, and it falls back to the 
> compiler based membarrier, which is only good within the current process. 
> Synchronisation across shared memory pages 
will fail.

Agreed per documentation.

> If the program links against liburcu 0.10, and is used within a
container, with a kernel version less than 4.3 that does not support
sys_membarrier, such as a Bionic container on a Trusty 3.13 host, or on
a 3.10 RHEL host, the sys_membarrier syscall fails, and it falls back to
the compiler based membarrier. Synchronisation across shared memory
pages will fail.

Agreed, per "urcu_bp_sys_membarrier_status()".

> Now, the upstream developers added MEMBARRIER_CMD_PRIVATE_EXPEDITED as
the default in liburcu 0.11. They did not change the API to accommodate
both MEMBARRIER_CMD_SHARED and MEMBARRIER_CMD_PRIVATE_EXPEDITED, and
instead, if the kernel is greater than 4.14,
MEMBARRIER_CMD_PRIVATE_EXPEDITED will be used. Upstream are well aware
of their consumers, and they would not break everyone's usages out of
the blue, without adding some sort of API provision for legacy users.

I see your point but that is usually not an assumption we can make, thus
the review.

> Thus, our initial assumption that liburcu can be used to synchronise
access to shared memory pages for IPC between a sister process is wrong,
since no one will create a program that potentially only works in one
specific environment, which is bionic on bare metal and liburcu 0.10
only. I'm not even sure how you would co-ordinate liburcu over multiple
processes either.

I was checking from sys_membarrier() POV only, so I agree with you.

> So, because of the above, I don't think any librcu consumers are depending on 
> a full membarrier, driven by the kernel, for shared pages among different 
> processes.
>
> I still think this is safe to SRU.

Just like TL;DR version, now backed by solid arguments, +1.

Thanks for all this information!

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to liburcu in Ubuntu.
https://bugs.launchpad.net/bugs/1876230

Title:
  liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address
  performance problems with MEMBARRIER_CMD_SHARED

Status in liburcu package in Ubuntu:
  Fix Released
Status in liburcu source package in Bionic:
  In Progress

Bug description:
  [Impact]

  In Linux 4.3, a new syscall was defined, called "membarrier". This
  systemcall was defined specifically for use in userspace-rcu (liburcu)
  to speed up the fast path / reader side of the library. The original
  implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED
  subcommand of the membarrier syscall.

  MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from
  all processes running on the system. When it exits, the userspace
  thread which called it is guaranteed that all running threads share
  the same world view in regards to userspace addresses which are
  consumed by readers and writers.

  The problem with MEMBARRIER_CMD_SHARED is system calls made in this
  fashion can block, since it deploys a barrier across all threads in a
  system, and some other threads can be waiting on blocking operations,
  and take time to reach the barrier.

  In Linux 4.14, this was addressed by adding the
  MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It
  only targets threads which share the same mm as the thread calling the
  membarrier syscall, aka, threads in the current process, and not all
  threads / processes in the system.

  Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command
  are guaranteed non-blocking, due to using inter-processor interrupts
  to implement memory barriers.

  Because of this, membarrier calls that use
  MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use
  MEMBARRIER_CMD_SHARED.

  Since Bionic uses a 4.15 kernel, all kernel requirements are met, and
  this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in
  the liburcu package.

  This brings the performance of the liburcu library back in line to
  where it was in Trusty, as this particular user has performance
  problems upon upgrading from Trusty to Bionic.

  [Test]

  Testing performance is heavily dependant on the application which
  links against liburcu, and the workload which it executes.

  A test package is available in the following ppa:
  https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test

  For the sake of testing, we can use the benchmarks provided in the
  liburcu source code. Download a copy of the source code for liburcu
  either from the repos or from github:

  $ pull-lp-source liburcu bionic
  # OR
  $ git clone https://github.com/urcu/userspace-rcu.git
  $ git checkout v0.10.1 # version in bionic

  Build the code:

  $ ./bootstrap
  $ ./configure
  $ make

  Go into the tests/benchmark directory

  $ cd tests/benchmark

  From there, you can run benchmarks for the four main usages of
  liburcu: urcu, urcu-bp, urcu-signal and urcu-mb.

  On a 8 core machine, 6 threads for readers and 2 threads for writers,
  with a 10 second runtime, execute:

  $ ./test_urcu 6 2 10
  $ ./test_urcu_bp 6 2 10
  $ ./test_urcu_signal 6 2 10
  $ ./test_urcu_mb 6 2 10

  Results:

  ./test_urcu 6 2 10
  0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops
  0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops

  $ ./test_urcu_bp 6 2 10
  0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops
  0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops

  $ ./test_urcu_signal 6 2 10
  0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops
  0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops

  $ ./test_urcu_mb 6 2 10
  0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops
  0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops

  The SRU only changes behaviour for urcu and urcu-bp, since they are
  the only "flavours" of liburcu which the patches change. From a pure
  ops standpoint:

  $ ./test_urcu 6 2 10
  17612527935 ops
  14989247316 ops

  $ ./test_urcu_bp 6 2 10
  1179590602 ops
  13230930051 ops

  We see that this particular benchmark workload, test_urcu sees extra
  performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is
  explained by the extra impact that it has on the slowpath, and the
  extra amount of writes it did during my benchmark.

  The real winner in this benchmark workload is test_urcu_bp, which sees
  a 10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some
  of this may be down to the 3x less writes it did during my benchmark.

  Again, these benchmarks are indicative only are very "random".
  Performance is really dependant on the application which links against
  liburcu and its workload.

  [Regression Potential]

  This SRU changes the behaviour of the following libraries which
  applications link against: -lurcu and -lurcu-bp. Behaviour is not
  changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb.

  On Bionic, liburcu will call the membarrier syscall in urcu and urcu-
  bp. This does not change. What is changing is the semantics of that
  syscall, from MEMBARRIER_CMD_SHARED to
  MEMBARRIER_CMD_PRIVATE_EXPEDITED. The changed code is all run in
  kernel space and resides in the kernel. These commits simply change
  the parameters which are supplied to the membarrier syscall from
  liburcu.

  I have run the testsuite that comes with the Bionic source code, and
  "make regtest", "make short_bench" and "make long_bench" pass. You
  want to run these on a cloud instance somewhere since they take
  multiple hours.

  If a regression were to occur, applications linked against -lurcu and
  -lurcu-bp would be affected. The homepage: https://liburcu.org/ offers
  a list of the major applications that use liburcu: Knot DNS, Netsniff-
  ng, Sheepdog, GlusterFS, gdnsd and LTTng.

  [Scope]

  The two commits which are being SRU'd are:

  commit c0bb9f693f926595a7cb8b4ce712cef08d9f5d49
  Author: Mathieu Desnoyers <mathieu.desnoy...@efficios.com>
  Date: Thu Dec 21 13:42:23 2017 -0500
  Subject: liburcu: Use membarrier private expedited when available
  Link: 
https://github.com/urcu/userspace-rcu/commit/c0bb9f693f926595a7cb8b4ce712cef08d9f5d49

  commit 3745305bf09e7825e75ee5b5490347ee67c6efdd
  Author: Mathieu Desnoyers <mathieu.desnoy...@efficios.com>
  Date: Fri Dec 22 10:57:59 2017 -0500
  Subject: liburcu-bp: Use membarrier private expedited when available
  Link: 
https://github.com/urcu/userspace-rcu/commit/3745305bf09e7825e75ee5b5490347ee67c6efdd

  Both cherry pick directly onto 0.10.1 in Bionic, and are originally
  from 0.11.0, meaning that Eoan, Focal and Groovy already have the
  patch.

  [Other]

  If you are interested in how the membarrier syscall works, you can
  read their commits in the Linux kernel:

  commit 5b25b13ab08f616efd566347d809b4ece54570d1
  Author: Mathieu Desnoyers <mathieu.desnoy...@efficios.com>
  Date:   Fri Sep 11 13:07:39 2015 -0700
  Subject: sys_membarrier(): system-wide memory barrier (generic, x86)
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b25b13ab08f616efd566347d809b4ece54570d1

  commit 22e4ebb975822833b083533035233d128b30e98f
  Author: Mathieu Desnoyers <mathieu.desnoy...@efficios.com>
  Date:   Fri Jul 28 16:40:40 2017 -0400
  Subject: membarrier: Provide expedited private command
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22e4ebb975822833b083533035233d128b30e98f

  Additionally, blog posts from LTTng:
  
https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/

  And Phoronix:
  
https://www.phoronix.com/scan.php?page=news_item&px=URCU-Membarrier-Performance

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/liburcu/+bug/1876230/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to