RISC-V V C Intrinsic API v1.0 release meeting reminder (July 10th, 2023)

2023-07-10 Thread Eop Chen via Gcc


Hi all,

A reminder that the next open meeting to discuss on the RISC-V V C Intrinsic 
API v1.0 is going to
be held later on 2023/07/10 7AM (PST) / 10PM (GMT +8).

The agenda can be found in the second page of the meeting slides (link 
).
Please join the calendar to be constantly notified - Google calender link 
,
 ICal 

We also have a mailing list now hosted by RISC-V International (link 
).

Regards,

eop Chen



Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-10 Thread Jonas Oberhauser



Am 7/7/2023 um 7:25 PM schrieb Olivier Dion:

On Fri, 07 Jul 2023, Jonas Oberhauser  wrote:
[...]

This is a request for comments on extending the atomic builtins API to
help avoiding redundant memory barriers.  Indeed, there are
discrepancies between the Linux kernel consistency memory model (LKMM)
and the C11/C++11 memory consistency model [0].  For example,
fully-ordered atomic operations like xchg and cmpxchg success in LKMM
have implicit memory barriers before/after the operations [1-2], while
atomic operations using the __ATOMIC_SEQ_CST memory order in C11/C++11
do not have any ordering guarantees of an atomic thread fence
__ATOMIC_SEQ_CST with respect to other non-SEQ_CST operations [3].


The issues run quite a bit deeper than this. The two models have two
completely different perspectives that are quite much incompatible.

Agreed.  Our intent is not to close the gap completely, but to reduce
the gap between the two models, by supporting the "full barrier
before/after" semantic of LKMM in the C11/C++11 memory model.



I think what you're trying to achieve has nothing to do with the gap at 
all. (But do check out the IMM paper https://plv.mpi-sws.org/imm/ for 
what is involved in bridging the gap between LKMM-like and C11-like models).


What you're trying to achieve is to certify some urcu algorithms, 
without making the code unnecessarily slower.
Your problem is that the algorithm is implemented using the LKMM API, 
and you want to check it with a tool (TSAN) meant to (dynamically) 
analyze C11 code that relies on a subset of C11's memory model.


What I still don't understand is whether using TSAN as-is is a formal 
requirement from the certification you are trying to achieve, or whether 
you could either slightly modify the TSAN toolchain to give answers 
consistent with the behavior on LKMM, or use a completely different tool.


For example, you could eliminate the worry about the unnecessary 
barriers by including the extra barriers only in the TSAN' (modified 
TSAN) analysis.
In that case TSAN' adds additional, redundant barriers in some cases 
during the analysis process, but those barriers would be gone the moment 
you stop using TSAN'.


You would need to argue that this additional instrumentation doesn't 
hide any data races, but I suspect that wouldn't be too hard.


Another possibility is to use a tool like Dat3M that supports LKMM to 
try and verify your code, but I'm not sure if these tools are 
feature-complete enough to verify the specific algorithms you have in 
mind (e.g., mixed-size accesses are an issue, and Paul told me there's a 
few of those in (U)RCU. But maybe the cost of changing the code to 
full-sized accesses might be cheaper than relying on extra barriers.)



FWIW your current solution of adding a whole class of fences to 
essentially C11 and the toolchain, and modifying the code to use these 
fences, isn't a way I would want to take.




I think all you can really do is bridge the gap at the level of the
generated assembly.  I.e., don't bridge the gap between LKMM and the
C11 MCM. Bridge the gap between the assembly code generated by C11
atomics and the one generated by LKMM. But I'm not sure that's really
the task here.

[...]
However, nothing prevents a toolchain from changing the emitted
assembler in the future, which would make things fragile.  The only
thing that is guaranteed to not change is the definitions in the
standard (C11/C++11).  Anything else is fair game for optimizations.


Not quite. If you rely on the LKMM API to generate the final code, you 
can definitely prove that the LKMM API implementation has a C11-like 
memory model abstraction.
For example, you might be able to prove that the LKMM implementation of 
a strong xchg guarantees at least the same ordering as a seq_cst fence ; 
seq_cst xchg ; seq_cst fence sequence in C11.
I don't think it's that fragile since 1) it's a manually written 
volatile assembler mapping, so there's not really a lot the toolchains 
can do about it and 2) the LKMM implementation of atomics rarely 
changes, and should still have similar guarantees after the change.


The main issue will be as we discussed before and below that TSAN will 
still trigger false positives.




[...] For example, to make Read-Modify-Write (RMW) operations match
the Linux kernel "full barrier before/after" semantics, the liburcu's
uatomic API has to emit both a SEQ_CST RMW operation and a subsequent
thread fence SEQ_CST, which leads to duplicated barriers in some cases.

Does it have to though? Can't you just do e.g. an release RMW
operation followed by an after_atomic  fence?  And for loads, a
SEQ_CST fence followed by an acquire load? Analogously (but: mirrored)
for stores.

That would not improve anything for RMW.  Consider the following example
and its resulting assembler on x86-64 gcc 13.1 -O2:

int exchange(int *x, int y)
{
int r = __atomic_exchange_n(x, y, __ATOMIC_RELEASE);
__atomic_th

_Nullable and _Nonnull in GCC's analyzer (was: [PATCH v5] libio: Add nonnull attribute for most FILE * arguments in stdio.h)

2023-07-10 Thread Alejandro Colomar via Gcc

[CC += Andrew]

Hi Xi, Andrew,

On 7/10/23 20:41, Xi Ruoyao wrote:

Maybe we should have a weaker version of nonnull which only performs the
diagnostic, not the optimization.  But it looks like they hate the idea:
https://gcc.gnu.org/PR110617.


This is the one thing that makes me use both Clang and GCC to compile,
because while any of them would be enough to build, I want as much
static analysis as I can get, and so I want -fanalyzer (so I need GCC),
but I also use _Nullable (so I need Clang).

If GCC had support for _Nullable, I would have in GCC the superset of
features that I need from both in a single vendor.  Moreover, Clang's
static analyzer is brain-damaged (sorry, but it doesn't have a simple
command line to run it, contrary to GCC's easy -fanalyzer), so having
GCC's analyzer get over those _Nullable qualifiers would be great.

Clang's _Nullable (and _Nonnull) are not very useful outside of analyzer
mode, as there are many cases where the compiler doesn't have enough
information, and the analyzer can get rid of false negatives and
positives.  See: 

I'll back the ask for the qualifiers in GCC, for compatibility with
Clang.

Thanks,
Alex

--

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: _Nullable and _Nonnull in GCC's analyzer (was: [PATCH v5] libio: Add nonnull attribute for most FILE * arguments in stdio.h)

2023-07-10 Thread Alejandro Colomar via Gcc

On 7/10/23 22:14, Alejandro Colomar wrote:

[CC += Andrew]

Hi Xi, Andrew,

On 7/10/23 20:41, Xi Ruoyao wrote:

Maybe we should have a weaker version of nonnull which only performs the
diagnostic, not the optimization.  But it looks like they hate the idea:
https://gcc.gnu.org/PR110617.


This is the one thing that makes me use both Clang and GCC to compile,
because while any of them would be enough to build, I want as much
static analysis as I can get, and so I want -fanalyzer (so I need GCC),
but I also use _Nullable (so I need Clang).

If GCC had support for _Nullable, I would have in GCC the superset of
features that I need from both in a single vendor.  Moreover, Clang's
static analyzer is brain-damaged (sorry, but it doesn't have a simple
command line to run it, contrary to GCC's easy -fanalyzer), so having
GCC's analyzer get over those _Nullable qualifiers would be great.

Clang's _Nullable (and _Nonnull) are not very useful outside of analyzer
mode, as there are many cases where the compiler doesn't have enough
information, and the analyzer can get rid of false negatives and
positives.  See: 

I'll back the ask for the qualifiers in GCC, for compatibility with
Clang.


BTW, Bionic libc is adding those qualifiers:






Thanks,
Alex



--

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: semantics of uninitialized values in GIMPLE

2023-07-10 Thread Krister Walfridsson via Gcc

On Fri, 7 Jul 2023, Richard Biener wrote:


I have implemented support for uninitialized memory in my translation
validator. But I am not sure how well this corresponds to the GIMPLE
semantics, so I have some questions...

My implementation tracks uninitialized bits. Use of uninitialized bits is
in general treated as UB (for example, `x + y` is UB if `x` or `y` has any
uninitialized bits), but there are a few exceptions:

  * load/store: It is always OK to load/store values having uninitialized
bits.
  * Phi nodes: Phi nodes propagates uninitialized bits.


definitely, I think plain copies would be OK as well


With "plain copies", do you mean treating it as it is always defined? That
would prevent optimizations such as transforming

  int t;
  if (1)
t = *p;
  else
t = 0;
  return t + 1;

to the equivalent of

  return *p + 1;

because the latter is UB if *p is undefined, but the original is OK if phi
nodes always give us a fully defined value.



  * selection: Instructions that are selecting an element (COND_EXPR,
VEC_PERM_EXPR, etc.) may select between values having uninitialized
bits, and the resulting value may have uninitialized bits. But the
condition/mask must not have uninitialized bits.
  * Extraction: Instructions that are extracting bits (BIT_FIELD_REF etc.)
may have uninitialized bits in both the input/output.
  * Insertion: Instructions that are constructing/inserting values
(COMPLEX_EXPR, etc.) may have uninitialized bits in both the
input/output.


Generally I think "moving" uninitialized bits in full or in part is OK.
Operating on them, including comparing them (with themselves)
is eventually asking for troubles.


Makes sense.

But I think we must restrict the definition of "move". For example, one
can see x * 2 as moving the uninitialized bits one step. And x + x and
x << 1 are the same as x * 2, so then they too would be defined? I would
prefer if all of them were UB if x contains uninitialized bits...



All other use of values having uninitialized bits are considered UB.

Does this behavior make sense?

The above seems to work fine so far, with one exception that can be seen
in gcc.c-torture/execute/pr108498-1.c. The test has an uninitialized bit
field

   unsigned char c6:1, c7:1, c8:1, c9:3, c10:1;

which is written to as

   x.c6 = 0;
   x.c7 = 0;
   x.c8 = 0;
   x.c9 = 7;

The store merging pass changes this to

   _71 = MEM  [(struct C *)&x + 8B];
   _42 = _71 & 128;
   _45 = _42 | 56;

and the translation validator is now complaining that the pass has
introduced UB that was not in the original IR (because the most
significant bit in _71 is uninitialized when passed to BIT_AND_EXPR).

I could solve this by allowing uninitialized bits in BIT_AND_EXPR and
BIT_OR_EXP, and propagating each bit according to

   * `0 & uninit` is an initialized `0`
   * `1 & uninit` is uninitialized
   * `0 | uninit` is uninitialized
   * `1 | uninit` is an initialized `1`

Is that the correct GIMPLE semantics?


I think the above "moves" the uninitialized MSB from memory to _45 so
that's OK.

Some "operations" like & 0 or & 1 give either defined values or
take the uninitialized bit unmodified (thus "move").  I think both
kinds are OK.  Likewise + 0 or * 0/1 would be OK.  What's not
OK is operations that use an (fully?) uninitialized value twice,
like x - x when x is uninitialized.


+ 0 and * 0/1 makes sense to me. One annoyance is that it will make my
tool slower as it must track undefined bits in more cases. But that is
not a valid reason for restricting the IR semantics...



I think we want that, as soon as the uninitialized value becomes
"part" of a then partly initialized value, it's value is "fixed".
With things like _Complex or vector the situation is a bit
of a grey area since we have ways to operate on elements.

Note that when we for example use a BIT_FIELD_REF to extract
the MSB from _42 above the value would be again fully undefined.


Do you mean that all operations on the values having "fixed" bits are
valid? I do not think that will work. The frozen value may be any value,
which mean that the program is nondeterministic. For example, if x has one
"fixed" uninitialized bit with the other bits 0, then code such as
  if (x)
could "randomly" be either true or false, so the program will not really
have any defined semantic.

And if use of an extracted "fixed" uninitialized bit is UB, then the
following may be UB (if x or y has a "fixed" uninitialized least
significant bit)
  bool b = (x & 1) != (y & 1)
while the equivalent
  bool b = (x ^ y) & 1
is defined.

And things may be even more fun when arithmetic is "smearing" the fixed
uninitialized values over the variable -- it will then be impossible to
determine if the extracted bits are OK or not when extracted...




I hope this makes some sense, it's a tricky area.

Richard.