Re: 33 unknowns left

2015-08-31 Thread Krister Walfridsson

On Wed, 26 Aug 2015, Joseph Myers wrote:


kristerw = kristerw 


Krister Walfridsson 


Yes, this is my current address (the "cato@" address mentioned in some
other mail is obsolete).

[I have been away from GCC development for a long time, but I plan to
start contributing in a few weeks -- as soon as I have managed to catch
up on what has happened the last couple of years...]

   /Krister


Re: Krister Walfridsson appointed NetBSD OS port maintainer

2007-08-30 Thread Krister Walfridsson


On Thu, 30 Aug 2007, David Edelsohn wrote:


I am pleased to announce that the GCC Steering Committee has
appointed Krister Walfridsson as NetBSD OS port maintainer.

Please join me in congratulating Krister on his new role.
Krister, please update your listing in the MAINTAINERS file.


Thanks!

Updated as follows.

   /Krister

2007-08-30  Krister Walfridsson  <[EMAIL PROTECTED]>

* MAINTAINERS (OS Port Maintainers): Add myself as NetBSD maintainer.
(Write After Approval): Remove myself.


Index: MAINTAINERS
===
--- MAINTAINERS (revision 127935)
+++ MAINTAINERS (working copy)
@@ -105,6 +105,7 @@ hpuxDave Anglin [EMAIL 
PROTECTED]
 hpux   Steve Ellcey[EMAIL PROTECTED]
 irix, osf  Rainer Orth [EMAIL PROTECTED]
 netbsd Jason Thorpe[EMAIL PROTECTED]
+netbsd     Krister Walfridsson [EMAIL PROTECTED]
 sco5, unixware, sco udkKean Johnston   [EMAIL PROTECTED]
 sh-linux-gnu   Kaz Kojima  [EMAIL PROTECTED]
 RTEMS PortsJoel Sherrill   [EMAIL PROTECTED]
@@ -413,7 +414,6 @@ Caroline Tice   [EMAIL 
PROTECTED]
 Michael Tiemann[EMAIL PROTECTED]
 David Ung  [EMAIL PROTECTED]
 Jonathan Wakely[EMAIL PROTECTED]
-Krister Walfridsson[EMAIL PROTECTED]
 Feng Wang  [EMAIL PROTECTED]
 Stephen M. Webb[EMAIL PROTECTED]
 John Wehle [EMAIL PROTECTED]


Re: gnat1 huge time

2007-11-29 Thread Krister Walfridsson



On Wed, 28 Nov 2007, Joel Sherrill wrote:


I am trying to get the SVN head built locally again
and back at work on the GNAT/RTEMS work I was doing.
Unfortunately, I have tripped across something that
is quite bad. Compiling on Linux x86 targeting the
PowerPC or SPARC leads to a huge compilation time
on a single file.


FWIW.  I see the same behavior for i386-unknown-netbsdelf3.1 too (when 
compiling natively, i.e. not using cross compiler).


The problem was introduced by revision 125624 (the dataflow merge).

   /Krister


Re: gnat1 huge time

2007-11-30 Thread Krister Walfridsson


On Fri, 30 Nov 2007, Joel Sherrill wrote:


Krister Walfridsson wrote:

On Wed, 28 Nov 2007, Joel Sherrill wrote:


I am trying to get the SVN head built locally again
and back at work on the GNAT/RTEMS work I was doing.
Unfortunately, I have tripped across something that
is quite bad. Compiling on Linux x86 targeting the
PowerPC or SPARC leads to a huge compilation time
on a single file.


FWIW.  I see the same behavior for i386-unknown-netbsdelf3.1 too (when 
compiling natively, i.e. not using cross compiler).



Is there a PR for this or do I need to file one?


I have not filed any PR for this, so please go ahead and file a PR.

   /Krister


GIMPLE undefined behavior

2022-08-31 Thread Krister Walfridsson via Gcc
I'm implementing a tool for translation validation (similar to Alive2 for 
LLVM). The tool uses an SMT solver to verify for each GIMPLE pass that the 
output IR is a refinement of the input IR:

 * That each compiled function returns an identical result before/after
   the pass (for input that does not invoke UB)
 * That each function does not have additional UB after the pass
 * That values in global memory are identical after the two versions of
   the function are run

I have reported three bugs (106523, 106613, 106744) where the tool has 
found differences in the result, but it is a bit unclear to me what is 
considered undefined behavior in GIMPLE, so I have not reported any such 
cases yet...


For example, ifcombine optimizes

int foo (int x, int a, int b)
{
  int c;
  int _1;
  int _2;
  int _3;
  int _4;

  :
  c_6 = 1 << a_5(D);
  _1 = c_6 & x_7(D);
  if (_1 != 0)
goto ;
  else
goto ;

  :
  _2 = x_7(D) >> b_8(D);
  _3 = _2 & 1;
  if (_3 != 0)
goto ;
  else
goto ;

  :

  :
  # _4 = PHI <2(4), 0(2), 0(3)>
  return _4;
}

to

int foo (int x, int a, int b)
{
  int c;
  int _4;
  int _10;
  int _11;
  int _12;
  int _13;

  :
  _10 = 1 << b_8(D);
  _11 = 1 << a_5(D);
  _12 = _10 | _11;
  _13 = x_7(D) & _12;
  if (_12 == _13)
goto ;
  else
goto ;

  :

  :
  # _4 = PHI <2(3), 0(2)>
  return _4;
}

Both return the same value for foo(8, 1, 34), but the optimized version 
shifts more than 31 bits when calculating _10. tree.def says for 
LSHIFT_EXPR that "the result is undefined" if the number of bits to shift 
by is larger than the type size, which I interpret as it just should be 
considered to return an arbitrary value. I.e., this does not invoke 
undefined behavior, so the optimization is allowed. Is my understanding 
correct?


What about signed integer wrapping for PLUS_EXPR? This happens for

int f (int c, int s)
{
  int y2;
  int y1;
  int x2;
  int x1;
  int _7;

  :
  y1_2 = c_1(D) + 2;
  x1_4 = y1_2 * s_3(D);
  y2_5 = c_1(D) + 4;
  x2_6 = s_3(D) * y2_5;
  _7 = x1_4 + x2_6;
  return _7;
}

where slsr optimizes this to

int f (int c, int s)
{
  int y1;
  int x2;
  int x1;
  int _7;
  int slsr_9;

  :
  y1_2 = c_1(D) + 2;
  x1_4 = y1_2 * s_3(D);
  slsr_9 = s_3(D) * 2;
  x2_6 = x1_4 + slsr_9;
  _7 = x1_4 + x2_6;
  return _7;

Calling f(-3, 0x75181005) makes slsr_9 overflow in the optimized code, 
even though the original did not overflow. My understanding is that signed 
overflow invokes undefined behavior in GIMPLE, so this is a bug in 
ifcombine. Is my understanding correct?


I would appreciate some comments on which non-memory-related operations I 
should treat as invoking undefined behavior (memory operations are more 
complicated, and I'll be back with more concrete questions later...).


   /Krister


Re: GIMPLE undefined behavior

2022-09-01 Thread Krister Walfridsson via Gcc

On Thu, 1 Sep 2022, Richard Biener wrote:


It's generally poorly documented what is considered 'undefined behavior'.
We desparately need a section in the internals manual for this.
For the {L,R}SHIFT_EXPR case we assume the shift operand is
in range of [0, precision - 1], so in theory value-range propagation could
infer that b_8(D) < 32 after it "executed".  But it seems that
range-on-exit doesn't do that yet.

[...]

The problem with shifts is that there's not a "do it anway, but without
undefined behavior" operation to substitute.


I read this as I should not report these as bugs for now. But I'll 
probably keep this as UB in my tool to get an idea of how often this 
happens...




Calling f(-3, 0x75181005) makes slsr_9 overflow in the optimized code,
even though the original did not overflow. My understanding is that signed
overflow invokes undefined behavior in GIMPLE, so this is a bug in
ifcombine. Is my understanding correct?


Yes, the above would be a bug - again value-range propagation might be
leveraged to produce a wrong-code testcase.


OK. I'll open bugs for the signed overflow issues the tool finds.



I would appreciate some comments on which non-memory-related operations I
should treat as invoking undefined behavior (memory operations are more
complicated, and I'll be back with more concrete questions later...).


The more "interesting" cases are uninitialized values (registers or memory).


Yes, this is the next thing I was planning to implement. :)



In general what we should worry about most is introducing undefined
behavior that, when a later pass can assume it doesn't happen, causes
wrong code to be generated.  Likewise when we have late instrumentation
that would flag such undefined behavior as a user error.


Agreed. But that comes back to the issue of lacking documentation... :(

   /Krister


Translation validation

2022-09-13 Thread Krister Walfridsson via Gcc
I have implemented a tool for translation validation (similar to Alive2 
for LLVM). The tool takes GIMPLE IR for two functions and checks that the 
second is a refinement of the first. That is,

 * The returned value is the same for both functions for all input that
   does not invoke undefined behavior.
 * The second does not have any undefined behavior that the first does not
   have.
 * The global memory is the same after invoking both functions with input
   that does not invoke undefined behavior.

The implementation is rather limited, but I have already found a few bugs 
in GCC (PR 106513, 106523, 106744, 106883, and 106884) by running the tool 
on the c-c++-common, gcc.c-torture, gcc.dg, and gcc.misc-tests test 
suites.


The tool is available at https://github.com/kristerw/pysmtgcc and there 
is some additional information in the blog post 
https://kristerw.github.io/2022/09/13/translation-validation/


   /Krister


Re: Translation validation

2022-09-20 Thread Krister Walfridsson via Gcc

On Wed, 14 Sep 2022, Richard Biener wrote:


Note that the folding/peephole optimizations done early can be avoided
when you separate opportunities in the source to multiple statements,
like change

 int a, b;
 a = b + 1 - b;

to

a = b + 1;
a = a - b;

note that parts of the folding optimizations are shared between GENERIC
and GIMPLE (those generated from match.pd), so those will be exercised.
Fully exercising and verifying the fold-const.cc only optimizations which
happen on GENERIC only is indeed going to be difficult.

Another way to avoid some of the pre-SSA optimizations is to feed the
plugin with input from the GIMPLE frontend.  Look at
gcc/testsuite/gcc.dg/gimplefe-*.c for examples, IL can be dumped
roughly in a format suitable as GIMPLE frontend input by dumping
with -fdump-tree--gimple (but note all type, global variable and function
declarations are missing ...)


This does not help when trying to find wrong-code bugs by compiling random 
source code with plugin1.py.  :(


But it helps when writing tests for plugin2.py, so I wrote a simple script 
that extracts the first operand from match.pd simplify expressions and 
uses that to generate test cases. For example,


  (simplify
   (mult (abs@1 @0) @1)
   (mult @0 @0))

is generated as

  int __GIMPLE () f (int a0)
  {
int _0;
int a1;
int _2;
int _3;

_2 = a0;
a1 = __ABS _2;
_3 = a1;
_0 = a1 * _3;
return _0;
  }

This is not an optimal way to find bugs as many of the simplify 
expressions have additional restrictions in the replacement part, so many 
of my generated functions fail to trigger optimizations...


This did not find any wrong-code bugs, but it found two cases where 
match.pd is missing TYPE_OVERFLOW_SANITIZED checks: PR 106990.


   /Krister


CLOBBER(eol)

2023-06-27 Thread Krister Walfridsson via Gcc
I'm working on an updated version of my translation validator [*], and I 
have some problems with CLOBBER(eol).


I currently treat CLOBBER(eol) as making the memory invalid (i.e. all 
subsequent accesses is undefined behavior), but the tool reports 
miscompilation for a few cases where the tree-nrv pass makes the IR 
clobber  by changing code such as [**]


  union bu o;
  ...
  o = i;
  MEM[(union  *)&o].b18 = _11;
  MEM[(union  *)&o].b20 = _11;
   = o;
  o ={v} {CLOBBER(eol)};
  return ;

to use  instead of o

  union bu o [value-expr: ];
  ...
   = i;
  MEM[(union  *)&].b18 = _11;
  MEM[(union  *)&].b20 = _11;
   ={v} {CLOBBER(eol)};
  return ;

As a result, the tool therefore thinks the returned value is unavailable.

Is my understanding of CLOBBER(eol) correct? (I.e., is this a bug in 
tree-nrv?) Or is  special so that I should just ignore 
CLOBBER(eol) for this case?


   /Krister


[*] I'm planning to release the updated version in a few weeks. Meanwhile, 
you can find the old version at https://github.com/kristerw/pysmtgcc


[**] This example originates from gcc.c-torture/execute/921204-1.c 
compiled for x86 using the flags "-O -m32".


Re: CLOBBER(eol)

2023-06-27 Thread Krister Walfridsson via Gcc

On Tue, 27 Jun 2023, Richard Biener wrote:


I think this is a bug in NRV, yes,  is special but the above would
allow to DSE the three stores.

Can you open a bugreport?


Done! https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110434


types in GIMPLE IR

2023-06-28 Thread Krister Walfridsson via Gcc
I have some random questions concerning types in the GIMPLE IR that I 
would appreciate some clarification on.



Type safety
---
Some transformations treat 1-bit types as a synonym of _Bool and mix the 
types in expressions, such as:


   _2;
  _Bool _3;
  _Bool _4;
  ...
  _4 = _2 ^ _3;

and similarly mixing _Bool and enum

  enum E:bool { E0, E1 };

in one operation.

I had expected this to be invalid... What are the type safety rules in the 
GIMPLE IR?



Somewhat related, gcc.c-torture/compile/pr96796.c performs a 
VIEW_CONVERT_EXPR from


  struct S1 {
long f3;
char f4;
  } g_3_4;

to an int

  p_51_9 = VIEW_CONVERT_EXPR(g_3_4);

That must be wrong?


Semantics of 

"Wide" Booleans, such as , seems to allow more values 
than 0 and 1. For example, I've seen some IR operations like:


  _66 = _16 ? _Literal () -1 : 0;

But I guess there must be some semantic difference between 
 and a 32-bit int, otherwise the wide Boolean type 
would not be needed... So what are the difference?


   /Krister


Re: types in GIMPLE IR

2023-06-29 Thread Krister Walfridsson via Gcc

On Thu, 29 Jun 2023, Richard Biener wrote:


The thing with signed bools is that the two relevant values are -1 (true)
and 0 (false), those are used for vector bool components where we also
need them to be of wider type (32bits in this case).


My main confusion comes from seeing IR doing arithmetic such as

   _127;
   _169;
  ...
  _169 = _127 + -1;

or

   _127;
   _169;
  ...
  _169 = -_127;

and it was unclear to me what kind of arithmetic is allowed.

I have now verified that all cases seems to be just one operation of this 
form (where _127 has the value 0 or 1), so it cannot construct values 
such as 42. But the wide signed Boolean can have the three different 
values 1, 0, and -1, which I still think is at least one too many. :)


I'll update my tool to complain if the value is outside the range [-1, 1].

  /Krister


Re: types in GIMPLE IR

2023-06-29 Thread Krister Walfridsson via Gcc

On Thu, 29 Jun 2023, Richard Biener wrote:


IIRC we have some simplification rules that turn bit operations into
arithmetics.  Arithmetic is allowed if it keeps the values inside
[-1,0] for signed bools or [0, 1] for unsigned bools.


I have now verified that all cases seems to be just one operation of this
form (where _127 has the value 0 or 1), so it cannot construct values
such as 42. But the wide signed Boolean can have the three different
values 1, 0, and -1, which I still think is at least one too many. :)


Yeah, I'd be interested in a testcase that shows this behavior.


I created PR 110487 with one example.



I'll update my tool to complain if the value is outside the range [-1, 1].


It is likely that all issues I have seen so far are due to PR 110487, so 
I'll keep the current behavior that complains if the value is outside the 
range [-1, 0].


   /Krister


semantics of uninitialized values in GIMPLE

2023-07-06 Thread Krister Walfridsson via Gcc
I have implemented support for uninitialized memory in my translation 
validator. But I am not sure how well this corresponds to the GIMPLE 
semantics, so I have some questions...


My implementation tracks uninitialized bits. Use of uninitialized bits is 
in general treated as UB (for example, `x + y` is UB if `x` or `y` has any 
uninitialized bits), but there are a few exceptions:


 * load/store: It is always OK to load/store values having uninitialized
   bits.
 * Phi nodes: Phi nodes propagates uninitialized bits.
 * selection: Instructions that are selecting an element (COND_EXPR,
   VEC_PERM_EXPR, etc.) may select between values having uninitialized
   bits, and the resulting value may have uninitialized bits. But the
   condition/mask must not have uninitialized bits.
 * Extraction: Instructions that are extracting bits (BIT_FIELD_REF etc.)
   may have uninitialized bits in both the input/output.
 * Insertion: Instructions that are constructing/inserting values
   (COMPLEX_EXPR, etc.) may have uninitialized bits in both the
   input/output.

All other use of values having uninitialized bits are considered UB.

Does this behavior make sense?

The above seems to work fine so far, with one exception that can be seen 
in gcc.c-torture/execute/pr108498-1.c. The test has an uninitialized bit 
field


  unsigned char c6:1, c7:1, c8:1, c9:3, c10:1;

which is written to as

  x.c6 = 0;
  x.c7 = 0;
  x.c8 = 0;
  x.c9 = 7;

The store merging pass changes this to

  _71 = MEM  [(struct C *)&x + 8B];
  _42 = _71 & 128;
  _45 = _42 | 56;

and the translation validator is now complaining that the pass has 
introduced UB that was not in the original IR (because the most 
significant bit in _71 is uninitialized when passed to BIT_AND_EXPR).


I could solve this by allowing uninitialized bits in BIT_AND_EXPR and 
BIT_OR_EXP, and propagating each bit according to


  * `0 & uninit` is an initialized `0`
  * `1 & uninit` is uninitialized
  * `0 | uninit` is uninitialized
  * `1 | uninit` is an initialized `1`

Is that the correct GIMPLE semantics?

   /Krister


Re: semantics of uninitialized values in GIMPLE

2023-07-10 Thread Krister Walfridsson via Gcc

On Fri, 7 Jul 2023, Richard Biener wrote:


I have implemented support for uninitialized memory in my translation
validator. But I am not sure how well this corresponds to the GIMPLE
semantics, so I have some questions...

My implementation tracks uninitialized bits. Use of uninitialized bits is
in general treated as UB (for example, `x + y` is UB if `x` or `y` has any
uninitialized bits), but there are a few exceptions:

  * load/store: It is always OK to load/store values having uninitialized
bits.
  * Phi nodes: Phi nodes propagates uninitialized bits.


definitely, I think plain copies would be OK as well


With "plain copies", do you mean treating it as it is always defined? That
would prevent optimizations such as transforming

  int t;
  if (1)
t = *p;
  else
t = 0;
  return t + 1;

to the equivalent of

  return *p + 1;

because the latter is UB if *p is undefined, but the original is OK if phi
nodes always give us a fully defined value.



  * selection: Instructions that are selecting an element (COND_EXPR,
VEC_PERM_EXPR, etc.) may select between values having uninitialized
bits, and the resulting value may have uninitialized bits. But the
condition/mask must not have uninitialized bits.
  * Extraction: Instructions that are extracting bits (BIT_FIELD_REF etc.)
may have uninitialized bits in both the input/output.
  * Insertion: Instructions that are constructing/inserting values
(COMPLEX_EXPR, etc.) may have uninitialized bits in both the
input/output.


Generally I think "moving" uninitialized bits in full or in part is OK.
Operating on them, including comparing them (with themselves)
is eventually asking for troubles.


Makes sense.

But I think we must restrict the definition of "move". For example, one
can see x * 2 as moving the uninitialized bits one step. And x + x and
x << 1 are the same as x * 2, so then they too would be defined? I would
prefer if all of them were UB if x contains uninitialized bits...



All other use of values having uninitialized bits are considered UB.

Does this behavior make sense?

The above seems to work fine so far, with one exception that can be seen
in gcc.c-torture/execute/pr108498-1.c. The test has an uninitialized bit
field

   unsigned char c6:1, c7:1, c8:1, c9:3, c10:1;

which is written to as

   x.c6 = 0;
   x.c7 = 0;
   x.c8 = 0;
   x.c9 = 7;

The store merging pass changes this to

   _71 = MEM  [(struct C *)&x + 8B];
   _42 = _71 & 128;
   _45 = _42 | 56;

and the translation validator is now complaining that the pass has
introduced UB that was not in the original IR (because the most
significant bit in _71 is uninitialized when passed to BIT_AND_EXPR).

I could solve this by allowing uninitialized bits in BIT_AND_EXPR and
BIT_OR_EXP, and propagating each bit according to

   * `0 & uninit` is an initialized `0`
   * `1 & uninit` is uninitialized
   * `0 | uninit` is uninitialized
   * `1 | uninit` is an initialized `1`

Is that the correct GIMPLE semantics?


I think the above "moves" the uninitialized MSB from memory to _45 so
that's OK.

Some "operations" like & 0 or & 1 give either defined values or
take the uninitialized bit unmodified (thus "move").  I think both
kinds are OK.  Likewise + 0 or * 0/1 would be OK.  What's not
OK is operations that use an (fully?) uninitialized value twice,
like x - x when x is uninitialized.


+ 0 and * 0/1 makes sense to me. One annoyance is that it will make my
tool slower as it must track undefined bits in more cases. But that is
not a valid reason for restricting the IR semantics...



I think we want that, as soon as the uninitialized value becomes
"part" of a then partly initialized value, it's value is "fixed".
With things like _Complex or vector the situation is a bit
of a grey area since we have ways to operate on elements.

Note that when we for example use a BIT_FIELD_REF to extract
the MSB from _42 above the value would be again fully undefined.


Do you mean that all operations on the values having "fixed" bits are
valid? I do not think that will work. The frozen value may be any value,
which mean that the program is nondeterministic. For example, if x has one
"fixed" uninitialized bit with the other bits 0, then code such as
  if (x)
could "randomly" be either true or false, so the program will not really
have any defined semantic.

And if use of an extracted "fixed" uninitialized bit is UB, then the
following may be UB (if x or y has a "fixed" uninitialized least
significant bit)
  bool b = (x & 1) != (y & 1)
while the equivalent
  bool b = (x ^ y) & 1
is defined.

And things may be even more fun when arithmetic is "smearing" the fixed
uninitialized values over the variable -- it will then be impossible to
determine if the extracted bits are OK or not when extracted...




I hope this makes some sense, it's a tricky area.

Richard.



Re: semantics of uninitialized values in GIMPLE

2023-07-11 Thread Krister Walfridsson via Gcc

On Tue, 11 Jul 2023, Richard Biener wrote:


With "plain copies", do you mean treating it as it is always defined? That
would prevent optimizations such as transforming

   int t;
   if (1)
 t = *p;
   else
 t = 0;
   return t + 1;

to the equivalent of

   return *p + 1;

because the latter is UB if *p is undefined, but the original is OK if phi
nodes always give us a fully defined value.


I meant the copy will simply transfer the state (uninitialized or initialized)
from RHS to LHS but in itself does not invoke undefined behavior.  And
"plain" copy would be

  int t, u;
  t = u;

where u is uninitialized and t will be as well but this copy in itself
isn't problematic.


So I misunderstood what you meant. This is how I have implemented it! :)



Maybe we should treat these as undefined as well - the problem with PHIs is
that the implied copies are necessary because of SSA construction even when
they do not appear in the original program.  But since transforms like
jump threading
can cause those copies to become degenerate it's odd to only treat those as OK.
So you have to think about

_1 = PHI 

vs.

_1 = p_2(D);




  * selection: Instructions that are selecting an element (COND_EXPR,
VEC_PERM_EXPR, etc.) may select between values having uninitialized
bits, and the resulting value may have uninitialized bits. But the
condition/mask must not have uninitialized bits.
  * Extraction: Instructions that are extracting bits (BIT_FIELD_REF etc.)
may have uninitialized bits in both the input/output.
  * Insertion: Instructions that are constructing/inserting values
(COMPLEX_EXPR, etc.) may have uninitialized bits in both the
input/output.


Generally I think "moving" uninitialized bits in full or in part is OK.
Operating on them, including comparing them (with themselves)
is eventually asking for troubles.


Makes sense.

But I think we must restrict the definition of "move". For example, one
can see x * 2 as moving the uninitialized bits one step. And x + x and
x << 1 are the same as x * 2, so then they too would be defined? I would
prefer if all of them were UB if x contains uninitialized bits...


I guess you have to try ...




All other use of values having uninitialized bits are considered UB.

Does this behavior make sense?

The above seems to work fine so far, with one exception that can be seen
in gcc.c-torture/execute/pr108498-1.c. The test has an uninitialized bit
field

   unsigned char c6:1, c7:1, c8:1, c9:3, c10:1;

which is written to as

   x.c6 = 0;
   x.c7 = 0;
   x.c8 = 0;
   x.c9 = 7;

The store merging pass changes this to

   _71 = MEM  [(struct C *)&x + 8B];
   _42 = _71 & 128;
   _45 = _42 | 56;

and the translation validator is now complaining that the pass has
introduced UB that was not in the original IR (because the most
significant bit in _71 is uninitialized when passed to BIT_AND_EXPR).

I could solve this by allowing uninitialized bits in BIT_AND_EXPR and
BIT_OR_EXP, and propagating each bit according to

   * `0 & uninit` is an initialized `0`
   * `1 & uninit` is uninitialized
   * `0 | uninit` is uninitialized
   * `1 | uninit` is an initialized `1`

Is that the correct GIMPLE semantics?


I think the above "moves" the uninitialized MSB from memory to _45 so
that's OK.

Some "operations" like & 0 or & 1 give either defined values or
take the uninitialized bit unmodified (thus "move").  I think both
kinds are OK.  Likewise + 0 or * 0/1 would be OK.  What's not
OK is operations that use an (fully?) uninitialized value twice,
like x - x when x is uninitialized.


+ 0 and * 0/1 makes sense to me. One annoyance is that it will make my
tool slower as it must track undefined bits in more cases. But that is
not a valid reason for restricting the IR semantics...


* 0 produces a defined value.  The difference with x * 1 is that we can
elide the operation so the result should better behave the same with
x itself.  So I stand corrected and * 1 should not be "OK", that is
the result is still uninitialized.  With '&' a (partly) undefined value
might become (fully) defined.

A good history of us getting more "correct" here is visible in
tree-ssa-ccp.cc:likely_value which tells CCP when a lattice
value is UNDEFINED (CCP doesn't track undefinedness on
a bit level so it has to treat partly undefined values as defined).


Thanks. I'll take a look at this.



I think we want that, as soon as the uninitialized value becomes
"part" of a then partly initialized value, it's value is "fixed".
With things like _Complex or vector the situation is a bit
of a grey area since we have ways to operate on elements.

Note that when we for example use a BIT_FIELD_REF to extract
the MSB from _42 above the value would be again fully undefined.


Do you mean that all operations on the values having "fixed" bits are
valid? I do not think that will work. The frozen value may be any value,
which mean that the program is nondeterministic. For example, if x has one
"fixed" uninitialize

Re: semantics of uninitialized values in GIMPLE

2023-07-11 Thread Krister Walfridsson via Gcc

On Tue, 11 Jul 2023, Richard Biener wrote:


I'll update my implementation, and will come back with a more detailed
proposal in a few weeks when I have tried some more things.


Thanks!  I've also taken the opportunity given by your work at the recent
bugs to propose a talk at this years GNU Cauldron about undefined
behavior on GCC and hope to at least start on documenting the state of
art^WGCC in the internals manual for this.  If you have any pointers to
your work / research I'd be happy to point to it, learn from it (and maybe
steal a word or two ;)).


Nice!

No, I have not published anything since the original release of 'pysmtgcc'
last year -- I was planning to document it in detail, but found out that
nothing worked, and I did not really understand what I was doing... And
the Python prototype started to be annoying, so I threw all away and wrote
a new tool in C++.

The new implementation now handles most of GIMPLE (but it still does not
handle loops or function calls). I also have support for checking that the
generated assembly has the same semantics as the optimized GIMPLE (only
partial support for RISC-V for now). I plan to publish a write-up of the
memory model soon in a series of blog posts -- I'll send you the link when
it is available.

And FWIW, I'm also planning to submit a talk to GNU Cauldron. :)
My proposal is about "Analyzing IR and Assembly Using SMT Solvers". The
translation validation stuff is actually just a test case for my analysis
engine -- I have more ideas for how it can be used...

   /Krister


Re: semantics of uninitialized values in GIMPLE

2023-07-20 Thread Krister Walfridsson via Gcc

On Tue, 11 Jul 2023, Krister Walfridsson wrote:

On Tue, 11 Jul 2023, Richard Biener wrote:

I'll update my implementation, and will come back with a more detailed
proposal in a few weeks when I have tried some more things.


Thanks!  I've also taken the opportunity given by your work at the recent
bugs to propose a talk at this years GNU Cauldron about undefined
behavior on GCC and hope to at least start on documenting the state of
art^WGCC in the internals manual for this.  If you have any pointers to
your work / research I'd be happy to point to it, learn from it (and maybe
steal a word or two ;)).


Nice!

No, I have not published anything since the original release of 'pysmtgcc'
last year -- I was planning to document it in detail, but found out that
nothing worked, and I did not really understand what I was doing... And
the Python prototype started to be annoying, so I threw all away and wrote
a new tool in C++.

The new implementation now handles most of GIMPLE (but it still does not
handle loops or function calls). I also have support for checking that the
generated assembly has the same semantics as the optimized GIMPLE (only
partial support for RISC-V for now). I plan to publish a write-up of the
memory model soon in a series of blog posts -- I'll send you the link when
it is available.


I have now published a few blog posts about my work. You can find them at:
  https://github.com/kristerw/pysmtgcc
I'll publish the remaining blog posts next week.

   /Krister


CLZ when CLZ_DEFINED_VALUE_AT_ZERO is false

2023-08-31 Thread Krister Walfridsson via Gcc
My translation validation tool reports some miscompilations related to the 
internal call CLZ(0) when CLZ_DEFINED_VALUE_AT_ZERO is false, but I am not 
sure I use the correct semantics...


I started by modeling CLZ(0) as undefined behavior, but that made the tool 
report a miscompilation for gcc.c-torture/execute/920501-6.c where sccp 
changes a loop to


  _36 = t_10(D) != 0;
  _35 = .CLZ (t_10(D));
  _34 = 63 - _35;
  _33 = (unsigned int) _34;
  _32 = (long long unsigned int) _33;
  _31 = _32 + 1;
  b_38 = _36 ? _31 : 1;

This may call CLZ with 0, but the value is not used, so it seems 
reasonable to allow this. I therefore changed my tool to treat CLZ(0) as 
returning an undefined value instead (by generating it as a symbolic 
value, which then makes the tool test that the IR is valid for all 
values).


This still makes this optimization fail because the calculation of _34 may 
overflow... :(  This could be a bug in the sccp pass (which could have 
done the calculation as unsigned), but fixing this would still make the 
tool report a miscompilation as the ranges calculated during the dom3 pass 
claims that _35 has a range:

  _35  : [irange] int [0, 63]

So what is the correct semantics for CLZ when CLZ_DEFINED_VALUE_AT_ZERO is 
false?


   /Krister


Re: CLZ when CLZ_DEFINED_VALUE_AT_ZERO is false

2023-09-03 Thread Krister Walfridsson via Gcc

On Fri, 1 Sep 2023, Richard Biener wrote:


The value of .CLZ (0) is undefined then.  I belive your analysis is correct in
that both 63 - _35 might overflow and that dom3 (thus ranger) mis-computes
the range for _35.  I wonder why we don't elide _36 ? _31 : 1 with that info
(possibly no range-op for .CLZ?), of course it would be wrong to do so.

Can you open a bugreport please?


PR 111280.

   /Krister


smtgcc translation validation

2023-09-30 Thread Krister Walfridsson via Gcc
I have now released the new implementation for translation validation I 
talked about at the GNU Tools Cauldron last week:

  https://github.com/kristerw/smtgcc

   /Krister


Re: smtgcc translation validation

2024-08-19 Thread Krister Walfridsson via Gcc

On Sun, 18 Aug 2024, Gerald Pfeifer wrote:

On Sat, 30 Sep 2023, Krister Walfridsson via Gcc wrote:

I have now released the new implementation for translation validation I talked
about at the GNU Tools Cauldron last week:
  https://github.com/kristerw/smtgcc


Wouldn't it be appropriate/nice to promote this a bit more strongly from
gcc.gnu.org?

One example I can think of is https://gcc.gnu.org/extensions.html .

Another might be https://gcc.gnu.org/bugs/ to help understand whether
something is a user bug or GCC bug?

Happy to accept patches, or if you want to suggest some text massage that
into HTML and add it.


I suggest waiting a few weeks -- the tool still has a few issues with 
false positives, and I need to document the limitations. But my plan is to 
have it fixed before the GNU Tools Cauldron; the main message in my 
proposed talk is "all of you should start using smtgcc!" :)


I'll send you a patch when this is done!

  /Krister


Re: smtgcc end-of-year update

2025-01-03 Thread Krister Walfridsson via Gcc

On Thu, 2 Jan 2025, Andi Kleen wrote:

Krister Walfridsson via Gcc  writes:


But running smtgcc on the test suite is not the best use case for the
tool -- it only detects bugs where the test triggers an unrelated bug
compared to what the test is checking, which should be uncommon. I
therefore plan to start testing by compiling real-world code and doing
some fuzzing. One problem is that this often finds the open bugs
mentioned in the previous section, making it time-consuming to check
if each new issue is a repeat. For example, running yarpgen gives me a
few failure reports per day of PR106883.


I would increase the priority of PR106883 if it blocks so much testing.
But does disabling the slsr pass not help?


Yes, I can disable the slsr pass, or just ignore all the problems detected 
in it. But most of the bugs mentioned occur frequently enough that I have 
the same problem with them, and there wouldn't be much left of the 
compiler to check if I disable/ignore all of the affected passes...


I don't want to overstate the problem -- I'll manage. But it would be 
great if a few of these PRs were fixed! :)


   /Krister


smtgcc end-of-year update

2024-12-31 Thread Krister Walfridsson via Gcc
I have continued working on my translation validator, smtgcc [1], over the 
last few months. Here's an update on what has happened since my talk at 
the GNU Tools Cauldron [2].



Improvements

The main focus has been improving the smtgcc-tv-backend tool which checks 
that the generated assembly is a refinement of the final GIMPLE IR.


Highlights:
* Fixed the RISC-V 64-bit pointer problem, so both RV32 and RV64 now work.
* Fixed lots of corner cases where we failed to parse the assembly.
* Added support for the RISC-V B extensions.
* Added partial support for the RISC-V V extensions.
* Added support for the AArch64 architecture.

AArch64 support is still a bit limited: the ABI is not fully implemented, 
and checking is slow and often times out because the current 
implementation generates overly complicated SMT formulas for some common 
instructions.



Reported GCC bugs
-
Most GCC bugs I have reported have been fixed, but there are currently 
three open bugs where GCC generates incorrect code: PR113703, PR117186, 
PR118174. I also claim that PR117688 still happens even though it is 
marked as resolved -- I will investigate it a bit more before reopening 
the PR.


I have also reported several bugs where optimizations perform invalid 
transformations that introduce new undefined behavior, although these do 
not, in practice, miscompile code with the current pass ordering. The 
following are still open: PR106883, PR106884, PR109626, PR110760, 
PR111257, PR111280, PR111494, PR113590, PR114056, PR117927.



GCC testing
---
I have been running smtgcc over the GCC test suite weekly, and it seems to 
detect a newly introduced bug roughly every month. I plan to continue 
doing this in the coming months as well.


But running smtgcc on the test suite is not the best use case for the tool 
-- it only detects bugs where the test triggers an unrelated bug compared 
to what the test is checking, which should be uncommon. I therefore plan 
to start testing by compiling real-world code and doing some fuzzing. One 
problem is that this often finds the open bugs mentioned in the previous 
section, making it time-consuming to check if each new issue is a repeat. 
For example, running yarpgen gives me a few failure reports per day of 
PR106883.


   /Krister


[1] https://github.com/kristerw/smtgcc/
[2] https://www.youtube.com/watch?v=vrkR7jKcFpw


Re: Pointer semantics in GIMPLE

2025-03-21 Thread Krister Walfridsson via Gcc

On Fri, 21 Mar 2025, Richard Biener wrote:


Issues
--
The semantics I've described above result in many reports of
miscompilations that I haven't reported yet.

As mentioned earlier, the vectorizer can use POINTER_PLUS_EXPR to generate
pointers that extend up to a vector length past the object (and similarly,
up to one vector length before the object in a backwards loop). This is
invalid if the object is too close to address 0 or 0x.

And this out of bounds distance can be made arbitrarily large, as can be
seen by compiling the following function below for x86_64 with -O3:

void foo(int *p, uint64_t s, int64_t n)
{
   for (int64_t i = 0; i < n; i++)
 {
   int *q = p + i * s;
   for (int j = 0; j < 4; j++)
*q++ = j;
 }
}

Here, the vectorized code add s * 4 to the pointer ivtmp_8 at the end of
each iteration:

:
   bnd.8_38 = (unsigned long) n_10(D);
   _21 = s_11(D) * 4;

:
   # ivtmp_8 = PHI 
   # ivtmp_26 = PHI 
   MEM  [(int *)ivtmp_8] = { 0, 1, 2, 3 };
   ivtmp_7 = ivtmp_8 + _21;
   ivtmp_20 = ivtmp_26 + 1;
   if (ivtmp_20 < bnd.8_38)
 goto ;
   else
 goto ;

:
   goto ;

This means calling foo with a sufficiently large s can guarantee wrapping
or evaluating to 0, even though the original IR before optimization did
not wrap or evaluating to 0. For example, when calling foo as:

   foo(p, -((uintptr_t)p / 4), 1);

I guess this is essentially the same issue as PR113590, and could be fixed
by moving all induction variable updates to the latch.


Yes.

But if that

happens, the motivating examples for needing to handle invalid pointer
values would no longer occur. Would that mean smtgcc should adopt a more
restrictive semantics for pointer values?


In principle yes, but PR113590 blocks this I guess.  Could smtgcc consider
a SSA def to be happening only at the latest possible point, that is,
"virtually"
sink it to the latch?  (I realize when you have two dependent defs that can
be moved together only such "virtual" sinking could be somewhat complicated)


I think it feels a bit strange (and brittle) to define the semantics by
virtually sinking everything as much as possible. But isn't this sinking
just another way of saying that the operation is UB only if the value is
used? That is, we can solve this essentially the same way LLVM does with
its deferred UB.


Yeah, that's another way of thinking.  Though technically GCC doesn't implement
defered UB - for the case above we're simply lucky nothing exploits the UB
that is in principle there.


The idea is that POINTER_DIFF_EXPR, PLUS_EXPR, etc. are defined for all
inputs, but the result is a poison value when it doesn't fit in the return
type. Any use of a poison value is UB.

This is trivial to implement in smtgcc and would solve PR113590.


I _think_ it's a reasonable thing for smtgcc to do.  For GCC itself it would
complicate what is UB and what not quite a bit - what's the ultimate "use"
UB triggers?  I suppose POISON + 1 is simply POISON again.  Can
you store POISON to memory or is that then UB at the point of the store?
Can you pass it to a call?  What if the call is inlined and just does
return arg + 1?  IIRC I read some paper about this defered UB in LLVM
and wasn't convinced they solve any real problem.


I agree deferred UB causes more problems than it solves... And I think 
it's important that smtgcc uses the same semantics as GCC, so I'll keep 
the current behavior.


   /Krister


Pointer semantics in GIMPLE

2025-04-05 Thread Krister Walfridsson via Gcc
I'm working on ensuring that the GIMPLE semantics used by smtgcc are 
correct, and I have a lot of questions about the details. I'll be sending 
a series of emails with these questions. This first one is about pointers 
in general.


Each section starts with a description of the semantics I've implemented 
(or plan to implement), followed by concrete questions if relevant. Let me 
know if the described semantics are incorrect or incomplete.



Pointers values
---
C imposes restrictions on pointer values (alignment, must point to valid 
memory, etc.), but GIMPLE seems more permissive. For example, when 
vectorizing a loop:


  void foo(int *p, int n)
  {
for (int i = 0; i < n; i++)
  p[i] = 1;
  }

the vectorized loop updates the pointer vectp_p.8 for the next iteration 
at the end of the loop, causing it to point up to a vector-length out of 
range after the last iteration:


   :
  # vectp_p.8_34 = PHI 
  # ivtmp_37 = PHI 
  MEM  [(int *)vectp_p.8_34] = { 1, 1, 1, 1 };
  vectp_p.8_35 = vectp_p.8_34 + 16;
  ivtmp_38 = ivtmp_37 + 1;
  if (ivtmp_38 < bnd.5_31)
goto ;
  else
goto ;

   :
  goto ;

Because of this, I allow all values -- essentially treating pointers as 
unsigned integers. Things like ensuring pointers point to valid memory or 
have correct alignment are only checked when dereferencing the pointer 
(I'll discuss pointer dereferencing semantics in the next email).



Pointer arithmetic -- POINTER_PLUS_EXPR
---
POINTER_PLUS_EXPR calculates p + x:
 * It is UB if the calculation overflows when
   TYPE_OVERFLOW_WRAPS(ptr_type) is false.
 * It is UB if p + x == 0, unless p == 0 and x == 0, unless
   flag_delete_null_pointer_checks is false.

Since the pointer type is unsigned, it's not entirely clear what overflow 
means. For example, p - 1 is represented as p + 0x, which 
overflows when treated as an unsigned addition. I check for overflow in 
p + x as follows:

  is_overflow = ((intptr_t)x < 0) ? (p + x) > p : (p + x) < p;


Pointer arithmetic -- MEM_REF and TARGET_MEM_REF

A calculation p + x can also be done using MEM_REF as &MEM[p + x].

Question: smtgcc does not check for overflow in address calculations for 
MEM_REF and TARGET_MEM_REF. Should it treat overflow as UB in the same way 
as POINTER_PLUS_EXPR?



Pointer arithmetic -- COMPONENT_REF
---
COMPONENT_REF adds the offset of a structure element to the pointer to the 
structure.


Question: smtgcc does not check for overflow in address calculation for 
COMPONENT_REF. Should it treat overflow as UB in the same way as 
POINTER_PLUS_EXPR?



Pointer arithmetic -- ARRAY_REF
---
ARRAY_REF adds the offset of the indexed element to the pointer to the 
array.


 * It is UB if the index is negative.
 * If the TYPE_DOMAIN for the array type has a TYPE_MAX_VALUE and the
   array is not a flexible array, it is UB if the index exceeds "one
   after" this maximum value.
 * If it is a flexible array or does not have a maximum value, it is
   considered an unsized array, so all non-negative indices are valid.
   But it is UB if the calculation of
  offset = index * element_size
   overflows.

Question: smtgcc does not check for overflow in the calculation of p + 
offset. Should it treat overflow as UB in the same way as 
POINTER_PLUS_EXPR?


Question: What is the correct way to check for flexible arrays? I am 
currently using array_ref_flexible_size_p and flag_strict_flex_arrays, but 
I noticed that middle-end passes do not seem to use 
flag_strict_flex_arrays. Is there a more canonical way to determine how to 
interpret flexible arrays?



Pointer arithmetic -- POINTER_DIFF_EXPR
---
Subtracting a pointer q from a pointer p is done using POINTER_DIFF_EXPR.
 * It is UB if the difference does not fit in a signed integer with the
   same precision as the pointers.

This implies that an object's size must be less than half the address 
space; otherwise, POINTER_DIFF_EXPR cannot be used to compute sizes in C. 
But there may be additional restrictions. For example, compiling the 
function:


  void foo(int *p, int *q, int n)
  {
for (int i = 0; i < n; i++)
  p[i] = q[i] + 1;
  }

causes the vectorized code to perform overlap checks like:

  _7 = q_11(D) + 4;
  _25 = p_12(D) - _7;
  _26 = (sizetype) _25;
  _27 = _26 > 8;
  _28 = _27;
  if (_28 != 0)
goto ;
  else
goto ;

which takes the difference between two pointers that may point to 
different objects. This suggests that all objects must fit within half the 
address space.


Question: What are the restrictions on valid address ranges and object 
sizes?



Issues
--
The semantics I've described above result in many reports of 
miscompilations that I haven't reported yet.


As mentioned earlier, the vectorizer can use POINTER_PLUS_EXPR to generate 
pointers that extend

Memory access in GIMPLE

2025-04-02 Thread Krister Walfridsson via Gcc

I have more questions about GIMPLE memory semantics for smtgcc.

As before, each section starts with a description of the semantics I've 
implemented (or plan to implement), followed by concrete questions if 
relevant. Let me know if the described semantics are incorrect or 
incomplete.



Accessing memory

Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where the 
lhs and/or rhs is a memory reference expression (such as MEM_REF). When 
both lhs and rhs access memory, one of the following must hold -- 
otherwise the access is UB:

 1. There is no overlap between lhs and rhs
 2. lhs and rhs represent the same address

A memory access is also UB in the following cases:
 * Any accessed byte is outside valid memory
 * The pointer violates the alignment requirements
 * The pointer provenance doesn't match the object
 * The type is incorrect from a TBAA perspective
 * It's a store to constant memory

smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in this 
mail. Provenance has its own issues, which I'll come back to in a separate 
mail.



Checking memory access is within bounds
---
A memory access may be represented by a chain of memory reference 
expressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example, 
accessing a structure:


  struct s {
int x, y;
  };

as:

  int foo (struct s * p)
  {
int _3;

 :
_3 = p_1(D)->x;
return _3;
  }

involves a MEM_REF for the whole object and a COMPONENT_REF to select the 
field. Conceptually, we load the entire structure and then pick out the 
element -- so all bytes of the structure must be in valid memory.


We could also do the access as:

  int foo (struct s * p)
  {
int * q;
int _3;

 :
q_2 = &p_1(D)->x;
_3 = *q_2;
return _3;
  }

This calculates the address of the element, and then reads it as an 
integer, so only the four bytes of x must be in valid memory.


In other words, the compiler is not allowed to optimize:
  q_2 = &p_1(D)->x;
  _3 = *q_2;
to
  _3 = p_1(D)->x;

Question: Describing the first case as conceptually reading the whole 
structure makes sense for loads. But I assume the same requirement -- that 
the entire object must be in valid memory -- also applies for stores. Is 
that correct?



Allowed out-of-bounds read?
---
Compile the function below for x86_64 with "-O3 -march=x86-64-v2":

  int f(int *a)
  {
for (int i = 0; i < 100; i++)
  if (a[i])
return 1;
return 0;
  }

The vectorizer transforms this into code that processes one scalar element 
at a time until the pointer is 16-byte aligned, then switches to vector 
loads.


The original code is well-defined when called like this:

  int a[2] __attribute__((aligned(16))) = {1, 0};
  f(a);

But the vectorized version ends up reading 8 bytes out of bounds.

This out-of-bounds read is harmless in practice -- it stays within the 
same memory page, so the extra bytes are accessable. But it's invalid 
under the smtgcc memory model.


Question: Is this considered a valid access in GIMPLE? If so, what are the 
rules for allowed out-of-bounds memory reads?



Alignment check
---
Question: smtgcc currently gets the alignment requirements by calling 
get_object_alignment on the tree expression returned from 
gimple_assign_lhs (for stores) or gimple_assign_rhs1 (for loads). Is that 
the correct way to get the required alignment?



   /Krister


Re: Pointer semantics in GIMPLE

2025-03-22 Thread Krister Walfridsson via Gcc

On Thu, 20 Mar 2025, Richard Biener wrote:


Pointer arithmetic -- POINTER_DIFF_EXPR
---
Subtracting a pointer q from a pointer p is done using POINTER_DIFF_EXPR.
  * It is UB if the difference does not fit in a signed integer with the
same precision as the pointers.


Yep.


This implies that an object's size must be less than half the address
space; otherwise, POINTER_DIFF_EXPR cannot be used to compute sizes in C.
But there may be additional restrictions. For example, compiling the
function:

   void foo(int *p, int *q, int n)
   {
 for (int i = 0; i < n; i++)
   p[i] = q[i] + 1;
   }

causes the vectorized code to perform overlap checks like:

   _7 = q_11(D) + 4;
   _25 = p_12(D) - _7;
   _26 = (sizetype) _25;
   _27 = _26 > 8;
   _28 = _27;
   if (_28 != 0)
 goto ;
   else
 goto ;

which takes the difference between two pointers that may point to
different objects. This suggests that all objects must fit within half the
address space.


Interesting detail.  But yes, the above is either incorrect code (I didn't
double-check) or this condition must hold.  For 64bit virtual address-space
it likely holds in practice.  For 32bit virtual address-space it might not.


Question: What are the restrictions on valid address ranges and object
sizes?


The restriction on object size is documented, as well as that objects
may not cross/include 'NULL'.  Can you open a bugreport for the
vectorizer overlap test issue above?  I think we want to track this and
at least document the restriction (we could possibly add a target hook
that can tell whether such a simplified check is OK).


PR119399



Issues
--
The semantics I've described above result in many reports of
miscompilations that I haven't reported yet.

As mentioned earlier, the vectorizer can use POINTER_PLUS_EXPR to generate
pointers that extend up to a vector length past the object (and similarly,
up to one vector length before the object in a backwards loop). This is
invalid if the object is too close to address 0 or 0x.

And this out of bounds distance can be made arbitrarily large, as can be
seen by compiling the following function below for x86_64 with -O3:

void foo(int *p, uint64_t s, int64_t n)
{
   for (int64_t i = 0; i < n; i++)
 {
   int *q = p + i * s;
   for (int j = 0; j < 4; j++)
*q++ = j;
 }
}

Here, the vectorized code add s * 4 to the pointer ivtmp_8 at the end of
each iteration:

:
   bnd.8_38 = (unsigned long) n_10(D);
   _21 = s_11(D) * 4;

:
   # ivtmp_8 = PHI 
   # ivtmp_26 = PHI 
   MEM  [(int *)ivtmp_8] = { 0, 1, 2, 3 };
   ivtmp_7 = ivtmp_8 + _21;
   ivtmp_20 = ivtmp_26 + 1;
   if (ivtmp_20 < bnd.8_38)
 goto ;
   else
 goto ;

:
   goto ;

This means calling foo with a sufficiently large s can guarantee wrapping
or evaluating to 0, even though the original IR before optimization did
not wrap or evaluating to 0. For example, when calling foo as:

   foo(p, -((uintptr_t)p / 4), 1);

I guess this is essentially the same issue as PR113590, and could be fixed
by moving all induction variable updates to the latch.


Yes.

But if that

happens, the motivating examples for needing to handle invalid pointer
values would no longer occur. Would that mean smtgcc should adopt a more
restrictive semantics for pointer values?


In principle yes, but PR113590 blocks this I guess.  Could smtgcc consider
a SSA def to be happening only at the latest possible point, that is,
"virtually"
sink it to the latch?  (I realize when you have two dependent defs that can
be moved together only such "virtual" sinking could be somewhat complicated)


I think it feels a bit strange (and brittle) to define the semantics by 
virtually sinking everything as much as possible. But isn't this sinking 
just another way of saying that the operation is UB only if the value is 
used? That is, we can solve this essentially the same way LLVM does with 
its deferred UB.


The idea is that POINTER_DIFF_EXPR, PLUS_EXPR, etc. are defined for all 
inputs, but the result is a poison value when it doesn't fit in the return 
type. Any use of a poison value is UB.


This is trivial to implement in smtgcc and would solve PR113590.

   /Krister


Re: Memory access in GIMPLE

2025-04-03 Thread Krister Walfridsson via Gcc

On Thu, 3 Apr 2025, Richard Biener wrote:


On Thu, Apr 3, 2025 at 2:23 AM Krister Walfridsson via Gcc
 wrote:


I have more questions about GIMPLE memory semantics for smtgcc.

As before, each section starts with a description of the semantics I've
implemented (or plan to implement), followed by concrete questions if
relevant. Let me know if the described semantics are incorrect or
incomplete.


Accessing memory

Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where the
lhs and/or rhs is a memory reference expression (such as MEM_REF). When
both lhs and rhs access memory, one of the following must hold --
otherwise the access is UB:
  1. There is no overlap between lhs and rhs
  2. lhs and rhs represent the same address

A memory access is also UB in the following cases:
  * Any accessed byte is outside valid memory
  * The pointer violates the alignment requirements
  * The pointer provenance doesn't match the object
  * The type is incorrect from a TBAA perspective
  * It's a store to constant memory


correct.

Note that GIMPLE_CALL and GIMPLE_ASM can also access memory.


smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in this
mail. Provenance has its own issues, which I'll come back to in a separate
mail.


Checking memory access is within bounds
---
A memory access may be represented by a chain of memory reference
expressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example,
accessing a structure:

   struct s {
 int x, y;
   };

as:

   int foo (struct s * p)
   {
 int _3;

  :
 _3 = p_1(D)->x;
 return _3;
   }

involves a MEM_REF for the whole object and a COMPONENT_REF to select the
field. Conceptually, we load the entire structure and then pick out the
element -- so all bytes of the structure must be in valid memory.

We could also do the access as:

   int foo (struct s * p)
   {
 int * q;
 int _3;

  :
 q_2 = &p_1(D)->x;
 _3 = *q_2;
 return _3;
   }

This calculates the address of the element, and then reads it as an
integer, so only the four bytes of x must be in valid memory.

In other words, the compiler is not allowed to optimize:
   q_2 = &p_1(D)->x;
   _3 = *q_2;
to
   _3 = p_1(D)->x;


Correct.  The reason that p_1(D)->x is considered accessing the whole
object is because of TBAA, so with -fno-strict-aliasing there is no UB
when the whole object isn't accessible (but the subsetted piece is).


I'm still a bit confused... Assume we're using -fno-strict-aliasing and p 
points to a 4-byte buffer in the example above. Then:

  _3 = p_1(D)->x;
is valid. So that means my paragraph starting with "In other words..." 
must be incorrect? That is, the compiler is allowed to optimize:

  q_2 = &p_1(D)->x;
  _3 = *q_2;
to
  _3 = p_1(D)->x;
since it's equally valid. But this optimization would be invalid for 
-fstrict-aliasing?


And a memory access like:
  _4 = p_1(D)->y;
would be invalid (for both -fno-strict-aliasing and -fstrict-aliasing)? 
That is, the subsetted piece must be accessible?


   /Krister