ojhunt wrote:

This comment became quite long, so I just want to add a preamble. First off, 
I'm sorry that I missed the RFC for this or I would have raised these 
objections then - If I did see it I would have blindly assumed that it was 
solely removing the UB nature of overflow, with no other semantic changes, and 
so probably not paid it any heed. This is a feature I have wanted for a long 
time, but the narrowing behavior is just not ok - I would have to essentially 
make this a banned feature in any code base I have influence over, for the 
reasons I'll be discussing below.

I’m also not sure I understand the semantics of narrowing when `__nowrap` is 
used, I’ve tried to  - given

```cpp
uint64_t a = ...
uint32_t __no_wrap b = ...
auto result = a + b
```

The documentation makes it clear that `smaller_type __no_wrap a = larger_type` 
is an error, but what happens in the case of `(smaller_type __no_wrap) + 
larger_type` - is that an error?

I could dump these issues into the RFC thread but I don’t think that’s 
necessarily appropriate - @aaronballman or 


Anyway, on to the wall of text comment (as people who are on WG21 can attest I 
have an unerring ability to do this, despite my best efforts):

The description of this PR states that this feature is fine grained control of 
fwrapv, etc that make wrapping have defined semantics rather than undefined. 
But the change in semantics here is not that, it is a significantly different 
behaviour, and has the effect of making previously correct code to incorrect 
code. I.e code that is correct, and has no UB (e.g. no overflow), is changed to 
be incorrect - introducing both overflow and truncation into code that 
previously had none.

A common refrain in the replies to my comments has been this is a new feature 
that is opt in, but the problem is there are many cases developers would want 
to opt in to these in ways that would be relatively global. It's super easy to 
imagine a project doing:

```cpp
typedef uint8_t __nowrap uint8_safe_t;
typedef uint16_t __nowrap uint16_safe_t;
typedef uint32_t __nowrap uint32_safe_t;
typedef uint63_t __nowrap uint63_safe_t;
typedef uint8_t __wrap uint8_wrap_t;
typedef uint16_t __wrap uint16_wrap_t;
typedef uint32_t __wrap uint32_wrap_t;
typedef uint63_t __wrap uint63_wrap_t;
```

Then adding a linter rule that requires all new code to use one these instead 
of the existing types. In codebases I have working in, this is _exactly_ what 
they would want to do. But the narrowing change makes such a change impossible 
- interaction between new and existing code, or system APIs, means the 
narrowing can happen anywhere, and similarly literals like `1ull + somevalue` 
can now break as well.

These changes in semantics can introduce security vulnerabilities where 
previously none existed. Take the format string example in the RFC, which was 
called out as a result of this change in behavior, which is the reason I 
assumed it had be removed:

```cpp
unsigned long long a;
size_t __wrap b;

printf("%llx", a + b);
```

on systems where `sizeof(int) != sizeof(unsigned long long)` this previously 
_correct_ code is now a security vulnerability. In principle this could happen 
on a 64bit system if varargs aren’t required to be at least 8 byte aligned, 
though I’m not sure if such systems exist.

You also get silent truncation in previously correct code:

```cpp
struct S32 {
  uint32_t __wrap a; // I want defined semantics for arithmetic on this field
                     // e.g. no UB -> compiler created security vulnerability
};
struct S64 {
  uint64_t __wrap a;
};

uint64_t doSomething();
uint64_t foo(S32 s) {
   return doSomething() + s.a; // silently truncate to uint32_t, and silently 
expand out to uint64_t;
}
```

You can imagine code this also impacting code like:

```cpp
auto /*uint32_t*/ base = getBase();
auto /*uint<less than 32>_t __wrap*/ offset = ...; // so many formats and use 
cases have bytes here
if (offset >= size)
  return;
buffer[base + offset] = ...; // you're now writing out of expected bounds
```

I also suspect these semantics mean that (assuming the above `uint64_t 
doSomething();` exists)

```cpp
uint64_t foo(uint32_t __wrap a, uint64_t wrap b) {
   return doSomething() + a + b;
}
```

could now produce a different result (it's already different from what existing 
correct code does) from


```cpp
uint64_t foo(uint32_t __wrap a, uint64_t wrap b) {
   return doSomething() + b + a;
}
```

Due to the narrowing of `doSomething() + a` where previously the developer had 
(presumably) ensured a correct result. Note that this is distinct from `int32 + 
int32 + int64` vs `int64 + int32 + int32` which is a real and existing problem, 
because we’re starting with correct code.

Similar to this, previously correct C++ becomes wrong, and changes 
substantially:

```cpp

// just some nonsense to demonstrate the viral nature of this change in 
narrowing
template <class T> struct S {
   S(T a) : a(a) {}
   T a;
}

template <class A, class B> auto doSomething(S<A> s1, S<B> s2) -> 
S<decltype(s1.a + s2.b)> {
   return S(s1.a + s2.a);
}
 
```

`doSomething` now loses data if one of A or B is a smaller `__wrap` type (e.g. 
`doSomething(S<uint64_t>(...), S<uint32_t __wrap>(...))`), and while 
historically this might have been caught by something like

```
S<uint64_t> s = doSomething(...)
```

Modern C++ has `auto` and `S s = doSomething(...)` which will silently accept 
the change from `S<uint64_t>` to `S<uint32_t __wrap>`, and the viral nature 
will continue to silently propagate the `uint32_t __wrap` through everything.

I think `_Auto` or whatever arbitrarily different spelling C choose for `auto` 
has similar hazards.

The reality is that given the change to narrowing behavior this attribute does 
not actually solve the real world problems I’ve seen, or that I have had 
developers express a desire for support with, and it makes existing correct 
code incorrect if adopted, to the extent that it can create _new_ 
vulnerabilities rather than removing them. Given the above issues, not only 
would this not be a solution to what developers have asked for, it would likely 
be necessary to _forbid_ the adoption of the `__wrap` portion of the feature 
entirely, and `__no_wrap` would at least be challenging due to the potential 
introduction of crashes in code that is error (which includes UB) free.

That said, these are just the issues that I see from my point of view - the RFC 
says that this is intended to support linux kernel use cases, so I’d like to 
know whether these issues and hazards seem like a problem to them.


https://github.com/llvm/llvm-project/pull/148914
_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to