ojhunt wrote:
This comment became quite long, so I just want to add a preamble. First off,
I'm sorry that I missed the RFC for this or I would have raised these
objections then - If I did see it I would have blindly assumed that it was
solely removing the UB nature of overflow, with no other semantic changes, and
so probably not paid it any heed. This is a feature I have wanted for a long
time, but the narrowing behavior is just not ok - I would have to essentially
make this a banned feature in any code base I have influence over, for the
reasons I'll be discussing below.
I’m also not sure I understand the semantics of narrowing when `__nowrap` is
used, I’ve tried to - given
```cpp
uint64_t a = ...
uint32_t __no_wrap b = ...
auto result = a + b
```
The documentation makes it clear that `smaller_type __no_wrap a = larger_type`
is an error, but what happens in the case of `(smaller_type __no_wrap) +
larger_type` - is that an error?
I could dump these issues into the RFC thread but I don’t think that’s
necessarily appropriate - @aaronballman or
Anyway, on to the wall of text comment (as people who are on WG21 can attest I
have an unerring ability to do this, despite my best efforts):
The description of this PR states that this feature is fine grained control of
fwrapv, etc that make wrapping have defined semantics rather than undefined.
But the change in semantics here is not that, it is a significantly different
behaviour, and has the effect of making previously correct code to incorrect
code. I.e code that is correct, and has no UB (e.g. no overflow), is changed to
be incorrect - introducing both overflow and truncation into code that
previously had none.
A common refrain in the replies to my comments has been this is a new feature
that is opt in, but the problem is there are many cases developers would want
to opt in to these in ways that would be relatively global. It's super easy to
imagine a project doing:
```cpp
typedef uint8_t __nowrap uint8_safe_t;
typedef uint16_t __nowrap uint16_safe_t;
typedef uint32_t __nowrap uint32_safe_t;
typedef uint63_t __nowrap uint63_safe_t;
typedef uint8_t __wrap uint8_wrap_t;
typedef uint16_t __wrap uint16_wrap_t;
typedef uint32_t __wrap uint32_wrap_t;
typedef uint63_t __wrap uint63_wrap_t;
```
Then adding a linter rule that requires all new code to use one these instead
of the existing types. In codebases I have working in, this is _exactly_ what
they would want to do. But the narrowing change makes such a change impossible
- interaction between new and existing code, or system APIs, means the
narrowing can happen anywhere, and similarly literals like `1ull + somevalue`
can now break as well.
These changes in semantics can introduce security vulnerabilities where
previously none existed. Take the format string example in the RFC, which was
called out as a result of this change in behavior, which is the reason I
assumed it had be removed:
```cpp
unsigned long long a;
size_t __wrap b;
printf("%llx", a + b);
```
on systems where `sizeof(int) != sizeof(unsigned long long)` this previously
_correct_ code is now a security vulnerability. In principle this could happen
on a 64bit system if varargs aren’t required to be at least 8 byte aligned,
though I’m not sure if such systems exist.
You also get silent truncation in previously correct code:
```cpp
struct S32 {
uint32_t __wrap a; // I want defined semantics for arithmetic on this field
// e.g. no UB -> compiler created security vulnerability
};
struct S64 {
uint64_t __wrap a;
};
uint64_t doSomething();
uint64_t foo(S32 s) {
return doSomething() + s.a; // silently truncate to uint32_t, and silently
expand out to uint64_t;
}
```
You can imagine code this also impacting code like:
```cpp
auto /*uint32_t*/ base = getBase();
auto /*uint<less than 32>_t __wrap*/ offset = ...; // so many formats and use
cases have bytes here
if (offset >= size)
return;
buffer[base + offset] = ...; // you're now writing out of expected bounds
```
I also suspect these semantics mean that (assuming the above `uint64_t
doSomething();` exists)
```cpp
uint64_t foo(uint32_t __wrap a, uint64_t wrap b) {
return doSomething() + a + b;
}
```
could now produce a different result (it's already different from what existing
correct code does) from
```cpp
uint64_t foo(uint32_t __wrap a, uint64_t wrap b) {
return doSomething() + b + a;
}
```
Due to the narrowing of `doSomething() + a` where previously the developer had
(presumably) ensured a correct result. Note that this is distinct from `int32 +
int32 + int64` vs `int64 + int32 + int32` which is a real and existing problem,
because we’re starting with correct code.
Similar to this, previously correct C++ becomes wrong, and changes
substantially:
```cpp
// just some nonsense to demonstrate the viral nature of this change in
narrowing
template <class T> struct S {
S(T a) : a(a) {}
T a;
}
template <class A, class B> auto doSomething(S<A> s1, S<B> s2) ->
S<decltype(s1.a + s2.b)> {
return S(s1.a + s2.a);
}
```
`doSomething` now loses data if one of A or B is a smaller `__wrap` type (e.g.
`doSomething(S<uint64_t>(...), S<uint32_t __wrap>(...))`), and while
historically this might have been caught by something like
```
S<uint64_t> s = doSomething(...)
```
Modern C++ has `auto` and `S s = doSomething(...)` which will silently accept
the change from `S<uint64_t>` to `S<uint32_t __wrap>`, and the viral nature
will continue to silently propagate the `uint32_t __wrap` through everything.
I think `_Auto` or whatever arbitrarily different spelling C choose for `auto`
has similar hazards.
The reality is that given the change to narrowing behavior this attribute does
not actually solve the real world problems I’ve seen, or that I have had
developers express a desire for support with, and it makes existing correct
code incorrect if adopted, to the extent that it can create _new_
vulnerabilities rather than removing them. Given the above issues, not only
would this not be a solution to what developers have asked for, it would likely
be necessary to _forbid_ the adoption of the `__wrap` portion of the feature
entirely, and `__no_wrap` would at least be challenging due to the potential
introduction of crashes in code that is error (which includes UB) free.
That said, these are just the issues that I see from my point of view - the RFC
says that this is intended to support linux kernel use cases, so I’d like to
know whether these issues and hazards seem like a problem to them.
https://github.com/llvm/llvm-project/pull/148914
_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits