[PATCH] D135551: [clang] replace `assert(0)` with `llvm_unreachable` NFC

David Blaikie via Phabricator via cfe-commits Mon, 10 Oct 2022 14:46:14 -0700

dblaikie added a comment.

In D135551#3847607 <https://reviews.llvm.org/D135551#3847607>, @aaron.ballman 
wrote:

> In D135551#3847444 <https://reviews.llvm.org/D135551#3847444>, @dblaikie 
> wrote:
>
>> I thought this was settled quite a while ago and enshrined in the style 
>> guide: https://llvm.org/docs/CodingStandards.html#assert-liberally
>>
>> `assert(0)` should not be used if something is reachable. We shouldn't have 
>> a "this violates an invariant, but if you don't have asserts enabled you do 
>> get some maybe-acceptable behavior".
>>
>> I feel fairly strongly that any cases of "reachable asserts" should be 
>> changed to valid error handling or `llvm_report_error` and remaining 
>> `assert(0)` should be transformed to `llvm_unreachable`. (also, ideally, 
>> don't use branch-to-`unreachable` where possible, instead assert the 
>> condition - in cases where the `if` has side effects, sometimes that's the 
>> nicest way to write it, but might be clearer, if more verbose to use a local 
>> variable for the condition, then assert that the variable is true (and have 
>> the requisite "variable might be unused" cast))
>
> I would be okay with that, but that's not what this patch was doing -- it was 
> changing `assert(0)` into an `llvm_unreachable` more mechanically, and I 
> don't think that's a valid transformation. The key, to me, is not losing the 
> distinction between "reaching here is a programming mistake that you'd make 
> during active development" vs "we never expect to reach this patch and want 
> to optimize accordingly."

I don't really think those are different things, though. Violating invariants 
is UB and there's no discussion to be had about how the program (in a 
non-asserts build) behaves when those invariants are violated - all bets are 
off, whether it's assert or unreachable.

> `__builtin_unreachable` changes the debugging landscape far too much for me 
> to want to see folks using it for "reaching here is a programming mistake" 
> situations, *especially* in RelWithDebInfo builds where optimizations are 
> enabled and may result in surprising call stacks and time travel debugging.

Generally LLVM's pretty hard to fathom in a non-asserts build anyway, right? 
(that's the first thing any of us do is reproduce with an assertions build that 
may fail miles away from where a crash occurred because an invariant was 
violated much earlier) - that `cast` won't crash/will continue on happily in a 
non-asserts build seems like a much larger hole to debuggability of a 
non-asserts build than any unreachable?

>>> Historically, we've usually used llvm_unreachable for situations where 
>>> we're saying "this code cannot be reached; if it can, something else has 
>>> gone seriously wrong." For example, in code like: int foo(SomeEnum E) { 
>>> switch (E) { case One: return 1; default: return 2; } 
>>> llvm_unreachable("getting here would be mysterious"); } and we've used 
>>> assert(0) for situations where we're saying "this code is possible to reach 
>>> only when there were mistakes elsewhere which broke invariants we were 
>>> relying on."
>>
>> I don't think those are different things though - violating invariants is ~= 
>> something going seriously wrong.
>>
>> (sorry, I guess I should debate this on the thread instead of here - but I 
>> think most of the folks on that thread did agree with the LLVM style 
>> guide/the direction here)
>>
>> This, I think was also discussed about a decade ago in the LLVM community 
>> and resulted in r166821/2962d9599e463265edae599285bbc6351f1cc0ef which 
>> specifically "Suggests llvm_unreachable over assert(0)" and is the policy of 
>> LLVM - this change is consistent with that policy.
>
> I don't have the context for those changes in my email either, but regardless 
> of what we thought ten years ago, we have code in the code base today that 
> assumes a difference in severity between kinds of unreachable statements so 
> we need to be careful when correcting mistakes. I think we're still in 
> agreement that `llvm_unreachable` should be preferred over `assert(0)` in 
> situations where the code is expected to be impossible to reach. I think 
> we're also still in agreement that "correct error reporting" is preferred 
> over `assert(0)`. Where we might still have daylight are the occasional 
> situations where `assert(0)` gives a better experience -- when the code is 
> possible to reach but reaching it signifies a developer (not user) mistake 
> that is plausible to make when doing new development, such as when misusing 
> the C interface (where there isn't always an error that should be reported 
> via the API). I think these uses should generally be rare, but I don't think 
> it's a "never do this" kind of situation.

That still seems like something that should be caught by an asserts build and 
is UB otherwise. If you're developing against LLVM's APIs in a non-assertions 
build, there's a lot of other invariants that won't be checked (`cast` being a 
pretty core example) and will make debugging really difficult.

> `llvm_unreachable` asserts in debug mode, so it has the nice property we need 
> when doing new development. But it doesn't convey the distinction in 
> semantics. Maybe another approach is: `#define developer_bugcheck(msg) 
> llvm_unreachable(msg)` (or something along those lines). We still get the 
> assertion in debug mode, we then get the better optimization in release mode, 
> but we don't lose the semantic information in the code base. It doesn't 
> really help the RelWithDebInfo case though (where call stacks may be 
> unrecognizable as a result of time travel optimizations) but maybe that's a 
> tradeoff worth making?

I don't really see the distinction, though - it's a violated invariant, which 
is a developer bug, whether it's from an assert on unreachable. They're both 
expressing invariants - not of different strength. "Valid code should never 
reach this branch" and "valid code should never produce a false result here".

I'd really like to avoid what I see as cleanup - replacing one construct 
(assert(false)) with another (llvm_unreachable) of identical (to me) 
contractual strength - being slowed because of this distinction. Replacing 
reachable-unreachables, or reachable-false-assertions with llvm_report_error, 
to me, is orthogonal to this cleanup.

(& for the C API - I think assertions/unreachable are the right thing, not 
llvm_report_error - I'm sure there's many more ways to violate the C API 
contract that would result in inexplicable/confusing behavior and that will 
only ever be protected by assertions, than we might cover by a few 
llvm_report_errors on the interface & that should be the direction we encourage 
people to go down - develop with assertions enabled)

Basically llvm_report_error ends up/should be a "we couldn't find a better way 
to do error handling here, but it is really a reachable error we have to do 
something with" - each one is a bug in the library-ness of LLVM. (or it's used 
up in a tool implementation (above the library layer) and then it's fine/just a 
convenient way to error/exit, though probably still isn't so good for 
diagnostic quality from such tools). If it's not reachable by a correct usage 
(be it from a command line tool or API) then it should be an assertion, not an 
llvm_report_error.

But I know this issue comes up from time to time and I've yet to figure out a 
way to come to shared understandings on it :/ so I don't expect my rather 
narrow and firm definition to necessarily carry the day on this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135551/new/

https://reviews.llvm.org/D135551

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D135551: [clang] replace `assert(0)` with `llvm_unreachable` NFC

Reply via email to