Hi Everyone, The C and C++ Compatibility Study Group, when working on the new standard `#embed` preprocessor parameter that mirrors the `clang::offset(...)` and `gnu::offset(...)` parameters, had someone raise a concern that the order of may be confusing. The concerns came from the June 4th, 2025 meeting: https://github.com/sg22-c-cpp-standard-compatibility/sg-compatibility/blob/main/README.md#june-4th-2025
Background ========= Throughout the rest of this text, `clang::offset`, `gnu::offset`, and the almost-standard `offset` parameter will be used interchangeably in prose. They represent the same preprocessor embed parameter, with the same semantics. Similarly, a resource named `<data.bin>` is a resource with exactly 10 bytes and is considered as such when put in an `#embed` statement. While the following 2 invocation of `#embed` are identical and produce exactly the same data: ----- #embed <data.bin> clang::offset(1) limit(3) /* ONE */ #embed <data.bin> limit(3) clang::offset(1) /* TWO */ ----- some people questioned whether or not the difference in order might make some people confused that they do not produce identical effects (e.g., that `offset` is always calculated first based on the raw file size, and then `limit` is applied after, or vice-versa). The Core Proposal ============== Following from the background, some people advocated for providing a warning/error for if it was written in the "wrong order". That is, since `limit` always applies after `offset`, the standard wanted to mandate that such parameters must always be written in a specific order. That is, `/* ONE */` would be fine but `/* TWO */` should trigger an error. It was then pointed out that this can also apply to other parameters based on the standard wording. For example, `limit(0)` or `offset(SIZE_MAX)` can make a resource that has data be considered "empty". In particular, using `<data.bin>` again: ----- #embed <data.bin> limit(0) if_empty("meow") /* THREE */ #embed <data.bin> if_empty("meow") limit(0) /* FOUR */ ----- `/* FOUR */`, under the previous ideals, should issue a diagnostic since `if_empty` is being evaluated before `limit` turns the resource empty, while `/* THREE */` would issue no diagnostics. This lead to the formulation of the following guidance: - `offset` must appear before `limit`. - `limit` and/or `offset` must appear before any of `prefix`, `suffix`, or `if_empty`. We are asking implementations how they feel about the above 2 rules and implementing them. To be extremely clear: `offset`, `clang::offset`, and `gnu::offset` always apply before the standard `limit(...)` parameter, both in Wording and in All Real Implementations, but do not impose an order in how they are written. To be more clear: this is not how C23 specified it, and not how C++ standardized it so far. As `#embed`'s principles author and carrier through the last 7 years, nobody has really came forward to say this was confusing or harmful, but this may simply be selection bias or simply that nobody has spoken up. We note that some of this is weird. Again, consider the case of `/* FOUR */` before: #embed <data.bin> if_empty("meow") limit(0) /* FOUR */ If `<data.bin>` is an empty resource, would that mean the preceding `if_empty` is fine because `limit(0)` would not have any effect anyways? In an obvious sense, the diagnostic would apply anwyays but this is one of those things where I personally did not believe anyone would advocate for ordering requirements either way so now I feel like I have to ask if that's a quality-of-implementation thing anyone would care about in the first place. This is, again, in the face of the fact that the order of the parameters does on all the implementations and that nobody has asked me both in the run-up to standardization and after if this should be a thing. The Questions ============ Therefore, we'd like to poll the GCC developer community: 1. Does anyone think a diagnostic on the order will help prevent confusion with users, even if the semantics never change between invocations regardless of parameter order? 2. If the answer to (1) is yes, do we believe it should be a warning (recommended practice in Standard Speak) or an error (a Constraint Violation/Ill-Formed in Standard Speak)? Sub-questions such as "an error, but only in pedantic mode" and similar can be golfed and bikeshedded after answering the first two questions. A formalization of these semantics is going to be presented to WG21 and WG14 at some point. I'm gathering implementer feedback and willingness to change their existing implementations to formulate a new paper: https://isocpp.org/files/papers/P3731R0.html Thank you for reading, Björkus CC: Jakub Jelinek (who wrote about implementing this on RedHat; apologies if that's an inappropriate CC) Joseph Myers (a WG14 regular who may have been tangentially interested in this question)