ankurrj7 wrote:
Reposting a tighter note after rechecking the benchmark scope.
This is only about this PR branch:
- PR branch head used for the Clang numbers:
`2be52f1b4c7dd7907b08f554c0160550727195cb`
- The separate SLPVectorizer experiment is not in this branch and is not
included in the numbers below.
- No optimizer-specific flags or SLP changes were used for the benchmark.
What the patch is trying to do is fairly small: when
`-fcoverage-call-continuations` is enabled, Clang emits an extra counter at the
continuation point after a call. If the call returns normally, that counter is
incremented. If the call exits, longjmps, throws out, or otherwise does not
return to that point, the continuation counter is not hit. Coverage mapping can
then stop treating "the call was reached" as equivalent to "source after the
call was reached".
Default source coverage is unchanged unless the new flag is passed.
### Small behavior checks
#### 1. The block-after-call case from the issue
```c
#include <stdio.h>
int main(int argc, char **argv) {
{
if (argc > 2)
return 2;
printf("one\n");
}
printf("two\n");
return 0;
}
```
Run with no extra arguments. Runtime prints both lines:
```text
one
two
```
Default Clang source coverage still shows the known bad line:
```text
7| 1| printf("one\n");
8| 1| }
9| 0| printf("two\n");
10| 1| return 0;
```
With this PR's new mode:
```text
7| 1| printf("one\n");
8| 1| }
9| 1| printf("two\n");
10| 1| return 0;
```
#### 2. A call chain that exits
```c
#include <stdio.h>
#include <stdlib.h>
static void fun2(void) {
puts("in fun2");
exit(0);
puts("after exit in fun2");
}
static void fun1(void) {
fun2();
puts("after fun2 in fun1");
}
int main(void) {
fun1();
puts("after fun1 in main");
return 1;
}
```
Runtime only prints:
```text
in fun2
```
With `-fcoverage-call-continuations`, the lines after calls that never resume
are not marked covered:
```text
6| 1| exit(0);
7| 0| puts("after exit in fun2");
...
11| 1| fun2();
12| 0| puts("after fun2 in fun1");
...
16| 1| fun1();
17| 0| puts("after fun1 in main");
18| 0| return 1;
```
#### 3. `setjmp` / `longjmp`
```c
#include <setjmp.h>
#include <stdio.h>
static jmp_buf env;
static void jump_out(void) {
longjmp(env, 1);
puts("after longjmp in jump_out");
}
int main(void) {
if (setjmp(env) == 0) {
puts("first");
jump_out();
puts("after jump_out in first path");
return 2;
}
puts("second");
return 0;
}
```
Runtime:
```text
first
second
```
Default Clang source coverage misses the second return from `setjmp` and
incorrectly carries coverage through the `longjmp` path:
```text
12| 1| if (setjmp(env) == 0) {
13| 1| puts("first");
14| 1| jump_out();
15| 1| puts("after jump_out in first path");
16| 1| return 2;
...
19| 0| puts("second");
```
With `-fcoverage-call-continuations`:
```text
12| 2| if (setjmp(env) == 0) {
13| 1| puts("first");
14| 1| jump_out();
15| 0| puts("after jump_out in first path");
16| 0| return 2;
...
19| 1| puts("second");
20| 1| return 0;
```
### Benchmark
I reran this with a fresh source file specifically for this PR. This is not
using the separate SLP experiment. The Clang binary reports:
```text
clang version 23.0.0git (https://github.com/llvm/llvm-project.git
2be52f1b4c7dd7907b08f554c0160550727195cb)
```
Benchmark setup:
- Build mode: `-O2 -g`
- Clang default: `-fprofile-instr-generate -fcoverage-mapping`
- Clang continuation mode: `-fprofile-instr-generate -fcoverage-mapping
-fcoverage-call-continuations`
- GCC: `--coverage`
- ICC classic: `-prof-gen=srcpos`
- Runtime input: `12000000` loop iterations, 3 samples
| Case | Runtime samples (s) | Binary size | Counter slots |
|---|---:|---:|---:|
| Clang default source coverage | `0.82,0.82,0.82` | 124880 | 41 |
| Clang with `-fcoverage-call-continuations` | `0.82,0.82,0.82` | 127128 | 113 |
| GCC/gcov | `0.58,0.58,0.60` | 60752 | n/a |
| ICC `-prof-gen=srcpos` | `0.96,0.97,0.97` | 147520 | n/a |
The main measurable cost here is the counter count: this PR's mode adds
continuation counters after calls, so the counter slots increased from 41 to
113 on this call-heavy test. Runtime was the same as default Clang source
coverage in this local run. I would not over-read the absolute GCC/ICC timings;
I included them only as context for the same benchmark shape.
<details>
<summary>Benchmark source shape</summary>
```c
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#if defined(__GNUC__) || defined(__clang__)
#define NOINLINE __attribute__((noinline))
#else
#define NOINLINE
#endif
static volatile uint64_t sink;
static uint64_t rotl(uint64_t x, unsigned r) {
return (x << r) | (x >> (64 - r));
}
#define LEAF(N) \
static NOINLINE uint64_t leaf_##N(uint64_t x) { \
x ^= UINT64_C(0x9e3779b97f4a7c15) + (uint64_t)(N); \
x = rotl(x, ((N) % 17) + 5);
\
return x + (x >> (((N) % 11) + 3));
\
}
LEAF(0) LEAF(1) LEAF(2) LEAF(3)
LEAF(4) LEAF(5) LEAF(6) LEAF(7)
LEAF(8) LEAF(9) LEAF(10) LEAF(11)
LEAF(12) LEAF(13) LEAF(14) LEAF(15)
LEAF(16) LEAF(17) LEAF(18) LEAF(19)
LEAF(20) LEAF(21) LEAF(22) LEAF(23)
LEAF(24) LEAF(25) LEAF(26) LEAF(27)
LEAF(28) LEAF(29) LEAF(30) LEAF(31)
static NOINLINE uint64_t call_chain(uint64_t x) {
x ^= leaf_0(x);
x ^= leaf_1(x);
x ^= leaf_2(x);
x ^= leaf_3(x);
x ^= leaf_4(x);
x ^= leaf_5(x);
x ^= leaf_6(x);
x ^= leaf_7(x);
x ^= leaf_8(x);
x ^= leaf_9(x);
x ^= leaf_10(x);
x ^= leaf_11(x);
x ^= leaf_12(x);
x ^= leaf_13(x);
x ^= leaf_14(x);
x ^= leaf_15(x);
x ^= leaf_16(x);
x ^= leaf_17(x);
x ^= leaf_18(x);
x ^= leaf_19(x);
x ^= leaf_20(x);
x ^= leaf_21(x);
x ^= leaf_22(x);
x ^= leaf_23(x);
x ^= leaf_24(x);
x ^= leaf_25(x);
x ^= leaf_26(x);
x ^= leaf_27(x);
x ^= leaf_28(x);
x ^= leaf_29(x);
x ^= leaf_30(x);
x ^= leaf_31(x);
return x;
}
static NOINLINE uint64_t branchy(uint64_t x, uint64_t i) {
if ((x ^ i) & 1)
x += leaf_3(x);
else
x += leaf_19(x);
if ((x + i) & 8)
x ^= leaf_7(x);
else
x ^= leaf_29(x);
return x;
}
int main(int argc, char **argv) {
uint64_t iterations = 1200000;
if (argc > 1)
iterations = strtoull(argv[1], 0, 0);
uint64_t x = UINT64_C(0x123456789abcdef0);
for (uint64_t i = 0; i < iterations; ++i) {
x = call_chain(x + i);
if ((i & 255) == 0)
x ^= branchy(x, i);
}
sink = x;
printf("%" PRIu64 "\n", x);
return 0;
}
```
</details>
### Validation
- `git diff --check origin/main...HEAD`: clean
- `git-clang-format` with the local LLVM formatter: clean
- Focused tests: `clang/test/CoverageMapping/call-continuations.c`,
`clang/test/CoverageMapping/call-continuations-tight.c`, and
`clang/test/Driver/coverage-call-continuations.c`: all passed
- Broader `clang/test/CoverageMapping`: 77/77 passed
My current read: this is good enough for design review as an opt-in coverage
mode. I would still expect the main review discussion to be about the
counter-growth tradeoff and whether this should be the supported knob for users
who need precise post-call coverage.
https://github.com/llvm/llvm-project/pull/201079
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits