ankurrj7 wrote:

Reposting a tighter note after rechecking the benchmark scope.

This is only about this PR branch:

- PR branch head used for the Clang numbers: 
`2be52f1b4c7dd7907b08f554c0160550727195cb`
- The separate SLPVectorizer experiment is not in this branch and is not 
included in the numbers below.
- No optimizer-specific flags or SLP changes were used for the benchmark.

What the patch is trying to do is fairly small: when 
`-fcoverage-call-continuations` is enabled, Clang emits an extra counter at the 
continuation point after a call. If the call returns normally, that counter is 
incremented. If the call exits, longjmps, throws out, or otherwise does not 
return to that point, the continuation counter is not hit. Coverage mapping can 
then stop treating "the call was reached" as equivalent to "source after the 
call was reached".

Default source coverage is unchanged unless the new flag is passed.

### Small behavior checks

#### 1. The block-after-call case from the issue

```c
#include <stdio.h>

int main(int argc, char **argv) {
  {
    if (argc > 2)
      return 2;
    printf("one\n");
  }
  printf("two\n");
  return 0;
}
```

Run with no extra arguments. Runtime prints both lines:

```text
one
two
```

Default Clang source coverage still shows the known bad line:

```text
    7|      1|    printf("one\n");
    8|      1|  }
    9|      0|  printf("two\n");
   10|      1|  return 0;
```

With this PR's new mode:

```text
    7|      1|    printf("one\n");
    8|      1|  }
    9|      1|  printf("two\n");
   10|      1|  return 0;
```

#### 2. A call chain that exits

```c
#include <stdio.h>
#include <stdlib.h>

static void fun2(void) {
  puts("in fun2");
  exit(0);
  puts("after exit in fun2");
}

static void fun1(void) {
  fun2();
  puts("after fun2 in fun1");
}

int main(void) {
  fun1();
  puts("after fun1 in main");
  return 1;
}
```

Runtime only prints:

```text
in fun2
```

With `-fcoverage-call-continuations`, the lines after calls that never resume 
are not marked covered:

```text
    6|      1|  exit(0);
    7|      0|  puts("after exit in fun2");
...
   11|      1|  fun2();
   12|      0|  puts("after fun2 in fun1");
...
   16|      1|  fun1();
   17|      0|  puts("after fun1 in main");
   18|      0|  return 1;
```

#### 3. `setjmp` / `longjmp`

```c
#include <setjmp.h>
#include <stdio.h>

static jmp_buf env;

static void jump_out(void) {
  longjmp(env, 1);
  puts("after longjmp in jump_out");
}

int main(void) {
  if (setjmp(env) == 0) {
    puts("first");
    jump_out();
    puts("after jump_out in first path");
    return 2;
  }

  puts("second");
  return 0;
}
```

Runtime:

```text
first
second
```

Default Clang source coverage misses the second return from `setjmp` and 
incorrectly carries coverage through the `longjmp` path:

```text
   12|      1|  if (setjmp(env) == 0) {
   13|      1|    puts("first");
   14|      1|    jump_out();
   15|      1|    puts("after jump_out in first path");
   16|      1|    return 2;
...
   19|      0|  puts("second");
```

With `-fcoverage-call-continuations`:

```text
   12|      2|  if (setjmp(env) == 0) {
   13|      1|    puts("first");
   14|      1|    jump_out();
   15|      0|    puts("after jump_out in first path");
   16|      0|    return 2;
...
   19|      1|  puts("second");
   20|      1|  return 0;
```

### Benchmark

I reran this with a fresh source file specifically for this PR. This is not 
using the separate SLP experiment. The Clang binary reports:

```text
clang version 23.0.0git (https://github.com/llvm/llvm-project.git 
2be52f1b4c7dd7907b08f554c0160550727195cb)
```

Benchmark setup:

- Build mode: `-O2 -g`
- Clang default: `-fprofile-instr-generate -fcoverage-mapping`
- Clang continuation mode: `-fprofile-instr-generate -fcoverage-mapping 
-fcoverage-call-continuations`
- GCC: `--coverage`
- ICC classic: `-prof-gen=srcpos`
- Runtime input: `12000000` loop iterations, 3 samples

| Case | Runtime samples (s) | Binary size | Counter slots |
|---|---:|---:|---:|
| Clang default source coverage | `0.82,0.82,0.82` | 124880 | 41 |
| Clang with `-fcoverage-call-continuations` | `0.82,0.82,0.82` | 127128 | 113 |
| GCC/gcov | `0.58,0.58,0.60` | 60752 | n/a |
| ICC `-prof-gen=srcpos` | `0.96,0.97,0.97` | 147520 | n/a |

The main measurable cost here is the counter count: this PR's mode adds 
continuation counters after calls, so the counter slots increased from 41 to 
113 on this call-heavy test. Runtime was the same as default Clang source 
coverage in this local run. I would not over-read the absolute GCC/ICC timings; 
I included them only as context for the same benchmark shape.

<details>
<summary>Benchmark source shape</summary>

```c
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#if defined(__GNUC__) || defined(__clang__)
#define NOINLINE __attribute__((noinline))
#else
#define NOINLINE
#endif

static volatile uint64_t sink;

static uint64_t rotl(uint64_t x, unsigned r) {
  return (x << r) | (x >> (64 - r));
}

#define LEAF(N)                                                                \
  static NOINLINE uint64_t leaf_##N(uint64_t x) {                              \
    x ^= UINT64_C(0x9e3779b97f4a7c15) + (uint64_t)(N);                         \
    x = rotl(x, ((N) % 17) + 5);                                                
\
    return x + (x >> (((N) % 11) + 3));                                         
\
  }

LEAF(0)  LEAF(1)  LEAF(2)  LEAF(3)
LEAF(4)  LEAF(5)  LEAF(6)  LEAF(7)
LEAF(8)  LEAF(9)  LEAF(10) LEAF(11)
LEAF(12) LEAF(13) LEAF(14) LEAF(15)
LEAF(16) LEAF(17) LEAF(18) LEAF(19)
LEAF(20) LEAF(21) LEAF(22) LEAF(23)
LEAF(24) LEAF(25) LEAF(26) LEAF(27)
LEAF(28) LEAF(29) LEAF(30) LEAF(31)

static NOINLINE uint64_t call_chain(uint64_t x) {
  x ^= leaf_0(x);
  x ^= leaf_1(x);
  x ^= leaf_2(x);
  x ^= leaf_3(x);
  x ^= leaf_4(x);
  x ^= leaf_5(x);
  x ^= leaf_6(x);
  x ^= leaf_7(x);
  x ^= leaf_8(x);
  x ^= leaf_9(x);
  x ^= leaf_10(x);
  x ^= leaf_11(x);
  x ^= leaf_12(x);
  x ^= leaf_13(x);
  x ^= leaf_14(x);
  x ^= leaf_15(x);
  x ^= leaf_16(x);
  x ^= leaf_17(x);
  x ^= leaf_18(x);
  x ^= leaf_19(x);
  x ^= leaf_20(x);
  x ^= leaf_21(x);
  x ^= leaf_22(x);
  x ^= leaf_23(x);
  x ^= leaf_24(x);
  x ^= leaf_25(x);
  x ^= leaf_26(x);
  x ^= leaf_27(x);
  x ^= leaf_28(x);
  x ^= leaf_29(x);
  x ^= leaf_30(x);
  x ^= leaf_31(x);
  return x;
}

static NOINLINE uint64_t branchy(uint64_t x, uint64_t i) {
  if ((x ^ i) & 1)
    x += leaf_3(x);
  else
    x += leaf_19(x);

  if ((x + i) & 8)
    x ^= leaf_7(x);
  else
    x ^= leaf_29(x);

  return x;
}

int main(int argc, char **argv) {
  uint64_t iterations = 1200000;
  if (argc > 1)
    iterations = strtoull(argv[1], 0, 0);

  uint64_t x = UINT64_C(0x123456789abcdef0);
  for (uint64_t i = 0; i < iterations; ++i) {
    x = call_chain(x + i);
    if ((i & 255) == 0)
      x ^= branchy(x, i);
  }

  sink = x;
  printf("%" PRIu64 "\n", x);
  return 0;
}
```
</details>

### Validation

- `git diff --check origin/main...HEAD`: clean
- `git-clang-format` with the local LLVM formatter: clean
- Focused tests: `clang/test/CoverageMapping/call-continuations.c`, 
`clang/test/CoverageMapping/call-continuations-tight.c`, and 
`clang/test/Driver/coverage-call-continuations.c`: all passed
- Broader `clang/test/CoverageMapping`: 77/77 passed

My current read: this is good enough for design review as an opt-in coverage 
mode. I would still expect the main review discussion to be about the 
counter-growth tradeoff and whether this should be the supported knob for users 
who need precise post-call coverage.


https://github.com/llvm/llvm-project/pull/201079
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to