Missed compiler optimization issue in function select_rtable_names_for_explain

2024-05-22 Thread XChy

Hi everyone,

I'm a compiler developer working on detecting missed optimization in 
real-world applications. Recently, we found that LLVM missed a dead 
store elimination optimization in the PostgreSQL code 
 
(https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/ruleutils.c#L3794) 
in the master branch.


For the example below:

```

int dst[128];

memset(dst, 0, 128);

*unrelated = some_value;

dst[1] = 0;

dst[2] = 0;

```

LLVM cannot eliminate the useless stores after memset since the store to 
"unrelated" clobbers. But if we put the stores to "dst" ahead of the 
store to "unrelated", we could prevent confusing the compiler. See also 
the compiler explorer link: https://godbolt.org/z/P9jnKod3v and the 
issue of LLVM: https://github.com/llvm/llvm-project/issues/88632


To improve the codegen quality, I think it's also possible to modify the 
source code, changing the order of initialization of the member, to get 
better optimization. But I don't know whether this can be considered as 
a bug, thus post the issue here.


If anyone could confirm this problem or post a patch for it, let me know 
please. Thanks!



Best regards. Hongyu.


Re: Missed compiler optimization issue in function select_rtable_names_for_explain

2024-05-22 Thread XChy
How is the memset in select_rtable_names_for_explain a dead-store? 
Even memset

calls could be optimized away from the EXPLAIN codepath I have a feeling it
would have to be many in a tight loop for it to be measurable even?

--
Daniel Gustafsson


For the first question, I don't mean that the memset is the dead store. 
I mean that the stores with value "0" after the memset are dead:


```

    dpns.subplans = NIL;
    dpns.ctes = NIL;
    dpns.appendrels = NULL;
```
since the memset has written zeroes to the object "dpns", and these 
members are known to be zero.


For the second question, you are right, I don't really profile it or 
measure the performance impact for it. I just think it's worthwhile to 
improve codegen quality without affecting readability, as adopting 
performance tips from some static analyzer.


Best regards, Hongyu.


Re: Missed compiler optimization issue in function select_rtable_names_for_explain

2024-05-22 Thread XChy


在 2024/5/22 18:55, Daniel Gustafsson 写道:

I mean that the stores with value "0" after the memset are dead:
```
 dpns.subplans = NIL;
 dpns.ctes = NIL;
 dpns.appendrels = NULL;
```
since the memset has written zeroes to the object "dpns", and these members are 
known to be zero.

They are known to be zero, but that's not entirely equivalent though is it?
NIL is defined as ((List *) NULL) and NULL is typically defined as ((void *)
0), so sizeof(0) would be the size of an int and sizeof(NULL) would be the size
of a void pointer.


The type or size doesn't matter here. At IR or assembly level, they are 
all zeroes.


My main point is that "dpns.xxx" are filled with zeroes by the memset 
firstly, and overwriting them with zeroes in the following stores is 
redundant. LLVM cannot remove the redundant overwrites due to the 
initialization order. If we adjust the order of the initialization of 
"dpns.xxx", the compiler can remove such stores.


Does my explanation make sense to you?

Best regards, Hongyu.