On Tue, 10 Mar 2026 at 02:16, Ranier Vilela <[email protected]> wrote: > In the functions *systable_beginscan* and *systable_beginscan_ordered*, > is possible a small optimization. > The array *idxkey* can be constructed in one go with a single call to mempcy. > The excess might not make much of a difference, but I think it's worth the > effort.
This whole thread reeks of you running a static analyser to find optimisation opportunities. If you want to find these, then please be aware that static analysis of code is pretty much useless for this purpose. It has no idea how hot the code in question is. If you want to find places that actually matter, try using perf. Even perf top --pid=N will allow you to see which functions matter enough to warrant further analysis. By all means, if you find something interesting and see a way to make a meaningful improvement, then post about it and show a benchmark script and results which demonstrate the performance gain. For the patch and the analysis that's been done on it so far, please be aware that the struct size that you use will be critical to both the code emitted by the compiler and to the performance. On my machine, ScanKeyData is 72 bytes. None of the godbolt.org links I looked at got this right. One had 8 bytes, and another had 24 bytes. With the default -march flags, you're probably only going to get SSE instructions which will move 16 bytes at a time. Moving 72 bytes is going to need 4 of those SSE moves and a MOV for the 8 byte remainder. That's 5 instructions. 8 bytes is only 1 instruction, and 24 is 2 instructions (16+8). I've forgotten the exact thresholds, but both gcc and clang will inline fixed-sized memcpys below something like 256-512 bytes. I assume the compiler authors did a reasonable amount of analysis to discover those thresholds. 4 ScanKeyData is 288 bytes, which is pretty close to that threshold. Flicking through our code, I see most systable_beginscan() calls are only using 1-2 scan keys. I did see some using 3. I didn't look at all of them, but the ones I saw are not even near the threshold for not inlining the memcpy. But, by all means, prove me wrong and do something like add logging in systable_beginscan and run the regression tests and use the logs to build a histogram of the calls by nkeys. For me, I think the unmodified code is better as the memcpy is a fixed size and can be inlined by the compiler. I might feel different about it if the outer loop could disappear, but that's not possible since sk_attno needs to be set. I also doubt that it matters very much as most of the time scans to catalogue tables will be cached in sys/rel/cat caches. A few tables don't, like pg_inherits, possibly others. But by all means, prove me wrong with perf results and the query that shows systable_beginscan using a meaningful amount of CPU. On a final note, please please stop thinking your static analyser is going to help you improve the performance of PostgreSQL. Go and learn a new tool that's actually useful and fit for the purpose. There's likely years of work doing small things like this in places that actually matter. You seem to have lots of energy for PostgreSQL, so I highly recommend that you take the time and go and learn to use a tool that finds those for you. You can then focus on the places it highlights for you. Otherwise, you're just wasting your time and all of ours. David
