> On Mar 18, 2026, at 05:33, Tom Lane <[email protected]> wrote:
>
> I got an off-list report that a query like this consumes
> an unreasonable amount of memory:
>
> SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i),
> '$[*] ? (@ < $)');
>
> For me, that eats about 6GB by the time it's done executing.
> If that doesn't seem like a lot to you, just add another zero to the
> generate_series call, and then it'll be more like 600GB, because the
> leakage is O(N^2).
>
> Admittedly, this isn't an especially useful query: its runtime is
> also O(N^2), because that path expression basically requires us to
> compare every element of the input JSON array to every other element.
> But it's not cool that it leaks so much memory while at it.
>
> I poked into this and found that the leakage is entirely composed of
> "JsonValueList"s that are built during path evaluation and then just
> left to rot until the end of jsonb_path_query(). We can fix it by
> being careful to free those lists on the way out of each jsonpath
> evaluation function that creates one. However, just doing that would
> mean adding pfree overhead on top of palloc overhead, so I went a bit
> further and reimplemented JsonValueList to be more compact and cheaper
> to allocate/free. The attached seems to be a bit faster than the
> existing code as well as not leaking so much memory. See the draft
> commit message for more details.
>
> regards, tom lane
This patch looks like a big win. It not only saves memory, but also makes the
query much faster.
I tested the query on my MacBook M4, increasing the iteration count from 10000
to 50000.
Current master (3b4c2b9db25):
```
evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM
generate_series(1,50000) i), '$[*] ? (@ < $)');
Time: 208581.771 ms (03:28.582)
evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM
generate_series(1,50000) i), '$[*] ? (@ < $)');
Time: 217269.595 ms (03:37.270)
```
With the patch:
```
evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM
generate_series(1,50000) i), '$[*] ? (@ < $)');
Time: 18674.580 ms (00:18.675)
evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM
generate_series(1,50000) i), '$[*] ? (@ < $)');
Time: 18889.329 ms (00:18.889)
```
My observations were:
* Before the patch, the backend process memory usage fluctuated between roughly
50GB and 145GB, while CPU usage stayed around 30%.
* With the patch, the backend process memory usage stayed stable at around
30MB, while CPU usage stayed around 100%.
After reviewing the patch, I thought JsonValueListLength() might be worth
optimizing, since it is O(n). I tried adding an ntotal_items field to
JsonValueList to track the total number of items, similar to the last pointer
that is only meaningful in the base chunk. But that did not help in my test,
and I realized JsonValueListLength() is not on the hottest path, so I dropped
that idea.
From the MacOS Instruments tool, the most expensive parts seem to be
fillJsonbValue, JsonbIteratorNext, cmp_var_common, and cmp_numerics. But those
look like separate topics.
Overall, this looks like a solid patch.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/