Re: Indexes on expressions with multiple columns and operators

2025-09-27 Thread Frédéric Yhuel
On 9/22/25 23:15, Andrei Lepikhov wrote: I'm not sure I fully understand your case, but SQL Server demonstrates an interesting approach: they have a WHERE clause attached to statistics. So, having implemented this, you may separate the whole range of values inside the table into 'partitions'

Re: Indexes on expressions with multiple columns and operators

2025-09-25 Thread Frédéric Yhuel
On 9/23/25 12:20, Frédéric Yhuel wrote: On 9/22/25 23:15, Andrei Lepikhov wrote: I'm not sure I fully understand your case, but SQL Server demonstrates an interesting approach: they have a WHERE clause attached to statistics. So, having implemented this, you may separate the whole ran

Re: Indexes on expressions with multiple columns and operators

2025-09-25 Thread Frédéric Yhuel
On 9/23/25 15:31, Frédéric Yhuel wrote: To get back to the topic of partitioned statistics, do you know if SQL Server is smart enough to handle this case [1] that we discussed last year? (with filtered statistics) [1] https://www.postgresql.org/message-id/flat/b860c71a-7cab-4d88- ad87

Re: Indexes on expressions with multiple columns and operators

2025-09-22 Thread Frédéric Yhuel
On 9/22/25 15:57, Andrei Lepikhov wrote: On 22/9/2025 15:37, Frédéric Yhuel wrote: I wonder if this is an argument in favour of decoupling the sample size and the precision of the statistics. Here, we basically want the sample size to be as big as the table in order to include the few

Re: Indexes on expressions with multiple columns and operators

2025-09-22 Thread Frédéric Yhuel
On 9/20/25 18:51, Tom Lane wrote: I concluded that maybe I was overthinking this part. We only really need to check the rowcount estimate, since the indexscan cost estimate is already okay. And stats_ext.sql seems to have gotten away with assuming that rowcount estimates are reliably reprodu

Re: Indexes on expressions with multiple columns and operators

2025-09-19 Thread Frédéric Yhuel
On 9/18/25 18:40, Tom Lane wrote: The attached fixes things so it works like it did pre-a391ff3c3. Indeed, it works well! I spent some time trying to devise a test case, and was reminded of why I didn't have one before: it's hard to make a case that will be robust enough to not show diffs

Re: Indexes on expressions with multiple columns and operators

2025-09-17 Thread Frédéric Yhuel
Thank you Laurenz and Tom! I'm going to quote Tom's email here: On 9/17/25 16:41, Tom Lane wrote: =?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?= writes: Hello, in the following, I don't understand why: 1) the expression index isn't used in the first EXPLAIN The planner doesn't look for multi-clause ma

Indexes on expressions with multiple columns and operators

2025-09-17 Thread Frédéric Yhuel
Hello, in the following, I don't understand why: 1) the expression index isn't used in the first EXPLAIN 2) the number of estimated rows is completely off in the second EXPLAIN, whereas the planner could easily use the statistics of foo_f_idx. (SQL script attached, tested with master and v17)

Re: Indexes on expressions with multiple columns and operators

2025-09-17 Thread Frédéric Yhuel
On 9/17/25 16:57, Frédéric Yhuel wrote: Yes, Laurenz made a similar suggestion, but the problem is that I'm mostly interested in the estimated number of output rows... because in the real query, there's a very bad Hash Join above (the Nested Loop is *much* faster). BTW, I'

Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet

2025-08-19 Thread Frédéric Yhuel
On 8/19/25 17:37, Frits Hoogland wrote: The specific issue I see in certain cases leading to unreasonable swap usage is Linux workingset detection kicking in Do you have a way to highlight that precisely? I mean, can you prove that it is Linux workingset detection that is causing swapping?

Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet

2025-08-18 Thread Frédéric Yhuel
On 8/8/25 10:21, Frits Hoogland wrote: If swappiness is set to 0, but swap is available, some documentation suggests it will never use anonymous memory, however I found this not to be true, linux might still choose anonymous memory to reclaim. A bug in RHEL8 meant that swappiness was not t

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

2025-06-06 Thread Frédéric Yhuel
On 6/5/25 23:52, Tom Lane wrote: The idea of treating lack of MCELEM differently from complete lack of stats still seems to have merit, though. Couldn't we count / estimate the number of distinct two-by-two elements, and use that instead of the default selectivity estimate?

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

2025-06-05 Thread Frédéric Yhuel
On 6/5/25 17:42, Mark Frost wrote: Is there any good explanation for this behaviour? Preferably we’d like some way for proper `most_common_elems` statistics to be collected in our production database, in the hope that influences a good query plan to always be selected. most_common_elems h

Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

2025-06-05 Thread Frédéric Yhuel
On 6/5/25 16:13, Frédéric Yhuel wrote: On 6/4/25 16:12, Dimitrios Apostolou wrote: In general I have noticed most operations are slower after a succesful pg_restore until VACUUM is complete, which is unfortunate as the database is huge and it takes days to run. Something I have on my list

Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

2025-06-05 Thread Frédéric Yhuel
On 6/4/25 16:12, Dimitrios Apostolou wrote: In general I have noticed most operations are slower after a succesful pg_restore until VACUUM is complete, which is unfortunate as the database is huge and it takes days to run. Something I have on my list to try, is whether a COPY FREEZE would al

Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

2025-06-04 Thread Frédéric Yhuel
On 6/3/25 17:34, Dimitrios Apostolou wrote: The backend process for each of the above ALTER TABLE commands, does not   parallelize the foreign key checks for the different partitions. I   know, because in the logs I see gigabytes of temporary files being   written, with the CONTEXT showing

Re: Why a bitmap scan in this case?

2024-12-20 Thread Frédéric Yhuel
On 12/20/24 09:16, Frédéric Yhuel wrote: On 12/19/24 20:09, Jon Zeppieri wrote: The table is freshly vacuumed. If I disable bitmap scans, it will do an index only scan, which performs better. For the bitmap heap scan, it says "Heap Blocks: exact=27393," whereas for the index

Re: Why a bitmap scan in this case?

2024-12-20 Thread Frédéric Yhuel
On 12/19/24 20:09, Jon Zeppieri wrote: The table is freshly vacuumed. If I disable bitmap scans, it will do an index only scan, which performs better. For the bitmap heap scan, it says "Heap Blocks: exact=27393," whereas for the index only scan, it's "Heap Fetches: 27701." So you have 100% h

Re: Cardinality estimate of the inner relation

2024-11-25 Thread Frédéric Yhuel
On 11/23/24 03:07, Andrei Lepikhov wrote: Thanks for the case provided! Thanks for your answer! I wonder if data science has invented a statistic or selectivity estimation technique that could tackle your case in general. As I see, we should touch the table of products to realise which s

Cardinality estimate of the inner relation

2024-11-22 Thread Frédéric Yhuel
My colleague Christophe Courtois and I have been trying to fix a bad plan for one of Dalibo's clients. It is a (probably well-known) problem with skewed data and a parameterized Nested Loop with an underestimation of the cardinality of the inner relation. Here is a test case (the script to cre

Re: Has gen_random_uuid() gotten much slower in v17?

2024-11-20 Thread Frédéric Yhuel
On 9/11/24 12:47, David Mullineux wrote: Good idea. Thanks.    I did check. It's not enabled by default but just in case I did another build. This time explicitly defining --disable- debug and --disable-cassert. And I tested. Still slower than old versions. 4.5 seconds is very surprising.

Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

2023-12-20 Thread Frédéric Yhuel
Le 20/12/2023 à 20:04, Jerry Brenner a écrit : Thanks.  Does this make sense? * There are 3 nodes under the Merge Join * The first node is an InitPlan, due to the ANY(ARRAY()) - that gets executed and finds 0 matching rows * The second node is the outer node in the Merge Join and th

Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

2023-12-20 Thread Frédéric Yhuel
Le 20/12/2023 à 15:40, Jerry Brenner a écrit : Whichever side gets executed first, is the execution of the side that would be second get short circuited if 0 rows are returned by the first side? Indeed, if 0 rows are returned from the outer relation, the scan of the inner relation is never

Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

2023-12-20 Thread Frédéric Yhuel
Le 20/12/2023 à 15:40, Jerry Brenner a écrit : The attached query plan is from 11. We are getting Merge Joins on both sides of the UNION.  In both cases, the first node under the Merge Join returns 0 rows but the other side of the Merge Join (the one being sorted) is executed and that's wher