Re: Indexes on expressions with multiple columns and operators

2025-09-22 Thread Andrei Lepikhov

On 22/9/2025 15:37, Frédéric Yhuel wrote:
I wonder if this is an argument in favour of decoupling the sample size 
and the precision of the statistics. Here, we basically want the sample 
size to be as big as the table in order to include the few (NULL, 
WARNING) values.
I also have seen how repeating ANALYZE on the same database drastically 
changes query plans ;(.
It seems to me that with massive samples, many of the ANALYZE algorithms 
should be rewritten. In principle, statistical hooks exist. So, it is 
possible to invent an independent table analyser which will scan the 
whole table to get precise statistics.


--
regards, Andrei Lepikhov




Re: Indexes on expressions with multiple columns and operators

2025-09-22 Thread Frédéric Yhuel




On 9/20/25 18:51, Tom Lane wrote:

I concluded that maybe I was overthinking this part.  We only really
need to check the rowcount estimate, since the indexscan cost estimate
is already okay.  And stats_ext.sql seems to have gotten away with
assuming that rowcount estimates are reliably reproducible on
not-too-large tables.  This bug affects use of extended statistics
too, so a test using those is good enough to show it's fixed; we don't
really need to use an expression index for the purpose.  So, I added a
test case in stats_ext.sql and pushed it.


OK, great! Thanks for the detailed explanation.

Regarding extended statistics, it's unfortunate that they cannot be used 
in this case. Multivariate MCV statistics work as long as the number of 
rows in the table is reasonably small (100K) and the STATISTICS value 
for a column is high enough (it doesn't matter which column, because 
only the sample size matters).


I wonder if this is an argument in favour of decoupling the sample size 
and the precision of the statistics. Here, we basically want the sample 
size to be as big as the table in order to include the few (NULL, 
WARNING) values.


Or maybe we need a different kind of extended statistics?




Re: Indexes on expressions with multiple columns and operators

2025-09-22 Thread Frédéric Yhuel




On 9/22/25 15:57, Andrei Lepikhov wrote:

On 22/9/2025 15:37, Frédéric Yhuel wrote:
I wonder if this is an argument in favour of decoupling the sample 
size and the precision of the statistics. Here, we basically want the 
sample size to be as big as the table in order to include the few 
(NULL, WARNING) values.
I also have seen how repeating ANALYZE on the same database drastically 
changes query plans ;(.
It seems to me that with massive samples, many of the ANALYZE algorithms 
should be rewritten. In principle, statistical hooks exist. So, it is 
possible to invent an independent table analyser which will scan the 
whole table to get precise statistics.




Interesting! I wonder how difficult it would be.

However, in this specific case, I realised that it wouldn't solve the 
issue of ANALYZE being triggered when there are zero rows with (ackid, 
crit) = (NULL, WARNING).


Partitioning would still work in this case, though, because ackid's 
null_frac would be zero for the partition containing the 'WARNING' value.


I wonder if we could devise another kind of extended statistic that 
would provide these "partitioned statistics" without actually partitioning.