Re: Unfortunate Nested Loop + Missing Autovacuum

2025-02-24 Thread Lincoln Swaine-Moore
Thanks Jeff for the response--I did end up just analyzing the tables
manually, as a stopgap. Resource consumption was a non-issue as you
predicted (and plan was corrected, though estimates were still slightly
awkward).

With respect to the blocking of the autovacuum/analyze: no it shouldn't be
the case that those are running frequently in our case. Ditto re: database
restarts--this is out of my control because it would be DigitalOcean's
doing, but I don't see any evidence of it. Nor anything amiss in \dt+,
unfortunately.

I'll try to figure out if I can get access to the logs to search for
cancellations. Do you happen to know what that would look like if I'm
grep-ing for it? And do you have any other guesses about possible
explanations?

Thanks again for your help.

Lincoln



On Sun, Feb 23, 2025 at 6:09 PM Jeff Janes  wrote:

> On Sun, Feb 23, 2025 at 5:49 PM Lincoln Swaine-Moore <
> [email protected]> wrote:
>
>> Thanks for the reply! I tried the analysis on our much shorter staging
>> table and it did change the plan. I haven’t tried it on the production ones
>> because my understanding is that the autovacuum process is gentler with
>> resource consumption and I didn’t want to gum things up in the meantime.
>> But that may be false or avoidable.
>>
>
> The default setting of vacuum_cost_delay is zero, while the default
> setting of autovacuum_vacuum_cost_delay is 2ms (or 20ms, depending on the
> software version).  So you can give a manual run the same resource
> consumption as an autorun just by changing vacuum_cost_delay in that
> session so that it has the same value as autovacuum_vacuum_cost_delay.
> Also, if you just do ANALYZE (rather than VACUUM ANALYZE) the resource
> usage should be rather modest anyway.
>
> Another difference is that autovac will cancel itself if it detects it is
> blocking something else, while a manual vac/analyze operation will not
> do that.  Normal operations (DML) don't block against vacuum anyway, only
> things like index creation or partition maintenance do that.  But if those
> types of operation are frequent, then doing a manual VACUUM or ANALYZE
> could indeed gum things up.  Also, if those operations are frequent, it
> could explain the missing autovac.  If every auto attempt gets cancelled
> before it completes, then it will never complete.  pg_stat_user_tables
> doesn't reflect cancelled vacuum or analyze so those will go missing.  (You
> should see mentions of cancelled autovac operations in the log file though.)
>
> Database restarts will also interrupt vacuums.  So if your database is
> shutdown and restarted regularly (for cold back-ups, or just out of some
> misguided belief that restarting occasionally is a good practise) and the
> period between restarts is shorter than how long it would take autovac to
> run, this could also explain the lack of completed autovacs.  Also, if a
> table qualifies for both auto vacuum and auto analyze, the vacuum is done
> first.  So even if auto analyze would be fast by itself, it still won't
> complete if auto vacuum is slow and never gets to finish.
>
> It is possible to override your vacuum settings on a per-table basis.  So
> another possible explanation for the missing autovacs is that those two
> tables have been specifically configured to disable autovacuum on them and
> only them.  If you use psql, \dt+ (but not regular \dt) will show such
> customizations.  I'm sure other tools also have ways to detect this, but I
> don't know what those ways are off the top of my head.
>
> Cheers,
>
> Jeff
>


-- 
Lincoln Swaine-Moore


Re: Unfortunate Nested Loop + Missing Autovacuum

2025-02-24 Thread Andrei Lepikhov

On 23/2/2025 23:49, Lincoln Swaine-Moore wrote:
Thanks for the reply! I tried the analysis on our much shorter staging 
table and it did change the plan. I haven’t tried it on the production 
ones because my understanding is that the autovacuum process is gentler 
with resource consumption and I didn’t want to gum things up in the 
meantime. But that may be false or avoidable.


Unfortunately the insertions are not batched—they arrive in a fairly 
steady stream. Given that, and the fact that we’re talking about the 
behavior of the daemon, I’m not sure how to provide a reproduction 
(though would be game if you had pointers). I could in theory make a job 
analyze these tables on a regular cadence, though I guess that feels 
like I’d be stepping on autovacuum’s toes.


Would be happy to read over similar reports if you could steer me toward 
them.

I know about at least two problematic use cases:
1. Batch insertions of tuples with steadily growing value of a column 
(timestamp as an example). Selecting "just arrived" tuples, we can't 
estimate cardinalities correctly because these values are out of 
histogram and MCV statistics until the next analyze.
2. The table is overcrowded by queries, and autovacuum does not have a 
chance to pass the table.


For case No.1, I attempted to extend the 'probe indexes' technique [1]. 
However, it is applicable only in specific conditions (analytic load, 
for example) and should be designed as an extension. Unfortunately, we 
still do not have selectivity hooks to let it go.


To resolve case No.2 (and not only), I have invented the 'plan freezing' 
extension [2,3]. Using it, you may find the optimal plan once and freeze 
it in the instance's plan cache, picking constants that have to be 
treated as parameters. Any changes of the table content, statistics, or 
even GUCs will not impact the query plan until "unfreezing".


Also, you may use the pg_hint_plan extension [4] to provide Postgres 
optimiser with hints that can help it generate the plan you need.


[1] https://danolivo.substack.com/p/probing-indexes-to-survive-data-skew
[2] https://danolivo.substack.com/p/designing-a-prototype-postgres-plan
[3] https://postgrespro.com/docs/enterprise/16/sr-plan
[4] https://github.com/ossc-db/pg_hint_plan

--
regards, Andrei Lepikhov