Re: Unfortunate Nested Loop + Missing Autovacuum

2025-02-23 Thread Lincoln Swaine-Moore
Thanks for the reply! I tried the analysis on our much shorter staging
table and it did change the plan. I haven’t tried it on the production ones
because my understanding is that the autovacuum process is gentler with
resource consumption and I didn’t want to gum things up in the meantime.
But that may be false or avoidable.

Unfortunately the insertions are not batched—they arrive in a fairly steady
stream. Given that, and the fact that we’re talking about the behavior of
the daemon, I’m not sure how to provide a reproduction (though would be
game if you had pointers). I could in theory make a job analyze these
tables on a regular cadence, though I guess that feels like I’d be stepping
on autovacuum’s toes.

Would be happy to read over similar reports if you could steer me toward
them.

Thanks again for your help.

Lincoln Swaine-Moore


On Sat, Feb 22, 2025 at 00:37 Andrei Lepikhov  wrote:

> On 22/2/2025 00:46, Lincoln Swaine-Moore wrote:
> > So, obviously there's a statistics problem, which led me to realize that
> > actually these tables have *never* been autovacuumed/analyzed according
> > to pg_stat_user_tables.
> > I'm using a managed database which makes it a little tricky to debug,
> > but all my settings
> > (autovacuum/autovacuum_vacuum_threshold/autovacuum_analyze_threshold/
> > autovacuum_vacuum_insert_threshold) are default,
> > and I can see that other tables have been vacuumed recently.
> I know a couple of reports related to this kind of behaviour and
> different solutions to resolve it. But first, if you execute the ANALYZE
> command on these problematic tables, does it fix your issue? May you
> live with manual vacuum analysis each time after batch insertion?
> If not, may you provide a synthetic reproduction of the case?
>
> --
> regards, Andrei Lepikhov
>


Re: Unfortunate Nested Loop + Missing Autovacuum

2025-02-23 Thread Jeff Janes
On Sun, Feb 23, 2025 at 5:49 PM Lincoln Swaine-Moore 
wrote:

> Thanks for the reply! I tried the analysis on our much shorter staging
> table and it did change the plan. I haven’t tried it on the production ones
> because my understanding is that the autovacuum process is gentler with
> resource consumption and I didn’t want to gum things up in the meantime.
> But that may be false or avoidable.
>

The default setting of vacuum_cost_delay is zero, while the default setting
of autovacuum_vacuum_cost_delay is 2ms (or 20ms, depending on the software
version).  So you can give a manual run the same resource consumption as an
autorun just by changing vacuum_cost_delay in that session so that it has
the same value as autovacuum_vacuum_cost_delay.  Also, if you just do
ANALYZE (rather than VACUUM ANALYZE) the resource usage should be
rather modest anyway.

Another difference is that autovac will cancel itself if it detects it is
blocking something else, while a manual vac/analyze operation will not
do that.  Normal operations (DML) don't block against vacuum anyway, only
things like index creation or partition maintenance do that.  But if those
types of operation are frequent, then doing a manual VACUUM or ANALYZE
could indeed gum things up.  Also, if those operations are frequent, it
could explain the missing autovac.  If every auto attempt gets cancelled
before it completes, then it will never complete.  pg_stat_user_tables
doesn't reflect cancelled vacuum or analyze so those will go missing.  (You
should see mentions of cancelled autovac operations in the log file though.)

Database restarts will also interrupt vacuums.  So if your database is
shutdown and restarted regularly (for cold back-ups, or just out of some
misguided belief that restarting occasionally is a good practise) and the
period between restarts is shorter than how long it would take autovac to
run, this could also explain the lack of completed autovacs.  Also, if a
table qualifies for both auto vacuum and auto analyze, the vacuum is done
first.  So even if auto analyze would be fast by itself, it still won't
complete if auto vacuum is slow and never gets to finish.

It is possible to override your vacuum settings on a per-table basis.  So
another possible explanation for the missing autovacs is that those two
tables have been specifically configured to disable autovacuum on them and
only them.  If you use psql, \dt+ (but not regular \dt) will show such
customizations.  I'm sure other tools also have ways to detect this, but I
don't know what those ways are off the top of my head.

Cheers,

Jeff