Re: Occasional performance issue after changing table partitions

2022-07-10 Thread Justin Pryzby
On Sun, Jul 10, 2022 at 04:55:34PM +1200, Nathan Ward wrote:
> I am running Postgres 13 on CentOS 7, installed from the yum.postgresql.org 
>  repo.

It doesn't sound relevant, but what kind of storage systems is postgres using ?
Filesystem, raid, device.

Is the high CPU use related to to autovacuum/autoanalyze ?

> The issue I am having, is that when the daily data usage aggregation runs, 
> sometimes we have a big performance impact, with the following 
> characteristics which happen *after* the aggregation job runs in it usual 
> fast time of 12s or so:
> - The aggregation runs fast as per normal
> - Load on the server goes to 30-40 - recall we have quite high “max 
> connections” to keep throughput high when the client is far (16ms) from the 
> server

I suggest to install and enable autoexplain to see what's running slowly here,
and what its query plans are.  It seems possible that when the daily
aggregation script drops the old partitions, the plan changes for the worse.
I'm not sure what the fix is - maybe you just need to run vacuum or analyze on
the new partitions soon after populating them.

For good measure, also set log_autovacuum_min_duration=0 (or something other
than -1) (and while you're at it, log_checkpoints=on, and log_lock_waits=on if
you haven't already).

Note that postgres doesn't automatically analyze parent tables, so you should
maybe do that whenever the data changes enough for it to matter.

-- 
Justin




Re: Occasional performance issue after changing table partitions

2022-07-10 Thread Nathan Ward


> On 11/07/2022, at 2:05 AM, Justin Pryzby  wrote:
> 
> On Sun, Jul 10, 2022 at 04:55:34PM +1200, Nathan Ward wrote:
>> I am running Postgres 13 on CentOS 7, installed from the yum.postgresql.org 
>>  repo.
> 
> It doesn't sound relevant, but what kind of storage systems is postgres using 
> ?
> Filesystem, raid, device.

It’s an NVME SSD backed SAN over 2x16G fibre channel. The postgres server is in 
a VM (vmware). I can pretty comfortably do 10Gbit/s or more to the disk (I’ve 
only personally tested that, because initial replication from other sites runs 
at around that sort of speed limited by ethernet interfaces). The normal IO is 
between 1MB/s and maybe 15MB/s - writes only. Reads are pretty minimal.
FS is XFS.

> Is the high CPU use related to to autovacuum/autoanalyze ?

Good question - I don’t know. I’ve set the server to debug1 log level so I can 
see that - I see you have some notes below about autovacuum logs so I’ll see 
what that shows me.
Since setting that log level I haven’t yet had the issue occur - I watched it 
tick over midnight last night and it was normal (as it is, most days).

>> The issue I am having, is that when the daily data usage aggregation runs, 
>> sometimes we have a big performance impact, with the following 
>> characteristics which happen *after* the aggregation job runs in it usual 
>> fast time of 12s or so:
>> - The aggregation runs fast as per normal
>> - Load on the server goes to 30-40 - recall we have quite high “max 
>> connections” to keep throughput high when the client is far (16ms) from the 
>> server
> 
> I suggest to install and enable autoexplain to see what's running slowly here,
> and what its query plans are.  It seems possible that when the daily
> aggregation script drops the old partitions, the plan changes for the worse.
> I'm not sure what the fix is - maybe you just need to run vacuum or analyze on
> the new partitions soon after populating them.

Hmm, I’ll check it out. I hadn’t thought that the query planner could be doing 
something different, that’s a good point.

Note that the normal data ingest queries don’t hit the new partition - the new 
partition is for data 3 days ago, and the ingest queries only hit the 
partitions covering the last ~2 hours.

> For good measure, also set log_autovacuum_min_duration=0 (or something other
> than -1) (and while you're at it, log_checkpoints=on, and log_lock_waits=on if
> you haven't already).

Wilco.

> Note that postgres doesn't automatically analyze parent tables, so you should
> maybe do that whenever the data changes enough for it to matter.

Hmm. This raises some stuff I’m not familiar with - does analysing a parent 
table do anything? I got the impression that analysing the parent was just 
shorthand for analysing all of the attached partitions.

Perhaps because I attach a table with data, the parent sometimes decides it 
needs to run analyse on a bunch of things?
Or, maybe it uses the most recently attached partition, with bad statistics, to 
plan queries that only touch other partitions?

--
Nathan Ward





Re: Occasional performance issue after changing table partitions

2022-07-10 Thread Justin Pryzby
On Mon, Jul 11, 2022 at 03:21:38PM +1200, Nathan Ward wrote:
> > Note that postgres doesn't automatically analyze parent tables, so you 
> > should
> > maybe do that whenever the data changes enough for it to matter.
> 
> Hmm. This raises some stuff I’m not familiar with - does analysing a parent 
> table do anything?

Yes

You could check if you have stats now (maybe due to a global ANALYZE or
analyzedb) and how the query plans change if you analyze.
The transaction may be overly conservative.

SELECT COUNT(1) FROM pg_stats WHERE tablename=PARENT;
SELECT last_analyze, last_autoanalyze, relname FROM pg_stat_all_tables WHERE 
relname=PARENT;
begin;
SET default_statistics_target=10;
ANALYZE;
explain SELECT [...];
rollback;

> I got the impression that analysing the parent was just shorthand for 
> analysing all of the attached partitions.

Could you let us know if the documentation left that impression ?

See here (this was updated recently).

https://www.postgresql.org/docs/13/sql-analyze.html#id-1.9.3.46.8

For partitioned tables, ANALYZE gathers statistics by sampling rows from all 
partitions; in addition, it will recurse into each partition and update its 
statistics. Each leaf partition is analyzed only once, even with multi-level 
partitioning. No statistics are collected for only the parent table (without 
data from its partitions), because with partitioning it's guaranteed to be 
empty.

By contrast, if the table being analyzed has inheritance children, ANALYZE 
gathers two sets of statistics: one on the rows of the parent table only, and a 
second including rows of both the parent table and all of its children. This 
second set of statistics is needed when planning queries that process the 
inheritance tree as a whole. The child tables themselves are not individually 
analyzed in this case.

The autovacuum daemon does not process partitioned tables, nor does it process 
inheritance parents if only the children are ever modified. It is usually 
necessary to periodically run a manual ANALYZE to keep the statistics of the 
table hierarchy up to date.

> Perhaps because I attach a table with data, the parent sometimes decides it 
> needs to run analyse on a bunch of things?

No, that doesn't happen.

> Or, maybe it uses the most recently attached partition, with bad statistics, 
> to plan queries that only touch other partitions?

This is closer to what I was talking about.

To be clear, you are using relkind=p partitions (added in v10), and not
inheritance parents, right ?

-- 
Justin




Re: Occasional performance issue after changing table partitions

2022-07-10 Thread Nathan Ward


> On 11/07/2022, at 4:05 PM, Justin Pryzby  wrote:
> 
> On Mon, Jul 11, 2022 at 03:21:38PM +1200, Nathan Ward wrote:
>>> Note that postgres doesn't automatically analyze parent tables, so you 
>>> should
>>> maybe do that whenever the data changes enough for it to matter.
>> 
>> Hmm. This raises some stuff I’m not familiar with - does analysing a parent 
>> table do anything?
> 
> Yes
> 
> You could check if you have stats now (maybe due to a global ANALYZE or
> analyzedb) and how the query plans change if you analyze.
> The transaction may be overly conservative.
> 
> SELECT COUNT(1) FROM pg_stats WHERE tablename=PARENT;
> SELECT last_analyze, last_autoanalyze, relname FROM pg_stat_all_tables WHERE 
> relname=PARENT;
> begin;
> SET default_statistics_target=10;
> ANALYZE;
> explain SELECT [...];
> rollback;

I have a development database which gets a mirror of about 50% of the data 
coming in, and ran a global ANALYZE earlier on - and note that the disk IO is 
actually a lot higher since which is interesting and not desirable obviously, 
so I have some more fiddling to do..
The behaviour during the ANALYZE was very similar to what happens on my 
production database when things go funny though, so, this feels like it’s 
getting me close.

The above is going to be a bit tricky to do I think - the ingest process runs a 
stored procedure, and behaviour varies quite a bit if I stick in synthetic 
values.

I think probably my approach for now will be to turn on auto explain with some 
sampling, and see what happens.


Side note, in the auto_explain docs, there is a note in a callout box saying 
that log_analyze has a high impact even if the query isn’t logged - if I use 
sampling, is this still the case - i.e. all queries are impacted - or is it 
only the sampled queries?

>> I got the impression that analysing the parent was just shorthand for 
>> analysing all of the attached partitions.
> 
> Could you let us know if the documentation left that impression ?
> 
> See here (this was updated recently).
> 
> https://www.postgresql.org/docs/13/sql-analyze.html#id-1.9.3.46.8
> 
> For partitioned tables, ANALYZE gathers statistics by sampling rows from all 
> partitions; in addition, it will recurse into each partition and update its 
> statistics. Each leaf partition is analyzed only once, even with multi-level 
> partitioning. No statistics are collected for only the parent table (without 
> data from its partitions), because with partitioning it's guaranteed to be 
> empty.
> 
> By contrast, if the table being analyzed has inheritance children, ANALYZE 
> gathers two sets of statistics: one on the rows of the parent table only, and 
> a second including rows of both the parent table and all of its children. 
> This second set of statistics is needed when planning queries that process 
> the inheritance tree as a whole. The child tables themselves are not 
> individually analyzed in this case.
> 
> The autovacuum daemon does not process partitioned tables, nor does it 
> process inheritance parents if only the children are ever modified. It is 
> usually necessary to periodically run a manual ANALYZE to keep the statistics 
> of the table hierarchy up to date.


It was this part:
“””
No statistics are collected for *only* the parent table (without data from its 
partitions), because with partitioning it's guaranteed to be empty.
“””

Emphasis around “only” is mine - I think my brain skipped that word, but, it’s 
obviously critical.

I also note this:
“””
It is usually necessary to periodically run a manual ANALYZE to keep the 
statistics of the table hierarchy up to date.
“””
This seems really important and is something I was entirely unaware of - maybe 
this should be in one of those callout boxes.


>> Perhaps because I attach a table with data, the parent sometimes decides it 
>> needs to run analyse on a bunch of things?
> 
> No, that doesn't happen.

Ack.

>> Or, maybe it uses the most recently attached partition, with bad statistics, 
>> to plan queries that only touch other partitions?
> 
> This is closer to what I was talking about.
> 
> To be clear, you are using relkind=p partitions (added in v10), and not
> inheritance parents, right ?

Yes, relkind=p.

--
Nathan Ward