Partitioned table statistics vs autoanalyze
Hi, we faced a performance issue when joining 2 partitioned tables (declarative partitioning). The planner chooses nested loop while we expect hash join. The query and the plan are available here: https://explain.depesz.com/s/23r9 table_1 and table_2 are hash partitioned using volume_id column. Usually we make analyze on partitions. We do not make analyze on the partitioned table (parent). However, if we run 'analyze' on the partitioned table then planner starts choosing hash join. As a comparison, the execution using nested loop takes about 15 minutes and if it is done using hash join then the query lasts for about 1 minute. When running 'analyze' for the partitioned table, postgres inserts statistics for the partitioned table into pg_stats (pg_statistics). Before that, there are only statistics for partitions. We suspect that this is the reason for selecting bad query plan. The query is executed with cursor thus, in order to avoid parallel query, I set max_parallel_workers_per_gather to 0 during tests. We found that a similar issue was discussed in the context of inheritance: https://www.postgresql.org/message-id/Pine.BSO.4.64.0904161836540.11937%40leary.csoft.net and the conclusion was to add the following paragraph to the 'analyze' doc: > If the table being analyzed has one or more children, ANALYZE will gather statistics twice: once on the rows of the parent table only, and a second time on the rows of the parent table with all of its children. This second set of statistics is needed when planning queries that traverse the entire inheritance tree. The autovacuum daemon, however, will only consider inserts or updates on the parent table itself when deciding whether to trigger an automatic analyze for that table. If that table is rarely inserted into or updated, the inheritance statistics will not be up to date unless you run ANALYZE manually. (https://www.postgresql.org/docs/13/sql-analyze.html) I would appreciate if anyone could shed some light on the following questions: 1) Is this above paragraph from docs still valid in PG 13 and does it apply to declarative partitioning as well? Is running analyze manually on a partitioned table needed to get proper plans for queries on partitioned tables? Partitioned table (in the declarative way) is ”virtual” and does not keep any data so it seems that there are no statistics that can be gathered from the table itself and statistics from partitions should be sufficient. 2) Why does the planner need these statistics since they seem to be unused in the query plan. The query plan uses only partitions, not the partitioned table. PostgreSQL version number: version - PostgreSQL 13.3 (Ubuntu 13.3-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 64-bit (1 row) How you installed PostgreSQL: From Ubuntu 16 repositories. Changes made to the settings in the postgresql.conf file: name | current_setting |source -+-+-- application_name| psql | client auto_explain.log_analyze| on | configuration file auto_explain.log_min_duration | 30s | configuration file auto_explain.log_nested_statements | on | configuration file auto_explain.log_timing | off | configuration file autovacuum_freeze_max_age | 10 | configuration file autovacuum_max_workers | 6 | configuration file autovacuum_vacuum_cost_delay| 20ms | configuration file autovacuum_vacuum_cost_limit| 2000 | configuration file checkpoint_completion_target| 0.9 | configuration file checkpoint_timeout | 15min | configuration file cluster_name| 13/main | configuration file cpu_index_tuple_cost| 0.001 | configuration file cpu_operator_cost | 0.0005 | configuration file cursor_tuple_fraction | 1 | configuration file DateStyle | ISO, MDY | configuration file default_statistics_target | 200 | configuration file default_text_search_config | pg_catalog.english | configuration file dynamic_shared_memory_type | posix | configuration file effective_cache_size| 193385MB | configuration file effective_io_concurrency| 1000 | configuration file external_pid_file | /var/run/postgresql/13
Re: Partitioned table statistics vs autoanalyze
On Thu, Jul 22, 2021 at 01:32:51PM +0200, Kamil Frydel wrote: > table_1 and table_2 are hash partitioned using volume_id column. Usually we > make analyze on partitions. We do not make analyze on the partitioned table > (parent). > However, if we run 'analyze' on the partitioned table then planner starts > choosing hash join. As a comparison, the execution using nested loop takes > about 15 minutes and if it is done using hash join then the query lasts for > about 1 minute. When running 'analyze' for the partitioned table, postgres > inserts statistics for the partitioned table into pg_stats (pg_statistics). > Before that, there are only statistics for partitions. We suspect that this > is the reason for selecting bad query plan. > updated, the inheritance statistics will not be up to date unless you run > ANALYZE manually. > (https://www.postgresql.org/docs/13/sql-analyze.html) > > I would appreciate if anyone could shed some light on the following > questions: > 1) Is this above paragraph from docs still valid in PG 13 and does it apply > to declarative partitioning as well? Is running analyze manually on a > partitioned table needed to get proper plans for queries on partitioned > tables? Partitioned table (in the declarative way) is ”virtual” and does not > keep any data so it seems that there are no statistics that can be gathered > from the table itself and statistics from partitions should be sufficient. Up through v13, autoanalyze doesn't collect stats on parent tables (neither declarative nor inheritence). I agree that this doesn't seem to be well documented. I think it should also be mentioned here: https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-STATISTICS In v14 (which is currently in beta), autoanalyze will process the partitioned table automatically: https://www.postgresql.org/docs/14/release-14.html |Autovacuum now analyzes partitioned tables (Yuzuko Hosoya, Álvaro Herrera) |Insert, update, and delete tuple counts from partitions are now propagated to their parent tables so autovacuum knows when to process them. > 2) Why does the planner need these statistics since they seem to be unused > in the query plan. The query plan uses only partitions, not the partitioned > table. The "inherited" stats are used when you SELECT FROM table. The stats for the individual table would be needed when you SELECT FROM ONLY table (which makes no sense for a partitioned table). -- Justin
RE: Big performance slowdown from 11.2 to 13.3
OK. Will do another round of testing. -Original Message- From: David Rowley Sent: Thursday, July 22, 2021 00:44 To: [email protected] Cc: Tom Lane ; Peter Geoghegan ; Justin Pryzby ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, 22 Jul 2021 at 16:37, [email protected] wrote: > Seems like no cigar ☹ See plan pasted below. I changed the conf as follows: > - hash_mem_multiplier = '2' > - work_mem = '1GB' > Batches: 5 Memory Usage: 2400305kB Disk Usage: 126560kB You might want to keep going higher with hash_mem_multiplier until you see no "Disk Usage" there. As mentioned, v11 didn't spill to disk and just used all the memory it pleased. That was a bit dangerous as it could result in OOM, so it was fixed. David
Re: Partitioned table statistics vs autoanalyze
In v14 (which is currently in beta), autoanalyze will process the partitioned table automatically: https://www.postgresql.org/docs/14/release-14.html |Autovacuum now analyzes partitioned tables (Yuzuko Hosoya, Álvaro Herrera) |Insert, update, and delete tuple counts from partitions are now propagated to their parent tables so autovacuum knows when to process them. Thank you for the prompt reply! Changes in v14 sound promising. -- Best regards Kamil Frydel
RE: Big performance slowdown from 11.2 to 13.3
-Original Message- From: [email protected] Sent: Thursday, July 22, 2021 09:37 To: David Rowley Cc: Tom Lane ; Peter Geoghegan ; Justin Pryzby ; [email protected] Subject: RE: Big performance slowdown from 11.2 to 13.3 OK. Will do another round of testing. -Original Message- From: David Rowley Sent: Thursday, July 22, 2021 00:44 To: [email protected] Cc: Tom Lane ; Peter Geoghegan ; Justin Pryzby ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, 22 Jul 2021 at 16:37, [email protected] wrote: > Seems like no cigar ☹ See plan pasted below. I changed the conf as follows: > - hash_mem_multiplier = '2' > - work_mem = '1GB' > Batches: 5 Memory Usage: 2400305kB Disk Usage: 126560kB You might want to keep going higher with hash_mem_multiplier until you see no "Disk Usage" there. As mentioned, v11 didn't spill to disk and just used all the memory it pleased. That was a bit dangerous as it could result in OOM, so it was fixed. David -- Hello all, So, I went possibly nuclear, and still no cigar. Something's not right. - hash_mem_multiplier = '10' - work_mem = '1GB' The results are Batches: 5 Memory Usage: 2,449,457kB Disk Usage: 105,936kB Execution Time: 1,837,126.766 ms It's still spilling to disk and seems to cap at 2.5GB of memory usage in spite of configuration. More importantly - I am not understanding how spilling to disk 100MB (which seems low to me and should be fast on our SSD), causes the query to slow down by a factor of 10. - It seems at the very least that memory consumption on 11 was more moderate? This process of ours was running several of these types of queries concurrently and I don't think I ever saw the machine go over 40GB in memory usage. HashAggregate (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=84860.629..1836583.909 rows=722853 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 1277985kB Buffers: shared hit=46 read=169822, temp read=13144 written=23035 -> HashAggregate (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=27890.422..39975.074 rows=13852618 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 5 Memory Usage: 2449457kB Disk Usage: 105936kB Buffers: shared hit=32 read=169819, temp read=13144 written=23035 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1256856.62 rows=13859750 width=38) (actual time=0.053..13623.310 rows=13852618 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH (CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE / SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION TISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHER COMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASON MEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC TISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMINING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUND PAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 171680 Buffers: shared hit=32 read=169819 Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '10', jit = 'off', jit_above_cost = '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit = '24', max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers = '4G
Re: Big performance slowdown from 11.2 to 13.3
"[email protected]" writes: > So, I went possibly nuclear, and still no cigar. Something's not right. > - hash_mem_multiplier = '10' > - work_mem = '1GB' > The results are > Batches: 5 Memory Usage: 2,449,457kB Disk Usage: 105,936kB > Execution Time: 1,837,126.766 ms > It's still spilling to disk and seems to cap at 2.5GB of memory usage in > spite of configuration. That is ... weird. Maybe you have found a bug in the spill-to-disk logic; it's quite new after all. Can you extract a self-contained test case that behaves this way? regards, tom lane
Re: Big performance slowdown from 11.2 to 13.3
I wrote: > "[email protected]" writes: >> It's still spilling to disk and seems to cap at 2.5GB of memory usage in >> spite of configuration. > That is ... weird. Oh: see get_hash_mem: hash_mem = (double) work_mem * hash_mem_multiplier; /* * guc.c enforces a MAX_KILOBYTES limitation on work_mem in order to * support the assumption that raw derived byte values can be stored in * 'long' variables. The returned hash_mem value must also meet this * assumption. * * We clamp the final value rather than throw an error because it should * be possible to set work_mem and hash_mem_multiplier independently. */ if (hash_mem < MAX_KILOBYTES) return (int) hash_mem; return MAX_KILOBYTES; So basically, we now have a hard restriction that hashaggs can't use more than INT_MAX kilobytes, or approximately 2.5GB, and this use case is getting eaten alive by that restriction. Seems like we need to do something about that. regards, tom lane
Re: Big performance slowdown from 11.2 to 13.3
On Fri, 23 Jul 2021 at 03:56, Tom Lane wrote: > So basically, we now have a hard restriction that hashaggs can't use > more than INT_MAX kilobytes, or approximately 2.5GB, and this use case > is getting eaten alive by that restriction. Seems like we need to > do something about that. Hmm, math check? postgres=# select pg_size_pretty(power(2,31)::numeric*1024); pg_size_pretty 2048 GB (1 row) David
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 8:45 AM Tom Lane wrote: > That is ... weird. Maybe you have found a bug in the spill-to-disk logic; > it's quite new after all. Can you extract a self-contained test case that > behaves this way? I wonder if this has something to do with the way that the input data is clustered. I recall noticing that that could significantly alter the behavior of HashAggs as of Postgres 13. -- Peter Geoghegan
RE: Big performance slowdown from 11.2 to 13.3
-Original Message- From: Tom Lane Sent: Thursday, July 22, 2021 11:57 To: [email protected] Cc: David Rowley ; Peter Geoghegan ; Justin Pryzby ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 I wrote: > "[email protected]" writes: >> It's still spilling to disk and seems to cap at 2.5GB of memory usage in >> spite of configuration. > That is ... weird. Oh: see get_hash_mem: hash_mem = (double) work_mem * hash_mem_multiplier; /* * guc.c enforces a MAX_KILOBYTES limitation on work_mem in order to * support the assumption that raw derived byte values can be stored in * 'long' variables. The returned hash_mem value must also meet this * assumption. * * We clamp the final value rather than throw an error because it should * be possible to set work_mem and hash_mem_multiplier independently. */ if (hash_mem < MAX_KILOBYTES) return (int) hash_mem; return MAX_KILOBYTES; So basically, we now have a hard restriction that hashaggs can't use more than INT_MAX kilobytes, or approximately 2.5GB, and this use case is getting eaten alive by that restriction. Seems like we need to do something about that. regards, tom lane -- Hello! Ah... int vs long then? Tried even more (multiplier=16) and this seems to be definitely the case. Is it fair then to deduce that the total memory usage would be 2,400,305kB + 126,560kB? Is this what under the covers V11 is consuming more or less? Is it also expected that a spill over of just 100MB (on top of 2.4GB memory consumption) would cause the query to collapse like this? I am still not visualizing in my head how that would happen. 100MB just seems so small, and our SSD is fast. Generating a dataset would take me a lot of time. This is a clinical database so I cannot reuse the current table. I would have to entirely mock the use case and create a dummy dataset from scratch. HashAggregate (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=94618.303..1795311.542 rows=722853 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 1277985kB Buffers: shared hit=14 read=169854, temp read=15777 written=27588 -> HashAggregate (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=30753.022..45384.558 rows=13852618 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 5 Memory Usage: 2400305kB Disk Usage: 126560kB Buffers: shared read=169851, temp read=15777 written=27588 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1256856.62 rows=13859750 width=38) (actual time=0.110..14342.258 rows=13852618 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH (CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE / SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION TISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHER COMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASON MEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC TISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMINING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUND PAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 171680 Buffers: shared read=169851 Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '16', jit = 'off', jit_above_cost = '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limi
Re: Big performance slowdown from 11.2 to 13.3
On Fri, 23 Jul 2021 at 04:14, Peter Geoghegan wrote:
>
> On Thu, Jul 22, 2021 at 8:45 AM Tom Lane wrote:
> > That is ... weird. Maybe you have found a bug in the spill-to-disk logic;
> > it's quite new after all. Can you extract a self-contained test case that
> > behaves this way?
>
> I wonder if this has something to do with the way that the input data
> is clustered. I recall noticing that that could significantly alter
> the behavior of HashAggs as of Postgres 13.
Isn't it more likely to be reaching the group limit rather than the
memory limit?
if (input_groups * hashentrysize < hash_mem * 1024L)
{
if (num_partitions != NULL)
*num_partitions = 0;
*mem_limit = hash_mem * 1024L;
*ngroups_limit = *mem_limit / hashentrysize;
return;
}
There are 55 aggregates on a varchar(255). I think hashentrysize is
pretty big. If it was 255*55 then only 765591 groups fit in the 10GB
of memory.
David
RE: Big performance slowdown from 11.2 to 13.3
-Original Message- From: Peter Geoghegan Sent: Thursday, July 22, 2021 12:14 To: Tom Lane Cc: Jeff Davis ; [email protected]; David Rowley ; Justin Pryzby ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 8:45 AM Tom Lane wrote: > That is ... weird. Maybe you have found a bug in the spill-to-disk > logic; it's quite new after all. Can you extract a self-contained > test case that behaves this way? I wonder if this has something to do with the way that the input data is clustered. I recall noticing that that could significantly alter the behavior of HashAggs as of Postgres 13. -- Peter Geoghegan I could execute that test and re-cluster against the index. But I believe that's already done? Let me check. Thank you, Laurent.
Re: Big performance slowdown from 11.2 to 13.3
David Rowley writes: > On Fri, 23 Jul 2021 at 03:56, Tom Lane wrote: >> So basically, we now have a hard restriction that hashaggs can't use >> more than INT_MAX kilobytes, or approximately 2.5GB, and this use case >> is getting eaten alive by that restriction. Seems like we need to >> do something about that. > Hmm, math check? Yeah, I should have said "2GB plus palloc slop". It doesn't surprise me a bit that we seem to be eating another 20% on top of the nominal limit. I think the right fix here is to remove the cap, which will require changing get_hash_mem to return double, and then maybe some cascading changes --- I've not looked at its callers. regards, tom lane
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 04:16:34PM +, [email protected] wrote: > Is it fair then to deduce that the total memory usage would be 2,400,305kB + > 126,560kB? Is this what under the covers V11 is consuming more or less? It might be helpful to know how much RAM v11 is using. Could you run the query with log_executor_stats=on; client_min_messages=debug; The interesting part is this: ! 7808 kB max resident size -- Justin
RE: Big performance slowdown from 11.2 to 13.3
-Original Message- From: David Rowley Sent: Thursday, July 22, 2021 12:18 To: Peter Geoghegan Cc: Tom Lane ; Jeff Davis ; [email protected]; Justin Pryzby ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 On Fri, 23 Jul 2021 at 04:14, Peter Geoghegan wrote: > > On Thu, Jul 22, 2021 at 8:45 AM Tom Lane wrote: > > That is ... weird. Maybe you have found a bug in the spill-to-disk > > logic; it's quite new after all. Can you extract a self-contained > > test case that behaves this way? > > I wonder if this has something to do with the way that the input data > is clustered. I recall noticing that that could significantly alter > the behavior of HashAggs as of Postgres 13. Isn't it more likely to be reaching the group limit rather than the memory limit? if (input_groups * hashentrysize < hash_mem * 1024L) { if (num_partitions != NULL) *num_partitions = 0; *mem_limit = hash_mem * 1024L; *ngroups_limit = *mem_limit / hashentrysize; return; } There are 55 aggregates on a varchar(255). I think hashentrysize is pretty big. If it was 255*55 then only 765591 groups fit in the 10GB of memory. David - Hello, So, FYI The query I shared is actually a simpler use case of ours 😊 We do have a similar pivot query over 600 columns to create a large flat tale for analysis on an even larger table. Takes about 15mn to run on V11 with strong CPU usage and no particular memory usage spike that I can detect via TaskManager. We have been pushing PG hard and simplify the workflows of our analysts and data scientists downstream. Thank you, Laurent.
Re: Big performance slowdown from 11.2 to 13.3
I wrote: > I think the right fix here is to remove the cap, which will require > changing get_hash_mem to return double, and then maybe some cascading > changes --- I've not looked at its callers. Or, actually, returning size_t would likely make the most sense. We'd fold the 1024L multiplier in here too instead of doing that at the callers. regards, tom lane
RE: Big performance slowdown from 11.2 to 13.3
-Original Message- From: Justin Pryzby Sent: Thursday, July 22, 2021 12:23 To: [email protected] Cc: Tom Lane ; David Rowley ; Peter Geoghegan ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 04:16:34PM +, [email protected] wrote: > Is it fair then to deduce that the total memory usage would be 2,400,305kB + > 126,560kB? Is this what under the covers V11 is consuming more or less? It might be helpful to know how much RAM v11 is using. Could you run the query with log_executor_stats=on; client_min_messages=debug; The interesting part is this: ! 7808 kB max resident size -- Justin --- Hello Justin, > log_executor_stats=on; client_min_messages=debug; Would the results then come in EXPLAIN or would I need to pick something up from the logs? Thank you, Laurent.
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 04:30:00PM +, [email protected] wrote: > Hello Justin, > > > log_executor_stats=on; client_min_messages=debug; > > Would the results then come in EXPLAIN or would I need to pick something up > from the logs? If you're running with psql, and client_min_messages=debug, then it'll show up as a debug message to the client: ts=# SET log_executor_stats=on; SET client_min_messages=debug; explain analyze SELECT 1; SET SET LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 0.11 s user, 0.000209 s system, 0.000219 s elapsed ! [0.040608 s user, 0.020304 s system total] ! 7808 kB max resident size ... It can but doesn't have to use "explain" - that just avoids showing the query output, since it's not what's interesting here. -- Justin
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 9:21 AM Tom Lane wrote: > Yeah, I should have said "2GB plus palloc slop". It doesn't surprise > me a bit that we seem to be eating another 20% on top of the nominal > limit. MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been for many years. The function get_hash_mem() returns a work_mem-style int that callers refer to as hash_mem -- the convention is that callers pretend that there is a work_mem style GUC (called hash_mem) that they must access by calling get_hash_mem(). I don't see how it's possible for get_hash_mem() to be unable to return a hash_mem value that could be represented by work_mem directly. MAX_KILOBYTES is an annoyingly low limit on Windows, where sizeof(long) is 4. But that's nothing new. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 09:36:02AM -0700, Peter Geoghegan wrote: > I don't see how it's possible for get_hash_mem() to be unable to > return a hash_mem value that could be represented by work_mem > directly. MAX_KILOBYTES is an annoyingly low limit on Windows, where > sizeof(long) is 4. But that's nothing new. Oh. So the problem seems to be that: 1) In v12, HashAgg now obeyes work_mem*hash_mem_multiplier; 2) Under windows, work_mem is limited to 2GB. -- Justin
Re: Big performance slowdown from 11.2 to 13.3
Peter Geoghegan writes: > On Thu, Jul 22, 2021 at 9:21 AM Tom Lane wrote: >> Yeah, I should have said "2GB plus palloc slop". It doesn't surprise >> me a bit that we seem to be eating another 20% on top of the nominal >> limit. > MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been > for many years. Right. The point here is that before v13, hash aggregation was not subject to the work_mem limit, nor any related limit. If you did an aggregation requiring more than 2GB-plus-slop, it would work just fine as long as your machine had enough RAM. Now, the performance sucks and there is no knob you can turn to fix it. That's unacceptable in my book. regards, tom lane
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 9:42 AM Tom Lane wrote: > Right. The point here is that before v13, hash aggregation was not > subject to the work_mem limit, nor any related limit. If you did an > aggregation requiring more than 2GB-plus-slop, it would work just fine > as long as your machine had enough RAM. Now, the performance sucks and > there is no knob you can turn to fix it. That's unacceptable in my book. Oh! That makes way more sense. I suspect David's theory about hash_agg_set_limits()'s ngroup limit is correct. It certainly seems like a good starting point. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
Justin Pryzby writes: > Oh. So the problem seems to be that: > 1) In v12, HashAgg now obeyes work_mem*hash_mem_multiplier; > 2) Under windows, work_mem is limited to 2GB. And more to the point, work_mem*hash_mem_multiplier is *also* limited to 2GB. We didn't think that through very carefully. The point of the hash_mem_multiplier feature was to allow hash aggregation to still consume more than the work_mem limit, but we failed to free it from this 2GB limit. You're right though that this is Windows-only; on machines with 64-bit "long" there's less of a problem. regards, tom lane
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 9:53 AM Peter Geoghegan wrote: > I suspect David's theory about hash_agg_set_limits()'s ngroup limit is > correct. It certainly seems like a good starting point. I also suspect that if Laurent set work_mem and/or hash_mem_multiplier *extremely* aggressively, then eventually the hash agg would be in-memory. And without actually using all that much memory. I'm not suggesting that that is a sensible resolution to Laurent's complaint. I'm just pointing out that it's probably not fundamentally impossible to make the hash agg avoid spilling through tuning these GUCs. At least I see no evidence of that right now. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
Peter Geoghegan writes: > I also suspect that if Laurent set work_mem and/or hash_mem_multiplier > *extremely* aggressively, then eventually the hash agg would be > in-memory. And without actually using all that much memory. No, he already tried, upthread. The trouble is that he's on a Windows machine, so get_hash_mem is quasi-artificially constraining the product to 2GB. And he needs it to be a bit more than that. Whether the constraint is hitting at the ngroups stage or it's related to actual memory consumption isn't that relevant. What I'm wondering about is whether it's worth putting in a solution for this issue in isolation, or whether we ought to embark on the long-ignored project of getting rid of use of "long" for any memory-size-related computations. There would be no chance of back-patching something like the latter into v13, though. regards, tom lane
RE: Big performance slowdown from 11.2 to 13.3
I did try 2000MB work_mem and 16 multiplier 😊 It seems to plateau at 2GB no
matter what. This is what the explain had:
HashAggregate (cost=1774568.21..1774579.21 rows=200 width=1260) (actual
time=94618.303..1795311.542 rows=722853 loops=1)
Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
Batches: 1 Memory Usage: 1277985kB
Buffers: shared hit=14 read=169854, temp read=15777 written=27588
-> HashAggregate (cost=1360804.75..1374830.63 rows=1402588 width=56)
(actual time=30753.022..45384.558 rows=13852618 loops=1)
Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk,
assessmenticcqa_raw.iccqar_ques_code
Batches: 5 Memory Usage: 2400305kB Disk Usage: 126560kB
Buffers: shared read=169851, temp read=15777 written=27588
-> Seq Scan on assessmenticcqa_raw (cost=0.00..1256856.62
rows=13859750 width=38) (actual time=0.110..14342.258 rows=13852618 loops=1)
Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT
DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH (CM)","DEPTH
DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN
PRESENT","DRAIN TYPE","EDGE / SURROUNDING TISSUE -
MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE
TYPE","GRANULATION TISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE
TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND OR SUSPECTED
DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE
AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHER COMMENTS REGARDING DEBRIDEMENT
TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAIN
INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS
REGARDING REASON MEASUREMENTS NOT TAKEN","PAIN FREQUENCY","PAIN
INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE
INDURATION","REASON MEASUREMENTS NOT TAKEN","RESPONSE TO PAIN
INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKIN COLOR SURROUNDING
WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL
NECROTIC TISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3
O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELING
SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK",UNDERMINING,"UNDERMINING SIZE(CM)/LOCATION - 12 - 3
O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMINING
SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK","WIDTH (CM)","WOUND PAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 =
\"WORST POSSIBLE PAIN\""}'::text[]))
Rows Removed by Filter: 171680
Buffers: shared read=169851
Settings: effective_cache_size = '52GB', from_collapse_limit = '24',
hash_mem_multiplier = '16', jit = 'off', jit_above_cost = '2e+08',
jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08',
join_collapse_limit = '24', max_parallel_workers = '20',
max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers =
'4GB', work_mem = ' 2000MB'
Planning:
Buffers: shared hit=186 read=37
Planning Time: 55.709 ms
Execution Time: 1795921.717 ms
-Original Message-
From: Peter Geoghegan
Sent: Thursday, July 22, 2021 13:05
To: Tom Lane
Cc: David Rowley ; [email protected]; Justin Pryzby
; [email protected]
Subject: Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 9:53 AM Peter Geoghegan wrote:
> I suspect David's theory about hash_agg_set_limits()'s ngroup limit is
> correct. It certainly seems like a good starting point.
I also suspect that if Laurent set work_mem and/or hash_mem_multiplier
*extremely* aggressively, then eventually the hash agg would be in-memory. And
without actually using all that much memory.
I'm not suggesting that that is a sensible resolution to Laurent's complaint.
I'm just pointing out that it's probably not fundamentally impossible to make
the hash agg avoid spilling through tuning these GUCs. At least I see no
evidence of that right now.
--
Peter Geoghegan
RE: Big performance slowdown from 11.2 to 13.3
-Original Message- From: Justin Pryzby Sent: Thursday, July 22, 2021 12:36 To: [email protected] Cc: Tom Lane ; David Rowley ; Peter Geoghegan ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 04:30:00PM +, [email protected] wrote: > Hello Justin, > > > log_executor_stats=on; client_min_messages=debug; > > Would the results then come in EXPLAIN or would I need to pick something up > from the logs? If you're running with psql, and client_min_messages=debug, then it'll show up as a debug message to the client: ts=# SET log_executor_stats=on; SET client_min_messages=debug; explain analyze SELECT 1; SET SET LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 0.11 s user, 0.000209 s system, 0.000219 s elapsed ! [0.040608 s user, 0.020304 s system total] ! 7808 kB max resident size ... It can but doesn't have to use "explain" - that just avoids showing the query output, since it's not what's interesting here. -- Justin - Justin, I tried this but not seeing max resident size data output. Pepper=# SET log_executor_stats=on; SET client_min_messages=debug; explain analyze SELECT 1; SET SET LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 0.00 s user, 0.00 s system, 0.000109 s elapsed ! [494.640625 s user, 19.171875 s system total] QUERY PLAN Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1) Planning Time: 0.041 ms Execution Time: 0.012 ms (3 rows) For my query: LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 169.625000 s user, 5.843750 s system, 175.490088 s elapsed ! [494.640625 s user, 19.171875 s system total] HashAggregate (cost=1764285.18..1764296.18 rows=200 width=1260) (actual time=86323.813..174737.442 rows=723659 loops=1) Group Key: t.iccqa_iccassmt_fk Buffers: shared hit=364 read=170293, temp written=83229 CTE t -> HashAggregate (cost=1343178.39..1356985.17 rows=1380678 width=56) (actual time=22594.053..32519.573 rows=13865785 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Buffers: shared hit=364 read=170293 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1240682.76 rows=13666084 width=38) (actual time=0.170..10714.598 rows=13865785 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH (CM)","DEPTH DESCRIPTION","DOE S PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE / SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT", "EXUDATE TYPE","GRANULATION TISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND OR SUSPECTE D DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHER COMMENTS REGARDING DEBRIDEMENT TYPE","OTHE R COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREM ENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASON MEASUREMENTS NOT TAKEN","R ESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR ","TOTAL NECROTIC TISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELING SIZE(CM)/LO CATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMINING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOC ATION - 3 - 6 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUND PAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 172390 Buffers: shared hit=364 read=170293 -> CTE Scan on t (cost=0.00..27613.56 rows=1380678 width=552) (actual time=22594.062..40248.874 rows=13865785 loops=1) Buffers: shared hit=364 read=170293, temp written=83229 Planning Time: 0.728 ms Execution Time: 175482.904 ms (15 rows)
RE: Big performance slowdown from 11.2 to 13.3
-Original Message- From: Tom Lane Sent: Thursday, July 22, 2021 12:42 To: Peter Geoghegan Cc: David Rowley ; [email protected]; Justin Pryzby ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 Peter Geoghegan writes: > On Thu, Jul 22, 2021 at 9:21 AM Tom Lane wrote: >> Yeah, I should have said "2GB plus palloc slop". It doesn't surprise >> me a bit that we seem to be eating another 20% on top of the nominal >> limit. > MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been > for many years. Right. The point here is that before v13, hash aggregation was not subject to the work_mem limit, nor any related limit. If you did an aggregation requiring more than 2GB-plus-slop, it would work just fine as long as your machine had enough RAM. Now, the performance sucks and there is no knob you can turn to fix it. That's unacceptable in my book. regards, tom lane --- Hello all, As a user of PG, we have taken pride in the last few years in tuning the heck out of the system and getting great performance compared to alternatives like SQLServer. The customers we work with typically have data centers and are overwhelmingly Windows shops: we won the battle to deploy a complex operational system on PG vs SQLServer, but Linux vs Windows was still a bridge too far for many. I am surprised that this limitation introduced after V11 hasn't caused issues elsewhere though. Are we doing things that are such out of the normal? Are we early in pushing V13 to full production? 😊 Doing analytics with pivoted tables with hundreds of columns is not uncommon in our world. As for the three other requests from the team: Clustering: == I re-clustered the table on the index that drives the pivot logic but I didn't see any change: cluster verbose assessmenticcqa_raw using assessmenticcqa_raw_idx_iccqar_assmt_ques; HashAggregate (cost=1774465.36..1774476.36 rows=200 width=1260) (actual time=80848.591..1763443.586 rows=722853 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 1277985kB Buffers: shared hit=369 read=169577, temp read=15780 written=27584 -> HashAggregate (cost=1360748.50..1374772.80 rows=1402430 width=56) (actual time=25475.554..38256.923 rows=13852618 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 5 Memory Usage: 2400305kB Disk Usage: 126552kB Buffers: shared hit=352 read=169577, temp read=15780 written=27584 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1256812.09 rows=13858188 width=38) (actual time=0.085..11914.135 rows=13852618 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH (CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE / SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION TISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHER COMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASON MEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC TISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMINING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUND PAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 171680 Buffers: shared hit=352 read=169577 Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '4', jit = 'off', jit_above_cost = '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit = '24', max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers = '4GB', work_mem = '1GB' Planning: Buffers: shared hit=100 r
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 10:11 AM Tom Lane wrote: > No, he already tried, upthread. The trouble is that he's on a Windows > machine, so get_hash_mem is quasi-artificially constraining the product > to 2GB. And he needs it to be a bit more than that. Whether the > constraint is hitting at the ngroups stage or it's related to actual > memory consumption isn't that relevant. Somehow I missed that part. > What I'm wondering about is whether it's worth putting in a solution > for this issue in isolation, or whether we ought to embark on the > long-ignored project of getting rid of use of "long" for any > memory-size-related computations. There would be no chance of > back-patching something like the latter into v13, though. +1. Even if we assume that Windows is a low priority platform, in the long run it'll be easier to make it more like every other platform. The use of "long" is inherently suspect to me. It signals that the programmer wants something wider than "int", even if the standard doesn't actually require that "long" be wider. This seems to contradict what we know to be true for Postgres, which is that in general it's unsafe to assume that long is int64. It's not just work_mem related calculations. There is also code like logtape.c, which uses long for block numbers -- that also exposes us to risk on Windows. By requiring int64 be used instead of long, we don't actually increase risk for non-Windows platforms to any significant degree. I'm pretty sure that "long" means int64 on non-Windows 64-bit platforms anyway. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 05:26:26PM +, [email protected] wrote: > I tried this but not seeing max resident size data output. Oh. Apparently, that's not supported under windows.. #if defined(HAVE_GETRUSAGE) appendStringInfo(&str, "!\t%ld kB max resident size\n",
RE: Big performance slowdown from 11.2 to 13.3
-Original Message- From: Justin Pryzby Sent: Thursday, July 22, 2021 13:29 To: [email protected] Cc: Tom Lane ; David Rowley ; Peter Geoghegan ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 05:26:26PM +, [email protected] wrote: > I tried this but not seeing max resident size data output. Oh. Apparently, that's not supported under windows.. #if defined(HAVE_GETRUSAGE) appendStringInfo(&str, "!\t%ld kB max resident size\n", Hello, Damn... I know Windows is a lower priority, and this is yet another issue, but in Healthcare, Windows is so prevalent everywhere... Thank you, Laurent.
Re: Big performance slowdown from 11.2 to 13.3
Peter Geoghegan writes: > On Thu, Jul 22, 2021 at 10:11 AM Tom Lane wrote: >> What I'm wondering about is whether it's worth putting in a solution >> for this issue in isolation, or whether we ought to embark on the >> long-ignored project of getting rid of use of "long" for any >> memory-size-related computations. There would be no chance of >> back-patching something like the latter into v13, though. > By requiring int64 be used instead of long, we don't actually increase > risk for non-Windows platforms to any significant degree. I'm pretty > sure that "long" means int64 on non-Windows 64-bit platforms anyway. Well, what we really ought to be using is size_t (a/k/a Size), at least for memory-space-related calculations. I don't have an opinion right now about what logtape.c ought to use. I do agree that avoiding "long" altogether would be a good ultimate goal. In the short term though, the question is whether we want to regard this hashagg issue as something we need a fix for in v13/v14. The fact that it's Windows-only makes it slightly less pressing in my mind, but it's still a regression that some people are going to hit. regards, tom lane
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 10:33 AM [email protected] wrote: > Damn... I know Windows is a lower priority, and this is yet another issue, > but in Healthcare, Windows is so prevalent everywhere... To be clear, I didn't actually say that. I said that it doesn't matter either way, as far as addressing this long standing "int64 vs long" issue goes. -- Peter Geoghegan
RE: Big performance slowdown from 11.2 to 13.3
-Original Message- From: Peter Geoghegan Sent: Thursday, July 22, 2021 13:36 To: [email protected] Cc: Justin Pryzby ; Tom Lane ; David Rowley ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 10:33 AM [email protected] wrote: > Damn... I know Windows is a lower priority, and this is yet another issue, > but in Healthcare, Windows is so prevalent everywhere... To be clear, I didn't actually say that. I said that it doesn't matter either way, as far as addressing this long standing "int64 vs long" issue goes. -- Peter Geoghegan Yes, agreed Peter... The "lower priority" issue was mentioned, but not in terms of the applicability of the fix overall. Personally, I would prefer going the size_t route vs int/long/int64 in C/C++/. Of course, as a user, I'd love a patch on V13 and something cleaner in V14. Thanks, Laurent.
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 10:35 AM Tom Lane wrote: > Well, what we really ought to be using is size_t (a/k/a Size), at least > for memory-space-related calculations. I don't have an opinion right > now about what logtape.c ought to use. I do agree that avoiding "long" > altogether would be a good ultimate goal. I assume that we often use "long" in contexts where a signed integer type is required. Maybe this is not true in the case of the work_mem style calculations. But I know that it works that way in logtape.c, where -1 is a sentinel value. We already use int64 (not size_t) in tuplesort.c for roughly the same reason: LACKMEM() needs to work with negative values, to handle certain edge cases. > In the short term though, the question is whether we want to regard this > hashagg issue as something we need a fix for in v13/v14. The fact that > it's Windows-only makes it slightly less pressing in my mind, but it's > still a regression that some people are going to hit. True. I worry about the potential for introducing new bugs on Windows by backpatching a fix for this. Technically this restriction existed in every other work_mem consumer on Windows. Of course this won't matter much to users like Laurent. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
Em qui., 22 de jul. de 2021 às 14:28, Peter Geoghegan escreveu: > On Thu, Jul 22, 2021 at 10:11 AM Tom Lane wrote: > > No, he already tried, upthread. The trouble is that he's on a Windows > > machine, so get_hash_mem is quasi-artificially constraining the product > > to 2GB. And he needs it to be a bit more than that. Whether the > > constraint is hitting at the ngroups stage or it's related to actual > > memory consumption isn't that relevant. > > Somehow I missed that part. > > > What I'm wondering about is whether it's worth putting in a solution > > for this issue in isolation, or whether we ought to embark on the > > long-ignored project of getting rid of use of "long" for any > > memory-size-related computations. There would be no chance of > > back-patching something like the latter into v13, though. > > +1. Even if we assume that Windows is a low priority platform, in the > long run it'll be easier to make it more like every other platform. > > The use of "long" is inherently suspect to me. It signals that the > programmer wants something wider than "int", even if the standard > doesn't actually require that "long" be wider. This seems to > contradict what we know to be true for Postgres, which is that in > general it's unsafe to assume that long is int64. It's not just > work_mem related calculations. There is also code like logtape.c, > which uses long for block numbers -- that also exposes us to risk on > Windows. > > By requiring int64 be used instead of long, we don't actually increase > risk for non-Windows platforms to any significant degree. I'm pretty > sure that "long" means int64 on non-Windows 64-bit platforms anyway. > I wonder if similar issues not raise from this [1]. (b/src/backend/optimizer/path/costsize.c) cost_tuplesort uses *long* to store sort_mem_bytes. I suggested switching to int64, but obviously to no avail. +1 to switch long to int64. regards, Ranier Vilela [1] https://www.postgresql.org/message-id/CAApHDvqhUYHYGmovoGWJQ1%2BZ%2B50Mz%3DPV6bW%3DQYEh3Z%2BwZTufPQ%40mail.gmail.com
Re: Big performance slowdown from 11.2 to 13.3
On 2021-Jul-22, [email protected] wrote: > Yes, agreed Peter... The "lower priority" issue was mentioned, but not > in terms of the applicability of the fix overall. Personally, I would > prefer going the size_t route vs int/long/int64 in C/C++/. Of course, > as a user, I'd love a patch on V13 and something cleaner in V14. Just to clarify our terminology here. "A patch" means any kind of change to the source code, regardless of its cleanliness or applicability to versions deemed stable. You can have one patch which is a ugly hack for a stable version that can't have invasive changes, and another patch which is a clean, more maintainable version of a totally different fix for the version in development. We use the same word, "patch", for both types of source code changes. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Big performance slowdown from 11.2 to 13.3
Just asking, I may be completely wrong. is this query parallel safe? can we force parallel workers, by setting low parallel_setup_cost or otherwise to make use of scatter gather and Partial HashAggregate(s)? I am just assuming more workers doing things in parallel, would require less disk spill per hash aggregate (or partial hash aggregate ?) and the scatter gather at the end. I did some runs in my demo environment, not with the same query, some group by aggregates with around 25M rows, and it showed reasonable results, not too off. this was pg14 on ubuntu.
RE: Big performance slowdown from 11.2 to 13.3
I am not sure I understand this parameter well enough but it’s with a default value right now of 1000. I have read Robert’s post (http://rhaas.blogspot.com/2018/06/using-forceparallelmode-correctly.html) and could play with those parameters, but unsure whether what you are describing will unlock this 2GB limit. From: Vijaykumar Jain Sent: Thursday, July 22, 2021 16:32 To: [email protected] Cc: Justin Pryzby ; [email protected] Subject: Re: Big performance slowdown from 11.2 to 13.3 Just asking, I may be completely wrong. is this query parallel safe? can we force parallel workers, by setting low parallel_setup_cost or otherwise to make use of scatter gather and Partial HashAggregate(s)? I am just assuming more workers doing things in parallel, would require less disk spill per hash aggregate (or partial hash aggregate ?) and the scatter gather at the end. I did some runs in my demo environment, not with the same query, some group by aggregates with around 25M rows, and it showed reasonable results, not too off. this was pg14 on ubuntu.
