Too few rows expected by Planner on partitioned tables

2020-07-21 Thread Julian Wolf
Hello,

A description of what you are trying to achieve and what results you expect:
Our database is growing on a daily basis by about 2.5million rows per table (2 
at the moment). Because of that, we decided to partition the data, especially, 
as we are pre-aggregating the data for weeks, months, quarters and years. Every 
aggregation is stored in a separate partition:

Days: ..._ymMMd  (base data)
Weeks: ..._ymMMw (aggregated all weeks of the month)
month: ..._ymMM (aggregated month)
etc.


Our problem is, that the planner always predicts one row to be returned, 
although only a part of the primary key is queried. This problem exceeds 
feasibility of performance rapidly - a query only involving a few days already 
takes dozens of seconds. All tables are analyzed and pg_stats looks reasonable 
IMHO.


PostgreSQL version number you are running:
postgres=# SELECT version();
 version
--
 PostgreSQL 12.3 (Debian 12.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by 
gcc (Debian 8.3.0-6) 8.3.0, 64-bit
(1 Zeile)

postgres=# SELECT name, current_setting(name), source
postgres-#   FROM pg_settings
postgres-#   WHERE source NOT IN ('default', 'override');
postgres=# SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');
  name  | current_setting | 
   source
+-+--
 application_name   | psql| 
client
 checkpoint_completion_target   | 0.9 | 
configuration file
 client_encoding| UTF8| 
client
 cluster_name   | 12/main | 
configuration file
 DateStyle  | ISO, DMY| 
configuration file
 default_text_search_config | pg_catalog.german   | 
configuration file
 dynamic_shared_memory_type | posix   | 
configuration file
 effective_cache_size   | 6GB | 
configuration file
 effective_io_concurrency   | 200 | 
configuration file
 enable_partitionwise_aggregate | on  | 
configuration file
 enable_partitionwise_join  | on  | 
configuration file
 external_pid_file  | /var/run/postgresql/12-main.pid | 
configuration file
 lc_messages| de_DE.UTF-8 | 
configuration file
 lc_monetary| de_DE.UTF-8 | 
configuration file
 lc_numeric | de_DE.UTF-8 | 
configuration file
 lc_time| de_DE.UTF-8 | 
configuration file
 listen_addresses   | *   | 
configuration file
 log_line_prefix| %m [%p] %q%u@%d | 
configuration file
 log_timezone   | Etc/UTC | 
configuration file
 maintenance_work_mem   | 512MB   | 
configuration file
 max_connections| 300 | 
configuration file
 max_parallel_workers   | 2   | 
configuration file
 max_stack_depth| 2MB | 
environment variable
 max_wal_size   | 2GB | 
configuration file
 max_worker_processes   | 2   | 
configuration file
 min_wal_size   | 256MB   | 
configuration file
 port   | 5432| 
configuration file
 random_page_cost   | 1.1 | 
configuration file
 shared_buffers | 2GB | 
configuration file
 ssl| on  | 
configuration file
 ssl_cert_file  | /etc/ssl/certs/ssl-cert-snakeoil.pem| 
configuration file
 ssl_key_file   | /etc/ssl/private/ssl-cert-snakeoil.key  | 
configuration file
 stats_temp_directory   | /var/run/postgresql/12-main.pg_stat_tmp | 
configuration file
 temp_buffers   | 256MB   | 
configuration file
 TimeZone   | Etc/UTC | 
configuration file
 unix_sock

Re: Too few rows expected by Planner on partitioned tables

2020-07-21 Thread Justin Pryzby
On Tue, Jul 21, 2020 at 01:09:22PM +, Julian Wolf wrote:
> Our problem is, that the planner always predicts one row to be returned, 
> although only a part of the primary key is queried. This problem exceeds 
> feasibility of performance rapidly - a query only involving a few days 
> already takes dozens of seconds. All tables are analyzed and pg_stats looks 
> reasonable IMHO.

> daterangedaterange NOT NULL,
> spatial_feature_id   INTEGER,

> Aggregate  (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 
> rows=1 loops=1)
>   Buffers: shared hit=67334
>   ->  Index Scan using location_statistics_y2019m03w_pkey on 
> location_statistics_y2019m03w st  (cost=0.56..2.78 rows=1 width=8) (actual 
> time=0.026..117.284 rows=516277 loops=1)
> Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND 
> (spatial_feature_id = 12675))
> Buffers: shared hit=67334
> 
> As can be seen, the planner predicts one row to be returned, although it 
> should be around 3% (11% of the entries are of the given ID, which are 
> distributed over 4 weeks = date ranges) of the table. Using the partition 
> table directly, does not change this fact.

Is there a correlation between daterange and spacial_feature_id ?

Are the estimates good if you query on *only* daterange?  spacial_feature_id ?

Maybe what you need is:
https://www.postgresql.org/docs/devel/sql-createstatistics.html
CREATE STATISTICS stats (dependencies) ON daterange, spacial_feature_id FROM 
location_statistics;
ANALYZE location_statistics;

-- 
Justin