Foreign table as partition - Non optimal aggregation plan

2021-09-07 Thread Stepan Yankevych
Hi All!

We are using such feature as Foreign table as partition in PG 13 under CentOS
Here is our table
CREATE TABLE dwh.l1_snapshot (
l1_snapshot_id int8 NOT NULL DEFAULT 
nextval('sq_l1_snapshot_id'::regclass),
start_date_id int4 NULL,
...
...
...
dataset_id int4 NULL, -- ETL needs
transaction_time timestamp NULL
)
PARTITION BY RANGE (start_date_id);


We have several partitions locally and one partition for storing historical 
data as foreign table which is stored on another PG13
When I run following query . Partition pruning redirect query to that foreign 
table
select count(1) from dwh.l1_snapshot ls where start_date_id  = 20201109;
I see remote SQL as following

SELECT NULL FROM dwh.l1_snapshot_tail2 WHERE ((start_date_id = 20201109)).
It transfers vie network hundred million records in our case

When I query directly partition (almost the same what partition pruning does) I 
see another remote sql

select count(1) from partitions.l1_snapshot_tail2 ls where start_date_id  = 
20201109;

And remote sql is
SELECT count(1) FROM dwh.l1_snapshot_tail2 WHERE ((start_date_id = 20201109));

So in case querying foreign table we see aggregation is propagated to remote 
host (Like driving_site in oracle)
But in the first case with partition pruning the aggregation is not propagated 
to remote host.
And of course different performance 22 sec vs 75sec


That would great to have the same behavior in both cases (pushing aggregation 
to remote side).
It should be possible at least for simple aggregation (without distinct etc)


Thanks!
Stepan Yankevych

Office: +380 322 424 642xx58840  Cell: 
+380 96 915 9551  Email: 
[email protected]<mailto:[email protected]>
Lviv,  Ukraine  epam.com<http://www.epam.com>


CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) 
to which it is addressed and contains information that is legally privileged 
and confidential. If you are not the intended recipient, or the person 
responsible for delivering the message to the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this 
communication is strictly prohibited. All unintended recipients are obliged to 
delete this message and destroy any printed copies.



Re: Postgresql 14/15/16/17 partition pruning on dependent table during join

2024-11-04 Thread Stepan Yankevych
Let's classify it as possible improvement / new feature for further releases.
Optimizer definitely should be able to add that extra (redundant) condition  
and e.exec_date_id >= 20241021
or even transform e.exec_date_id >= co.create_date_id
to  e.exec_date_id >= 20241021



Stepan Yankevych



From: Andrei Lepikhov 
Sent: Sunday, November 3, 2024 4:42 AM
To: Vijaykumar Jain ; Stepan Yankevych 

Cc: [email protected] 

Subject: Re: Postgresql 14/15/16/17 partition pruning on dependent table during 
join

On 3/11/2024 03:21, Vijaykumar Jain wrote:
> On Fri, 1 Nov 2024 at 18:51, Stepan Yankevych  
> wrote:
>>
>> Partition pruning is not pushing predicate into dependent table during join 
>> in some cases.
>> See example. Predicate highlighted in red
>>
>
> i think your observation is correct.
> you may need to provide redundant predicates for join both tables to
> prune partition (as below).
>
> there is explanation on how dynamic pruning works for some cases, but
> idk which part satisfies this case.
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2Fcurrent%2Fddl-partitioning.html%23DDL-PARTITION-PRUNING&data=05%7C02%7CStepan_Yankevych%40epam.com%7Cb0119e5e3c5e47a7dd5f08dcfbb13be0%7Cb41b72d04e9f4c268a69f949f367c91d%7C1%7C0%7C638661985836678039%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=xqVRyWW11KoN0qFb%2FZsTO%2FjijULLW84NSW8lURa5UzY%3D&reserved=0<https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITION-PRUNING>
>
> explain select *
> from public.orders co
> left join public.execution e on e.order_id = co.order_id and
> e.exec_date_id >= co.create_date_id
> where co.order_text in ('Order 5259 - F968FDC8')
> and co.create_date_id = 20241021
> and e.exec_date_id >= 20241021; -- this is redundant but without this
> pruning does not work.
>
> i can be corrected and would be great if someone explains with more
> detail which i cannot due to lack of understanding of dynamic pruning.
I guess you think that Postgres should create an additional clause on
the 'e.exec_date_id from' the chain of:

'co.create_date_id = 20241021 and e.exec_date_id >= co.create_date_id'

but Postgres doesn't have such a functionality yet. It can deduce
clauses from equivalence clauses only. For example, having 'x=1 AND
x=y', Postgres can build a new clause 'y=1'. But it doesn't work for
inequalities [1].
So, to perform partition pruning on the table 'e', you need to add this
redundant clause.

[1]
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FCAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%2540mail.gmail.com&data=05%7C02%7CStepan_Yankevych%40epam.com%7Cb0119e5e3c5e47a7dd5f08dcfbb13be0%7Cb41b72d04e9f4c268a69f949f367c91d%7C1%7C0%7C638661985836699390%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=P3TVf%2FVm2s48xqB00DBaO0LQAlq4%2BGdcXbtpEU0XNi4%3D&reserved=0<https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com>

--
regards, Andrei Lepikhov



Postgresql 14/15/16/17 partition pruning on dependent table during join

2024-11-01 Thread Stepan Yankevych
lic.orders_20241029 PARTITION OF public.orders  FOR VALUES FROM 
(20241029) TO (20241030);
CREATE TABLE public.orders_20241030 PARTITION OF public.orders  FOR VALUES FROM 
(20241030) TO (20241031);
CREATE TABLE public.orders_20241031 PARTITION OF public.orders  FOR VALUES FROM 
(20241031) TO (20241101);


CREATE TABLE public.execution (
exec_id int8 NOT NULL,
order_id int8 NOT NULL,
exec_date_id int4 NOT NULL,
exec_time timestamp(6) NOT NULL,
   qty int8 NULL,
price numeric NULL
)
PARTITION BY RANGE (exec_date_id);
CREATE INDEX execution_exec_date_id_order_id_idx ON ONLY public.execution USING 
btree (exec_date_id, order_id);
CREATE unique INDEX execution_pk ON ONLY public.execution USING btree (exec_id, 
exec_date_id);

CREATE TABLE public.execution_20241001 PARTITION OF public.execution  FOR 
VALUES FROM (20241001) TO (20241002);
CREATE TABLE public.execution_20241002 PARTITION OF public.execution  FOR 
VALUES FROM (20241002) TO (20241003);
CREATE TABLE public.execution_20241003 PARTITION OF public.execution  FOR 
VALUES FROM (20241003) TO (20241004);
CREATE TABLE public.execution_20241004 PARTITION OF public.execution  FOR 
VALUES FROM (20241004) TO (20241007);
CREATE TABLE public.execution_20241007 PARTITION OF public.execution  FOR 
VALUES FROM (20241007) TO (20241008);
CREATE TABLE public.execution_20241008 PARTITION OF public.execution  FOR 
VALUES FROM (20241008) TO (20241009);
CREATE TABLE public.execution_20241009 PARTITION OF public.execution  FOR 
VALUES FROM (20241009) TO (20241010);
CREATE TABLE public.execution_20241010 PARTITION OF public.execution  FOR 
VALUES FROM (20241010) TO (20241011);
CREATE TABLE public.execution_20241011 PARTITION OF public.execution  FOR 
VALUES FROM (20241011) TO (20241014);
CREATE TABLE public.execution_20241014 PARTITION OF public.execution  FOR 
VALUES FROM (20241014) TO (20241015);
CREATE TABLE public.execution_20241015 PARTITION OF public.execution  FOR 
VALUES FROM (20241015) TO (20241016);
CREATE TABLE public.execution_20241016 PARTITION OF public.execution  FOR 
VALUES FROM (20241016) TO (20241017);
CREATE TABLE public.execution_20241017 PARTITION OF public.execution  FOR 
VALUES FROM (20241017) TO (20241018);
CREATE TABLE public.execution_20241018 PARTITION OF public.execution  FOR 
VALUES FROM (20241018) TO (20241021);
CREATE TABLE public.execution_20241021 PARTITION OF public.execution  FOR 
VALUES FROM (20241021) TO (20241022);
CREATE TABLE public.execution_20241022 PARTITION OF public.execution  FOR 
VALUES FROM (20241022) TO (20241023);
CREATE TABLE public.execution_20241023 PARTITION OF public.execution  FOR 
VALUES FROM (20241023) TO (20241024);
CREATE TABLE public.execution_20241024 PARTITION OF public.execution  FOR 
VALUES FROM (20241024) TO (20241025);
CREATE TABLE public.execution_20241025 PARTITION OF public.execution  FOR 
VALUES FROM (20241025) TO (20241028);
CREATE TABLE public.execution_20241028 PARTITION OF public.execution  FOR 
VALUES FROM (20241028) TO (20241029);
CREATE TABLE public.execution_20241029 PARTITION OF public.execution  FOR 
VALUES FROM (20241029) TO (20241030);
CREATE TABLE public.execution_20241030 PARTITION OF public.execution  FOR 
VALUES FROM (20241030) TO (20241031);
CREATE TABLE public.execution_20241031 PARTITION OF public.execution  FOR 
VALUES FROM (20241031) TO (20241101);

-- generate data
INSERT INTO public.orders (order_id, instrument_id, account_id, order_text, 
create_date_id)
SELECT
   s,  -- order_id as a sequential number starting from 1
   (RANDOM() * 100)::int,  -- Random instrument_id
   (RANDOM() * 1)::int,  -- Random account_id
   'Order ' || s || ' - ' || upper(left( md5(s::text),  floor(random() * 
10)::int)) AS order_text,
   20241001 + (s % 31)  -- create_date_id cycling between 20241001 and 20241031
FROM generate_series(1, 100) s
where (20241001 + (s % 31))::int not in (20241005, 20241006, 20241012, 
20241013, 20241019, 20241020, 20241026, 20241027 );


INSERT INTO public.execution (exec_id, order_id, exec_date_id, exec_time, qty, 
price)
WITH ordered_data AS (  SELECT order_id,
   create_date_id
   from  public.orders),
execution_data AS (  SELECT
   order_id,
   create_date_id,
   generate_series(1, (RANDOM() * 3 + 1)::int) AS exec_num  -- Generates 
between 1 and 3 executions per order
   FROM
   ordered_data),
exec_ids AS ( SELECT
   ROW_NUMBER() OVER (ORDER BY order_id, exec_num) AS exec_id,  -- 
Generates a unique exec_id
   order_id,
   create_date_id,
   exec_num
   FROM
   execution_data)
SELECT
   exec_id,
   order_id,
   CASE
   WHEN exec_num = 1 THEN create_date_id
   ELSE create_date_id + (RANDOM() * 10)::int  -- Randomly adds up to 10 
days to the original date for additional records
   END AS exec_date_id,
   clock_timestamp() - (RANDOM() * INTERVAL '10 days') AS exec_time,  -- Random 
timestamp within the last 10 days
   (RANDOM() * 1000)::int AS qty,  -- Random quantity
   (RANDOM() * 100)::numeric(10,2) AS price  -- Random price
FROM
   exec_ids;



Stepan Yankevych