date:20210517

Re: BUG #16968: Planner does not recognize optimization

2021-05-17 Thread David Rowley

On Sun, 16 May 2021 at 02:34, Eugen Konkov  wrote:
> I found a case when `not assigning a ressortgroupref to the whole-row var` 
> cause
> wrong window function calculations.
>
> I  use  same  query.  The  difference  come  when I wrap my query into
> function. (see full queries in attachment)
>
> 1.
> SELECT *
> FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) )
> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
>
> 2.
> SELECT *
>   sum(  ) over wagreement
> FROM 
> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
> WINDOW wagreement AS ( PARTITION BY agreement_id )
>
> For  first  query  window function calculates SUM over all agreements,
> then some are filtered out by (o).period_id condition.

This is unrelated to the optimisation that you were asking about before.

All that's going on here is that WHERE is evaluated before SELECT.
This means that your filtering is done before the window functions are
executed.  This is noted in the docs in [1]:

> The rows considered by a window function are those of the “virtual table” 
> produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and 
> HAVING clauses if any. For example, a row removed because it does not meet 
> the WHERE condition is not seen by any window function. A query can contain 
> multiple window functions that slice up the data in different ways using 
> different OVER clauses, but they all act on the same collection of rows 
> defined by this virtual table.

If you want to filter rows after the window functions are evaluated
then you'll likely want to use a subquery.

David

[1] https://www.postgresql.org/docs/13/tutorial-window.html

Re: BUG #16968: Planner does not recognize optimization

2021-05-17 Thread David Rowley

On Sat, 15 May 2021 at 00:39, KES  wrote:
>
> Thank you for detailed explanation. I glad to hear that I can use aliases and 
> this will be recognized and optimization is applied.
>
> >We'd need some sort of ability to assign ressortgroupref to a particular 
> >column within a
> whole-row var
> Could it be possible to create hidden alias in same way as I did that 
> manually?
>
> Algorithm seems not complex:
> 1. User refer column from composite type/whole-row: (o).agreement_id
> 2. Create hidden column at select: _o_agreement_id
> 3. Replace other references to (o).agreement_id by _o_agreement_id
> 4. Process query as usual after replacements

Internally Postgresql does use a hidden column for columns that are
required for calculations which are not in the SELECT list. e.g ones
that are in the GROUP BY / ORDER BY, or in your case a window
function's PARTITION BY. We call these "resjunk" columns.  The problem
is you can't reference those from the parent query. If you explicitly
had listed that column in the SELECT clause, it won't cost you
anything more since the planner will add it regardless and just hide
it from you.  When you add it yourself you'll be able to use it in the
subquery and you'll be able to filter out the partitions that you
don't want.

I really think you're driving yourself down a difficult path by
expecting queries with whole-row vars to be optimised just as well as
using select * or explicitly listing the columns.

David

Re: BUG #16968: Planner does not recognize optimization

2021-05-17 Thread Eugen Konkov

Hello David,

Saturday, May 15, 2021, 5:52:47 PM, you wrote:

> On Sun, 16 May 2021 at 02:34, Eugen Konkov  wrote:
>> I found a case when `not assigning a ressortgroupref to the whole-row var` 
>> cause
>> wrong window function calculations.
>>
>> I  use  same  query.  The  difference  come  when I wrap my query into
>> function. (see full queries in attachment)
>>
>> 1.
>> SELECT *
>> FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) )
>> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
>>
>> 2.
>> SELECT *
>>   sum(  ) over wagreement
>> FROM 
>> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
>> WINDOW wagreement AS ( PARTITION BY agreement_id )
>>
>> For  first  query  window function calculates SUM over all agreements,
>> then some are filtered out by (o).period_id condition.

> This is unrelated to the optimisation that you were asking about before.

> All that's going on here is that WHERE is evaluated before SELECT.
> This means that your filtering is done before the window functions are
> executed.  This is noted in the docs in [1]:

>> The rows considered by a window function are those of the “virtual table” 
>> produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and 
>> HAVING clauses if any. For example, a row removed because it does not meet 
>> the WHERE condition is not seen by any window function. A query can contain 
>> multiple window functions that slice up the data in different ways using 
>> different OVER clauses, but they all act on the same collection of rows 
>> defined by this virtual table.

> If you want to filter rows after the window functions are evaluated
> then you'll likely want to use a subquery.

> David

> [1] https://www.postgresql.org/docs/13/tutorial-window.html


Sorry,  I  miss  that WHERE works first and after it
window function.

>This is unrelated to the optimisation that you were asking about before.
So, yes,  unrelated.

Thank you for your answers.

-- 
Best regards,
Eugen Konkov

Re: BUG #16968: Planner does not recognize optimization

2021-05-17 Thread Eugen Konkov

Hello David,

> I really think you're driving yourself down a difficult path by
> expecting queries with whole-row vars to be optimised just as well as
> using select * or explicitly listing the columns.
Yes, I was expect that. I use whole-row because do not want repeat all
10+  columns  at select. I do not use (row1).*, (row2).*, because rows
could   have   same   columns.  eg:  row1.name, row2.name both will be
named as 'name' and then I can not distinguish them.
So I select whole-row and put myself into problems ((

It  would be nice if (row1).** will be expanded to: row1_id, row1_name
etc.  But this is other question which I already ask at different
thread.



Saturday, May 15, 2021, 5:59:41 PM, you wrote:

> On Sat, 15 May 2021 at 00:39, KES  wrote:
>>
>> Thank you for detailed explanation. I glad to hear that I can use aliases 
>> and this will be recognized and optimization is applied.
>>
>> >We'd need some sort of ability to assign ressortgroupref to a particular 
>> >column within a
>> whole-row var
>> Could it be possible to create hidden alias in same way as I did that 
>> manually?
>>
>> Algorithm seems not complex:
>> 1. User refer column from composite type/whole-row: (o).agreement_id
>> 2. Create hidden column at select: _o_agreement_id
>> 3. Replace other references to (o).agreement_id by _o_agreement_id
>> 4. Process query as usual after replacements

> Internally Postgresql does use a hidden column for columns that are
> required for calculations which are not in the SELECT list. e.g ones
> that are in the GROUP BY / ORDER BY, or in your case a window
> function's PARTITION BY. We call these "resjunk" columns.  The problem
> is you can't reference those from the parent query. If you explicitly
> had listed that column in the SELECT clause, it won't cost you
> anything more since the planner will add it regardless and just hide
> it from you.  When you add it yourself you'll be able to use it in the
> subquery and you'll be able to filter out the partitions that you
> don't want.

> I really think you're driving yourself down a difficult path by
> expecting queries with whole-row vars to be optimised just as well as
> using select * or explicitly listing the columns.

> David



-- 
Best regards,
Eugen Konkov

Index and statistics not used

2021-05-17 Thread rstander

Good day

 

I'm struggling with a Postgres 13 performance issue and nothing I do seem to
help.

 

I have two tables with one having a foreign key to the other. It happens to
be for this one client the foreign key is always null, so no violation would
be possible. When deleting from the one table the foreign key trigger for
constraint takes 20 seconds to run.

 

The tables look as follows

 

table1

idbigint pkey

value number

 

table2 (55 mil entries)

idbigint pkey

table1_id bigint (fkey to table1 id)

value number

 

Running delete from table1 where id = 48938 the trigger for constraint runs
for 20 seconds

 

Event when doing a simple select from table2 where table1_id = 48938 takes
about 8 seconds

 

I've tried the following, but nothing seems to change the outcome:

CREATE INDEX table2_idx ON table2(table1_id);

CREATE INDEX table2_idx2 ON table2(table1_id) WHERE table1_id IS NOT NULL;

CREATE INDEX table2_idx3 ON table2(table1_id) INCLUDE (id) WHERE table1_id
IS NOT NULL;

 

alter table table2 alter column table1_id set statistics 1;

 

None of these steps changes the planner and it would continue to do table
scans.

 

Regards

Riaan

Re: Index and statistics not used

2021-05-17 Thread David Rowley

On Tue, 18 May 2021 at 08:42,  wrote:
> Running delete from table1 where id = 48938 the trigger for constraint runs 
> for 20 seconds
>
> Event when doing a simple select from table2 where table1_id = 48938 takes 
> about 8 seconds

Does EXPLAIN show it uses a seq scan for this 8-second SELECT?

If so, does it use the index if you SET enable_seqscan TO off; ? If
so, how do the costs compare to the seqscan costs?

Is random_page_cost set to something sane? Are all the indexes valid?
(psql's \d table2 would show you INVALID if they're not.)

does: SHOW enable_indexscan; show that index scanning is switched on?

David

Re: BUG #16968: Planner does not recognize optimization

Re: BUG #16968: Planner does not recognize optimization

Re: BUG #16968: Planner does not recognize optimization

Re: BUG #16968: Planner does not recognize optimization

Index and statistics not used

Re: Index and statistics not used

6 matches

Site Navigation

Mail list logo

Footer information