Re: BUG #16968: Planner does not recognize optimization
On Sun, 16 May 2021 at 02:34, Eugen Konkov wrote: > I found a case when `not assigning a ressortgroupref to the whole-row var` > cause > wrong window function calculations. > > I use same query. The difference come when I wrap my query into > function. (see full queries in attachment) > > 1. > SELECT * > FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) ) > WHERE agreement_id = 161::int AND (o).period_id = 10::int > > 2. > SELECT * > sum( ) over wagreement > FROM > WHERE agreement_id = 161::int AND (o).period_id = 10::int > WINDOW wagreement AS ( PARTITION BY agreement_id ) > > For first query window function calculates SUM over all agreements, > then some are filtered out by (o).period_id condition. This is unrelated to the optimisation that you were asking about before. All that's going on here is that WHERE is evaluated before SELECT. This means that your filtering is done before the window functions are executed. This is noted in the docs in [1]: > The rows considered by a window function are those of the “virtual table” > produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and > HAVING clauses if any. For example, a row removed because it does not meet > the WHERE condition is not seen by any window function. A query can contain > multiple window functions that slice up the data in different ways using > different OVER clauses, but they all act on the same collection of rows > defined by this virtual table. If you want to filter rows after the window functions are evaluated then you'll likely want to use a subquery. David [1] https://www.postgresql.org/docs/13/tutorial-window.html
Re: BUG #16968: Planner does not recognize optimization
On Sat, 15 May 2021 at 00:39, KES wrote: > > Thank you for detailed explanation. I glad to hear that I can use aliases and > this will be recognized and optimization is applied. > > >We'd need some sort of ability to assign ressortgroupref to a particular > >column within a > whole-row var > Could it be possible to create hidden alias in same way as I did that > manually? > > Algorithm seems not complex: > 1. User refer column from composite type/whole-row: (o).agreement_id > 2. Create hidden column at select: _o_agreement_id > 3. Replace other references to (o).agreement_id by _o_agreement_id > 4. Process query as usual after replacements Internally Postgresql does use a hidden column for columns that are required for calculations which are not in the SELECT list. e.g ones that are in the GROUP BY / ORDER BY, or in your case a window function's PARTITION BY. We call these "resjunk" columns. The problem is you can't reference those from the parent query. If you explicitly had listed that column in the SELECT clause, it won't cost you anything more since the planner will add it regardless and just hide it from you. When you add it yourself you'll be able to use it in the subquery and you'll be able to filter out the partitions that you don't want. I really think you're driving yourself down a difficult path by expecting queries with whole-row vars to be optimised just as well as using select * or explicitly listing the columns. David
Re: BUG #16968: Planner does not recognize optimization
Hello David, Saturday, May 15, 2021, 5:52:47 PM, you wrote: > On Sun, 16 May 2021 at 02:34, Eugen Konkov wrote: >> I found a case when `not assigning a ressortgroupref to the whole-row var` >> cause >> wrong window function calculations. >> >> I use same query. The difference come when I wrap my query into >> function. (see full queries in attachment) >> >> 1. >> SELECT * >> FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) ) >> WHERE agreement_id = 161::int AND (o).period_id = 10::int >> >> 2. >> SELECT * >> sum( ) over wagreement >> FROM >> WHERE agreement_id = 161::int AND (o).period_id = 10::int >> WINDOW wagreement AS ( PARTITION BY agreement_id ) >> >> For first query window function calculates SUM over all agreements, >> then some are filtered out by (o).period_id condition. > This is unrelated to the optimisation that you were asking about before. > All that's going on here is that WHERE is evaluated before SELECT. > This means that your filtering is done before the window functions are > executed. This is noted in the docs in [1]: >> The rows considered by a window function are those of the “virtual table” >> produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and >> HAVING clauses if any. For example, a row removed because it does not meet >> the WHERE condition is not seen by any window function. A query can contain >> multiple window functions that slice up the data in different ways using >> different OVER clauses, but they all act on the same collection of rows >> defined by this virtual table. > If you want to filter rows after the window functions are evaluated > then you'll likely want to use a subquery. > David > [1] https://www.postgresql.org/docs/13/tutorial-window.html Sorry, I miss that WHERE works first and after it window function. >This is unrelated to the optimisation that you were asking about before. So, yes, unrelated. Thank you for your answers. -- Best regards, Eugen Konkov
Re: BUG #16968: Planner does not recognize optimization
Hello David, > I really think you're driving yourself down a difficult path by > expecting queries with whole-row vars to be optimised just as well as > using select * or explicitly listing the columns. Yes, I was expect that. I use whole-row because do not want repeat all 10+ columns at select. I do not use (row1).*, (row2).*, because rows could have same columns. eg: row1.name, row2.name both will be named as 'name' and then I can not distinguish them. So I select whole-row and put myself into problems (( It would be nice if (row1).** will be expanded to: row1_id, row1_name etc. But this is other question which I already ask at different thread. Saturday, May 15, 2021, 5:59:41 PM, you wrote: > On Sat, 15 May 2021 at 00:39, KES wrote: >> >> Thank you for detailed explanation. I glad to hear that I can use aliases >> and this will be recognized and optimization is applied. >> >> >We'd need some sort of ability to assign ressortgroupref to a particular >> >column within a >> whole-row var >> Could it be possible to create hidden alias in same way as I did that >> manually? >> >> Algorithm seems not complex: >> 1. User refer column from composite type/whole-row: (o).agreement_id >> 2. Create hidden column at select: _o_agreement_id >> 3. Replace other references to (o).agreement_id by _o_agreement_id >> 4. Process query as usual after replacements > Internally Postgresql does use a hidden column for columns that are > required for calculations which are not in the SELECT list. e.g ones > that are in the GROUP BY / ORDER BY, or in your case a window > function's PARTITION BY. We call these "resjunk" columns. The problem > is you can't reference those from the parent query. If you explicitly > had listed that column in the SELECT clause, it won't cost you > anything more since the planner will add it regardless and just hide > it from you. When you add it yourself you'll be able to use it in the > subquery and you'll be able to filter out the partitions that you > don't want. > I really think you're driving yourself down a difficult path by > expecting queries with whole-row vars to be optimised just as well as > using select * or explicitly listing the columns. > David -- Best regards, Eugen Konkov
Index and statistics not used
Good day I'm struggling with a Postgres 13 performance issue and nothing I do seem to help. I have two tables with one having a foreign key to the other. It happens to be for this one client the foreign key is always null, so no violation would be possible. When deleting from the one table the foreign key trigger for constraint takes 20 seconds to run. The tables look as follows table1 idbigint pkey value number table2 (55 mil entries) idbigint pkey table1_id bigint (fkey to table1 id) value number Running delete from table1 where id = 48938 the trigger for constraint runs for 20 seconds Event when doing a simple select from table2 where table1_id = 48938 takes about 8 seconds I've tried the following, but nothing seems to change the outcome: CREATE INDEX table2_idx ON table2(table1_id); CREATE INDEX table2_idx2 ON table2(table1_id) WHERE table1_id IS NOT NULL; CREATE INDEX table2_idx3 ON table2(table1_id) INCLUDE (id) WHERE table1_id IS NOT NULL; alter table table2 alter column table1_id set statistics 1; None of these steps changes the planner and it would continue to do table scans. Regards Riaan
Re: Index and statistics not used
On Tue, 18 May 2021 at 08:42, wrote: > Running delete from table1 where id = 48938 the trigger for constraint runs > for 20 seconds > > Event when doing a simple select from table2 where table1_id = 48938 takes > about 8 seconds Does EXPLAIN show it uses a seq scan for this 8-second SELECT? If so, does it use the index if you SET enable_seqscan TO off; ? If so, how do the costs compare to the seqscan costs? Is random_page_cost set to something sane? Are all the indexes valid? (psql's \d table2 would show you INVALID if they're not.) does: SHOW enable_indexscan; show that index scanning is switched on? David
