Copyright vs Licence

2021-05-10 Thread Vijaykumar Jain
Hi All,

I have been playing around with the pg_auto_failover extension by citus and
have really enjoyed playing chaos with it.
citusdata/pg_auto_failover: Postgres extension and service for automated
failover and high-availability (github.com)


Now I see this at the bottom of this extension.
This may be a stupid question, but i ask coz i have worked with OSS that
been marked EOL or dead.
Some software have started asking for fee (like oracle for supported java)
Some software which were completely open sourced for unlimited usage (like
sensu) now have a new version which has limited/capped free usage.
Or the Google vs Oracle case.

I know I can make a city of postgresql clusters of various sharded
architectures, and it will still be free and postgresql is not responsible
for any damage etc i understand, but can the extensions later charge on
usage model.



*Copyright (c) Microsoft Corporation. All rights reserved.This project is
licensed under the PostgreSQL License, see LICENSE file for details.*
I have a lame query (but i have a concern wrt how oracle acquired products
licenses changed)

What is the role of copyright in a license. (I am not sure if i am even
framing the question correctly, but let me know if i am not).
Can I be charged for whatever reasons in the future for using this
extension.

-- 
Thanks,
Vijay
Mumbai, India


Re: Copyright vs Licence

2021-05-10 Thread Ron

On 5/10/21 4:34 AM, Vijaykumar Jain wrote:

Hi All,

I have been playing around with the pg_auto_failover extension by citus 
and have really enjoyed playing chaos with it.
citusdata/pg_auto_failover: Postgres extension and service for automated 
failover and high-availability (github.com) 



Now I see this at the bottom of this extension.
This may be a stupid question, but i ask coz i have worked with OSS that 
been marked EOL or dead.

Some software have started asking for fee (like oracle for supported java)
Some software which were completely open sourced for unlimited usage (like 
sensu) now have a new version which has limited/capped free usage.

Or the Google vs Oracle case.

I know I can make a city of postgresql clusters of various sharded 
architectures, and it will still be free and postgresql is not responsible 
for any damage etc i understand, but can the extensions later charge on 
usage model.


*Copyright (c) Microsoft Corporation. All rights reserved.
This project is licensed under the PostgreSQL License, see LICENSE file 
for details.

*
I have a lame query (but i have a concern wrt how oracle acquired products 
licenses changed)


What is the role of copyright in a license. (I am not sure if i am even 
framing the question correctly, but let me know if i am not).


Copyright establishes who wrote the software, and thus prevents others from 
copying it (substantively) verbatim and then claiming it as their own.  Only 
the copyright owner can license the software for someone else to use.



Can I be charged for whatever reasons in the future for using this extension.


If MSFT is the sole holder of the copyright, then they can relicense it as 
they see fit. *I think* that they can only change the license on *newer* 
versions, so you'd be able to keep using the latest OSS version.


Of course, IANAL so take what I write with a spoonful of salt.

--
Angular momentum makes the world go 'round.


force partition pruning

2021-05-10 Thread Niels Jespersen
Hi all

I need a litte advice on how to

Postgres 13.2

A metadata query pulls partition keys:

select m.period_version from register.register_metadata m where 
m.current_version and m.period between '201712' and '201912' ;

A query using these in an in-list easily makes the planner do partition pruning.

select * from register.register d where d.period_version in ('201712_1', 
'201812_1', '201912_1');

However combining the metadataquery into the dataquery makes the planner decide 
to scan all partitions.

select * from register.register d where d.period_version in (select 
m.period_version from register.register_metadata m where m.current_version and 
m.period between '201712' and '201912');

I am quite aware that the latter query requires partition pruning to take place 
during execution not during planning.

My question here is how do I package the two-step proces into an interface that 
analysts can actually use?

One possibility is to have a prepare step that creates a temporary view with 
the hard-coded values built-in. And then query data via the temp view. This 
works ok, but there is an issue with possible naming conflicts on the temp view 
(not that this could not be worked around).

Ideally I would like a function to figure out the query and then return the 
data from that dynamically executed query. Complicating matters is the fact 
that there are more than one set of data/metatable tables and each datatable 
has a different set of columns. This excludes a table returning function since 
that must list the columns present.


Best regards


Niels Jespersen
Chief Adviser
IT Center

Mobile phone:+45 42 42 93 73
Email: n...@dst.dk

Statistics Denmark, Sejrøgade 11, DK-2100 Copenhagen
www.dst.dk/en | Twitter | 
LinkedIn | 
Facebook

[cid:image001.png@01D745A1.DEE94640]







Re: Copyright vs Licence

2021-05-10 Thread cen


If MSFT is the sole holder of the copyright, then they can relicense 
it as they see fit. *I think* that they can only change the license on 
*newer* versions, so you'd be able to keep using the latest OSS version.


That is correct. If I get a version 1 of your program under license A 
you can't come back a year later and tell me the same code is now 
licensed under B and hinder the original freedoms of the license.


What can and does happen is that a new version is released under a 
different license while the old version is made obsolete. In real word 
that means you are stuck with the old version so you either need to 
upgrade to a newer version with different license or use something else. 
Real world cases are MongoDB and Redis modules license change.




Re: force partition pruning

2021-05-10 Thread Vijaykumar Jain
I do not know how to put this in words,
but see below when the predicate is explicitly applied to the main table
with partition.

postgres=# \d+ prt1
   Partitioned table "public.prt1"
 Column |   Type| Collation | Nullable | Default | Storage  |
Stats target | Description
+---+---+--+-+--+--+-
 a  | integer   |   | not null | | plain|
|
 b  | integer   |   |  | | plain|
|
 c  | character varying |   |  | | extended |
|
Partition key: RANGE (a)
Partitions: prt1_p1 FOR VALUES FROM (0) TO (250),
prt1_p2 FOR VALUES FROM (250) TO (500),
prt1_p3 FOR VALUES FROM (500) TO (600)

(failed reverse-i-search)`': ^C
postgres=# \d+ b
 Table "public.b"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+-+--+-
 id | integer |   | not null | | plain   |
|
Indexes:
"b_id_idx" btree (id)
Access method: heap

postgres=# table b;
 id
-
 200
 400
(2 rows)

-- basically if the table is joined and predicate can be applied to the
outer table which has constraints matching,
partition pruning takes place.

I do not know the theory, or even what i did is correct, but just FYI.

postgres=# explain analyze  select prt1.* from prt1 where a in ( select id
from b where id in (1, 100, 200) );
   QUERY PLAN
-
 Hash Semi Join  (cost=1.05..9.36 rows=2 width=13) (actual
time=0.034..0.074 rows=1 loops=1)
   Hash Cond: (prt1.a = b.id)
   ->  Append  (cost=0.00..7.50 rows=300 width=13) (actual
time=0.006..0.043 rows=300 loops=1)
 ->  Seq Scan on prt1_p1 prt1_1  (cost=0.00..2.25 rows=125
width=13) (actual time=0.005..0.013 rows=125 loops=1)
 ->  Seq Scan on prt1_p2 prt1_2  (cost=0.00..2.25 rows=125
width=13) (actual time=0.003..0.009 rows=125 loops=1)
 ->  Seq Scan on prt1_p3 prt1_3  (cost=0.00..1.50 rows=50 width=13)
(actual time=0.002..0.004 rows=50 loops=1)
   ->  Hash  (cost=1.03..1.03 rows=2 width=4) (actual time=0.005..0.005
rows=1 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on b  (cost=0.00..1.03 rows=2 width=4) (actual
time=0.003..0.003 rows=1 loops=1)
   Filter: (id = ANY ('{1,100,200}'::integer[]))
   Rows Removed by Filter: 1
 Planning Time: 0.181 ms
 Execution Time: 0.089 ms
(13 rows)

postgres=# explain analyze  select prt1.* from prt1 where a in ( select id
from b where b.id = prt1.a) and a in (1, 100, 200);
QUERY PLAN
---
 Index Scan using iprt1_p1_a on prt1_p1 prt1  (cost=0.14..14.03 rows=2
width=13) (actual time=0.024..0.025 rows=1 loops=1)
   Index Cond: (a = ANY ('{1,100,200}'::integer[]))
   Filter: (SubPlan 1)
   Rows Removed by Filter: 1
   SubPlan 1
 ->  Seq Scan on b  (cost=0.00..1.02 rows=1 width=4) (actual
time=0.003..0.003 rows=0 loops=2)
   Filter: (id = prt1.a)
   Rows Removed by Filter: 1
 Planning Time: 0.120 ms
 Execution Time: 0.041 ms
(10 rows)

postgres=# explain analyze  select prt1.* from prt1 where exists ( select 1
from b where b.id = prt1.a) and a in (1, 100, 200);
 QUERY PLAN
-
 Hash Semi Join  (cost=1.04..3.79 rows=2 width=13) (actual
time=0.024..0.028 rows=1 loops=1)
   Hash Cond: (prt1.a = b.id)
   ->  Seq Scan on prt1_p1 prt1  (cost=0.00..2.72 rows=3 width=13) (actual
time=0.011..0.017 rows=2 loops=1)
 Filter: (a = ANY ('{1,100,200}'::integer[]))
 Rows Removed by Filter: 123
   ->  Hash  (cost=1.02..1.02 rows=2 width=4) (actual time=0.004..0.004
rows=2 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on b  (cost=0.00..1.02 rows=2 width=4) (actual
time=0.001..0.002 rows=2 loops=1)
 Planning Time: 0.192 ms
 Execution Time: 0.043 ms
(10 rows)

postgres=# explain analyze  select prt1.* from prt1 inner join b on prt1.a
= b.id  where a in (1, 100, 200);
 QUERY PLAN
-
 Hash Join  (cost=1.04..3.79 rows=2 width=13) (actual time=0.024..0.028
rows=1 loops=1)
   Hash Cond: (prt1.a = b.id)
   ->  Seq Scan on prt1_p1 prt1  (cost=0.00..2.72 rows=3 wid

Re: Copyright vs Licence

2021-05-10 Thread Michael Nolan
 > What can and does happen is that a new version is released under a
different license while the old version is made obsolete.

It is often more than just 'made obsolete', updates to other parts of the
OS (that are almost impossible to avoid installing) can make it so those
older products either don't run or run VERY inefficiently.

The Wall Street Journal is paying close attention to the Apple/Epic
lawsuit, it may create some interesting (which far too frequently means
'bad for consumers') case law regarding what 'ownership' means these days.
Mike Nolan


Re: trigger impacting insertion of records

2021-05-10 Thread Michael Lewis
It seems like there is some flaw here. From my reading, on insert of any
row, you are updating ALL rows in the same table to just remove an
underscore if it matches the pattern of 'US_' at the beginning. That
doesn't seem likely to be what you want. I'd think you would want something
like the below.

CREATE OR REPLACE FUNCTION bonzipay.ussf_accountnumber_update()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$ BEGIN
if( left(NEW.accountnumber,3) = 'US_' ) then
NEW.accountnumber=replace(accountnumber,'_',' ');
RETURN NEW; END; $function$


Postgresql 13 query engine regression

2021-05-10 Thread Jonathan Chen
Hi,

I am running Postgresql 13 as a backend for Odoo, and I believe I have
discovered a regression with the query engine.

This (simplified) query generated by the ORM takes 47.683s to complete
(the result set is empty):
SELECT "account_bank_statement_line".id
FROM "account_bank_statement_line"
LEFT JOIN "account_move" AS "account_bank_statement_line__move_id"
  ON ("account_bank_statement_line"."move_id" =
"account_bank_statement_line__move_id"."id")
WHERE
(
  ("account_bank_statement_line"."move_id" in
(
  SELECT "account_move".id
  FROM "account_move"
  WHERE ("account_move"."state" = 'posted')
  AND ("account_move"."company_id" IS NULL  OR
("account_move"."company_id" in (1)))
  ORDER BY  "account_move"."id"
)
  )
  AND ("account_bank_statement_line__move_id"."journal_id" = 29)
)
ORDER BY "account_bank_statement_line__move_id"."date" DESC,
"account_bank_statement_line"."id" DESC LIMIT 1

If I remove the "LIMIT 1" on the last line, the query completes in 0.036s.

If I remove the WHERE clause, the query completes in 0.032s.

If I run the original query on Postgresql 12.6 (on a lower spec'd
host), it completes in 0.067s.
-- 
Jonathan Chen 




Re: Postgresql 13 query engine regression

2021-05-10 Thread Adrian Klaver

On 5/10/21 2:13 PM, Jonathan Chen wrote:

Hi,

I am running Postgresql 13 as a backend for Odoo, and I believe I have
discovered a regression with the query engine.


See here:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

for the information needed to get an answer to this sort of question.



This (simplified) query generated by the ORM takes 47.683s to complete
(the result set is empty):
SELECT "account_bank_statement_line".id
FROM "account_bank_statement_line"
LEFT JOIN "account_move" AS "account_bank_statement_line__move_id"
   ON ("account_bank_statement_line"."move_id" =
"account_bank_statement_line__move_id"."id")
WHERE
(
   ("account_bank_statement_line"."move_id" in
 (
   SELECT "account_move".id
   FROM "account_move"
   WHERE ("account_move"."state" = 'posted')
   AND ("account_move"."company_id" IS NULL  OR
("account_move"."company_id" in (1)))
   ORDER BY  "account_move"."id"
 )
   )
   AND ("account_bank_statement_line__move_id"."journal_id" = 29)
)
ORDER BY "account_bank_statement_line__move_id"."date" DESC,
"account_bank_statement_line"."id" DESC LIMIT 1

If I remove the "LIMIT 1" on the last line, the query completes in 0.036s.

If I remove the WHERE clause, the query completes in 0.032s.

If I run the original query on Postgresql 12.6 (on a lower spec'd
host), it completes in 0.067s.




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Postgresql 13 query engine regression

2021-05-10 Thread Marc Millas
Hi,

sorry to jump in, but looks like I have a similar pb on a 12.6 instance.
on a quite simple request,
with limit 1000 it takes 27ms, and with limit 10, it takes 145000 ms
looking at both the explain analyze plans, there is a huge difference:
with limit 1000, postgres do an optimization of the plan putting join in an
efficient order. and using a hash left join
with limit 10: no optimization. so inefficient order and reading of a huge
amount of data.and using a nested loop left join

stats  have been updated through vacuum analyze.

If interested I can put the plans (in another thread...)


Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com



On Mon, May 10, 2021 at 11:13 PM Jonathan Chen  wrote:

> Hi,
>
> I am running Postgresql 13 as a backend for Odoo, and I believe I have
> discovered a regression with the query engine.
>
> This (simplified) query generated by the ORM takes 47.683s to complete
> (the result set is empty):
> SELECT "account_bank_statement_line".id
> FROM "account_bank_statement_line"
> LEFT JOIN "account_move" AS "account_bank_statement_line__move_id"
>   ON ("account_bank_statement_line"."move_id" =
> "account_bank_statement_line__move_id"."id")
> WHERE
> (
>   ("account_bank_statement_line"."move_id" in
> (
>   SELECT "account_move".id
>   FROM "account_move"
>   WHERE ("account_move"."state" = 'posted')
>   AND ("account_move"."company_id" IS NULL  OR
> ("account_move"."company_id" in (1)))
>   ORDER BY  "account_move"."id"
> )
>   )
>   AND ("account_bank_statement_line__move_id"."journal_id" = 29)
> )
> ORDER BY "account_bank_statement_line__move_id"."date" DESC,
> "account_bank_statement_line"."id" DESC LIMIT 1
>
> If I remove the "LIMIT 1" on the last line, the query completes in 0.036s.
>
> If I remove the WHERE clause, the query completes in 0.032s.
>
> If I run the original query on Postgresql 12.6 (on a lower spec'd
> host), it completes in 0.067s.
> --
> Jonathan Chen 
>
>
>


Re: Postgresql 13 query engine regression

2021-05-10 Thread Tom Lane
Jonathan Chen  writes:
> I am running Postgresql 13 as a backend for Odoo, and I believe I have
> discovered a regression with the query engine.

As Adrian noted, you haven't provided enough detail to let anyone
offer more than generalities.  However, in the spirit of generalities ...

ORDER BY with a small LIMIT is often a hard problem for the planner.
It has to guess whether a "fast start" plan is likely to win out over
a more straightforward plan.  "Fast start" typically looks like "scan
from the end of an index on the ORDER BY column, immediately emitting
any row that satisfies the WHERE conditions, and stop when you have
enough".  The other way generally involves collecting all the rows
satisfying WHERE, sorting them, and emitting the first few sort
outputs.  For small LIMIT, the "fast start" way can win big, by
not having to visit most of the rows nominally satisfying WHERE.
It can also lose big, if most of the rows near the end of the index
turn out not to satisfy the WHERE, so that it has to keep on scanning.

With the limited facts you've offered, it appears that the v12 planner
chose the right way and v13 didn't, but we can't tell which way is the
one that was faster.  It's unlikely that this is an actual regression,
in the sense of being anything we could fix.  It's at least as likely
that v12 was just lucky, or was right for the wrong reasons.  With an
example like you have here, with hard-to-estimate WHERE conditions,
there's a lot of luck involved ... especially if the WHERE conditions
are at all correlated with the ORDER BY order.  In this case, with
an "IN sub-SELECT" condition that looks suspiciously correlated with
the outer join condition, it's quite likely that the planner is
arriving at a totally off-base guess about how many rows the IN
eliminates.

Rather than blaming the planner for not being lucky, you should be
thinking about what you can do to help it out.  Can you get rid of
the additional join to account_move?  (Note that the one I'm
suspecting is extra is the LEFT JOIN, not the IN.)  I'd also
counsel losing the ORDER BY in the sub-select, as that's either
going to be ignored altogether or force a probably-useless sort,
not to mention creating an optimization barrier.

regards, tom lane




Re: Postgresql 13 query engine regression

2021-05-10 Thread Jonathan Chen
Hi,

On Tue, 11 May 2021 at 10:34, Tom Lane  wrote:
[...]
> As Adrian noted, you haven't provided enough detail to let anyone
> offer more than generalities.

I'll be providing the output of EXPLAIN (ANALYZE, BUFFERS) in a separate post.

> However, in the spirit of generalities ...
[.. helpful explanation snipped ..]

> Rather than blaming the planner for not being lucky, you should be
> thinking about what you can do to help it out.  Can you get rid of
> the additional join to account_move?  (Note that the one I'm
> suspecting is extra is the LEFT JOIN, not the IN.)  I'd also
> counsel losing the ORDER BY in the sub-select, as that's either
> going to be ignored altogether or force a probably-useless sort,
> not to mention creating an optimization barrier.

Unfortunately, the SQL is generated by Odoo's ORM, and as much as I'd
like to be able to 'help it', their code has too many feature-hooks
for me to easily hack their code ... Odoo has used Postgresql as it s
database since its inception back in 2008. It is just unfortunate that
it has hit this stumbling block with the latest version of the
planner.

Cheers.
-- 
Jonathan Chen 




Re: Postgresql 13 query engine regression

2021-05-10 Thread Jonathan Chen
Hi,

Here's a bit more detail:
select version();

version
-
 PostgreSQL 13.2 on amd64-portbld-freebsd12.2, compiled by FreeBSD
clang version 10.0.1 (g...@github.com:llvm/llvm-project.git
llvmorg-10.0.1-0-gef32c611aa2), 64-bit

For reference, the query is (this is generated by the Odoo ORM, with
all its redundant bits):
SELECT "account_bank_statement_line".id
FROM "account_bank_statement_line"
LEFT JOIN "account_move" AS "account_bank_statement_line__move_id"
  ON ("account_bank_statement_line"."move_id" =
"account_bank_statement_line__move_id"."id")
WHERE
(
  ("account_bank_statement_line"."move_id" in
(
  SELECT "account_move".id
  FROM "account_move"
  WHERE ("account_move"."state" = 'posted')
  AND ("account_move"."company_id" IS NULL  OR
("account_move"."company_id" in (1)))
  ORDER BY  "account_move"."id"
)
  )
  AND ("account_bank_statement_line__move_id"."journal_id" = 29)
)
order by "account_bank_statement_line__move_id"."date" DESC,
"account_bank_statement_line"."id" DESC LIMIT 1;

Query plan:
 Limit  (cost=14226.75..15563.99 rows=1 width=8) (actual
time=129162.351..129162.409 rows=0 loops=1)
   Buffers: shared hit=97340
   ->  Nested Loop Semi Join  (cost=14226.75..6134771.07 rows=4577
width=8) (actual time=129162.350..129162.408 rows=0 loops=1)
 Join Filter: (account_bank_statement_line.move_id = account_move.id)
 Rows Removed by Join Filter: 1198285616
 Buffers: shared hit=97340
 ->  Gather Merge  (cost=1010.15..20911.67 rows=4577 width=16)
(actual time=3.596..24.065 rows=13453 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Buffers: shared hit=93061
   ->  Incremental Sort  (cost=10.14..19396.75 rows=2692
width=16) (actual time=0.669..34.507 rows=6726 loops=2)
 Sort Key:
account_bank_statement_line__move_id.date DESC,
account_bank_statement_line.id DESC
 Presorted Key: account_bank_statement_line__move_id.date
 Full-sort Groups: 29  Sort Method: quicksort
Average Memory: 27kB  Peak Memory: 27kB
 Pre-sorted Groups: 17  Sort Method: quicksort
Average Memory: 30kB  Peak Memory: 30kB
 Buffers: shared hit=93061
 Worker 0:  Full-sort Groups: 185  Sort Method:
quicksort  Average Memory: 28kB  Peak Memory: 28kB
   Pre-sorted Groups: 56  Sort Method: quicksort
Average Memory: 28kB  Peak Memory: 28kB
 ->  Nested Loop  (cost=0.58..19299.09 rows=2692
width=16) (actual time=0.284..31.917 rows=6726 loops=2)
   Buffers: shared hit=92996
   ->  Parallel Index Scan Backward using
account_move_date_index on account_move
account_bank_statement_line__move_id  (cost=0.29..13152.37 rows=16040
width=8) (actual time=0.271..14.465 rows=13556 loops=2)
 Filter: (journal_id = 29)
 Rows Removed by Filter: 40958
 Buffers: shared hit=25318
   ->  Index Scan using
account_bank_statement_line_move_id_idx on account_bank_statement_line
 (cost=0.29..0.37 rows=1 width=8) (actual time=0.001..0.001 rows=0
loops=27112)
 Index Cond: (move_id =
account_bank_statement_line__move_id.id)
 Buffers: shared hit=67678
 ->  Materialize  (cost=13216.60..14771.30 rows=88840 width=4)
(actual time=0.003..3.716 rows=89072 loops=13453)
   Buffers: shared hit=4279
   ->  Sort  (cost=13216.60..13438.70 rows=88840 width=4)
(actual time=35.560..39.472 rows=89072 loops=1)
 Sort Key: account_move.id
 Sort Method: quicksort  Memory: 6906kB
 Buffers: shared hit=4279
 ->  Seq Scan on account_move  (cost=0.00..5914.43
rows=88840 width=4) (actual time=0.007..21.195 rows=89072 loops=1)
   Filter: (((company_id IS NULL) OR
(company_id = 1)) AND ((state)::text = 'posted'::text))
   Rows Removed by Filter: 19957
   Buffers: shared hit=4279
 Planning:
   Buffers: shared hit=83
 Planning Time: 0.708 ms
 Execution Time: 129162.830 ms

If I remove the "LIMIT 1", the Query plan changes to:
 Sort  (cost=21589.50..21600.94 rows=4577 width=8) (actual
time=46.266..46.269 rows=0 loops=1)
   Sort Key: account_bank_statement_line__move_id.date DESC,
account_bank_statement_line.id DESC
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=4390
   ->  Hash Join  (cost=16543.33..21311.21 rows=4577 width=8) (actual
time=46.263..46.266 rows=0 loops=1)
 Hash Cond: (account_bank_statement_line__move_id.id =
account_

Re: Postgresql 13 query engine regression

2021-05-10 Thread Jonathan Chen
Hmm, there's a bit of text-mangling.

Let's use https://explain.depesz.com.

PG13 LIMIT 1 : https://explain.depesz.com/s/GFki
PG13 no LIMIT: https://explain.depesz.com/s/U4vR
PG12 LIMIT 1 : https://explain.depesz.com/s/JAp4
-- 
Jonathan Chen 




Re: Postgresql 13 query engine regression

2021-05-10 Thread David Rowley
On Tue, 11 May 2021 at 11:34, Jonathan Chen  wrote:
> PG13 LIMIT 1 : https://explain.depesz.com/s/GFki
> PG13 no LIMIT: https://explain.depesz.com/s/U4vR
> PG12 LIMIT 1 : https://explain.depesz.com/s/JAp4

The difference is coming from the fact that PostgreSQL 13 has
incremental sort and can use the account_move_date_index to provide
partially sorted input for the ORDER BY clause. PG12 didn't have this
feature, so there was no index that could help getting pre-sorted
input to the ORDER BY.

You'd get along much better if you got rid of the
account_move_date_index index and replaced it with:

CREATE INDEX account_move_date_id_index ON account_move (date,id);

Or instead or removing account_move_date_index, you could add an index such as:

CREATE INDEX account_move_journal_id_date_index (journal_id, date, id);

That should allow the query to run a bit more quickly. However, if the
first of the two is fast enough then it might be better to not add too
many extra indexes.

David




Re: Postgresql 13 query engine regression

2021-05-10 Thread Jonathan Chen
On Tue, 11 May 2021 at 12:49, David Rowley  wrote:
>
> On Tue, 11 May 2021 at 11:34, Jonathan Chen  wrote:
> > PG13 LIMIT 1 : https://explain.depesz.com/s/GFki
> > PG13 no LIMIT: https://explain.depesz.com/s/U4vR
> > PG12 LIMIT 1 : https://explain.depesz.com/s/JAp4
>
> The difference is coming from the fact that PostgreSQL 13 has
> incremental sort and can use the account_move_date_index to provide
> partially sorted input for the ORDER BY clause. PG12 didn't have this
> feature, so there was no index that could help getting pre-sorted
> input to the ORDER BY.
>
> You'd get along much better if you got rid of the
> account_move_date_index index and replaced it with:
>
> CREATE INDEX account_move_date_id_index ON account_move (date,id);
>
> Or instead or removing account_move_date_index, you could add an index such 
> as:
>
> CREATE INDEX account_move_journal_id_date_index (journal_id, date, id);
>
> That should allow the query to run a bit more quickly. However, if the
> first of the two is fast enough then it might be better to not add too
> many extra indexes.

If I drop the index account_move_date_index, the speed of the query
improves to 0.035s. Adding the replacement index on (date, id) or
(journal_id, date, id) increases the query to > 47s again.

This gives me a bit of something to play with.

Thanks for the insight.
-- 
Jonathan Chen