AW: Performance issue with order by clause on

2019-03-20 Thread Stephan Schmidt
Have you tried creating an sorted index like

CREATE INDEX table_modificationtime_idx ON “TABLE“ USING btree(modificationtime 
DESC) WHERE (enabled=1)?

Best Regards
Stephan


Von: Corey Huinker
Gesendet: Mittwoch, 20. März 2019 18:34
An: Maracska Ádám
Cc: [email protected]
Betreff: Re: Performance issue with order by clause on

On Wed, Mar 20, 2019 at 9:36 AM Maracska Ádám 
mailto:[email protected]>> wrote:
Hi,

I would like to overcome an issue which occurs only in case with order by 
clause.

Details:
I am trying to insert into a temporary table 50 rows from a joined table 
ordered by a modification time column which is inserted by the current time so 
it is ordered ascending.

Each table has index on the following columns: PRIMARY KEY(SystemID, ObjectID, 
ElementID,  ModificationTime)

Statement:

sqlString := 'INSERT INTO ResultTable (
SELECT * FROM "TABLE" a  LEFT OUTER JOIN "TABLE_Text" l1031  ON  
a.ModificationTime = l1031.ModificationTime AND a.SystemID = l1031.SystemID AND 
a.ObjectID = l1031.ObjectID AND a.ElementID = l1031.ElementID  AND l1031.LCID = 
1031 LEFT OUTER JOIN  ( SELECT * AS CommentNumber FROM "TABLE_Comment" v1  
GROUP BY v1.ModificationTime, v1.SystemID, v1.ObjectID, v1.ElementID ) c ON  
a.ModificationTime = c.ModificationTime AND a.SystemID = c.SystemID AND 
a.ObjectID = c.ObjectID AND a.ElementID = c.ElementID WHERE a.ModificationTime 
BETWEEN $1 AND $2  AND ( a.Enabled = 1 ) ORDER BY a.ModificationTime DESC LIMIT 
50));

EXECUTE sqlString USING StartTime,EndTime;


node type

count

sum of times

% of query

Hash

1

8.844 ms

10.0 %

Hash Left Join

1

33.715 ms

38.0 %

Insert

1

0.734 ms

0.8 %

Limit

1

0.003 ms

0.0 %

Seq Scan

2

22.735 ms

25.6 %

Sort

1

22.571 ms

25.5 %

Subquery Scan

1

0.046 ms

0.1 %




Execution Plan: https://explain.depesz.com/s/S96g (Obfuscated)


If I remove the order by clause I get the following results:
node type

count

sum of times

% of query

Index Scan

2

27.632 ms

94.9 %

Insert

1

0.848 ms

2.9 %

Limit

1

0.023 ms

0.1 %

Merge Left Join

1

0.423 ms

1.5 %

Result

1

0.000 ms

0.0 %

Subquery Scan

1

0.186 ms

0.6 %


Which is pointing me to a problem with the sorting. Is there any way that I 
could improve the performance with order by clause?

To make the problem more transparent I ran a long run test where you can see 
that with order by clause the performance is linearly getting worse:

[image.png]


Postgresql version: "PostgreSQL 11.1, compiled by Visual C++ build 1914, 64-bit"

Istalled by: With  EnterpriseDB One-click installer from EDB's offical site.

Postgresql.conf changes: Used pgtune suggestions:
# DB Version: 11
# OS Type: windows
# DB Type: desktop
# Total Memory (RAM): 8 GB
# CPUs num: 4
# Connections num: 25
# Data Storage: hdd
max_connections = 25
shared_buffers = 512MB
effective_cache_size = 2GB
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.5
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
work_mem = 8738kB
min_wal_size = 100MB
max_wal_size = 1GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4

Operating System: Windows 10 x64, Version: 1607

Thanks in advance,
Best Regards,
Tom Nay

The queries are not equivalent. One returns the first 50 rows it finds 
regardless of what qualities they possess, and the other one must fetch all 
rows and then decide which 50 are the most recent.

They're the difference between:
Find any 10 people in your city.
Find the TALLEST 10 people in your city.  This will scale poorly in large 
cities.

If you have an index on ModificationTime, then the query can seek to the 
highest row matching the between clause, and walk backwards looking for rows 
that match any other criteria, so that will help, because it will avoid the 
sort.



Poor man's partitioned index .... not being used?

2019-03-20 Thread Gunther

Hi all, look at this short story please:

foo=# CREATE TABLE Test(id int NOT NULL PRIMARY KEY);
CREATE TABLE
foo=# INSERT INTO test SELECT row_number() OVER() FROM pg_class a CROSS JOIN 
pg_class b;
INSERT 0 388129
foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934';
QUERY PLAN
---
 Index Only Scan using test_pkey on test  (cost=0.42..8.44 rows=1 width=4)
   Index Cond: (id = 8934)
(2 rows)

foo=# ALTER TABLE Test DROP CONSTRAINT Test_pkey;
ALTER TABLE
foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934';
   QUERY PLAN
---
 Seq Scan on test  (cost=0.00..6569.61 rows=1 width=4)
   Filter: (id = 8934)
(2 rows)

foo=# SELECT max(id)/2 FROM Test;
 ?column?
--
   194064
(1 row)

foo=# CREATE UNIQUE INDEX Test_pk0 ON Test(id) WHERE id < 194064;
CREATE INDEX
foo=# CREATE UNIQUE INDEX Test_pk1 ON Test(id) WHERE id >= 194064;
CREATE INDEX
foo=# ANALYZE Test;
ANALYZE
foo=# EXPLAIN SELECT * FROM Test WHERE id = 8934;
QUERY PLAN
--
 Index Only Scan using test_pk0 on test  (cost=0.42..8.44 rows=1 width=4)
   Index Cond: (id = 8934)
(2 rows)


foo=# DROP INDEX Test_pk0;
DROP INDEX
foo=# DROP INDEX Test_pk1;
DROP INDEX

foo=# CREATE UNIQUE INDEX Test_pk0 ON Test(id) WHERE mod(id,2) = 0;
CREATE INDEX
foo=# CREATE UNIQUE INDEX Test_pk1 ON Test(id) WHERE mod(id,2) = 1;
CREATE INDEX
foo=# ANALYZE Test;
ANALYZE
foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934';
  QUERY PLAN
---
 Seq Scan on test  (cost=0.00..6569.61 rows=1 width=4)
   Filter: (id = 8934)
(2 rows)

Why is that index never used?

PS: there is a performance question behind this, big table, heavily used index,
the hope was that with this simple scheme of partitioning just the index one 
might
distribute the load better. I know, if the load really is so big, why not 
partition
the entire table. But just for hecks, why not this way?

regards,
-Gunther



Re: Poor man's partitioned index .... not being used?

2019-03-20 Thread David Rowley
On Thu, 21 Mar 2019 at 15:51, Gunther  wrote:
> foo=# CREATE UNIQUE INDEX Test_pk0 ON Test(id) WHERE mod(id,2) = 0;
> CREATE INDEX
> foo=# CREATE UNIQUE INDEX Test_pk1 ON Test(id) WHERE mod(id,2) = 1;
> CREATE INDEX
> foo=# ANALYZE Test;
> ANALYZE
> foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934';
>   QUERY PLAN
> ---
>  Seq Scan on test  (cost=0.00..6569.61 rows=1 width=4)
>Filter: (id = 8934)
> (2 rows)
>
> Why is that index never used?

When the planner looks at partial indexes to see if they'll suit the
scan, the code that does the matching (predicate_implied_by()) simply
does not go to that much trouble to determine if it matches. If you
look at operator_predicate_proof() you'll see it requires the
expression on at least one side of the OpExpr to match your predicate.
Yours matches on neither side since "id" is wrapped up in a mod()
function call.

Certainly, predicate_implied_by() is by no means finished, new smarts
have been added to it over the years to allow it to prove more cases,
but each time something is added we still need to carefully weigh up
the additional overhead of the new code vs. possible benefits.

It may be possible to do something with immutable functions found in
the expr but someone doing so might have a hard time proving that it's
always safe to do so. For example, arg 2 of your mod() call is a
Const. If it had been another Var then it wouldn't be safe to use.
What other unsafe cases are there? Is there a way we can always
identify unsafe cases during planning? ... are the sorts of questions
someone implementing this would be faced with.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Poor man's partitioned index .... not being used?

2019-03-20 Thread Andrew Gierth
> "Gunther" == Gunther   writes:

 Gunther> foo=# CREATE UNIQUE INDEX Test_pk0 ON Test(id) WHERE mod(id,2) = 0;
 Gunther> CREATE INDEX

 Gunther> foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934';
 Gunther>   QUERY PLAN
 Gunther> ---
 Gunther>  Seq Scan on test  (cost=0.00..6569.61 rows=1 width=4)
 Gunther>Filter: (id = 8934)
 Gunther> (2 rows)

 Gunther> Why is that index never used?

Because the expression mod(id,2) does not appear in the query, and there
is no logic in the implication prover to prove that (mod(id,2) = 0) is
implied by (id = 8934).

If you did  WHERE mod(id,2) = mod(8934,2) AND id = 8934

then the index would likely be used - because the prover can then treat
mod(id,2) as an atom (call it X), constant-fold mod(8934,2) to 0 because
mod() is immutable, and then observe that (X = 0) proves that (X = 0).

Pretty much the only simple implications that the prover can currently
deduce are:

  - identical immutable subexpressions are equivalent

  - strict operator expressions imply scalar IS NOT NULL

  - (A op1 B) implies (B op2 A) if op2 is op1's declared commutator

  - Btree semantics: if <, <=, =, >=, > are all members of a btree
opfamily, and <> is the declared negator of =, then implications
like (X < A) and (A <= B) implies (X < B) can be deduced.

-- 
Andrew (irc:RhodiumToad)