Hello,
We have 2 TPC-H queries which fetch the same tuples but have significant query
execution time differences (22.0 times).
We are sharing a pair of TPC-H queries that exhibit this performance difference:
First query:
SELECT "orders3"."o_comment",
"orders3"."o_orderstatus",
"orders3"."o_orderkey",
"t17"."ps_partkey",
"t17"."ps_supplycost",
"t17"."ps_comment",
"orders3"."o_clerk",
"orders3"."o_totalprice",
"t17"."ps_availqty",
"t17"."ps_suppkey"
FROM (
SELECT *
FROM "partsupp"
WHERE "ps_comment" LIKE ', even theodolites. regular, final
theodolites eat after the carefully pending foxes. furiously regular deposits
sleep slyly. carefully bold realms above the ironic dependencies haggle
careful') AS "t17"
LEFT JOIN "orders"
AS "orders3"
ON true
ORDER BY "t17"."ps_supplycost"FETCH next 14 rows only
Second query:
SELECT "orders3"."o_comment",
"orders3"."o_orderstatus",
"orders3"."o_orderkey",
"t17"."ps_partkey",
"t17"."ps_supplycost",
"t17"."ps_comment",
"orders3"."o_clerk",
"orders3"."o_totalprice",
"t17"."ps_availqty",
"t17"."ps_suppkey"
FROM (
SELECT *
FROM "partsupp"
WHERE "ps_comment" LIKE ', even theodolites. regular,
final theodolites eat after the carefully pending foxes. furiously regular
deposits sleep slyly. carefully bold realms above the ironic dependencies
haggle careful'
ORDER BY "ps_supplycost"FETCH next 14 rows only) AS "t17"
LEFT JOIN "orders" AS "orders3"
ON true
ORDER BY "t17"."ps_supplycost"FETCH next 14 rows only
* Actual Behavior
We executed both queries on the TPC-H benchmark of scale factor 5: the first
query takes over 8 seconds, while the second query only takes 0.3 seconds.
We think the time difference results from different plans selected.
Specifically, in the first (slow) query, the DBMS performs a left join using
entire table partsupp, while the second (fast) query performs a left join using
only 14 rows from partsupp).
* Query Execution Plan
* First query:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=464628.69..464630.32 rows=14 width=223) (actual
time=8082.764..8082.767 rows=14 loops=1)
-> Gather Merge (cost=464628.69..1193917.91 rows=6250614 width=223)
(actual time=8082.762..8087.639 rows=14 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=463628.66..471441.93 rows=3125307 width=223) (actual
time=2933.506..2933.506 rows=5 loops=3)
Sort Key: partsupp.ps_supplycost
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 32kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=0.00..388506.36 rows=3125307
width=223) (actual time=360.602..1643.471 rows=2500000 loops=3)
-> Parallel Seq Scan on partsupp (cost=0.00..108031.62
rows=1 width=144) (actual time=360.577..360.599 rows=0 loops=3)
Filter: ((ps_comment)::text ~~ ', even theodolites.
regular, final theodolites eat after the carefully pending foxes. furiously
regular deposits sleep slyly. carefully bold realms above the ironic
dependencies haggle careful'::text)
Rows Removed by Filter: 1333333
-> Seq Scan on orders orders3 (cost=0.00..205467.37
rows=7500737 width=79) (actual time=0.064..1544.990 rows=7500000 loops=1)
Planning Time: 0.278 ms
Execution Time: 8087.714 ms
(16 rows)
* Second query:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----
Limit (cost=109031.74..109032.26 rows=14 width=223) (actual
time=363.883..363.890 rows=14 loops=1)
-> Nested Loop Left Join (cost=109031.74..389506.49 rows=7500737
width=223) (actual time=363.882..363.887 rows=14 loops=1)
-> Limit (cost=109031.74..109031.74 rows=1 width=144) (actual
time=363.859..363.859 rows=1 loops=1)
-> Sort (cost=109031.74..109031.74 rows=1 width=144) (actual
time=363.858..363.858 rows=1 loops=1)
Sort Key: partsupp.ps_supplycost
Sort Method: quicksort Memory: 25kB
-> Gather (cost=1000.00..109031.73 rows=1 width=144)
(actual time=363.447..370.107 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on partsupp
(cost=0.00..108031.62 rows=1 width=144) (actual time=360.033..360.101 rows=0
loops=3)
Filter: ((ps_comment)::text ~~ ', even
theodolites. regular, final theodolites eat after the carefully pending foxes.
furiously regular deposits sleep slyly. carefully bold realms above the ironic
dependencies haggle careful'::t
ext)
Rows Removed by Filter: 1333333
-> Seq Scan on orders orders3 (cost=0.00..205467.37 rows=7500737
width=79) (actual time=0.016..0.017 rows=14 loops=1)
Planning Time: 0.228 ms
Execution Time: 370.200 ms
(15 rows)
*Expected Behavior
Since these two queries are semantically equivalent, we were hoping that
PostgreSQL would evaluate them in roughly the same amount of time.
It looks to me that there is a missing optimization rule related to pushing the
sort operator (i.e., order and limit) through the left join.
Given the significant query execution time difference, I was wondering if it is
worth adding such a rule to make the system evaluate the first query more
efficiently.
It would also be helpful if you could comment on if there is a standard
practice to evaluate the tradeoff associated with adding such a rule in
Postgresql.
*Test Environment
Ubuntu 20.04 machine "Linux panda 5.4.0-40-generic #44-Ubuntu SMP Tue Jun 23
00:01:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"
PostgreSQL v12.3
Database: TPC-H benchmark (with scale factor 5)
The description of table partsupp and orders is as follows:
tpch5=# \d partsupp;
Table "public.partsupp"
Column | Type | Collation | Nullable | Default
---------------+------------------------+-----------+----------+---------
ps_partkey | integer | | not null |
ps_suppkey | integer | | not null |
ps_availqty | integer | | not null |
ps_supplycost | numeric(15,2) | | not null |
ps_comment | character varying(199) | | not null |
Indexes:
"partsupp_pkey" PRIMARY KEY, btree (ps_partkey, ps_suppkey)
Foreign-key constraints:
"partsupp_fk1" FOREIGN KEY (ps_suppkey) REFERENCES supplier(s_suppkey)
"partsupp_fk2" FOREIGN KEY (ps_partkey) REFERENCES part(p_partkey)
Referenced by:
TABLE "lineitem" CONSTRAINT "lineitem_fk2" FOREIGN KEY (l_partkey,
l_suppkey) REFERENCES partsupp(ps_partkey, ps_suppkey)
tpch5=# \d orders;
Table "public.orders"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------+-----------+----------+---------
o_orderkey | integer | | not null |
o_custkey | integer | | not null |
o_orderstatus | character(1) | | not null |
o_totalprice | numeric(15,2) | | not null |
o_orderdate | date | | not null |
o_orderpriority | character(15) | | not null |
o_clerk | character(15) | | not null |
o_shippriority | integer | | not null |
o_comment | character varying(79) | | not null |
Indexes:
"orders_pkey" PRIMARY KEY, btree (o_orderkey)
Foreign-key constraints:
"orders_fk1" FOREIGN KEY (o_custkey) REFERENCES customer(c_custkey)
Referenced by:
TABLE "lineitem" CONSTRAINT "lineitem_fk1" FOREIGN KEY (l_orderkey)
REFERENCES orders(o_orderkey)
*Here are the steps for reproducing our observations:
1. Download the dataset from the link:
https://drive.google.com/file/d/13rFa1BNDi4e2RmXBn-yEQkcqt6lsBu1c/view?usp=sharing
2. Set up TPC-H benchmark
tar xzvf tpch5_postgresql.tar.gz
cd tpch5_postgresql
db=tpch5
createdb $db
psql -d $db < dss.ddl
for i in `ls *.tbl`
do
echo $i
name=`echo $i|cut -d'.' -f1`
psql -d $db -c "COPY $name FROM '`pwd`/$i' DELIMITER '|' ENCODING 'LATIN1';"
done
psql -d $db < dss_postgres.ri
1. Execute the queries