query planner not using index, instead using squential scan
I am using postgres 12 on AWS RDS
could someone clarify why the LEFT JOIN order_offer_map oom using
(order_id) in the below query is using sequential scan instead of
using index on order_id which is defined in order_offer_map table.
explain ANALYZE
WITH business AS( SELECT * FROM get_businessday_utc_f() start_date)
SELECT ro.order_id,
ro.date_time,
round(ro.order_amount, 2) AS order_amount,
b.branch_id,
b.branch_name,
st_x(b.location) AS from_x,
st_y(b.location) AS from_y,
b.user_id AS branch_user_id,
b.contact_info,
r.restaurant_id,
c.city_id,
c.city_name,
c.city_name_ar,
st_linefromtext'LINESTRING('::text || st_x(b.location)) ||
' '::text) || st_y(b.location)) || ','::text) ||
st_x(ro.location_geometry)) || ' '::text) ||
st_y(ro.location_geometry)) || ')'::text, 28355) AS from_to,
to_char(ro.date_time, 'HH24:MI'::text) AS order_time,
ro.customer_comment,
'N'::text AS is_new_customer,
ro.picked_up_time,
ro.driver_assigned_date_time,
oom.offer_amount,
oom.offer_type_code AS offer_type,
ro.uk_vat
FROM business, restaurant_order ro
JOIN branch b ON b.branch_id = ro.branch_id
JOIN restaurant r ON r.restaurant_id = b.restaurant_id
JOIN city c ON c.city_id = b.city_id
LEFT JOIN order_offer_map oom using (order_id)
WHERE ro.date_time >= business.start_date AND ro.date_time<=
f_now_immutable_with_tz();
Hash Join (cost=39897.37..161872.06 rows=259399 width=494) (actual
time=605.767..2060.712 rows=156253 loops=1)
Hash Cond: (b.city_id = c.city_id)
-> Hash Join (cost=39895.11..78778.16 rows=259399 width=355) (actual
time=605.583..789.863 rows=156253 loops=1)
Hash Cond: (b.restaurant_id = r.restaurant_id)
-> Hash Join (cost=39542.41..77744.20 rows=259399 width=307)
(actual time=602.096..738.765 rows=156253 loops=1)
Hash Cond: (ro.branch_id = b.branch_id)
-> Hash Right Join (cost=38607.06..76127.79 rows=259399
width=225) (actual time=591.342..672.039 rows=156253 loops=1)
Hash Cond: (oom.order_id = ro.order_id)
-> Seq Scan on order_offer_map oom
(cost=0.00..34179.09 rows=1273009 width=15) (actual time=0.007..91.121
rows=1273009 loops=1)
-> Hash (cost=35364.57..35364.57 rows=259399
width=218) (actual time=244.571..244.571 rows=156253 loops=1)
Buckets: 262144 Batches: 1 Memory Usage: 29098kB
-> Index Scan using "idx$$_00010001" on
restaurant_order ro (cost=0.56..35364.57 rows=259399 width=218) (actual
time=0.033..195.939 rows=156253 loops=1)
Index Cond: ((date_time >= '2021-05-27
05:00:00'::timestamp without time zone) AND (date_time <= '2021-06-05
16:38:22.758875+00'::timestamp with time zone))
Filter: (order_status_code = 'D'::bpchar)
Rows Removed by Filter: 73969
-> Hash (cost=673.49..673.49 rows=20949 width=90) (actual
time=10.715..10.715 rows=20949 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 2758kB
-> Seq Scan on branch b (cost=0.00..673.49 rows=20949
width=90) (actual time=0.006..6.397 rows=20949 loops=1)
-> Hash (cost=245.09..245.09 rows=8609 width=56) (actual
time=3.466..3.467 rows=8609 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 903kB
-> Seq Scan on restaurant r (cost=0.00..245.09 rows=8609
width=56) (actual time=0.003..2.096 rows=8609 loops=1)
-> Hash (cost=1.56..1.56 rows=56 width=29) (actual time=0.026..0.026
rows=56 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on city c (cost=0.00..1.56 rows=56 width=29) (actual
time=0.007..0.015 rows=56 loops=1)
Planning Time: 1.377 ms
Execution Time: 2071.965 ms
-Ayub
Re: query planner not using index, instead using squential scan
Ayub Khan writes: > could someone clarify why the LEFT JOIN order_offer_map oom using > (order_id) in the below query is using sequential scan instead of > using index on order_id which is defined in order_offer_map table. Probably because it estimates the hash join to restaurant_order is faster than a nestloop join would be. I think it's likely right. You'd need very optimistic assumptions about the cost of an individual index probe into order_offer_map to conclude that 156K of them would be faster than the 476ms that are being spent here to read order_offer_map and join it to the result of the indexscan on restaurant_order. If, indeed, that *is* faster on your hardware, you might want to dial down random_page_cost to get more-relevant estimates. regards, tom lane
Re: query planner not using index, instead using squential scan
thanks Tom. I was trying to simulate some scenarios to be able to explain how the plan would change with/without *Rows Removed by Filter: 73969 * -- by using a different/correct index. postgres=# \d t Table "public.t" Column |Type | Collation | Nullable | Default +-+---+--+- id | integer | | not null | created_on | timestamp without time zone | | | col1 | text| | | Indexes: "t_pkey" PRIMARY KEY, btree (id) "t_created_on_idx" btree (created_on) WHERE col1 = 'a'::text --- useless index as all rows have col1 = 'a', but to attempt lossy case "t_created_on_idx1" btree (created_on) Referenced by: TABLE "t1" CONSTRAINT "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id) postgres=# \d t1 Table "public.t1" Column | Type | Collation | Nullable | Default +-+---+--+- t1_id | integer | | not null | id | integer | | | col2 | text| | | Indexes: "t1_pkey" PRIMARY KEY, btree (t1_id) Foreign-key constraints: "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id) postgres=# update t set col1 = 'a'; UPDATE 1000 postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id) where created_on = '2021-06-01 12:48:45.141123'; QUERY PLAN Hash Join (cost=37.01..39.28 rows=1 width=4) (actual time=0.124..0.125 rows=0 loops=1) Hash Cond: (t1.id = t.id) -> Seq Scan on t1 (cost=0.00..2.00 rows=100 width=4) (actual time=0.004..0.008 rows=100 loops=1) -> Hash (cost=37.00..37.00 rows=1 width=4) (actual time=0.109..0.109 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on t (cost=0.00..37.00 rows=1 width=4) (actual time=0.058..0.107 rows=1 loops=1) Filter: (created_on = '2021-06-01 12:48:45.141123'::timestamp without time zone) *Rows Removed by Filter: 999 --- as no useful index, t_created_on_idx will fetch all pages and then remove rows from them, expensive* Planning Time: 0.111 ms Execution Time: 0.162 ms (10 rows) postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id) where created_on = '2021-06-01 12:48:45.141123'; QUERY PLAN - Hash Join (cost=8.32..33.47 rows=1 width=4) (actual time=0.025..0.026 rows=0 loops=1) Hash Cond: (t1.id = t.id) -> Seq Scan on t1 (cost=0.00..22.00 rows=1200 width=4) (actual time=0.009..0.009 rows=1 loops=1) -> Hash (cost=8.31..8.31 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Index Scan using t_created_on_idx1 on t (cost=0.29..8.31 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1) Index Cond: (created_on = '2021-06-01 12:48:45.141123'::timestamp without time zone) -- *exact match using btree index, * Planning Time: 0.255 ms Execution Time: 0.071 ms (9 rows) but from Ayub's plan, the number of rows fetched are a lot, but is also removing rows post index scan. if that can be improved with a btree index that does not filter unwanted rows, the run may be faster ? but i guess if there are 156k rows, planner would a have found a win in seq scan. Ayub, just for the sake of understanding, can you run the query using postgres=# set enable_seqscan TO 0; SET postgres=# -- explain analyze postgres=# set enable_seqscan TO 1; SET On Sun, 6 Jun 2021 at 00:59, Tom Lane wrote: > Ayub Khan writes: > > could someone clarify why the LEFT JOIN order_offer_map oom using > > (order_id) in the below query is using sequential scan instead of > > using index on order_id which is defined in order_offer_map table. > > Probably because it estimates the hash join to restaurant_order is > faster than a nestloop join would be. I think it's likely right. > You'd need very optimistic assumptions about the cost of an > individual index probe into order_offer_map to conclude that 156K > of them would be faster than the 476ms that are being spent here > to read order_offer_map and join it to the result of the > indexscan on restaurant_order. > > If, indeed, that *is* faster on your hardware, you might want > to dial down random_page_cost to get more-relevant estimates. > > regards, tom lane > > > -- Thanks, Vijay Mumbai, India
Re: query planner not using index, instead using squential scan
by setting enable_sequence_scan=OFF, the query execution time seems to have slowed down even though the index is being used for left join of order_offer_map Hash Left Join (cost=72639.74..8176118.25 rows=19276467 width=293) (actual time=858.853..3166.994 rows=230222 loops=1) Hash Cond: (ro.order_id = oom.order_id) -> Hash Join (cost=1947.33..2053190.95 rows=19276467 width=211) (actual time=20.550..462.303 rows=230222 loops=1) Hash Cond: (b.city_id = c.city_id) -> Hash Join (cost=1937.65..1998751.06 rows=19276467 width=190) (actual time=20.523..399.979 rows=230222 loops=1) Hash Cond: (b.restaurant_id = r.restaurant_id) -> Hash Join (cost=1596.61..1947784.40 rows=19276467 width=190) (actual time=19.047..339.984 rows=230222 loops=1) Hash Cond: (ro.branch_id = b.branch_id) -> Nested Loop (cost=0.56..1895577.38 rows=19276467 width=108) (actual time=0.032..240.278 rows=230222 loops=1) -> Function Scan on start_date (cost=0.00..0.01 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1) -> Index Scan using "idx$$_00010001" on restaurant_order ro (cost=0.56..1702812.70 rows=19276467 width=108) (actual time=0.025..117.525 rows=230222 loops=1) Index Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-05 21:09:50.161463+00'::timestamp with time zone)) -> Hash (cost=1334.19..1334.19 rows=20949 width=90) (actual time=18.969..18.969 rows=20949 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 2758kB -> Index Scan using "branch_idx$$_274b0038" on branch b (cost=0.29..1334.19 rows=20949 width=90) (actual time=0.008..14.371 rows=20949 loops=1) -> Hash (cost=233.42..233.42 rows=8609 width=8) (actual time=1.450..1.451 rows=8609 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 465kB -> Index Only Scan using "restaurant_idx$$_274b003d" on restaurant r (cost=0.29..233.42 rows=8609 width=8) (actual time=0.011..0.660 rows=8609 loops=1) Heap Fetches: 0 -> Hash (cost=8.98..8.98 rows=56 width=29) (actual time=0.021..0.021 rows=56 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 12kB -> Index Only Scan using "city_idx$$_274b0022" on city c (cost=0.14..8.98 rows=56 width=29) (actual time=0.004..0.010 rows=56 loops=1) Heap Fetches: 0 -> Hash (cost=54779.81..54779.81 rows=1273009 width=15) (actual time=836.132..836.133 rows=1273009 loops=1) Buckets: 2097152 Batches: 1 Memory Usage: 81629kB -> Index Scan Backward using order_offer_map_order_id on order_offer_map oom (cost=0.43..54779.81 rows=1273009 width=15) (actual time=0.010..578.226 rows=1273009 loops=1) Planning Time: 1.229 ms Execution Time: 3183.248 ms On Sat, Jun 5, 2021 at 10:52 PM Vijaykumar Jain < [email protected]> wrote: > thanks Tom. > > I was trying to simulate some scenarios to be able to explain how the plan > would change with/without > *Rows Removed by Filter: 73969 * -- by using a different/correct index. > > postgres=# \d t > Table "public.t" >Column |Type | Collation | Nullable | Default > +-+---+--+- > id | integer | | not null | > created_on | timestamp without time zone | | | > col1 | text| | | > Indexes: > "t_pkey" PRIMARY KEY, btree (id) > "t_created_on_idx" btree (created_on) WHERE col1 = 'a'::text --- > useless index as all rows have col1 = 'a', but to attempt lossy case > "t_created_on_idx1" btree (created_on) > Referenced by: > TABLE "t1" CONSTRAINT "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id) > > postgres=# \d t1 > Table "public.t1" > Column | Type | Collation | Nullable | Default > +-+---+--+- > t1_id | integer | | not null | > id | integer | | | > col2 | text| | | > Indexes: > "t1_pkey" PRIMARY KEY, btree (t1_id) > Foreign-key constraints: > "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id) > > > > postgres=# update t set col1 = 'a'; > UPDATE 1000 > > postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id) > where created_on = '2021-06-01 12:48:45.141123'; >QUERY PLAN > > > Hash Join (cost=37.01..39.28 rows=1 width=4) (actual time=0.124..0.125 > rows=0 loops=1) >Hash Cond: (t1.id = t.id) >-> Seq Scan on t1 (cost=0.00..2.00 rows=100 width=4)
Re: query planner not using index, instead using squential scan
Yes, slowdown was expected :) I was just interested in cost estimates. Also did you try to set random_page_cost to 1 if your storage is not hdd. On Sun, 6 Jun 2021 at 2:44 AM Ayub Khan wrote: > > by setting enable_sequence_scan=OFF, the query execution time seems to > have slowed down even though the index is being used for left join of > order_offer_map > > Hash Left Join (cost=72639.74..8176118.25 rows=19276467 width=293) > (actual time=858.853..3166.994 rows=230222 loops=1) > Hash Cond: (ro.order_id = oom.order_id) > -> Hash Join (cost=1947.33..2053190.95 rows=19276467 width=211) > (actual time=20.550..462.303 rows=230222 loops=1) > Hash Cond: (b.city_id = c.city_id) > -> Hash Join (cost=1937.65..1998751.06 rows=19276467 width=190) > (actual time=20.523..399.979 rows=230222 loops=1) > Hash Cond: (b.restaurant_id = r.restaurant_id) > -> Hash Join (cost=1596.61..1947784.40 rows=19276467 > width=190) (actual time=19.047..339.984 rows=230222 loops=1) > Hash Cond: (ro.branch_id = b.branch_id) > -> Nested Loop (cost=0.56..1895577.38 rows=19276467 > width=108) (actual time=0.032..240.278 rows=230222 loops=1) > -> Function Scan on start_date > (cost=0.00..0.01 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1) > -> Index Scan using "idx$$_00010001" on > restaurant_order ro (cost=0.56..1702812.70 rows=19276467 width=108) > (actual time=0.025..117.525 rows=230222 loops=1) > Index Cond: ((date_time >= > start_date.start_date) AND (date_time <= '2021-06-05 > 21:09:50.161463+00'::timestamp with time zone)) > -> Hash (cost=1334.19..1334.19 rows=20949 width=90) > (actual time=18.969..18.969 rows=20949 loops=1) > Buckets: 32768 Batches: 1 Memory Usage: 2758kB > -> Index Scan using "branch_idx$$_274b0038" on > branch b (cost=0.29..1334.19 rows=20949 width=90) (actual > time=0.008..14.371 rows=20949 loops=1) > -> Hash (cost=233.42..233.42 rows=8609 width=8) (actual > time=1.450..1.451 rows=8609 loops=1) > Buckets: 16384 Batches: 1 Memory Usage: 465kB > -> Index Only Scan using "restaurant_idx$$_274b003d" > on restaurant r (cost=0.29..233.42 rows=8609 width=8) (actual > time=0.011..0.660 rows=8609 loops=1) > Heap Fetches: 0 > -> Hash (cost=8.98..8.98 rows=56 width=29) (actual > time=0.021..0.021 rows=56 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 12kB > -> Index Only Scan using "city_idx$$_274b0022" on city c > (cost=0.14..8.98 rows=56 width=29) (actual time=0.004..0.010 rows=56 > loops=1) > Heap Fetches: 0 > -> Hash (cost=54779.81..54779.81 rows=1273009 width=15) (actual > time=836.132..836.133 rows=1273009 loops=1) > Buckets: 2097152 Batches: 1 Memory Usage: 81629kB > -> Index Scan Backward using order_offer_map_order_id on > order_offer_map oom (cost=0.43..54779.81 rows=1273009 width=15) (actual > time=0.010..578.226 rows=1273009 loops=1) > Planning Time: 1.229 ms > Execution Time: 3183.248 ms > > On Sat, Jun 5, 2021 at 10:52 PM Vijaykumar Jain < > [email protected]> wrote: > >> thanks Tom. >> >> I was trying to simulate some scenarios to be able to explain how the >> plan would change with/without >> *Rows Removed by Filter: 73969 * -- by using a different/correct index. >> >> postgres=# \d t >> Table "public.t" >>Column |Type | Collation | Nullable | Default >> >> +-+---+--+- >> id | integer | | not null | >> created_on | timestamp without time zone | | | >> col1 | text| | | >> Indexes: >> "t_pkey" PRIMARY KEY, btree (id) >> "t_created_on_idx" btree (created_on) WHERE col1 = 'a'::text --- >> useless index as all rows have col1 = 'a', but to attempt lossy case >> "t_created_on_idx1" btree (created_on) >> Referenced by: >> TABLE "t1" CONSTRAINT "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id) >> >> postgres=# \d t1 >> Table "public.t1" >> Column | Type | Collation | Nullable | Default >> +-+---+--+- >> t1_id | integer | | not null | >> id | integer | | | >> col2 | text| | | >> Indexes: >> "t1_pkey" PRIMARY KEY, btree (t1_id) >> Foreign-key constraints: >> "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id) >> >> >> >> postgres=# update t set col1 = 'a'; >> UPDATE 1000 >> >> postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id) >> where created_on = '2021-06-01 12:48:45.14112
