query planner not using index, instead using squential scan

2021-06-05 Thread Ayub Khan
I am using postgres 12 on AWS RDS

could someone clarify why the LEFT JOIN order_offer_map oom using
(order_id) in the below query is using sequential scan instead of
using index on order_id which is defined in order_offer_map table.




explain ANALYZE

WITH business AS( SELECT * FROM get_businessday_utc_f()  start_date)
 SELECT ro.order_id,
ro.date_time,
round(ro.order_amount, 2) AS order_amount,
b.branch_id,
b.branch_name,
st_x(b.location) AS from_x,
st_y(b.location) AS from_y,
b.user_id AS branch_user_id,
b.contact_info,
r.restaurant_id,
   c.city_id,
   c.city_name,
   c.city_name_ar,
   st_linefromtext'LINESTRING('::text || st_x(b.location)) ||
' '::text) || st_y(b.location)) || ','::text) ||
st_x(ro.location_geometry)) || ' '::text) ||
st_y(ro.location_geometry)) || ')'::text, 28355) AS from_to,
   to_char(ro.date_time, 'HH24:MI'::text) AS order_time,
   ro.customer_comment,
   'N'::text AS is_new_customer,
   ro.picked_up_time,
   ro.driver_assigned_date_time,
   oom.offer_amount,
oom.offer_type_code AS offer_type,
   ro.uk_vat
 FROM business, restaurant_order ro

 JOIN branch b ON b.branch_id = ro.branch_id
JOIN restaurant r ON r.restaurant_id = b.restaurant_id
 JOIN city c ON c.city_id = b.city_id
LEFT JOIN order_offer_map oom using (order_id)
WHERE ro.date_time >= business.start_date AND ro.date_time<=
f_now_immutable_with_tz();





Hash Join  (cost=39897.37..161872.06 rows=259399 width=494) (actual
time=605.767..2060.712 rows=156253 loops=1)
  Hash Cond: (b.city_id = c.city_id)
  ->  Hash Join  (cost=39895.11..78778.16 rows=259399 width=355) (actual
time=605.583..789.863 rows=156253 loops=1)
Hash Cond: (b.restaurant_id = r.restaurant_id)
->  Hash Join  (cost=39542.41..77744.20 rows=259399 width=307)
(actual time=602.096..738.765 rows=156253 loops=1)
  Hash Cond: (ro.branch_id = b.branch_id)
  ->  Hash Right Join  (cost=38607.06..76127.79 rows=259399
width=225) (actual time=591.342..672.039 rows=156253 loops=1)
Hash Cond: (oom.order_id = ro.order_id)
->  Seq Scan on order_offer_map oom
 (cost=0.00..34179.09 rows=1273009 width=15) (actual time=0.007..91.121
rows=1273009 loops=1)
->  Hash  (cost=35364.57..35364.57 rows=259399
width=218) (actual time=244.571..244.571 rows=156253 loops=1)
  Buckets: 262144  Batches: 1  Memory Usage: 29098kB
  ->  Index Scan using "idx$$_00010001" on
restaurant_order ro  (cost=0.56..35364.57 rows=259399 width=218) (actual
time=0.033..195.939 rows=156253 loops=1)
Index Cond: ((date_time >= '2021-05-27
05:00:00'::timestamp without time zone) AND (date_time <= '2021-06-05
16:38:22.758875+00'::timestamp with time zone))
Filter: (order_status_code = 'D'::bpchar)
Rows Removed by Filter: 73969
  ->  Hash  (cost=673.49..673.49 rows=20949 width=90) (actual
time=10.715..10.715 rows=20949 loops=1)
Buckets: 32768  Batches: 1  Memory Usage: 2758kB
->  Seq Scan on branch b  (cost=0.00..673.49 rows=20949
width=90) (actual time=0.006..6.397 rows=20949 loops=1)
->  Hash  (cost=245.09..245.09 rows=8609 width=56) (actual
time=3.466..3.467 rows=8609 loops=1)
  Buckets: 16384  Batches: 1  Memory Usage: 903kB
  ->  Seq Scan on restaurant r  (cost=0.00..245.09 rows=8609
width=56) (actual time=0.003..2.096 rows=8609 loops=1)
  ->  Hash  (cost=1.56..1.56 rows=56 width=29) (actual time=0.026..0.026
rows=56 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 12kB
->  Seq Scan on city c  (cost=0.00..1.56 rows=56 width=29) (actual
time=0.007..0.015 rows=56 loops=1)
Planning Time: 1.377 ms
Execution Time: 2071.965 ms

-Ayub


Re: query planner not using index, instead using squential scan

2021-06-05 Thread Tom Lane
Ayub Khan  writes:
> could someone clarify why the LEFT JOIN order_offer_map oom using
> (order_id) in the below query is using sequential scan instead of
> using index on order_id which is defined in order_offer_map table.

Probably because it estimates the hash join to restaurant_order is
faster than a nestloop join would be.  I think it's likely right.
You'd need very optimistic assumptions about the cost of an
individual index probe into order_offer_map to conclude that 156K
of them would be faster than the 476ms that are being spent here
to read order_offer_map and join it to the result of the
indexscan on restaurant_order.

If, indeed, that *is* faster on your hardware, you might want
to dial down random_page_cost to get more-relevant estimates.

regards, tom lane




Re: query planner not using index, instead using squential scan

2021-06-05 Thread Vijaykumar Jain
thanks Tom.

I was trying to simulate some scenarios to be able to explain how the plan
would change with/without
*Rows Removed by Filter: 73969 * --   by using a different/correct index.

postgres=# \d t
 Table "public.t"
   Column   |Type | Collation | Nullable | Default
+-+---+--+-
 id | integer |   | not null |
 created_on | timestamp without time zone |   |  |
 col1   | text|   |  |
Indexes:
"t_pkey" PRIMARY KEY, btree (id)
"t_created_on_idx" btree (created_on) WHERE col1 = 'a'::text   ---
useless index as all rows have col1 = 'a', but to attempt lossy case
"t_created_on_idx1" btree (created_on)
Referenced by:
TABLE "t1" CONSTRAINT "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)

postgres=# \d t1
 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 t1_id  | integer |   | not null |
 id | integer |   |  |
 col2   | text|   |  |
Indexes:
"t1_pkey" PRIMARY KEY, btree (t1_id)
Foreign-key constraints:
"t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)



postgres=# update t set col1 = 'a';
UPDATE 1000

postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id) where
created_on = '2021-06-01 12:48:45.141123';
   QUERY PLAN

 Hash Join  (cost=37.01..39.28 rows=1 width=4) (actual time=0.124..0.125
rows=0 loops=1)
   Hash Cond: (t1.id = t.id)
   ->  Seq Scan on t1  (cost=0.00..2.00 rows=100 width=4) (actual
time=0.004..0.008 rows=100 loops=1)
   ->  Hash  (cost=37.00..37.00 rows=1 width=4) (actual time=0.109..0.109
rows=1 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on t  (cost=0.00..37.00 rows=1 width=4) (actual
time=0.058..0.107 rows=1 loops=1)
   Filter: (created_on = '2021-06-01
12:48:45.141123'::timestamp without time zone)
   *Rows Removed by Filter: 999  --- as no useful
index, t_created_on_idx will fetch all pages and then remove rows from
them, expensive*
 Planning Time: 0.111 ms
 Execution Time: 0.162 ms
(10 rows)


postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id) where
created_on = '2021-06-01 12:48:45.141123';
   QUERY PLAN
-
 Hash Join  (cost=8.32..33.47 rows=1 width=4) (actual time=0.025..0.026
rows=0 loops=1)
   Hash Cond: (t1.id = t.id)
   ->  Seq Scan on t1  (cost=0.00..22.00 rows=1200 width=4) (actual
time=0.009..0.009 rows=1 loops=1)
   ->  Hash  (cost=8.31..8.31 rows=1 width=4) (actual time=0.014..0.014
rows=0 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 8kB
 ->  Index Scan using t_created_on_idx1 on t  (cost=0.29..8.31
rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1)
   Index Cond: (created_on = '2021-06-01
12:48:45.141123'::timestamp without time zone) -- *exact match using btree
index, *
 Planning Time: 0.255 ms
 Execution Time: 0.071 ms
(9 rows)


but from Ayub's plan, the number of rows fetched are a lot, but is also
removing rows post index scan.
if that can be improved with a btree index that does not filter unwanted
rows, the run may be faster ?
but i guess if there are 156k rows, planner would a have found a win in seq
scan.

Ayub,
just for the sake of understanding,

can you run the query using

postgres=# set enable_seqscan TO 0;
SET
postgres=# -- explain analyze 

postgres=# set enable_seqscan TO 1;
SET


On Sun, 6 Jun 2021 at 00:59, Tom Lane  wrote:

> Ayub Khan  writes:
> > could someone clarify why the LEFT JOIN order_offer_map oom using
> > (order_id) in the below query is using sequential scan instead of
> > using index on order_id which is defined in order_offer_map table.
>
> Probably because it estimates the hash join to restaurant_order is
> faster than a nestloop join would be.  I think it's likely right.
> You'd need very optimistic assumptions about the cost of an
> individual index probe into order_offer_map to conclude that 156K
> of them would be faster than the 476ms that are being spent here
> to read order_offer_map and join it to the result of the
> indexscan on restaurant_order.
>
> If, indeed, that *is* faster on your hardware, you might want
> to dial down random_page_cost to get more-relevant estimates.
>
> regards, tom lane
>
>
>

-- 
Thanks,
Vijay
Mumbai, India


Re: query planner not using index, instead using squential scan

2021-06-05 Thread Ayub Khan
by setting enable_sequence_scan=OFF,  the query execution time seems to
have slowed down even though the index is being used for  left join of
order_offer_map

Hash Left Join  (cost=72639.74..8176118.25 rows=19276467 width=293) (actual
time=858.853..3166.994 rows=230222 loops=1)
  Hash Cond: (ro.order_id = oom.order_id)
  ->  Hash Join  (cost=1947.33..2053190.95 rows=19276467 width=211) (actual
time=20.550..462.303 rows=230222 loops=1)
Hash Cond: (b.city_id = c.city_id)
->  Hash Join  (cost=1937.65..1998751.06 rows=19276467 width=190)
(actual time=20.523..399.979 rows=230222 loops=1)
  Hash Cond: (b.restaurant_id = r.restaurant_id)
  ->  Hash Join  (cost=1596.61..1947784.40 rows=19276467
width=190) (actual time=19.047..339.984 rows=230222 loops=1)
Hash Cond: (ro.branch_id = b.branch_id)
->  Nested Loop  (cost=0.56..1895577.38 rows=19276467
width=108) (actual time=0.032..240.278 rows=230222 loops=1)
  ->  Function Scan on start_date  (cost=0.00..0.01
rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
  ->  Index Scan using "idx$$_00010001" on
restaurant_order ro  (cost=0.56..1702812.70 rows=19276467 width=108)
(actual time=0.025..117.525 rows=230222 loops=1)
Index Cond: ((date_time >=
start_date.start_date) AND (date_time <= '2021-06-05
21:09:50.161463+00'::timestamp with time zone))
->  Hash  (cost=1334.19..1334.19 rows=20949 width=90)
(actual time=18.969..18.969 rows=20949 loops=1)
  Buckets: 32768  Batches: 1  Memory Usage: 2758kB
  ->  Index Scan using "branch_idx$$_274b0038" on
branch b  (cost=0.29..1334.19 rows=20949 width=90) (actual
time=0.008..14.371 rows=20949 loops=1)
  ->  Hash  (cost=233.42..233.42 rows=8609 width=8) (actual
time=1.450..1.451 rows=8609 loops=1)
Buckets: 16384  Batches: 1  Memory Usage: 465kB
->  Index Only Scan using "restaurant_idx$$_274b003d"
on restaurant r  (cost=0.29..233.42 rows=8609 width=8) (actual
time=0.011..0.660 rows=8609 loops=1)
  Heap Fetches: 0
->  Hash  (cost=8.98..8.98 rows=56 width=29) (actual
time=0.021..0.021 rows=56 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 12kB
  ->  Index Only Scan using "city_idx$$_274b0022" on city c
 (cost=0.14..8.98 rows=56 width=29) (actual time=0.004..0.010 rows=56
loops=1)
Heap Fetches: 0
  ->  Hash  (cost=54779.81..54779.81 rows=1273009 width=15) (actual
time=836.132..836.133 rows=1273009 loops=1)
Buckets: 2097152  Batches: 1  Memory Usage: 81629kB
->  Index Scan Backward using order_offer_map_order_id on
order_offer_map oom  (cost=0.43..54779.81 rows=1273009 width=15) (actual
time=0.010..578.226 rows=1273009 loops=1)
Planning Time: 1.229 ms
Execution Time: 3183.248 ms

On Sat, Jun 5, 2021 at 10:52 PM Vijaykumar Jain <
[email protected]> wrote:

> thanks Tom.
>
> I was trying to simulate some scenarios to be able to explain how the plan
> would change with/without
> *Rows Removed by Filter: 73969 * --   by using a different/correct index.
>
> postgres=# \d t
>  Table "public.t"
>Column   |Type | Collation | Nullable | Default
> +-+---+--+-
>  id | integer |   | not null |
>  created_on | timestamp without time zone |   |  |
>  col1   | text|   |  |
> Indexes:
> "t_pkey" PRIMARY KEY, btree (id)
> "t_created_on_idx" btree (created_on) WHERE col1 = 'a'::text   ---
> useless index as all rows have col1 = 'a', but to attempt lossy case
> "t_created_on_idx1" btree (created_on)
> Referenced by:
> TABLE "t1" CONSTRAINT "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)
>
> postgres=# \d t1
>  Table "public.t1"
>  Column |  Type   | Collation | Nullable | Default
> +-+---+--+-
>  t1_id  | integer |   | not null |
>  id | integer |   |  |
>  col2   | text|   |  |
> Indexes:
> "t1_pkey" PRIMARY KEY, btree (t1_id)
> Foreign-key constraints:
> "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)
>
>
>
> postgres=# update t set col1 = 'a';
> UPDATE 1000
>
> postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id)
> where created_on = '2021-06-01 12:48:45.141123';
>QUERY PLAN
>
> 
>  Hash Join  (cost=37.01..39.28 rows=1 width=4) (actual time=0.124..0.125
> rows=0 loops=1)
>Hash Cond: (t1.id = t.id)
>->  Seq Scan on t1  (cost=0.00..2.00 rows=100 width=4)

Re: query planner not using index, instead using squential scan

2021-06-05 Thread Vijaykumar Jain
Yes, slowdown was expected :)

I was just interested in cost estimates.
Also did you try to set random_page_cost to 1 if your storage is not hdd.



On Sun, 6 Jun 2021 at 2:44 AM Ayub Khan  wrote:

>
> by setting enable_sequence_scan=OFF,  the query execution time seems to
> have slowed down even though the index is being used for  left join of
> order_offer_map
>
> Hash Left Join  (cost=72639.74..8176118.25 rows=19276467 width=293)
> (actual time=858.853..3166.994 rows=230222 loops=1)
>   Hash Cond: (ro.order_id = oom.order_id)
>   ->  Hash Join  (cost=1947.33..2053190.95 rows=19276467 width=211)
> (actual time=20.550..462.303 rows=230222 loops=1)
> Hash Cond: (b.city_id = c.city_id)
> ->  Hash Join  (cost=1937.65..1998751.06 rows=19276467 width=190)
> (actual time=20.523..399.979 rows=230222 loops=1)
>   Hash Cond: (b.restaurant_id = r.restaurant_id)
>   ->  Hash Join  (cost=1596.61..1947784.40 rows=19276467
> width=190) (actual time=19.047..339.984 rows=230222 loops=1)
> Hash Cond: (ro.branch_id = b.branch_id)
> ->  Nested Loop  (cost=0.56..1895577.38 rows=19276467
> width=108) (actual time=0.032..240.278 rows=230222 loops=1)
>   ->  Function Scan on start_date
>  (cost=0.00..0.01 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
>   ->  Index Scan using "idx$$_00010001" on
> restaurant_order ro  (cost=0.56..1702812.70 rows=19276467 width=108)
> (actual time=0.025..117.525 rows=230222 loops=1)
> Index Cond: ((date_time >=
> start_date.start_date) AND (date_time <= '2021-06-05
> 21:09:50.161463+00'::timestamp with time zone))
> ->  Hash  (cost=1334.19..1334.19 rows=20949 width=90)
> (actual time=18.969..18.969 rows=20949 loops=1)
>   Buckets: 32768  Batches: 1  Memory Usage: 2758kB
>   ->  Index Scan using "branch_idx$$_274b0038" on
> branch b  (cost=0.29..1334.19 rows=20949 width=90) (actual
> time=0.008..14.371 rows=20949 loops=1)
>   ->  Hash  (cost=233.42..233.42 rows=8609 width=8) (actual
> time=1.450..1.451 rows=8609 loops=1)
> Buckets: 16384  Batches: 1  Memory Usage: 465kB
> ->  Index Only Scan using "restaurant_idx$$_274b003d"
> on restaurant r  (cost=0.29..233.42 rows=8609 width=8) (actual
> time=0.011..0.660 rows=8609 loops=1)
>   Heap Fetches: 0
> ->  Hash  (cost=8.98..8.98 rows=56 width=29) (actual
> time=0.021..0.021 rows=56 loops=1)
>   Buckets: 1024  Batches: 1  Memory Usage: 12kB
>   ->  Index Only Scan using "city_idx$$_274b0022" on city c
>  (cost=0.14..8.98 rows=56 width=29) (actual time=0.004..0.010 rows=56
> loops=1)
> Heap Fetches: 0
>   ->  Hash  (cost=54779.81..54779.81 rows=1273009 width=15) (actual
> time=836.132..836.133 rows=1273009 loops=1)
> Buckets: 2097152  Batches: 1  Memory Usage: 81629kB
> ->  Index Scan Backward using order_offer_map_order_id on
> order_offer_map oom  (cost=0.43..54779.81 rows=1273009 width=15) (actual
> time=0.010..578.226 rows=1273009 loops=1)
> Planning Time: 1.229 ms
> Execution Time: 3183.248 ms
>
> On Sat, Jun 5, 2021 at 10:52 PM Vijaykumar Jain <
> [email protected]> wrote:
>
>> thanks Tom.
>>
>> I was trying to simulate some scenarios to be able to explain how the
>> plan would change with/without
>> *Rows Removed by Filter: 73969 * --   by using a different/correct index.
>>
>> postgres=# \d t
>>  Table "public.t"
>>Column   |Type | Collation | Nullable | Default
>>
>> +-+---+--+-
>>  id | integer |   | not null |
>>  created_on | timestamp without time zone |   |  |
>>  col1   | text|   |  |
>> Indexes:
>> "t_pkey" PRIMARY KEY, btree (id)
>> "t_created_on_idx" btree (created_on) WHERE col1 = 'a'::text   ---
>> useless index as all rows have col1 = 'a', but to attempt lossy case
>> "t_created_on_idx1" btree (created_on)
>> Referenced by:
>> TABLE "t1" CONSTRAINT "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)
>>
>> postgres=# \d t1
>>  Table "public.t1"
>>  Column |  Type   | Collation | Nullable | Default
>> +-+---+--+-
>>  t1_id  | integer |   | not null |
>>  id | integer |   |  |
>>  col2   | text|   |  |
>> Indexes:
>> "t1_pkey" PRIMARY KEY, btree (t1_id)
>> Foreign-key constraints:
>> "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)
>>
>>
>>
>> postgres=# update t set col1 = 'a';
>> UPDATE 1000
>>
>> postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id)
>> where created_on = '2021-06-01 12:48:45.14112