slow query with inline function on AWS RDS with RDS 24x large

2021-06-04 Thread Ayub Khan
below query is slow even with no data


explain ANALYZE

WITH business AS( SELECT * FROM get_businessday_utc_f()  start_date)
 SELECT ro.order_id,
ro.date_time,
round(ro.order_amount, 2) AS order_amount,
b.branch_id,
b.branch_name,
st_x(b.location) AS from_x,
st_y(b.location) AS from_y,
b.user_id AS branch_user_id,
b.contact_info,
r.restaurant_id,
   c.city_id,
   c.city_name,
   c.city_name_ar,
   st_linefromtext'LINESTRING('::text || st_x(b.location)) ||
' '::text) || st_y(b.location)) || ','::text) ||
st_x(ro.location_geometry)) || ' '::text) ||
st_y(ro.location_geometry)) || ')'::text, 28355) AS from_to,
   to_char(ro.date_time, 'HH24:MI'::text) AS order_time,
   ro.customer_comment,
   'N'::text AS is_new_customer,
   ro.picked_up_time,
   ro.driver_assigned_date_time,
   oom.offer_amount,
oom.offer_type_code AS offer_type,
   ro.uk_vat
 FROM business, restaurant_order ro

 JOIN branch b ON b.branch_id = ro.branch_id
JOIN restaurant r ON r.restaurant_id = b.restaurant_id
 JOIN city c ON c.city_id = b.city_id
LEFT JOIN order_offer_map oom using (order_id)
WHERE ro.date_time >= business.start_date AND ro.date_time<=
f_now_immutable_with_tz();



Hash Left Join  (cost=55497.32..5417639.59 rows=5397276 width=291)
(actual time=1056.926..1056.934 rows=0 loops=1)
  Hash Cond: (ro.order_id = oom.order_id)
  ->  Hash Join  (cost=6584.61..3674143.44 rows=5397276 width=209)
(actual time=1056.926..1056.932 rows=0 loops=1)
Hash Cond: (ro.branch_id = b.branch_id)
->  Nested Loop  (cost=5427.94..3546726.47 rows=19275986
width=108) (actual time=1036.809..1036.810 rows=0 loops=1)
  ->  Function Scan on start_date  (cost=0.00..0.01 rows=1
width=8) (actual time=0.006..0.008 rows=1 loops=1)
  ->  Bitmap Heap Scan on restaurant_order ro
(cost=5427.94..3353966.60 rows=19275986 width=108) (actual
time=1036.793..1036.793 rows=0 loops=1)
Recheck Cond: ((date_time >=
start_date.start_date) AND (date_time <= '2021-06-04
08:05:32.784199+00'::timestamp with time zone))
Rows Removed by Index Recheck: 5039976
Heap Blocks: lossy=275230
->  Bitmap Index Scan on rest_ord_date_brin
(cost=0.00..608.94 rows=19359111 width=0) (actual time=14.037..14.038
rows=2917120 loops=1)
  Index Cond: ((date_time >=
start_date.start_date) AND (date_time <= '2021-06-04
08:05:32.784199+00'::timestamp with time zone))
->  Hash  (cost=1083.35..1083.35 rows=5866 width=109) (actual
time=20.106..20.109 rows=20949 loops=1)
  Buckets: 32768 (originally 8192)  Batches: 1 (originally
1)  Memory Usage: 3112kB
  ->  Hash Join  (cost=343.29..1083.35 rows=5866
width=109) (actual time=1.620..14.539 rows=20949 loops=1)
Hash Cond: (b.restaurant_id = r.restaurant_id)
->  Hash Join  (cost=2.26..726.91 rows=5866
width=109) (actual time=0.029..8.597 rows=20949 loops=1)
  Hash Cond: (b.city_id = c.city_id)
  ->  Seq Scan on branch b  (cost=0.00..668.49
rows=20949 width=88) (actual time=0.004..1.609 rows=20949 loops=1)
  ->  Hash  (cost=1.56..1.56 rows=56 width=29)
(actual time=0.020..0.021 rows=56 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 12kB
->  Seq Scan on city c
(cost=0.00..1.56 rows=56 width=29) (actual time=0.004..0.010 rows=56
loops=1)
->  Hash  (cost=233.42..233.42 rows=8609 width=8)
(actual time=1.575..1.575 rows=8609 loops=1)
  Buckets: 16384  Batches: 1  Memory Usage: 465kB
  ->  Index Only Scan using
"restaurant_idx$$_274b003d" on restaurant r  (cost=0.29..233.42
rows=8609 width=8) (actual time=0.006..0.684 rows=8609 loops=1)
Heap Fetches: 0
  ->  Hash  (cost=33000.09..33000.09 rows=1273009 width=13) (never executed)
->  Seq Scan on order_offer_map oom  (cost=0.00..33000.09
rows=1273009 width=13) (never executed)
Planning Time: 1.180 ms
Execution Time: 1057.535 ms

could some one explain why it is slow, if I insert 50k records the
execution time reaches 20 seconds


Re: slow query with inline function on AWS RDS with RDS 24x large

2021-06-04 Thread Pavel Stehule
pá 4. 6. 2021 v 10:07 odesílatel Ayub Khan  napsal:

>
> below query is slow even with no data
>
>
> explain ANALYZE
>
> WITH business AS( SELECT * FROM get_businessday_utc_f()  start_date)
>  SELECT ro.order_id,
> ro.date_time,
> round(ro.order_amount, 2) AS order_amount,
> b.branch_id,
> b.branch_name,
> st_x(b.location) AS from_x,
> st_y(b.location) AS from_y,
> b.user_id AS branch_user_id,
> b.contact_info,
> r.restaurant_id,
>c.city_id,
>c.city_name,
>c.city_name_ar,
>st_linefromtext'LINESTRING('::text || st_x(b.location)) || ' 
> '::text) || st_y(b.location)) || ','::text) || st_x(ro.location_geometry)) || 
> ' '::text) || st_y(ro.location_geometry)) || ')'::text, 28355) AS from_to,
>to_char(ro.date_time, 'HH24:MI'::text) AS order_time,
>ro.customer_comment,
>'N'::text AS is_new_customer,
>ro.picked_up_time,
>ro.driver_assigned_date_time,
>oom.offer_amount,
> oom.offer_type_code AS offer_type,
>ro.uk_vat
>  FROM business, restaurant_order ro
>
>  JOIN branch b ON b.branch_id = ro.branch_id
> JOIN restaurant r ON r.restaurant_id = b.restaurant_id
>  JOIN city c ON c.city_id = b.city_id
> LEFT JOIN order_offer_map oom using (order_id)
> WHERE ro.date_time >= business.start_date AND ro.date_time<= 
> f_now_immutable_with_tz();
>
>
>
> Hash Left Join  (cost=55497.32..5417639.59 rows=5397276 width=291) (actual 
> time=1056.926..1056.934 rows=0 loops=1)
>   Hash Cond: (ro.order_id = oom.order_id)
>   ->  Hash Join  (cost=6584.61..3674143.44 rows=5397276 width=209) (actual 
> time=1056.926..1056.932 rows=0 loops=1)
> Hash Cond: (ro.branch_id = b.branch_id)
> ->  Nested Loop  (cost=5427.94..3546726.47 rows=19275986 width=108) 
> (actual time=1036.809..1036.810 rows=0 loops=1)
>   ->  Function Scan on start_date  (cost=0.00..0.01 rows=1 
> width=8) (actual time=0.006..0.008 rows=1 loops=1)
>   ->  Bitmap Heap Scan on restaurant_order ro  
> (cost=5427.94..3353966.60 rows=19275986 width=108) (actual 
> time=1036.793..1036.793 rows=0 loops=1)
> Recheck Cond: ((date_time >= start_date.start_date) AND 
> (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
> Rows Removed by Index Recheck: 5039976
> Heap Blocks: lossy=275230
> ->  Bitmap Index Scan on rest_ord_date_brin  
> (cost=0.00..608.94 rows=19359111 width=0) (actual time=14.037..14.038 
> rows=2917120 loops=1)
>   Index Cond: ((date_time >= start_date.start_date) 
> AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
> ->  Hash  (cost=1083.35..1083.35 rows=5866 width=109) (actual 
> time=20.106..20.109 rows=20949 loops=1)
>   Buckets: 32768 (originally 8192)  Batches: 1 (originally 1)  
> Memory Usage: 3112kB
>   ->  Hash Join  (cost=343.29..1083.35 rows=5866 width=109) 
> (actual time=1.620..14.539 rows=20949 loops=1)
> Hash Cond: (b.restaurant_id = r.restaurant_id)
> ->  Hash Join  (cost=2.26..726.91 rows=5866 width=109) 
> (actual time=0.029..8.597 rows=20949 loops=1)
>   Hash Cond: (b.city_id = c.city_id)
>   ->  Seq Scan on branch b  (cost=0.00..668.49 
> rows=20949 width=88) (actual time=0.004..1.609 rows=20949 loops=1)
>   ->  Hash  (cost=1.56..1.56 rows=56 width=29) 
> (actual time=0.020..0.021 rows=56 loops=1)
> Buckets: 1024  Batches: 1  Memory Usage: 12kB
> ->  Seq Scan on city c  (cost=0.00..1.56 
> rows=56 width=29) (actual time=0.004..0.010 rows=56 loops=1)
> ->  Hash  (cost=233.42..233.42 rows=8609 width=8) (actual 
> time=1.575..1.575 rows=8609 loops=1)
>   Buckets: 16384  Batches: 1  Memory Usage: 465kB
>   ->  Index Only Scan using 
> "restaurant_idx$$_274b003d" on restaurant r  (cost=0.29..233.42 rows=8609 
> width=8) (actual time=0.006..0.684 rows=8609 loops=1)
> Heap Fetches: 0
>   ->  Hash  (cost=33000.09..33000.09 rows=1273009 width=13) (never executed)
> ->  Seq Scan on order_offer_map oom  (cost=0.00..33000.09 
> rows=1273009 width=13) (never executed)
> Planning Time: 1.180 ms
> Execution Time: 1057.535 ms
>
> could some one explain why it is slow, if I insert 50k records the execution 
> time reaches 20 seconds
>
>
  ->  Bitmap Heap Scan on restaurant_order ro
 (cost=5427.94..3353966.60 rows=19275986 width=108) (actual
time=1036.793..1036.793 rows=0 loops=1)
Recheck Cond: ((date_time >= start_date.start_date) AND
(date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
Rows Removed by Index Recheck: 5039976
Heap Blocks: lossy=275230
-> 

Re: slow query with inline function on AWS RDS with RDS 24x large

2021-06-04 Thread Ayub Khan
BRIN index is only on the date_time column, I even tried with btree index
with no performance gains.

On Fri, Jun 4, 2021 at 11:23 AM Pavel Stehule 
wrote:

>
>
> pá 4. 6. 2021 v 10:07 odesílatel Ayub Khan  napsal:
>
>>
>> below query is slow even with no data
>>
>>
>> explain ANALYZE
>>
>> WITH business AS( SELECT * FROM get_businessday_utc_f()  start_date)
>>  SELECT ro.order_id,
>> ro.date_time,
>> round(ro.order_amount, 2) AS order_amount,
>> b.branch_id,
>> b.branch_name,
>> st_x(b.location) AS from_x,
>> st_y(b.location) AS from_y,
>> b.user_id AS branch_user_id,
>> b.contact_info,
>> r.restaurant_id,
>>c.city_id,
>>c.city_name,
>>c.city_name_ar,
>>st_linefromtext'LINESTRING('::text || st_x(b.location)) || ' 
>> '::text) || st_y(b.location)) || ','::text) || st_x(ro.location_geometry)) 
>> || ' '::text) || st_y(ro.location_geometry)) || ')'::text, 28355) AS from_to,
>>to_char(ro.date_time, 'HH24:MI'::text) AS order_time,
>>ro.customer_comment,
>>'N'::text AS is_new_customer,
>>ro.picked_up_time,
>>ro.driver_assigned_date_time,
>>oom.offer_amount,
>> oom.offer_type_code AS offer_type,
>>ro.uk_vat
>>  FROM business, restaurant_order ro
>>
>>  JOIN branch b ON b.branch_id = ro.branch_id
>> JOIN restaurant r ON r.restaurant_id = b.restaurant_id
>>  JOIN city c ON c.city_id = b.city_id
>> LEFT JOIN order_offer_map oom using (order_id)
>> WHERE ro.date_time >= business.start_date AND ro.date_time<= 
>> f_now_immutable_with_tz();
>>
>>
>>
>> Hash Left Join  (cost=55497.32..5417639.59 rows=5397276 width=291) (actual 
>> time=1056.926..1056.934 rows=0 loops=1)
>>   Hash Cond: (ro.order_id = oom.order_id)
>>   ->  Hash Join  (cost=6584.61..3674143.44 rows=5397276 width=209) (actual 
>> time=1056.926..1056.932 rows=0 loops=1)
>> Hash Cond: (ro.branch_id = b.branch_id)
>> ->  Nested Loop  (cost=5427.94..3546726.47 rows=19275986 width=108) 
>> (actual time=1036.809..1036.810 rows=0 loops=1)
>>   ->  Function Scan on start_date  (cost=0.00..0.01 rows=1 
>> width=8) (actual time=0.006..0.008 rows=1 loops=1)
>>   ->  Bitmap Heap Scan on restaurant_order ro  
>> (cost=5427.94..3353966.60 rows=19275986 width=108) (actual 
>> time=1036.793..1036.793 rows=0 loops=1)
>> Recheck Cond: ((date_time >= start_date.start_date) AND 
>> (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
>> Rows Removed by Index Recheck: 5039976
>> Heap Blocks: lossy=275230
>> ->  Bitmap Index Scan on rest_ord_date_brin  
>> (cost=0.00..608.94 rows=19359111 width=0) (actual time=14.037..14.038 
>> rows=2917120 loops=1)
>>   Index Cond: ((date_time >= start_date.start_date) 
>> AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
>> ->  Hash  (cost=1083.35..1083.35 rows=5866 width=109) (actual 
>> time=20.106..20.109 rows=20949 loops=1)
>>   Buckets: 32768 (originally 8192)  Batches: 1 (originally 1)  
>> Memory Usage: 3112kB
>>   ->  Hash Join  (cost=343.29..1083.35 rows=5866 width=109) 
>> (actual time=1.620..14.539 rows=20949 loops=1)
>> Hash Cond: (b.restaurant_id = r.restaurant_id)
>> ->  Hash Join  (cost=2.26..726.91 rows=5866 width=109) 
>> (actual time=0.029..8.597 rows=20949 loops=1)
>>   Hash Cond: (b.city_id = c.city_id)
>>   ->  Seq Scan on branch b  (cost=0.00..668.49 
>> rows=20949 width=88) (actual time=0.004..1.609 rows=20949 loops=1)
>>   ->  Hash  (cost=1.56..1.56 rows=56 width=29) 
>> (actual time=0.020..0.021 rows=56 loops=1)
>> Buckets: 1024  Batches: 1  Memory Usage: 12kB
>> ->  Seq Scan on city c  (cost=0.00..1.56 
>> rows=56 width=29) (actual time=0.004..0.010 rows=56 loops=1)
>> ->  Hash  (cost=233.42..233.42 rows=8609 width=8) 
>> (actual time=1.575..1.575 rows=8609 loops=1)
>>   Buckets: 16384  Batches: 1  Memory Usage: 465kB
>>   ->  Index Only Scan using 
>> "restaurant_idx$$_274b003d" on restaurant r  (cost=0.29..233.42 rows=8609 
>> width=8) (actual time=0.006..0.684 rows=8609 loops=1)
>> Heap Fetches: 0
>>   ->  Hash  (cost=33000.09..33000.09 rows=1273009 width=13) (never executed)
>> ->  Seq Scan on order_offer_map oom  (cost=0.00..33000.09 
>> rows=1273009 width=13) (never executed)
>> Planning Time: 1.180 ms
>> Execution Time: 1057.535 ms
>>
>> could some one explain why it is slow, if I insert 50k records the execution 
>> time reaches 20 seconds
>>
>>
>   ->  Bitmap Heap Scan on restaurant_order ro
>  (cost=5427.94..3353966.60 rows=19275986 width=108) (actual
> time=1036.793..1036.793 rows=0 loops=1)
> 

Re: slow query with inline function on AWS RDS with RDS 24x large

2021-06-04 Thread Pavel Stehule
Hi


pá 4. 6. 2021 v 10:32 odesílatel Ayub Khan  napsal:

> BRIN index is only on the date_time column, I even tried with btree index
> with no performance gains.
>

  ->  Bitmap Heap Scan on restaurant_order ro
(cost=5427.94..3353966.60 rows=19275986 width=108) (actual
time=1036.793..1036.793 rows=0 loops=1)
Recheck Cond: ((date_time >=
start_date.start_date) AND (date_time <= '2021-06-04
08:05:32.784199+00'::timestamp with time zone))
Rows Removed by Index Recheck: 5039976
Heap Blocks: lossy=275230

When the most rows are removed in recheck, then the effectivity of the
index is not good

Pavel




>
> On Fri, Jun 4, 2021 at 11:23 AM Pavel Stehule 
> wrote:
>
>>
>>
>> pá 4. 6. 2021 v 10:07 odesílatel Ayub Khan  napsal:
>>
>>>
>>> below query is slow even with no data
>>>
>>>
>>> explain ANALYZE
>>>
>>> WITH business AS( SELECT * FROM get_businessday_utc_f()  start_date)
>>>  SELECT ro.order_id,
>>> ro.date_time,
>>> round(ro.order_amount, 2) AS order_amount,
>>> b.branch_id,
>>> b.branch_name,
>>> st_x(b.location) AS from_x,
>>> st_y(b.location) AS from_y,
>>> b.user_id AS branch_user_id,
>>> b.contact_info,
>>> r.restaurant_id,
>>>c.city_id,
>>>c.city_name,
>>>c.city_name_ar,
>>>st_linefromtext'LINESTRING('::text || st_x(b.location)) || ' 
>>> '::text) || st_y(b.location)) || ','::text) || st_x(ro.location_geometry)) 
>>> || ' '::text) || st_y(ro.location_geometry)) || ')'::text, 28355) AS 
>>> from_to,
>>>to_char(ro.date_time, 'HH24:MI'::text) AS order_time,
>>>ro.customer_comment,
>>>'N'::text AS is_new_customer,
>>>ro.picked_up_time,
>>>ro.driver_assigned_date_time,
>>>oom.offer_amount,
>>> oom.offer_type_code AS offer_type,
>>>ro.uk_vat
>>>  FROM business, restaurant_order ro
>>>
>>>  JOIN branch b ON b.branch_id = ro.branch_id
>>> JOIN restaurant r ON r.restaurant_id = b.restaurant_id
>>>  JOIN city c ON c.city_id = b.city_id
>>> LEFT JOIN order_offer_map oom using (order_id)
>>> WHERE ro.date_time >= business.start_date AND ro.date_time<= 
>>> f_now_immutable_with_tz();
>>>
>>>
>>>
>>> Hash Left Join  (cost=55497.32..5417639.59 rows=5397276 width=291) (actual 
>>> time=1056.926..1056.934 rows=0 loops=1)
>>>   Hash Cond: (ro.order_id = oom.order_id)
>>>   ->  Hash Join  (cost=6584.61..3674143.44 rows=5397276 width=209) (actual 
>>> time=1056.926..1056.932 rows=0 loops=1)
>>> Hash Cond: (ro.branch_id = b.branch_id)
>>> ->  Nested Loop  (cost=5427.94..3546726.47 rows=19275986 width=108) 
>>> (actual time=1036.809..1036.810 rows=0 loops=1)
>>>   ->  Function Scan on start_date  (cost=0.00..0.01 rows=1 
>>> width=8) (actual time=0.006..0.008 rows=1 loops=1)
>>>   ->  Bitmap Heap Scan on restaurant_order ro  
>>> (cost=5427.94..3353966.60 rows=19275986 width=108) (actual 
>>> time=1036.793..1036.793 rows=0 loops=1)
>>> Recheck Cond: ((date_time >= start_date.start_date) AND 
>>> (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
>>> Rows Removed by Index Recheck: 5039976
>>> Heap Blocks: lossy=275230
>>> ->  Bitmap Index Scan on rest_ord_date_brin  
>>> (cost=0.00..608.94 rows=19359111 width=0) (actual time=14.037..14.038 
>>> rows=2917120 loops=1)
>>>   Index Cond: ((date_time >= start_date.start_date) 
>>> AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time 
>>> zone))
>>> ->  Hash  (cost=1083.35..1083.35 rows=5866 width=109) (actual 
>>> time=20.106..20.109 rows=20949 loops=1)
>>>   Buckets: 32768 (originally 8192)  Batches: 1 (originally 1)  
>>> Memory Usage: 3112kB
>>>   ->  Hash Join  (cost=343.29..1083.35 rows=5866 width=109) 
>>> (actual time=1.620..14.539 rows=20949 loops=1)
>>> Hash Cond: (b.restaurant_id = r.restaurant_id)
>>> ->  Hash Join  (cost=2.26..726.91 rows=5866 width=109) 
>>> (actual time=0.029..8.597 rows=20949 loops=1)
>>>   Hash Cond: (b.city_id = c.city_id)
>>>   ->  Seq Scan on branch b  (cost=0.00..668.49 
>>> rows=20949 width=88) (actual time=0.004..1.609 rows=20949 loops=1)
>>>   ->  Hash  (cost=1.56..1.56 rows=56 width=29) 
>>> (actual time=0.020..0.021 rows=56 loops=1)
>>> Buckets: 1024  Batches: 1  Memory Usage: 
>>> 12kB
>>> ->  Seq Scan on city c  (cost=0.00..1.56 
>>> rows=56 width=29) (actual time=0.004..0.010 rows=56 loops=1)
>>> ->  Hash  (cost=233.42..233.42 rows=8609 width=8) 
>>> (actual time=1.575..1.575 rows=8609 loops=1)
>>>   Buckets: 16384  Batches: 1  Memory Usage: 465kB
>>>   ->  Index Only Scan using 
>>> "restaurant_idx$$_274b003d" on resta

Re: slow query with inline function on AWS RDS with RDS 24x large

2021-06-04 Thread Ayub Khan
You are right, I dropped BRIN and created btree and the performance on 0
rows matching criteria table is good, below is the plan with BTREE. I will
test by inserting lot of data.


Hash Join  (cost=50186.91..3765911.10 rows=5397411 width=291) (actual
time=1.501..1.504 rows=0 loops=1)
  Hash Cond: (b.restaurant_id = r.restaurant_id)
  ->  Hash Left Join  (cost=49845.88..2078197.48 rows=5397411 width=216)
(actual time=0.079..0.081 rows=0 loops=1)
Hash Cond: (ro.order_id = oom.order_id)
->  Hash Join  (cost=933.18..2007856.35 rows=5397411 width=209)
(actual time=0.078..0.080 rows=0 loops=1)
  Hash Cond: (b.city_id = c.city_id)
  ->  Hash Join  (cost=930.92..1956181.11 rows=19276467
width=188) (actual time=0.048..0.050 rows=0 loops=1)
Hash Cond: (ro.branch_id = b.branch_id)
->  Nested Loop  (cost=0.56..1904639.80 rows=19276467
width=108) (actual time=0.048..0.048 rows=0 loops=1)
  ->  Function Scan on start_date  (cost=0.00..0.01
rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=1)
  ->  Index Scan using rest_ord_date_brin on
restaurant_order ro  (cost=0.56..1711875.12 rows=19276467 width=108)
(actual time=0.042..0.042 rows=0 loops=1)
Index Cond: ((date_time >=
start_date.start_date) AND (date_time <= '2021-06-04
08:48:45.377833+00'::timestamp with time zone))
->  Hash  (cost=668.49..668.49 rows=20949 width=88)
(never executed)
  ->  Seq Scan on branch b  (cost=0.00..668.49
rows=20949 width=88) (never executed)
  ->  Hash  (cost=1.56..1.56 rows=56 width=29) (actual
time=0.026..0.027 rows=56 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 12kB
->  Seq Scan on city c  (cost=0.00..1.56 rows=56
width=29) (actual time=0.009..0.016 rows=56 loops=1)
->  Hash  (cost=33000.09..33000.09 rows=1273009 width=13) (never
executed)
  ->  Seq Scan on order_offer_map oom  (cost=0.00..33000.09
rows=1273009 width=13) (never executed)
  ->  Hash  (cost=233.42..233.42 rows=8609 width=8) (actual
time=1.403..1.403 rows=8609 loops=1)
Buckets: 16384  Batches: 1  Memory Usage: 465kB
->  Index Only Scan using "restaurant_idx$$_274b003d" on restaurant
r  (cost=0.29..233.42 rows=8609 width=8) (actual time=0.007..0.634
rows=8609 loops=1)
  Heap Fetches: 0
Planning Time: 1.352 ms
Execution Time: 1.571 ms

On Fri, Jun 4, 2021 at 11:41 AM Pavel Stehule 
wrote:

> Hi
>
>
> pá 4. 6. 2021 v 10:32 odesílatel Ayub Khan  napsal:
>
>> BRIN index is only on the date_time column, I even tried with btree index
>> with no performance gains.
>>
>
>   ->  Bitmap Heap Scan on restaurant_order ro  
> (cost=5427.94..3353966.60 rows=19275986 width=108) (actual 
> time=1036.793..1036.793 rows=0 loops=1)
> Recheck Cond: ((date_time >= start_date.start_date) AND 
> (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
> Rows Removed by Index Recheck: 5039976
> Heap Blocks: lossy=275230
>
> When the most rows are removed in recheck, then the effectivity of the index 
> is not good
>
> Pavel
>
>
>
>
>>
>> On Fri, Jun 4, 2021 at 11:23 AM Pavel Stehule 
>> wrote:
>>
>>>
>>>
>>> pá 4. 6. 2021 v 10:07 odesílatel Ayub Khan  napsal:
>>>

 below query is slow even with no data


 explain ANALYZE

 WITH business AS( SELECT * FROM get_businessday_utc_f()  start_date)
  SELECT ro.order_id,
 ro.date_time,
 round(ro.order_amount, 2) AS order_amount,
 b.branch_id,
 b.branch_name,
 st_x(b.location) AS from_x,
 st_y(b.location) AS from_y,
 b.user_id AS branch_user_id,
 b.contact_info,
 r.restaurant_id,
c.city_id,
c.city_name,
c.city_name_ar,
st_linefromtext'LINESTRING('::text || st_x(b.location)) || ' 
 '::text) || st_y(b.location)) || ','::text) || st_x(ro.location_geometry)) 
 || ' '::text) || st_y(ro.location_geometry)) || ')'::text, 28355) AS 
 from_to,
to_char(ro.date_time, 'HH24:MI'::text) AS order_time,
ro.customer_comment,
'N'::text AS is_new_customer,
ro.picked_up_time,
ro.driver_assigned_date_time,
oom.offer_amount,
 oom.offer_type_code AS offer_type,
ro.uk_vat
  FROM business, restaurant_order ro

  JOIN branch b ON b.branch_id = ro.branch_id
 JOIN restaurant r ON r.restaurant_id = b.restaurant_id
  JOIN city c ON c.city_id = b.city_id
 LEFT JOIN order_offer_map oom using (order_id)
 WHERE ro.date_time >= business.start_date AND ro.date_time<= 
 f_now_immutable_with_tz();



 Hash Left Join  (cost=55497.32..5417639.59 rows=5397276 width=291) (actual 
 time=1056.926..1056.934 rows=0 loops=1)
   Hash Cond

PgSQL 12 on WinSrv ~3x faster than on Linux

2021-06-04 Thread Taras Savchuk
Dear All,

I have demo stand on Hyper-V (2xIntel E5-2650v2, 256GB RAM, Intel SSDs in RAID):
* One VM with 1C application Server
* 2 database VMs with same database imported into each PostgreSQL (~56Gb, "1C 
accounting 3.0" config):
1. CentOS 7 with postgresql_12.6_6.1C_x86_64 (distribution with special patches 
for running 1C), VM is updated with LIS for Hyper-V
2. Windows Server 2019 with same postgresql_12.6_6.1C_x86_64

My real life test is to "register" 10 _same_ documents (провести документы) in 
each of 1C/PostgreSQL DBs. Both PostgreSQL DBs are identical and just before 
test imported to PostgreSQL via application server (DT import).
On Windows Server test procedure takes 20-30 seconds, on Linux it takes 
1m-1m10seconds. PostgreSQL VMs are running on same Hypervisor with same 
resources assigned to each of them.
Tuning PostgreSQL config and/or CentOS don't make any difference. Contrary on 
Windows VM we have almost 3x better performance with stock PostgreSQL config.

Any ideas what's wrong? For me such a big difference on identical 
databases/queries looks strange.

--
Taras Savchuk




Re: PgSQL 12 on WinSrv ~3x faster than on Linux

2021-06-04 Thread David Rowley
On Fri, 4 Jun 2021 at 23:53, Taras Savchuk  wrote:
> My real life test is to "register" 10 _same_ documents (провести документы) 
> in each of 1C/PostgreSQL DBs. Both PostgreSQL DBs are identical and just 
> before test imported to PostgreSQL via application server (DT import).
> On Windows Server test procedure takes 20-30 seconds, on Linux it takes 
> 1m-1m10seconds. PostgreSQL VMs are running on same Hypervisor with same 
> resources assigned to each of them.
> Tuning PostgreSQL config and/or CentOS don't make any difference. Contrary on 
> Windows VM we have almost 3x better performance with stock PostgreSQL config.
>
> Any ideas what's wrong? For me such a big difference on identical 
> databases/queries looks strange.

It's pretty difficult to say. You've not provided any useful details
about the workload you're running.

If this "register 10 _same_ documents" thing requires running some
query, then you might want to look at EXPLAIN (ANALYZE, BUFFERS) for
that query.  You might want to consider doing SET track_io_timing =
on;  Perhaps Linux is having to read more buffers from disk than
Windows.

David.




Re: PgSQL 12 on WinSrv ~3x faster than on Linux

2021-06-04 Thread Vijaykumar Jain
also if you can setup an external timer \timing , along with explain
analyse to get total time, it would help if everything else is same.


I have seen some threads thar mention added startup cost for parallel
workers on windows but not on Linux.
But I do not want to mix those threads here, but just FYI.


On Fri, Jun 4, 2021, 6:12 PM David Rowley  wrote:

> On Fri, 4 Jun 2021 at 23:53, Taras Savchuk  wrote:
> > My real life test is to "register" 10 _same_ documents (провести
> документы) in each of 1C/PostgreSQL DBs. Both PostgreSQL DBs are identical
> and just before test imported to PostgreSQL via application server (DT
> import).
> > On Windows Server test procedure takes 20-30 seconds, on Linux it takes
> 1m-1m10seconds. PostgreSQL VMs are running on same Hypervisor with same
> resources assigned to each of them.
> > Tuning PostgreSQL config and/or CentOS don't make any difference.
> Contrary on Windows VM we have almost 3x better performance with stock
> PostgreSQL config.
> >
> > Any ideas what's wrong? For me such a big difference on identical
> databases/queries looks strange.
>
> It's pretty difficult to say. You've not provided any useful details
> about the workload you're running.
>
> If this "register 10 _same_ documents" thing requires running some
> query, then you might want to look at EXPLAIN (ANALYZE, BUFFERS) for
> that query.  You might want to consider doing SET track_io_timing =
> on;  Perhaps Linux is having to read more buffers from disk than
> Windows.
>
> David.
>
>
>


Re: PgSQL 12 on WinSrv ~3x faster than on Linux

2021-06-04 Thread Tom Lane
David Rowley  writes:
> On Fri, 4 Jun 2021 at 23:53, Taras Savchuk  wrote:
>> Any ideas what's wrong? For me such a big difference on identical 
>> databases/queries looks strange.

> It's pretty difficult to say. You've not provided any useful details
> about the workload you're running.
> If this "register 10 _same_ documents" thing requires running some
> query, then you might want to look at EXPLAIN (ANALYZE, BUFFERS) for
> that query.  You might want to consider doing SET track_io_timing =
> on;  Perhaps Linux is having to read more buffers from disk than
> Windows.

The first thing that comes to mind for me is fsync working correctly
(i.e. actually waiting for the disk write) in Linux but not in Windows.
On a weird VM stack like you've got, it's not hard for that sort of
thing to go wrong.  Needless to say, if that's the issue then the
apparent performance win is coming at the cost of crash safety.

pg_test_fsync might help detect such a problem.

regards, tom lane




RE: PgSQL 12 on WinSrv ~3x faster than on Linux

2021-06-04 Thread Taras Savchuk
> The first thing that comes to mind for me is fsync working correctly (i.e.
> actually waiting for the disk write) in Linux but not in Windows.
> On a weird VM stack like you've got, it's not hard for that sort of thing to 
> go
> wrong.  Needless to say, if that's the issue then the apparent performance
> win is coming at the cost of crash safety.
> 
> pg_test_fsync might help detect such a problem.
> 
>   regards, tom lane
> 

fsync performance on win is much better (results are below). Also network 
performance for VMs on same HV for win-win is 40% better than for win-linux 
(5,97Gbps vs 3,6Gbps).
Regarding weird VM stack - we're running both win and linux VMs and Hyper-V 
works reasonable well except... this issue )

Win:
C:\Program Files\PostgreSQL\12.6-6.1C\bin>pg_test_fsync.exe
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync  7423,645 ops/sec 135 usecs/op
fdatasync   n/a
fsync  1910,611 ops/sec 523 usecs/op
fsync_writethrough 1987,900 ops/sec 503 usecs/op
open_sync   n/a

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync  3827,254 ops/sec 261 usecs/op
fdatasync   n/a
fsync  1920,720 ops/sec 521 usecs/op
fsync_writethrough 1863,852 ops/sec 537 usecs/op
open_sync   n/a

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
 1 * 16kB open_sync write   n/a
 2 *  8kB open_sync writes  n/a
 4 *  4kB open_sync writes  n/a
 8 *  2kB open_sync writes  n/a
16 *  1kB open_sync writes  n/a

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 144,065 ops/sec6941 usecs/op
write, close, fsync 148,751 ops/sec6723 usecs/op

Non-sync'ed 8kB writes:
write   165,484 ops/sec6043 usecs/op

Linux:
[root@pgsql12 ~]# /usr/pgsql-12/bin/pg_test_fsync

5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync  2947.296 ops/sec 339 usecs/op
fdatasync  2824.271 ops/sec 354 usecs/op
fsync  1885.924 ops/sec 530 usecs/op
fsync_writethrough  n/a
open_sync  1816.312 ops/sec 551 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync  1458.849 ops/sec 685 usecs/op
fdatasync  2712.756 ops/sec 369 usecs/op
fsync  1769.353 ops/sec 565 usecs/op
fsync_writethrough  n/a
open_sync   902.626 ops/sec1108 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
 1 * 16kB open_sync write  1798.811 ops/sec 556 usecs/op
 2 *  8kB open_sync writes  887.727 ops/sec1126 usecs/op
 4 *  4kB open_sync writes  494.843 ops/sec2021 usecs/op
 8 *  2kB open_sync writes  233.659 ops/sec4280 usecs/op
16 *  1kB open_sync writes  117.417 ops/sec8517 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close1673.781 ops/sec 597 usecs/op
write, close, fsync1727.787 ops/sec 579 usecs/op

Non-sync'ed 8kB writes:
write200638.271 ops/sec   5 usecs/op

--
Taras