Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Semen Yefimenko
Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure
how I'm supposed to do it. (single E-Mails vs many)


> Can you try tuning by increasing the shared_buffers slowly in steps of
> 500MB, and running explain analyze against the query.


-- 2500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual
time=2076.329..3737.050 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual
time=2007.487..2202.707 rows=172172 loops=3)
Output: column1, .. , column54
Sort Key: logtable.timestampcol DESC
Sort Method: quicksort  Memory: 65154kB
Worker 0:  Sort Method: quicksort  Memory: 55707kB
Worker 1:  Sort Method: quicksort  Memory: 55304kB
Buffers: shared hit=295446
Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
  Buffers: shared hit=91028
Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
  Buffers: shared hit=92133
->  Parallel Bitmap Heap Scan on schema.logtable
 (cost=5546.39..323481.21 rows=210418 width=2542) (actual
time=322.125..1618.971 rows=172172 loops=3)
  Output: column1, .. , column54
  Recheck Cond: ((logtable.entrytype = 4000) OR
(logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
  Filter: (logtable.archivestatus <= 1)
  Heap Blocks: exact=110951
  Buffers: shared hit=295432
  Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
Buffers: shared hit=91021
  Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
Buffers: shared hit=92126
  ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0)
(actual time=199.119..199.119 rows=0 loops=1)
Buffers: shared hit=1334
->  Bitmap Index Scan on idx_entrytype
 (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857
rows=65970 loops=1)
  Index Cond: (logtable.entrytype = 4000)
  Buffers: shared hit=172
->  Bitmap Index Scan on idx_entrytype
 (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872
rows=225283 loops=1)
  Index Cond: (logtable.entrytype = 4001)
  Buffers: shared hit=581
->  Bitmap Index Scan on idx_entrytype
 (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377
rows=225264 loops=1)
  Index Cond: (logtable.entrytype = 4002)
  Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.940 ms
Execution Time: 4188.083 ms
--
-- 3000 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual
time=2062.280..3763.408 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual
time=1987.933..2180.422 rows=172172 loops=3)
Output: column1, .. , column54
Sort Key: logtable.timestampcol DESC
Sort Method: quicksort  Memory: 66602kB
Worker 0:  Sort Method: quicksort  Memory: 55149kB
Worker 1:  Sort Method: quicksort  Memory: 54415kB
Buffers: shared hit=295446
Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1
  Buffers: shared hit=89981
Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1
  Buffers: shared hit=90141
->  Parallel Bitmap Heap Scan on schema.logtable
 (cost=5546.39..323481.21 rows=210418 width=2542) (actual
time=340.705..1603.796 rows=172172 loops=3)
  Output: column1, .. , column54
  Recheck Cond: ((logtable.entrytype = 4000) OR
(logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
  Filter: (logtable.archivestatus <= 1)
  Heap Blocks: exact=113990
  Buffers: shared hit=295432
  Worker 0: actual time=317.918..1605.548 rows=159556 loops=1
Buffers: shared hit=89974
  Worker 1: actual time=304.744..1589.221 rows=158554 loops=1
Buffers: shared hit=90134
  ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0)
(actual time=218.972..218.973 rows=0 loops=1)
Buffers: shared hit=1334
->  Bitmap Index Scan on idx_entrytype
 (cost=0.00..682.13 rows=67293 width=0) (actual time=37.741..37.742
rows=65970 loops=1)

Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Vijaykumar Jain
ok one last thing, not to be a PITA, but just in case if this helps.

postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable';
postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable';
basically, i just to verify if the table is not bloated.
looking at *n_live_tup* vs *n_dead_tup* would help understand it.

if you see too many dead tuples,
vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if
there are no tx using the dead tuples)

and then run your query.

Thanks,
Vijay






On Fri, 7 May 2021 at 13:34, Semen Yefimenko 
wrote:

> Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure
> how I'm supposed to do it. (single E-Mails vs many)
>
>
>> Can you try tuning by increasing the shared_buffers slowly in steps of
>> 500MB, and running explain analyze against the query.
>
>
> -- 2500 MB shared buffers - random_page_cost = 1;
> Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual
> time=2076.329..3737.050 rows=516517 loops=1)
>   Output: column1, .. , column54
>   Workers Planned: 2
>   Workers Launched: 2
>   Buffers: shared hit=295446
>   ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual
> time=2007.487..2202.707 rows=172172 loops=3)
> Output: column1, .. , column54
> Sort Key: logtable.timestampcol DESC
> Sort Method: quicksort  Memory: 65154kB
> Worker 0:  Sort Method: quicksort  Memory: 55707kB
> Worker 1:  Sort Method: quicksort  Memory: 55304kB
> Buffers: shared hit=295446
> Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
>   Buffers: shared hit=91028
> Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
>   Buffers: shared hit=92133
> ->  Parallel Bitmap Heap Scan on schema.logtable
>  (cost=5546.39..323481.21 rows=210418 width=2542) (actual
> time=322.125..1618.971 rows=172172 loops=3)
>   Output: column1, .. , column54
>   Recheck Cond: ((logtable.entrytype = 4000) OR
> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>   Filter: (logtable.archivestatus <= 1)
>   Heap Blocks: exact=110951
>   Buffers: shared hit=295432
>   Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
> Buffers: shared hit=91021
>   Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
> Buffers: shared hit=92126
>   ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0)
> (actual time=199.119..199.119 rows=0 loops=1)
> Buffers: shared hit=1334
> ->  Bitmap Index Scan on idx_entrytype
>  (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857
> rows=65970 loops=1)
>   Index Cond: (logtable.entrytype = 4000)
>   Buffers: shared hit=172
> ->  Bitmap Index Scan on idx_entrytype
>  (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872
> rows=225283 loops=1)
>   Index Cond: (logtable.entrytype = 4001)
>   Buffers: shared hit=581
> ->  Bitmap Index Scan on idx_entrytype
>  (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377
> rows=225264 loops=1)
>   Index Cond: (logtable.entrytype = 4002)
>   Buffers: shared hit=581
> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
> Planning Time: 0.940 ms
> Execution Time: 4188.083 ms
>
> --
> -- 3000 MB shared buffers - random_page_cost = 1;
> Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual
> time=2062.280..3763.408 rows=516517 loops=1)
>   Output: column1, .. , column54
>   Workers Planned: 2
>   Workers Launched: 2
>   Buffers: shared hit=295446
>   ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual
> time=1987.933..2180.422 rows=172172 loops=3)
> Output: column1, .. , column54
> Sort Key: logtable.timestampcol DESC
> Sort Method: quicksort  Memory: 66602kB
> Worker 0:  Sort Method: quicksort  Memory: 55149kB
> Worker 1:  Sort Method: quicksort  Memory: 54415kB
> Buffers: shared hit=295446
> Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1
>   Buffers: shared hit=89981
> Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1
>   Buffers: shared hit=90141
> ->  Parallel Bitmap Heap Scan on schema.logtable
>  (cost=5546.39..323481.21 rows=210418 width=2542) (actual
> time=340.705..1603.796 rows=172172 loops=3)
>   Output: column1, .. , column54
>   Recheck Cond: ((logtable.entrytype = 400

Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Semen Yefimenko
As mentionend in the slack comments :

SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size,
pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size,
(pgstattuple('logtable')).dead_tuple_percent;

table_size | index_size | dead_tuple_percent
++
3177 MB | 289 MB | 0

I have roughly 6 indexes which all have around 300 MB

SELECT pg_relation_size('logtable') as table_size,
pg_relation_size(idx_entrytype) as index_size,
100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio

table_size | index_size | bloat_ratio
++---
3331694592 | 302555136 | 5.219

Your queries:

n_live_tup n_dead_tup
14118380   0


For testing, I've also been running VACUUM and ANALYZE pretty much before
every test run.

Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain <
[email protected]>:

> ok one last thing, not to be a PITA, but just in case if this helps.
>
> postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable';
> postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable';
> basically, i just to verify if the table is not bloated.
> looking at *n_live_tup* vs *n_dead_tup* would help understand it.
>
> if you see too many dead tuples,
> vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if
> there are no tx using the dead tuples)
>
> and then run your query.
>
> Thanks,
> Vijay
>
>
>
>
>
>
> On Fri, 7 May 2021 at 13:34, Semen Yefimenko 
> wrote:
>
>> Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure
>> how I'm supposed to do it. (single E-Mails vs many)
>>
>>
>>> Can you try tuning by increasing the shared_buffers slowly in steps of
>>> 500MB, and running explain analyze against the query.
>>
>>
>> -- 2500 MB shared buffers - random_page_cost = 1;
>> Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual
>> time=2076.329..3737.050 rows=516517 loops=1)
>>   Output: column1, .. , column54
>>   Workers Planned: 2
>>   Workers Launched: 2
>>   Buffers: shared hit=295446
>>   ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual
>> time=2007.487..2202.707 rows=172172 loops=3)
>> Output: column1, .. , column54
>> Sort Key: logtable.timestampcol DESC
>> Sort Method: quicksort  Memory: 65154kB
>> Worker 0:  Sort Method: quicksort  Memory: 55707kB
>> Worker 1:  Sort Method: quicksort  Memory: 55304kB
>> Buffers: shared hit=295446
>> Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
>>   Buffers: shared hit=91028
>> Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
>>   Buffers: shared hit=92133
>> ->  Parallel Bitmap Heap Scan on schema.logtable
>>  (cost=5546.39..323481.21 rows=210418 width=2542) (actual
>> time=322.125..1618.971 rows=172172 loops=3)
>>   Output: column1, .. , column54
>>   Recheck Cond: ((logtable.entrytype = 4000) OR
>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>   Filter: (logtable.archivestatus <= 1)
>>   Heap Blocks: exact=110951
>>   Buffers: shared hit=295432
>>   Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
>> Buffers: shared hit=91021
>>   Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
>> Buffers: shared hit=92126
>>   ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0)
>> (actual time=199.119..199.119 rows=0 loops=1)
>> Buffers: shared hit=1334
>> ->  Bitmap Index Scan on idx_entrytype
>>  (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857
>> rows=65970 loops=1)
>>   Index Cond: (logtable.entrytype = 4000)
>>   Buffers: shared hit=172
>> ->  Bitmap Index Scan on idx_entrytype
>>  (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872
>> rows=225283 loops=1)
>>   Index Cond: (logtable.entrytype = 4001)
>>   Buffers: shared hit=581
>> ->  Bitmap Index Scan on idx_entrytype
>>  (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377
>> rows=225264 loops=1)
>>   Index Cond: (logtable.entrytype = 4002)
>>   Buffers: shared hit=581
>> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
>> Planning Time: 0.940 ms
>> Execution Time: 4188.083 ms
>>
>> --
>> -- 3000 MB shared buffers - random_page_cost = 1;
>> Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual
>> time=2062.280..3763.408 rows=516517 loop

Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Vijaykumar Jain
Is this on windows ?

I see a thread that mentions of performance penalty due to parallel worker


There is a mailing thread with subject line -
Huge performance penalty with parallel queries in Windows x64 v. Linux x64



On Fri, 7 May 2021 at 2:33 PM Semen Yefimenko 
wrote:

> As mentionend in the slack comments :
>
> SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size,
> pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size,
> (pgstattuple('logtable')).dead_tuple_percent;
>
> table_size | index_size | dead_tuple_percent
> ++
> 3177 MB | 289 MB | 0
>
> I have roughly 6 indexes which all have around 300 MB
>
> SELECT pg_relation_size('logtable') as table_size,
> pg_relation_size(idx_entrytype) as index_size,
> 100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio
>
> table_size | index_size | bloat_ratio
> ++---
> 3331694592 | 302555136 | 5.219
>
> Your queries:
>
> n_live_tup n_dead_tup
> 14118380   0
>
>
> For testing, I've also been running VACUUM and ANALYZE pretty much before
> every test run.
>
> Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain <
> [email protected]>:
>
>> ok one last thing, not to be a PITA, but just in case if this helps.
>>
>> postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable';
>> postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable';
>> basically, i just to verify if the table is not bloated.
>> looking at *n_live_tup* vs *n_dead_tup* would help understand it.
>>
>> if you see too many dead tuples,
>> vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if
>> there are no tx using the dead tuples)
>>
>> and then run your query.
>>
>> Thanks,
>> Vijay
>>
>>
>>
>>
>>
>>
>> On Fri, 7 May 2021 at 13:34, Semen Yefimenko 
>> wrote:
>>
>>> Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure
>>> how I'm supposed to do it. (single E-Mails vs many)
>>>
>>>
 Can you try tuning by increasing the shared_buffers slowly in steps of
 500MB, and running explain analyze against the query.
>>>
>>>
>>> -- 2500 MB shared buffers - random_page_cost = 1;
>>> Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual
>>> time=2076.329..3737.050 rows=516517 loops=1)
>>>   Output: column1, .. , column54
>>>   Workers Planned: 2
>>>   Workers Launched: 2
>>>   Buffers: shared hit=295446
>>>   ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual
>>> time=2007.487..2202.707 rows=172172 loops=3)
>>> Output: column1, .. , column54
>>> Sort Key: logtable.timestampcol DESC
>>> Sort Method: quicksort  Memory: 65154kB
>>> Worker 0:  Sort Method: quicksort  Memory: 55707kB
>>> Worker 1:  Sort Method: quicksort  Memory: 55304kB
>>> Buffers: shared hit=295446
>>> Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
>>>   Buffers: shared hit=91028
>>> Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
>>>   Buffers: shared hit=92133
>>> ->  Parallel Bitmap Heap Scan on schema.logtable
>>>  (cost=5546.39..323481.21 rows=210418 width=2542) (actual
>>> time=322.125..1618.971 rows=172172 loops=3)
>>>   Output: column1, .. , column54
>>>   Recheck Cond: ((logtable.entrytype = 4000) OR
>>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>>   Filter: (logtable.archivestatus <= 1)
>>>   Heap Blocks: exact=110951
>>>   Buffers: shared hit=295432
>>>   Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
>>> Buffers: shared hit=91021
>>>   Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
>>> Buffers: shared hit=92126
>>>   ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0)
>>> (actual time=199.119..199.119 rows=0 loops=1)
>>> Buffers: shared hit=1334
>>> ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857
>>> rows=65970 loops=1)
>>>   Index Cond: (logtable.entrytype = 4000)
>>>   Buffers: shared hit=172
>>> ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872
>>> rows=225283 loops=1)
>>>   Index Cond: (logtable.entrytype = 4001)
>>>   Buffers: shared hit=581
>>> ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377
>>> rows=225264 loops=1)
>>>   Index Cond: (logtable.entrytype = 4002)
>>>   Buffers: shared hit=581
>>> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
>>> Plann

Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Rick Otten
On Thu, May 6, 2021 at 10:38 AM Semen Yefimenko 
wrote:

> Hi there,
>
> I've recently been involved in migrating our old system to SQL Server and
> then PostgreSQL. Everything has been working fine so far but now after
> executing our tests on Postgres, we saw a very slow running query on a
> large table in our database.
> I have tried asking on other platforms but no one has been able to give me
> a satisfying answer.
> ...
>
> SELECT column1,..., column54  where ((entrytype = 4000 or entrytype = 4001
> or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;
>
>
>
I know several people have suggested using `IN` to replace the or
statements, that would be my first go-to also.  Another approach I have
found helpful is to keep in mind whenever you have an  `OR`  in a where
clause it can be replaced with a `UNION ALL`.  Usually the `UNION ALL` is
faster.

I recommend avoiding `OR` in where clauses as much as possible.  -
Sometimes it can't be helped, especially if you need an exclusive or, but
most of the time there is another way that is usually better.

Another thought is "archivestatus" really a boolean or does it have
multiple states?  If it is actually a boolean, then can you change the
column data type?


Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Semen Yefimenko
For testing purposes I set up a separate postgres 13.2 instance on windows.
To my surprise, it works perfectly fine. Also indexes, have about 1/4 of
the size they had on 12.6.
I'll try setting up a new 12.6 instance and see if I can reproduce
anything.

This explain plan is on a SSD local postgres 13.2 instance with default
settings and not setting random_page_cost.

Gather Merge  (cost=19444.07..19874.60 rows=3690 width=2638) (actual
time=41.633..60.538 rows=7087 loops=1)
  Output: column1, .. ,column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=2123
  ->  Sort  (cost=18444.05..18448.66 rows=1845 width=2638) (actual
time=4.057..4.595 rows=2362 loops=3)
Output: column1, .. ,column54
Sort Key: logtable.timestampcol1 DESC
Sort Method: quicksort  Memory: 3555kB
Buffers: shared hit=2123
Worker 0:  actual time=0.076..0.077 rows=0 loops=1
  Sort Method: quicksort  Memory: 25kB
  Buffers: shared hit=7
Worker 1:  actual time=0.090..0.091 rows=0 loops=1
  Sort Method: quicksort  Memory: 25kB
  Buffers: shared hit=7
->  Parallel Bitmap Heap Scan on schema.logtable
 (cost=61.84..16243.96 rows=1845 width=2638) (actual time=0.350..2.419
rows=2362 loops=3)
  Output: column1, .. ,column54
  Recheck Cond: ((logtable.entrytype = 4000) OR
(logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
  Filter: (logtable.tfnlogent_archivestatus <= 1)
  Heap Blocks: exact=2095
  Buffers: shared hit=2109
  Worker 0:  actual time=0.030..0.030 rows=0 loops=1
  Worker 1:  actual time=0.035..0.036 rows=0 loops=1
  ->  BitmapOr  (cost=61.84..61.84 rows=4428 width=0) (actual
time=0.740..0.742 rows=0 loops=1)
Buffers: shared hit=14
->  Bitmap Index Scan on idx_entrytype
 (cost=0.00..19.50 rows=1476 width=0) (actual time=0.504..0.504 rows=5475
loops=1)
  Index Cond: (logtable.entrytype = 4000)
  Buffers: shared hit=7
->  Bitmap Index Scan on idx_entrytype
 (cost=0.00..19.50 rows=1476 width=0) (actual time=0.056..0.056 rows=830
loops=1)
  Index Cond: (logtable.entrytype = 4001)
  Buffers: shared hit=3
->  Bitmap Index Scan on idx_entrytype
 (cost=0.00..19.50 rows=1476 width=0) (actual time=0.178..0.179 rows=782
loops=1)
  Index Cond: (logtable.entrytype = 4002)
  Buffers: shared hit=4
Planning Time: 0.212 ms
Execution Time: 61.692 ms

I've also installed a locally running 12.6 on windows.
Unfortunately I couldn't reproduce the issue. I loaded the data with a tool
that I wrote a few months ago which basically independently from that
database inserts data and creates sequences and indexes.
Query also finishes in like 70 ~ ms. Then I've tried pg_dump into a
different database on the same dev database (where the slow query still
exists). The performance is just as bad on this database and indexes are
also all 300 MB big (whereas on my locally running instance they're at
around 80 MB)
Now I'm trying to insert the data with the same tool I've used for my local
installations on the remote dev database.
This will still take some time so I will update once I have this tested.
Seems like there is something skewed going on with the development database
so far.


Am Fr., 7. Mai 2021 um 11:56 Uhr schrieb Vijaykumar Jain <
[email protected]>:

> Is this on windows ?
>
> I see a thread that mentions of performance penalty due to parallel worker
>
>
> There is a mailing thread with subject line -
> Huge performance penalty with parallel queries in Windows x64 v. Linux x64
>
>
>
> On Fri, 7 May 2021 at 2:33 PM Semen Yefimenko 
> wrote:
>
>> As mentionend in the slack comments :
>>
>> SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size,
>> pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size,
>> (pgstattuple('logtable')).dead_tuple_percent;
>>
>> table_size | index_size | dead_tuple_percent
>> ++
>> 3177 MB | 289 MB | 0
>>
>> I have roughly 6 indexes which all have around 300 MB
>>
>> SELECT pg_relation_size('logtable') as table_size,
>> pg_relation_size(idx_entrytype) as index_size,
>> 100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio
>>
>> table_size | index_size | bloat_ratio
>> ++---
>> 3331694592 | 302555136 | 5.219
>>
>> Your queries:
>>
>> n_live_tup n_dead_tup
>> 14118380   0
>>
>>
>> For testing, I've also been running VACUUM and ANALYZE pretty much before
>> every test run.
>>
>> Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain <
>> [email protected]>:
>>
>>> ok one last thing, not to be a PITA, but just in case if this helps.
>>>
>>> postgres=# SELECT *

Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Justin Pryzby
On Fri, May 07, 2021 at 05:57:19PM +0200, Semen Yefimenko wrote:
> For testing purposes I set up a separate postgres 13.2 instance on windows.
> To my surprise, it works perfectly fine. Also indexes, have about 1/4 of
> the size they had on 12.6.

In pg13, indexes are de-duplicated by default.

But I suspect the performance is better because data was reload, and the
smaller indexes are a small, additional benefit.

> This explain plan is on a SSD local postgres 13.2 instance with default
> settings and not setting random_page_cost.

>->  Parallel Bitmap Heap Scan on schema.logtable (cost=61.84..16243.96 
> rows=1845 width=2638) (actual time=0.350..2.419 rows=2362 loops=3)
>  Output: column1, .. ,column54
>  Recheck Cond: ((logtable.entrytype = 4000) OR 
> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>  Filter: (logtable.tfnlogent_archivestatus <= 1)
>  Heap Blocks: exact=2095
>  Buffers: shared hit=2109

In the pg13 instance, the index *and heap* scans hit only 2109 buffers (16MB).

On your original instance, it took 300k buffers (2.4GB), mostly uncached and
read from disk.

> This will still take some time so I will update once I have this tested.
> Seems like there is something skewed going on with the development database
> so far.

I still think you should try to cluster, or at least reindex (which cluster
also does) and then analyze.  The bitmap scan is probably happening because 1)
you're reading a large number of tuples; and, 2) the index is "uncorrelated",
so a straight index scan would randomly access 300k disk pages, which is much
worse even than reading 2400MB to get just 16MB of data.

-- 
Justin




Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Tom Lane
Justin Pryzby  writes:
> On Fri, May 07, 2021 at 05:57:19PM +0200, Semen Yefimenko wrote:
>> For testing purposes I set up a separate postgres 13.2 instance on windows.
>> To my surprise, it works perfectly fine. Also indexes, have about 1/4 of
>> the size they had on 12.6.

> In pg13, indexes are de-duplicated by default.
> But I suspect the performance is better because data was reload, and the
> smaller indexes are a small, additional benefit.

Index bloat is often a consequence of inadequate vacuuming.  You might
need to dial up autovacuum's aggressiveness to keep their sizes in check.

regards, tom lane




Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Peter Geoghegan
On Fri, May 7, 2021 at 9:16 AM Justin Pryzby  wrote:
> In pg13, indexes are de-duplicated by default.
>
> But I suspect the performance is better because data was reload, and the
> smaller indexes are a small, additional benefit.

That's a very reasonable interpretation, since the bitmap index scans
themselves just aren't doing that much I/O -- we see that there is
much more I/O for the heap scan, which is likely to be what the
general picture looks like no matter how much bloat there is.

However, I'm not sure if that reasonable interpretation is actually
correct. The nbtinsert.c code that handles deleting LP_DEAD index
tuples no longer relies on having a page-level garbage item flag set
in Postgres 13 -- it just scans the line pointer array for LP_DEAD
items each time. VACUUM has a rather unhelpful tendency to unset the
flag when it shouldn't, which we're no longer affected by. So that's
one possible explanation.

Another possible explanation is that smaller indexes (due to
deduplication) are more likely to get index scans, which leads to
setting the LP_DEAD bit of known-dead index tuples in passing more
often (bitmap index scans won't do the kill_prior_tuple optimization).
There could even be a virtuous circle over time. (Note that the index
deletion stuff in Postgres 14 pretty much makes sure that this
happens, but it is probably at least much more likely in Postgres 13
compared to 12.)

I could easily be very wrong about all of this in this instance,
though, because the behavior I've described is highly non-linear and
therefore highly unpredictable in general (not to mention highly
sensitive to workload characteristics). I'm sure that I've thought
about this stuff way more than any other individual Postgres
contributor, but it's easy to be wrong in any given instance. The real
explanation might be something else entirely. Though it's hard not to
imagine that what really matters here is avoiding all of that bitmap
heap scan I/O.

-- 
Peter Geoghegan




Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Peter Geoghegan
On Fri, May 7, 2021 at 2:28 PM Peter Geoghegan  wrote:
> That's a very reasonable interpretation, since the bitmap index scans
> themselves just aren't doing that much I/O -- we see that there is
> much more I/O for the heap scan, which is likely to be what the
> general picture looks like no matter how much bloat there is.
>
> However, I'm not sure if that reasonable interpretation is actually
> correct. The nbtinsert.c code that handles deleting LP_DEAD index
> tuples no longer relies on having a page-level garbage item flag set
> in Postgres 13 -- it just scans the line pointer array for LP_DEAD
> items each time.

BTW, I am pointing all of this out because I've heard informal reports
of big improvements following an upgrade to Postgres 13 that seem
unlikely to be related to the simple fact that indexes are smaller
(most of the time you cannot save that much I/O by shrinking indexes
without affected when and how TIDs/heap tuples are scanned).

It's necessary to simulate the production workload to have *any* idea
if LP_DEAD index tuple deletion might be a factor. If the OP is just
testing this one query on Postgres 13 in isolation, without anything
bloating up (or cleaning up) indexes, then that doesn't really tell us
anything about how Postgres 13 compares to Postgres 12. As you said,
simply shrinking the indexes is nice, but not enough -- we'd need some
second of second order effect to get acceptable performance over time
and under real world conditions.

-- 
Peter Geoghegan




Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread didier
Are you sure you're using the same data det ?

Unless I'm overlooking something obvious one result has 500 000 rows the
other 7 000.



>