Re: Very slow Query compared to Oracle / SQL - Server
Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure how I'm supposed to do it. (single E-Mails vs many) > Can you try tuning by increasing the shared_buffers slowly in steps of > 500MB, and running explain analyze against the query. -- 2500 MB shared buffers - random_page_cost = 1; Gather Merge (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2076.329..3737.050 rows=516517 loops=1) Output: column1, .. , column54 Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=295446 -> Sort (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=2007.487..2202.707 rows=172172 loops=3) Output: column1, .. , column54 Sort Key: logtable.timestampcol DESC Sort Method: quicksort Memory: 65154kB Worker 0: Sort Method: quicksort Memory: 55707kB Worker 1: Sort Method: quicksort Memory: 55304kB Buffers: shared hit=295446 Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1 Buffers: shared hit=91028 Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1 Buffers: shared hit=92133 -> Parallel Bitmap Heap Scan on schema.logtable (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=322.125..1618.971 rows=172172 loops=3) Output: column1, .. , column54 Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)) Filter: (logtable.archivestatus <= 1) Heap Blocks: exact=110951 Buffers: shared hit=295432 Worker 0: actual time=282.201..1595.117 rows=161205 loops=1 Buffers: shared hit=91021 Worker 1: actual time=303.671..1623.299 rows=161935 loops=1 Buffers: shared hit=92126 -> BitmapOr (cost=5546.39..5546.39 rows=510578 width=0) (actual time=199.119..199.119 rows=0 loops=1) Buffers: shared hit=1334 -> Bitmap Index Scan on idx_entrytype (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857 rows=65970 loops=1) Index Cond: (logtable.entrytype = 4000) Buffers: shared hit=172 -> Bitmap Index Scan on idx_entrytype (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872 rows=225283 loops=1) Index Cond: (logtable.entrytype = 4001) Buffers: shared hit=581 -> Bitmap Index Scan on idx_entrytype (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377 rows=225264 loops=1) Index Cond: (logtable.entrytype = 4002) Buffers: shared hit=581 Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB' Planning Time: 0.940 ms Execution Time: 4188.083 ms -- -- 3000 MB shared buffers - random_page_cost = 1; Gather Merge (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2062.280..3763.408 rows=516517 loops=1) Output: column1, .. , column54 Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=295446 -> Sort (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=1987.933..2180.422 rows=172172 loops=3) Output: column1, .. , column54 Sort Key: logtable.timestampcol DESC Sort Method: quicksort Memory: 66602kB Worker 0: Sort Method: quicksort Memory: 55149kB Worker 1: Sort Method: quicksort Memory: 54415kB Buffers: shared hit=295446 Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1 Buffers: shared hit=89981 Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1 Buffers: shared hit=90141 -> Parallel Bitmap Heap Scan on schema.logtable (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=340.705..1603.796 rows=172172 loops=3) Output: column1, .. , column54 Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)) Filter: (logtable.archivestatus <= 1) Heap Blocks: exact=113990 Buffers: shared hit=295432 Worker 0: actual time=317.918..1605.548 rows=159556 loops=1 Buffers: shared hit=89974 Worker 1: actual time=304.744..1589.221 rows=158554 loops=1 Buffers: shared hit=90134 -> BitmapOr (cost=5546.39..5546.39 rows=510578 width=0) (actual time=218.972..218.973 rows=0 loops=1) Buffers: shared hit=1334 -> Bitmap Index Scan on idx_entrytype (cost=0.00..682.13 rows=67293 width=0) (actual time=37.741..37.742 rows=65970 loops=1)
Re: Very slow Query compared to Oracle / SQL - Server
ok one last thing, not to be a PITA, but just in case if this helps. postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable'; postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable'; basically, i just to verify if the table is not bloated. looking at *n_live_tup* vs *n_dead_tup* would help understand it. if you see too many dead tuples, vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if there are no tx using the dead tuples) and then run your query. Thanks, Vijay On Fri, 7 May 2021 at 13:34, Semen Yefimenko wrote: > Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure > how I'm supposed to do it. (single E-Mails vs many) > > >> Can you try tuning by increasing the shared_buffers slowly in steps of >> 500MB, and running explain analyze against the query. > > > -- 2500 MB shared buffers - random_page_cost = 1; > Gather Merge (cost=343085.23..392186.19 rows=420836 width=2542) (actual > time=2076.329..3737.050 rows=516517 loops=1) > Output: column1, .. , column54 > Workers Planned: 2 > Workers Launched: 2 > Buffers: shared hit=295446 > -> Sort (cost=342085.21..342611.25 rows=210418 width=2542) (actual > time=2007.487..2202.707 rows=172172 loops=3) > Output: column1, .. , column54 > Sort Key: logtable.timestampcol DESC > Sort Method: quicksort Memory: 65154kB > Worker 0: Sort Method: quicksort Memory: 55707kB > Worker 1: Sort Method: quicksort Memory: 55304kB > Buffers: shared hit=295446 > Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1 > Buffers: shared hit=91028 > Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1 > Buffers: shared hit=92133 > -> Parallel Bitmap Heap Scan on schema.logtable > (cost=5546.39..323481.21 rows=210418 width=2542) (actual > time=322.125..1618.971 rows=172172 loops=3) > Output: column1, .. , column54 > Recheck Cond: ((logtable.entrytype = 4000) OR > (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)) > Filter: (logtable.archivestatus <= 1) > Heap Blocks: exact=110951 > Buffers: shared hit=295432 > Worker 0: actual time=282.201..1595.117 rows=161205 loops=1 > Buffers: shared hit=91021 > Worker 1: actual time=303.671..1623.299 rows=161935 loops=1 > Buffers: shared hit=92126 > -> BitmapOr (cost=5546.39..5546.39 rows=510578 width=0) > (actual time=199.119..199.119 rows=0 loops=1) > Buffers: shared hit=1334 > -> Bitmap Index Scan on idx_entrytype > (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857 > rows=65970 loops=1) > Index Cond: (logtable.entrytype = 4000) > Buffers: shared hit=172 > -> Bitmap Index Scan on idx_entrytype > (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872 > rows=225283 loops=1) > Index Cond: (logtable.entrytype = 4001) > Buffers: shared hit=581 > -> Bitmap Index Scan on idx_entrytype > (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377 > rows=225264 loops=1) > Index Cond: (logtable.entrytype = 4002) > Buffers: shared hit=581 > Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB' > Planning Time: 0.940 ms > Execution Time: 4188.083 ms > > -- > -- 3000 MB shared buffers - random_page_cost = 1; > Gather Merge (cost=343085.23..392186.19 rows=420836 width=2542) (actual > time=2062.280..3763.408 rows=516517 loops=1) > Output: column1, .. , column54 > Workers Planned: 2 > Workers Launched: 2 > Buffers: shared hit=295446 > -> Sort (cost=342085.21..342611.25 rows=210418 width=2542) (actual > time=1987.933..2180.422 rows=172172 loops=3) > Output: column1, .. , column54 > Sort Key: logtable.timestampcol DESC > Sort Method: quicksort Memory: 66602kB > Worker 0: Sort Method: quicksort Memory: 55149kB > Worker 1: Sort Method: quicksort Memory: 54415kB > Buffers: shared hit=295446 > Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1 > Buffers: shared hit=89981 > Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1 > Buffers: shared hit=90141 > -> Parallel Bitmap Heap Scan on schema.logtable > (cost=5546.39..323481.21 rows=210418 width=2542) (actual > time=340.705..1603.796 rows=172172 loops=3) > Output: column1, .. , column54 > Recheck Cond: ((logtable.entrytype = 400
Re: Very slow Query compared to Oracle / SQL - Server
As mentionend in the slack comments :
SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size,
pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size,
(pgstattuple('logtable')).dead_tuple_percent;
table_size | index_size | dead_tuple_percent
++
3177 MB | 289 MB | 0
I have roughly 6 indexes which all have around 300 MB
SELECT pg_relation_size('logtable') as table_size,
pg_relation_size(idx_entrytype) as index_size,
100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio
table_size | index_size | bloat_ratio
++---
3331694592 | 302555136 | 5.219
Your queries:
n_live_tup n_dead_tup
14118380 0
For testing, I've also been running VACUUM and ANALYZE pretty much before
every test run.
Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain <
[email protected]>:
> ok one last thing, not to be a PITA, but just in case if this helps.
>
> postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable';
> postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable';
> basically, i just to verify if the table is not bloated.
> looking at *n_live_tup* vs *n_dead_tup* would help understand it.
>
> if you see too many dead tuples,
> vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if
> there are no tx using the dead tuples)
>
> and then run your query.
>
> Thanks,
> Vijay
>
>
>
>
>
>
> On Fri, 7 May 2021 at 13:34, Semen Yefimenko
> wrote:
>
>> Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure
>> how I'm supposed to do it. (single E-Mails vs many)
>>
>>
>>> Can you try tuning by increasing the shared_buffers slowly in steps of
>>> 500MB, and running explain analyze against the query.
>>
>>
>> -- 2500 MB shared buffers - random_page_cost = 1;
>> Gather Merge (cost=343085.23..392186.19 rows=420836 width=2542) (actual
>> time=2076.329..3737.050 rows=516517 loops=1)
>> Output: column1, .. , column54
>> Workers Planned: 2
>> Workers Launched: 2
>> Buffers: shared hit=295446
>> -> Sort (cost=342085.21..342611.25 rows=210418 width=2542) (actual
>> time=2007.487..2202.707 rows=172172 loops=3)
>> Output: column1, .. , column54
>> Sort Key: logtable.timestampcol DESC
>> Sort Method: quicksort Memory: 65154kB
>> Worker 0: Sort Method: quicksort Memory: 55707kB
>> Worker 1: Sort Method: quicksort Memory: 55304kB
>> Buffers: shared hit=295446
>> Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
>> Buffers: shared hit=91028
>> Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
>> Buffers: shared hit=92133
>> -> Parallel Bitmap Heap Scan on schema.logtable
>> (cost=5546.39..323481.21 rows=210418 width=2542) (actual
>> time=322.125..1618.971 rows=172172 loops=3)
>> Output: column1, .. , column54
>> Recheck Cond: ((logtable.entrytype = 4000) OR
>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>> Filter: (logtable.archivestatus <= 1)
>> Heap Blocks: exact=110951
>> Buffers: shared hit=295432
>> Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
>> Buffers: shared hit=91021
>> Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
>> Buffers: shared hit=92126
>> -> BitmapOr (cost=5546.39..5546.39 rows=510578 width=0)
>> (actual time=199.119..199.119 rows=0 loops=1)
>> Buffers: shared hit=1334
>> -> Bitmap Index Scan on idx_entrytype
>> (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857
>> rows=65970 loops=1)
>> Index Cond: (logtable.entrytype = 4000)
>> Buffers: shared hit=172
>> -> Bitmap Index Scan on idx_entrytype
>> (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872
>> rows=225283 loops=1)
>> Index Cond: (logtable.entrytype = 4001)
>> Buffers: shared hit=581
>> -> Bitmap Index Scan on idx_entrytype
>> (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377
>> rows=225264 loops=1)
>> Index Cond: (logtable.entrytype = 4002)
>> Buffers: shared hit=581
>> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
>> Planning Time: 0.940 ms
>> Execution Time: 4188.083 ms
>>
>> --
>> -- 3000 MB shared buffers - random_page_cost = 1;
>> Gather Merge (cost=343085.23..392186.19 rows=420836 width=2542) (actual
>> time=2062.280..3763.408 rows=516517 loop
Re: Very slow Query compared to Oracle / SQL - Server
Is this on windows ?
I see a thread that mentions of performance penalty due to parallel worker
There is a mailing thread with subject line -
Huge performance penalty with parallel queries in Windows x64 v. Linux x64
On Fri, 7 May 2021 at 2:33 PM Semen Yefimenko
wrote:
> As mentionend in the slack comments :
>
> SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size,
> pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size,
> (pgstattuple('logtable')).dead_tuple_percent;
>
> table_size | index_size | dead_tuple_percent
> ++
> 3177 MB | 289 MB | 0
>
> I have roughly 6 indexes which all have around 300 MB
>
> SELECT pg_relation_size('logtable') as table_size,
> pg_relation_size(idx_entrytype) as index_size,
> 100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio
>
> table_size | index_size | bloat_ratio
> ++---
> 3331694592 | 302555136 | 5.219
>
> Your queries:
>
> n_live_tup n_dead_tup
> 14118380 0
>
>
> For testing, I've also been running VACUUM and ANALYZE pretty much before
> every test run.
>
> Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain <
> [email protected]>:
>
>> ok one last thing, not to be a PITA, but just in case if this helps.
>>
>> postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable';
>> postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable';
>> basically, i just to verify if the table is not bloated.
>> looking at *n_live_tup* vs *n_dead_tup* would help understand it.
>>
>> if you see too many dead tuples,
>> vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if
>> there are no tx using the dead tuples)
>>
>> and then run your query.
>>
>> Thanks,
>> Vijay
>>
>>
>>
>>
>>
>>
>> On Fri, 7 May 2021 at 13:34, Semen Yefimenko
>> wrote:
>>
>>> Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure
>>> how I'm supposed to do it. (single E-Mails vs many)
>>>
>>>
Can you try tuning by increasing the shared_buffers slowly in steps of
500MB, and running explain analyze against the query.
>>>
>>>
>>> -- 2500 MB shared buffers - random_page_cost = 1;
>>> Gather Merge (cost=343085.23..392186.19 rows=420836 width=2542) (actual
>>> time=2076.329..3737.050 rows=516517 loops=1)
>>> Output: column1, .. , column54
>>> Workers Planned: 2
>>> Workers Launched: 2
>>> Buffers: shared hit=295446
>>> -> Sort (cost=342085.21..342611.25 rows=210418 width=2542) (actual
>>> time=2007.487..2202.707 rows=172172 loops=3)
>>> Output: column1, .. , column54
>>> Sort Key: logtable.timestampcol DESC
>>> Sort Method: quicksort Memory: 65154kB
>>> Worker 0: Sort Method: quicksort Memory: 55707kB
>>> Worker 1: Sort Method: quicksort Memory: 55304kB
>>> Buffers: shared hit=295446
>>> Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
>>> Buffers: shared hit=91028
>>> Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
>>> Buffers: shared hit=92133
>>> -> Parallel Bitmap Heap Scan on schema.logtable
>>> (cost=5546.39..323481.21 rows=210418 width=2542) (actual
>>> time=322.125..1618.971 rows=172172 loops=3)
>>> Output: column1, .. , column54
>>> Recheck Cond: ((logtable.entrytype = 4000) OR
>>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>> Filter: (logtable.archivestatus <= 1)
>>> Heap Blocks: exact=110951
>>> Buffers: shared hit=295432
>>> Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
>>> Buffers: shared hit=91021
>>> Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
>>> Buffers: shared hit=92126
>>> -> BitmapOr (cost=5546.39..5546.39 rows=510578 width=0)
>>> (actual time=199.119..199.119 rows=0 loops=1)
>>> Buffers: shared hit=1334
>>> -> Bitmap Index Scan on idx_entrytype
>>> (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857
>>> rows=65970 loops=1)
>>> Index Cond: (logtable.entrytype = 4000)
>>> Buffers: shared hit=172
>>> -> Bitmap Index Scan on idx_entrytype
>>> (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872
>>> rows=225283 loops=1)
>>> Index Cond: (logtable.entrytype = 4001)
>>> Buffers: shared hit=581
>>> -> Bitmap Index Scan on idx_entrytype
>>> (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377
>>> rows=225264 loops=1)
>>> Index Cond: (logtable.entrytype = 4002)
>>> Buffers: shared hit=581
>>> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
>>> Plann
Re: Very slow Query compared to Oracle / SQL - Server
On Thu, May 6, 2021 at 10:38 AM Semen Yefimenko wrote: > Hi there, > > I've recently been involved in migrating our old system to SQL Server and > then PostgreSQL. Everything has been working fine so far but now after > executing our tests on Postgres, we saw a very slow running query on a > large table in our database. > I have tried asking on other platforms but no one has been able to give me > a satisfying answer. > ... > > SELECT column1,..., column54 where ((entrytype = 4000 or entrytype = 4001 > or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc; > > > I know several people have suggested using `IN` to replace the or statements, that would be my first go-to also. Another approach I have found helpful is to keep in mind whenever you have an `OR` in a where clause it can be replaced with a `UNION ALL`. Usually the `UNION ALL` is faster. I recommend avoiding `OR` in where clauses as much as possible. - Sometimes it can't be helped, especially if you need an exclusive or, but most of the time there is another way that is usually better. Another thought is "archivestatus" really a boolean or does it have multiple states? If it is actually a boolean, then can you change the column data type?
Re: Very slow Query compared to Oracle / SQL - Server
For testing purposes I set up a separate postgres 13.2 instance on windows. To my surprise, it works perfectly fine. Also indexes, have about 1/4 of the size they had on 12.6. I'll try setting up a new 12.6 instance and see if I can reproduce anything. This explain plan is on a SSD local postgres 13.2 instance with default settings and not setting random_page_cost. Gather Merge (cost=19444.07..19874.60 rows=3690 width=2638) (actual time=41.633..60.538 rows=7087 loops=1) Output: column1, .. ,column54 Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=2123 -> Sort (cost=18444.05..18448.66 rows=1845 width=2638) (actual time=4.057..4.595 rows=2362 loops=3) Output: column1, .. ,column54 Sort Key: logtable.timestampcol1 DESC Sort Method: quicksort Memory: 3555kB Buffers: shared hit=2123 Worker 0: actual time=0.076..0.077 rows=0 loops=1 Sort Method: quicksort Memory: 25kB Buffers: shared hit=7 Worker 1: actual time=0.090..0.091 rows=0 loops=1 Sort Method: quicksort Memory: 25kB Buffers: shared hit=7 -> Parallel Bitmap Heap Scan on schema.logtable (cost=61.84..16243.96 rows=1845 width=2638) (actual time=0.350..2.419 rows=2362 loops=3) Output: column1, .. ,column54 Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)) Filter: (logtable.tfnlogent_archivestatus <= 1) Heap Blocks: exact=2095 Buffers: shared hit=2109 Worker 0: actual time=0.030..0.030 rows=0 loops=1 Worker 1: actual time=0.035..0.036 rows=0 loops=1 -> BitmapOr (cost=61.84..61.84 rows=4428 width=0) (actual time=0.740..0.742 rows=0 loops=1) Buffers: shared hit=14 -> Bitmap Index Scan on idx_entrytype (cost=0.00..19.50 rows=1476 width=0) (actual time=0.504..0.504 rows=5475 loops=1) Index Cond: (logtable.entrytype = 4000) Buffers: shared hit=7 -> Bitmap Index Scan on idx_entrytype (cost=0.00..19.50 rows=1476 width=0) (actual time=0.056..0.056 rows=830 loops=1) Index Cond: (logtable.entrytype = 4001) Buffers: shared hit=3 -> Bitmap Index Scan on idx_entrytype (cost=0.00..19.50 rows=1476 width=0) (actual time=0.178..0.179 rows=782 loops=1) Index Cond: (logtable.entrytype = 4002) Buffers: shared hit=4 Planning Time: 0.212 ms Execution Time: 61.692 ms I've also installed a locally running 12.6 on windows. Unfortunately I couldn't reproduce the issue. I loaded the data with a tool that I wrote a few months ago which basically independently from that database inserts data and creates sequences and indexes. Query also finishes in like 70 ~ ms. Then I've tried pg_dump into a different database on the same dev database (where the slow query still exists). The performance is just as bad on this database and indexes are also all 300 MB big (whereas on my locally running instance they're at around 80 MB) Now I'm trying to insert the data with the same tool I've used for my local installations on the remote dev database. This will still take some time so I will update once I have this tested. Seems like there is something skewed going on with the development database so far. Am Fr., 7. Mai 2021 um 11:56 Uhr schrieb Vijaykumar Jain < [email protected]>: > Is this on windows ? > > I see a thread that mentions of performance penalty due to parallel worker > > > There is a mailing thread with subject line - > Huge performance penalty with parallel queries in Windows x64 v. Linux x64 > > > > On Fri, 7 May 2021 at 2:33 PM Semen Yefimenko > wrote: > >> As mentionend in the slack comments : >> >> SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size, >> pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size, >> (pgstattuple('logtable')).dead_tuple_percent; >> >> table_size | index_size | dead_tuple_percent >> ++ >> 3177 MB | 289 MB | 0 >> >> I have roughly 6 indexes which all have around 300 MB >> >> SELECT pg_relation_size('logtable') as table_size, >> pg_relation_size(idx_entrytype) as index_size, >> 100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio >> >> table_size | index_size | bloat_ratio >> ++--- >> 3331694592 | 302555136 | 5.219 >> >> Your queries: >> >> n_live_tup n_dead_tup >> 14118380 0 >> >> >> For testing, I've also been running VACUUM and ANALYZE pretty much before >> every test run. >> >> Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain < >> [email protected]>: >> >>> ok one last thing, not to be a PITA, but just in case if this helps. >>> >>> postgres=# SELECT *
Re: Very slow Query compared to Oracle / SQL - Server
On Fri, May 07, 2021 at 05:57:19PM +0200, Semen Yefimenko wrote: > For testing purposes I set up a separate postgres 13.2 instance on windows. > To my surprise, it works perfectly fine. Also indexes, have about 1/4 of > the size they had on 12.6. In pg13, indexes are de-duplicated by default. But I suspect the performance is better because data was reload, and the smaller indexes are a small, additional benefit. > This explain plan is on a SSD local postgres 13.2 instance with default > settings and not setting random_page_cost. >-> Parallel Bitmap Heap Scan on schema.logtable (cost=61.84..16243.96 > rows=1845 width=2638) (actual time=0.350..2.419 rows=2362 loops=3) > Output: column1, .. ,column54 > Recheck Cond: ((logtable.entrytype = 4000) OR > (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)) > Filter: (logtable.tfnlogent_archivestatus <= 1) > Heap Blocks: exact=2095 > Buffers: shared hit=2109 In the pg13 instance, the index *and heap* scans hit only 2109 buffers (16MB). On your original instance, it took 300k buffers (2.4GB), mostly uncached and read from disk. > This will still take some time so I will update once I have this tested. > Seems like there is something skewed going on with the development database > so far. I still think you should try to cluster, or at least reindex (which cluster also does) and then analyze. The bitmap scan is probably happening because 1) you're reading a large number of tuples; and, 2) the index is "uncorrelated", so a straight index scan would randomly access 300k disk pages, which is much worse even than reading 2400MB to get just 16MB of data. -- Justin
Re: Very slow Query compared to Oracle / SQL - Server
Justin Pryzby writes: > On Fri, May 07, 2021 at 05:57:19PM +0200, Semen Yefimenko wrote: >> For testing purposes I set up a separate postgres 13.2 instance on windows. >> To my surprise, it works perfectly fine. Also indexes, have about 1/4 of >> the size they had on 12.6. > In pg13, indexes are de-duplicated by default. > But I suspect the performance is better because data was reload, and the > smaller indexes are a small, additional benefit. Index bloat is often a consequence of inadequate vacuuming. You might need to dial up autovacuum's aggressiveness to keep their sizes in check. regards, tom lane
Re: Very slow Query compared to Oracle / SQL - Server
On Fri, May 7, 2021 at 9:16 AM Justin Pryzby wrote: > In pg13, indexes are de-duplicated by default. > > But I suspect the performance is better because data was reload, and the > smaller indexes are a small, additional benefit. That's a very reasonable interpretation, since the bitmap index scans themselves just aren't doing that much I/O -- we see that there is much more I/O for the heap scan, which is likely to be what the general picture looks like no matter how much bloat there is. However, I'm not sure if that reasonable interpretation is actually correct. The nbtinsert.c code that handles deleting LP_DEAD index tuples no longer relies on having a page-level garbage item flag set in Postgres 13 -- it just scans the line pointer array for LP_DEAD items each time. VACUUM has a rather unhelpful tendency to unset the flag when it shouldn't, which we're no longer affected by. So that's one possible explanation. Another possible explanation is that smaller indexes (due to deduplication) are more likely to get index scans, which leads to setting the LP_DEAD bit of known-dead index tuples in passing more often (bitmap index scans won't do the kill_prior_tuple optimization). There could even be a virtuous circle over time. (Note that the index deletion stuff in Postgres 14 pretty much makes sure that this happens, but it is probably at least much more likely in Postgres 13 compared to 12.) I could easily be very wrong about all of this in this instance, though, because the behavior I've described is highly non-linear and therefore highly unpredictable in general (not to mention highly sensitive to workload characteristics). I'm sure that I've thought about this stuff way more than any other individual Postgres contributor, but it's easy to be wrong in any given instance. The real explanation might be something else entirely. Though it's hard not to imagine that what really matters here is avoiding all of that bitmap heap scan I/O. -- Peter Geoghegan
Re: Very slow Query compared to Oracle / SQL - Server
On Fri, May 7, 2021 at 2:28 PM Peter Geoghegan wrote: > That's a very reasonable interpretation, since the bitmap index scans > themselves just aren't doing that much I/O -- we see that there is > much more I/O for the heap scan, which is likely to be what the > general picture looks like no matter how much bloat there is. > > However, I'm not sure if that reasonable interpretation is actually > correct. The nbtinsert.c code that handles deleting LP_DEAD index > tuples no longer relies on having a page-level garbage item flag set > in Postgres 13 -- it just scans the line pointer array for LP_DEAD > items each time. BTW, I am pointing all of this out because I've heard informal reports of big improvements following an upgrade to Postgres 13 that seem unlikely to be related to the simple fact that indexes are smaller (most of the time you cannot save that much I/O by shrinking indexes without affected when and how TIDs/heap tuples are scanned). It's necessary to simulate the production workload to have *any* idea if LP_DEAD index tuple deletion might be a factor. If the OP is just testing this one query on Postgres 13 in isolation, without anything bloating up (or cleaning up) indexes, then that doesn't really tell us anything about how Postgres 13 compares to Postgres 12. As you said, simply shrinking the indexes is nice, but not enough -- we'd need some second of second order effect to get acceptable performance over time and under real world conditions. -- Peter Geoghegan
Re: Very slow Query compared to Oracle / SQL - Server
Are you sure you're using the same data det ? Unless I'm overlooking something obvious one result has 500 000 rows the other 7 000. >
