Re: Bitmap scan is undercosted?
Sorry for delay with response, I had to switch to other tasks and didn't have time to run proper tests and write some meaningful response. Recently, a similar issue happened with another our database, so I decided to write an update. Bitmap scan was preferred to index scan by the planner, but bitmap scan was running worse in practice. Here are the relevant pieces of a much bigger query plan: -> Bitmap Heap Scan on cmdb_program_daily_usage cmdb_program_daily_usage_6 (cost=6707.08..6879.35 rows=32 width=20) (actual time=39.994..40.019 rows=12 loops=336) Recheck Cond: ((used_from = cmdb_ci_computer_12.id) AND (usage_date >= '2018-02-02'::date) AND (usage_date <= '2018-02-12'::date)) Filter: (((NOT thin_client) OR (thin_client IS NULL)) AND (program_instance IS NOT NULL) AND (minutes_in_use > 0)) Rows Removed by Filter: 69 Heap Blocks: exact=2995 Buffers: shared hit=563448 -> BitmapAnd (cost=6707.08..6707.08 rows=154 width=0) (actual time=39.978..39.978 rows=0 loops=336) Buffers: shared hit=560453 -> Bitmap Index Scan on idx_fk_5317241949468942 (cost=0.00..133.87 rows=12641 width=0) (actual time=0.373..0.373 rows=4780 loops=336) Index Cond: (used_from = cmdb_ci_computer_12.id) Buffers: shared hit=5765 -> Bitmap Index Scan on idx_263911642415136 (cost=0.00..6572.94 rows=504668 width=0) (actual time=40.873..40.873 rows=540327 loops=324) Index Cond: ((usage_date >= '2018-02-02'::date) AND (usage_date <= '2018-02-12'::date)) Buffers: shared hit=554688 -> Index Scan using idx_fk_5317241949468942 on cmdb_program_daily_usage cmdb_program_daily_usage_6 (cost=0.56..24322.97 rows=35 width=20) (actual time=1.211..2.196 rows=14 loops=338) Index Cond: (used_from = cmdb_ci_computer_12.id) Filter: (((NOT thin_client) OR (thin_client IS NULL)) AND (program_instance IS NOT NULL) AND (minutes_in_use > 0) AND (usage_date >= '2018-02-02'::date) AND (usage_date <= '2018-02-12'::date)) Rows Removed by Filter: 4786 Buffers: shared hit=289812 The difference in run time does not look very huge, but when it's a part of a loop, that could mean difference between minutes and hours. After running some tests, here are the conclusions we've made: - When running with cold cache, and data is being read from disk, then the planner estimates look adequate. Bitmap scan has better costs, and indeed it performs better in that case. - When running with hot cache, and most of data is already in RAM, then index scan starts to outperform bitmap scan. Unfortunately the planner cannot account for the cache very well, and can't switch the plan. Because even if the planner would ever learn to account for the current content of shared buffers, it still can't know much about the content of filesystem cache. - Tests showed that the costs are dominated by random_page_cost, but there is still potential to change the total plan cost, if "cpu_*" costs would be less distant from "*_page_cost". - In our case the data is likely to be in cache, so we decided to change cost settings: seq_page_cost 1.0 -> 0.5; random_page_cost 1.1 -> 0.6 Regards, Vitaliy
RE: Updating large tables without dead tuples
> -Original Message- > From: Stephen Frost [mailto:[email protected]] > Sent: Friday, February 23, 2018 19:10 > To: [email protected] > Cc: [email protected] > Subject: Re: Updating large tables without dead tuples > > Greetings, > > * [email protected] ([email protected]) wrote: > > This was done during a maintenance window, and that table is read-only > except when we ETL data to it on a weekly basis, and so I was just wondering > why I should pay the "bloat" penalty for this type of transaction. Is there a > trick > that could be use here? > > Yes, create a new table and INSERT the data into that table, then swap the new > table into place as the old table. Another option, if you don't mind the > exclusive lock taken on the table, is to dump the data to another table, then > TRUNCATE the current one and then INSERT into it. > > There's other options too, involving triggers and such to allow updates and > other changes to be captured during this process, avoiding the need to lock > the > table, but that gets a bit complicated. > > > More generally, I suspect that the MVCC architecture is so deep that > something like LOCK TABLE, which would guarantee that there won't be > contentions, couldn't be used as a heuristic to not create dead tuples? That > would make quite a performance improvement for this type of work though. > > I'm afraid it wouldn't be quite that simple, particularly you have to think > about > what happens when you issue a rollback... > > Thanks! > > Stephen [Laurent Hasson] [Laurent Hasson] This table several other tables with foreign keys into it... So any physical replacement of the table wouldn't work I believe. I'd have to disable/remove the foreign keys across the other tables, do this work, and then re-set the foreign keys. Overall time in aggregate may not be much shorter than the current implementation. This table represents Hospital visits, off of which hang a lot of other information. The updated column in that Visits table is not part of the key. As for the rollback, I didn't think about it because in our case, short of a db/hardware failure, this operation wouldn't fail... But the risk is there and I understand the engine must be prepared for anything and fulfill the ACID principles. With respect to that, I read in many places that an UPDATE is effectively a DELETE + INSERT. Does that mean in the rollback logs, there are 2 entries for each row updated as a result? Thank you, Laurent.
Re: Updating large tables without dead tuples
Greetings, * [email protected] ([email protected]) wrote: > > * [email protected] ([email protected]) wrote: > > > This was done during a maintenance window, and that table is read-only > > except when we ETL data to it on a weekly basis, and so I was just wondering > > why I should pay the "bloat" penalty for this type of transaction. Is there > > a trick > > that could be use here? > > > > Yes, create a new table and INSERT the data into that table, then swap the > > new > > table into place as the old table. Another option, if you don't mind the > > exclusive lock taken on the table, is to dump the data to another table, > > then > > TRUNCATE the current one and then INSERT into it. > > > > There's other options too, involving triggers and such to allow updates and > > other changes to be captured during this process, avoiding the need to lock > > the > > table, but that gets a bit complicated. > > > > > More generally, I suspect that the MVCC architecture is so deep that > > something like LOCK TABLE, which would guarantee that there won't be > > contentions, couldn't be used as a heuristic to not create dead tuples? That > > would make quite a performance improvement for this type of work though. > > > > I'm afraid it wouldn't be quite that simple, particularly you have to think > > about > > what happens when you issue a rollback... > > [Laurent Hasson] > This table several other tables with foreign keys into it... So any physical > replacement of the table wouldn't work I believe. I'd have to disable/remove > the foreign keys across the other tables, do this work, and then re-set the > foreign keys. Overall time in aggregate may not be much shorter than the > current implementation. That would depend on the FKs, of course, but certainly having them does add to the level of effort required. > This table represents Hospital visits, off of which hang a lot of other > information. The updated column in that Visits table is not part of the key. > > As for the rollback, I didn't think about it because in our case, short of a > db/hardware failure, this operation wouldn't fail... But the risk is there > and I understand the engine must be prepared for anything and fulfill the > ACID principles. Right, PG still needs to be able to provide the ability to perform a rollback. > With respect to that, I read in many places that an UPDATE is effectively a > DELETE + INSERT. Does that mean in the rollback logs, there are 2 entries for > each row updated as a result? The short answer is yes. The existing row is updated with a marker saying "not valid as of this transaction" and a new row is added with a marker saying "valid as of this transaction." Each of those changes also ends up in WAL (possibly as a full-page image, if that was the first time that page was changed during that checkpoint, or possibly as just a partial page change if the page had already been modified during that checkpoint and a prior full-page image written out). Indexes also may need to be updated, depending on if the new row ended up on the same page or not and depending on which columns were indexed and which were being changed. There has been discussion around having an undo-log type of approach, where the page is modified in-place and a log of what existed previously stored off to the side, to allow for rollback, but it doesn't seem likely that we'll have that any time soon, and that space to store the undo log would have to be accounted for as well. Thanks! Stephen signature.asc Description: PGP signature
Re: Please help
Have you looked at the Nagios XI & Core packages? https://www.nagios.com/solutions/postgres-monitoring/ On 02/23/2018 12:31 PM, Daulat Ram wrote: Hello team, I need help how & what we can monitor the Postgres database via Nagios. I came to know about the check_postgres.pl script but we are using free ware option of postgres. If its Ok with freeware then please let me know the steps how I can implement in our environment. Regards, Daulat
