Re: Rapid disk usage spikes when updating large tables with GIN indexes

2018-05-21 Thread Jonathan Marks
I believe that we have figured it out. It indeed was a WAL issue — the WAL wasn’t getting measured because it had been moved into an archived folder. We resolved this in a two main ways: 1. By dramatically increasing max_wal_size to decrease the frequency of commits 2. By turning on wal_compress

Re: Rapid disk usage spikes when updating large tables with GIN indexes

2018-05-16 Thread Nikolay Samokhvalov
Why not set up a spot EC2 instance with Postgres 10.1, load database from a dump (yes you’ll need to create one from RDS because they don’t provide direct access to dumps/backups; probably you need to get only specific tables) and repeat your actions, closely looking at filesystem. ср, 16 мая 2018

Re: Rapid disk usage spikes when updating large tables with GIN indexes

2018-05-16 Thread Tom Lane
Jonathan Marks writes: > We turned on log_temp_files and since the last stats reset (about a week ago) > we’re seeing 0 temp files altogether (grabbing that info from > pg_stat_database). Hm. > Another thread we found suggested pg_subtrans — this seems less likely > because we’ve been able to

Re: Rapid disk usage spikes when updating large tables with GIN indexes

2018-05-16 Thread Jonathan Marks
Hi Tom — We turned on log_temp_files and since the last stats reset (about a week ago) we’re seeing 0 temp files altogether (grabbing that info from pg_stat_database). So, as far as we know: 1) It’s not WAL 2) It’s not tempfiles 3) It’s not the size of the error logs 4) It’s not the size of the

Re: Rapid disk usage spikes when updating large tables with GIN indexes

2018-05-14 Thread Jonathan Marks
We’ll turn on log_temp_files and get back to you to see if that’s the cause. Re: the exact queries — these are just normal INSERTs and UPDATEs. This occurs as part of normal database operations — i.e., we are processing 10% of a table and marking changes to a particular row, or happen to be inse

Re: Rapid disk usage spikes when updating large tables with GIN indexes

2018-05-14 Thread Tom Lane
[ please keep the list cc'd ] Jonathan Marks writes: > Thanks for your quick reply. Here’s a bit more information: > 1) to measure the “size of the database” we run something like `select > datname, pg_size_pretty(pg_database_size(datname)) from pg_database;` I’m not > sure if this includes WAL

Re: Rapid disk usage spikes when updating large tables with GIN indexes

2018-05-14 Thread Tom Lane
Jonathan Marks writes: > One recurring, and predictable, issue that we have experienced regularly for > multiple years is that inserting or updating rows in any table with GIN > indexes results in extremely large drops in free disk space — i.e. inserting > 10k rows with a total size of 10GB can

Rapid disk usage spikes when updating large tables with GIN indexes

2018-05-14 Thread Jonathan Marks
Hello! We have a mid-sized database on RDS running 10.1 (32 cores, 240 GB RAM, 5TB total disk space, 20k PIOPS) with several large (100GB+, tens of millions of rows) tables that use GIN indexes for full-text search. We at times need to index very large (hundreds of pages) documents and as a res