Re: pg12 - partition by column that might have null values

2019-10-03 Thread Mariel Cherkassky
Not sure how data storage is relevant here, I was only focusing on query
optimization. Lets say that most of the data isnt moving (history data).
However, objects can be changed and therefore new revisions are added and
the previous revisions updated (their end_date is updated). If you run
queries that involve the end_date very common (in order to get the most
recent revision of objects) it will be better to set this column as a
partition column instead just having an index on this col. In this way,
getting all the recent revisions of a specific object is reached by log(m)
[m is the number of most recent revisions] instead of logn [n is the number
of revisions u have] and n is by far bigger than m. Correct me I'f I'm
wrong, this topic is quite interesting ..


Re: pg12 - partition by column that might have null values

2019-10-03 Thread Michael Lewis
Just create a partial index on id column where end_date = infinity (or null
if you really prefer that pattern) and the system can quickly find the rows
that are still most current revision. How many rows do you have in this
table? Or foresee ever having? What took you down the road of partitioning
the table? Theory only, or solving a real life optimization problem?


Some observations on very slow pg_restore operations

2019-10-03 Thread Ogden Brash
I recently performed a pg_dump (data-only) of a relatively large database
where we store intermediate results of calculations. It is approximately 3
TB on disk and has about 20 billion rows.

We do the dump/restore about once a month and as the dataset has grown, the
restores have gotten very slow. So, this time I decided to do it a
different way and have some observations that puzzle me.

Background:

The data is extremely simple. The rows consist only of numbers and are all
fixed length. There are no foreign keys, constraints, null values, or
default values. There are no strings or arrays. There are 66 tables and the
number of rows in each table forms a gaussian distribution; so there are 3
tables which have about 3 billion rows each and the rest of the tables have
significantly fewer rows.

I used the directory format when doing the pg_dump. The compressed data of
the dump is 550 GB.

I am using: (PostgreSQL) 11.5 (Ubuntu 11.5-1.pgdg18.04+1)

The machine that I attempted to do a pg_restore to is a dedicated server
just for one instance of posgresql. It has 32 GB of memory and is running
Ubuntu 18.04 (headless). It physical hardware, not virtualized. Nothing
else runs on the machine and the postgresql.conf settings have been tuned
(to the best of my postgresql abilities which are suspect). While the
operating system is installed on an SSD, there is one extra large, fast HDD
that is dedicated to the posgresql server. It has been in use for this
particular purpose for a while and has not had performance issues. (Just
with pg_restore)

Autovacuum is off and all indexes have been deleted before the restore is
started. There is nothing in the db except for the empty data tables.

Restoring over the net:

In the past we have always restored in a way where the dumped data is read
over a gigabit connection while being restored to the local drive. But, the
last time we did it it took 2 days and I was looking for something faster.
So, I decided to copy the dumped directory to the local drive and restore
from the dump locally. I knew that because the machine only had one drive
that would fit the data, there would be some I/O contention, but I hoped
that it might not be as bad as reading over the network.

The pg_restore went unbearably slowly... after many hours it had written
less than 20GB to the database, so I started tracking it with iostat to see
what was going on. The following is iostat output every 60 seconds. I
tracked it for several hours and this is representative of what was
happening consistently.


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.390.000.40   43.100.00   56.11

Device tpskB_read/skB_wrtn/skB_readkB_wrtn
loop0 0.00 0.00 0.00  0  0
loop1 0.00 0.00 0.00  0  0
loop2 0.00 0.00 0.00  0  0
sda 263.33   132.87  2990.93   7972 179456
sdb   0.17 0.00 0.73  0 44

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.340.000.41   44.430.00   54.82

Device tpskB_read/skB_wrtn/skB_readkB_wrtn
loop0 0.00 0.00 0.00  0  0
loop1 0.00 0.00 0.00  0  0
loop2 0.00 0.00 0.00  0  0
sda 262.95   140.47  2983.00   8428 178980
sdb   0.08 0.00 0.40  0 24

While I was tracking this I started experimenting with the IO scheduler to
see if it had a noticable impact. I had been using cfq (ubuntu 18.04
default). Changing to deadline did not have a noticable difference.
Changing to noop made things much slower. I went back to cfq. I also
experimented with turning fsync off; that did speed things up a bit but not
enough for me to leave it off.

What puzzled me is that the OS was spending such a large percentage of time
in iowait, yet there was so little IO going on.

So, I decided to go back to restoring over the net. While the slow
pg_restore was still going on, and while I was still tracking iostat, I
copied the 550 GB dumps to an nfs drive. The copy happened pretty much at
full speed (limit being the gigabit ethernet) and interestingly, it did not
slow down kb_wrtn and kb_wrtn/s numbers in iostat (which was the postgresql
server continuing with the restore). To me that seemed to indicate that it
was not really a disk I/O limitation.

Restoring over the net:

After copying the dump files to an NFS drive, I stopped the restore,
truncated the tables and started exactly the same command, but this time
taking its input from the nfs drive. I did not reboot the machine or
restart the postgresql server. I tracked iostate every 60 seconds and this
is what it looks like:

avg-cpu:  %user   %nice %system %iowait  %

Re: Some observations on very slow pg_restore operations

2019-10-03 Thread MichaelDBA

Hi Ogden,

You didn't mention any details about your postgresql.conf settings.  Why 
don't you set them optimally for your loads and try again and see if 
there is any difference.  Make sure you do a DB restart since some of 
these parameters require it.


==
parameter   before  after
--  ---
shared_buffers  Reduce this value to about 25% of total memory
temp_buffersDecrease this value to 8MB since we are not using 
temporary tables or doing intermediate sorts
work_memReduce significantly (1MB) since we are not doing 
memory sorts or hashes per SQL
maintenance_work_memIncrease signficantly for DDL bulk loading, restore 
operations
fsync   off (so that time is not being spent waiting for stuff 
to be written to disk). Note: you may not be able to recover your database 
after a crash when set to off.
checkpoint_segments Increase this significantly for DML bulk loading, 
restore operations
max_wal_sizeIncrease significantly like you would to 
checkpoint_segments
min_wal_sizeIncrease significantly like you would to 
checkpoint_segments
checkpoint_timeout  Increase to at least 30min
archive_modeoff
autovacuum  off
synchronous_commit  off
wal_level   minimal
max_wal_senders 0
full_page_writesoff during DML bulk loading, restore operations
wal_buffers 16MB during DML bulk loading, restore operations


Regards,
Michael Vitale


Ogden Brash wrote on 10/3/2019 4:30 PM:
I recently performed a pg_dump (data-only) of a relatively large 
database where we store intermediate results of calculations. It is 
approximately 3 TB on disk and has about 20 billion rows.


We do the dump/restore about once a month and as the dataset has 
grown, the restores have gotten very slow. So, this time I decided to 
do it a different way and have some observations that puzzle me.


Background:

The data is extremely simple. The rows consist only of numbers and are 
all fixed length. There are no foreign keys, constraints, null values, 
or default values. There are no strings or arrays. There are 66 tables 
and the number of rows in each table forms a gaussian distribution; so 
there are 3 tables which have about 3 billion rows each and the rest 
of the tables have significantly fewer rows.


I used the directory format when doing the pg_dump. The compressed 
data of the dump is 550 GB.


I am using: (PostgreSQL) 11.5 (Ubuntu 11.5-1.pgdg18.04+1)

The machine that I attempted to do a pg_restore to is a dedicated 
server just for one instance of posgresql. It has 32 GB of memory and 
is running Ubuntu 18.04 (headless). It physical hardware, not 
virtualized. Nothing else runs on the machine and the postgresql.conf 
settings have been tuned (to the best of my postgresql abilities which 
are suspect). While the operating system is installed on an SSD, there 
is one extra large, fast HDD that is dedicated to the posgresql 
server. It has been in use for this particular purpose for a while and 
has not had performance issues. (Just with pg_restore)


Autovacuum is off and all indexes have been deleted before the restore 
is started. There is nothing in the db except for the empty data tables.


Restoring over the net:

In the past we have always restored in a way where the dumped data is 
read over a gigabit connection while being restored to the local 
drive. But, the last time we did it it took 2 days and I was looking 
for something faster. So, I decided to copy the dumped directory to 
the local drive and restore from the dump locally. I knew that because 
the machine only had one drive that would fit the data, there would be 
some I/O contention, but I hoped that it might not be as bad as 
reading over the network.


The pg_restore went unbearably slowly... after many hours it had 
written less than 20GB to the database, so I started tracking it with 
iostat to see what was going on. The following is iostat output every 
60 seconds. I tracked it for several hours and this is representative 
of what was happening consistently.



avg-cpu:  %user   %nice %system %iowait  %steal  %idle
           0.39    0.00 0.40   43.10    0.00   56.11

Device             tps    kB_read/s    kB_wrtn/s kB_read    kB_wrtn
loop0    0.00         0.00         0.00          0          0
loop1             0.00         0.00         0.00   0          0
loop2  0.00         0.00         0.00          0          0
sda             263.33       132.87      2990.93  7972     179456
sdb  0.17         0.00         0.73          0         44

avg-cpu: %user   %nice %system %iowait  %steal   %idle
           0.34    0.00    0.41   44.43    0.00   54.82

Device          tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
loop0             0.00         0.00         0.00   0          0
loop1  0.00         0.00         0.00          0          0
l

Re: Some observations on very slow pg_restore operations

2019-10-03 Thread Ogden Brash
Thanks Michael,

I am sure that there is some optimization to be found in the config (and
below are all the non-default values in the file). I suspect that they
won't explain the difference between restoring from nfs vs local drive, but
they could certainly speed up my best case. The dataset is so huge and the
schema so simple that I can't imagine the various buffer sizes making too
much difference - nothing is really going to fit into memory anyway.

This instance of the DB is currently used in read-only mode by only two
client processes running on other machines. So it is tuned for a small
number of user and primarily simple queries.

root@tb-db:/etc/postgresql/11/main# grep '^[[:blank:]]*[^[:blank:]#;]'
postgresql.conf

data_directory = '/var/lib/postgresql/11/main'
hba_file = '/etc/postgresql/11/main/pg_hba.conf'
ident_file = '/etc/postgresql/11/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/11-main.pid'
listen_addresses = '*'
port = 5432
max_connections = 25
unix_socket_directories = '/var/run/postgresql' # comma-separated list of
directories
ssl = on
ssl_cert_file = '/etc/ssl/certs/ssl-cert-snakeoil.pem'
ssl_key_file = '/etc/ssl/private/ssl-cert-snakeoil.key'
shared_buffers = 8GB# min 128kB
work_mem = 1GB  # min 64kB
maintenance_work_mem = 4GB
dynamic_shared_memory_type = posix
effective_io_concurrency = 2
max_worker_processes = 4
max_parallel_workers = 4
fsync = on
synchronous_commit = off
wal_buffers = 16MB
wal_writer_delay = 2000ms
wal_writer_flush_after = 100MB
max_wal_size = 8GB
min_wal_size = 4GB
random_page_cost = 4.0
effective_cache_size = 24GB
default_statistics_target = 500
log_line_prefix = '%m [%p] %q%u@%d '
log_timezone = 'localtime'
cluster_name = '11/main'
stats_temp_directory = '/var/run/postgresql/11-main.pg_stat_tmp'
autovacuum = off
datestyle = 'iso, mdy'
timezone = 'localtime'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
include_dir = 'conf.d'


On Thu, Oct 3, 2019 at 2:48 PM MichaelDBA  wrote:

> Hi Ogden,
>
> You didn't mention any details about your postgresql.conf settings.  Why
> don't you set them optimally for your loads and try again and see if there
> is any difference.  Make sure you do a DB restart since some of these
> parameters require it.
>
> ==
> parameter before  after
>   --  ---
> shared_buffersReduce this value to about 25% of total memory
> temp_buffers  Decrease this value to 8MB since we are not using 
> temporary tables or doing intermediate sorts
> work_mem  Reduce significantly (1MB) since we are not doing 
> memory sorts or hashes per SQL
> maintenance_work_mem  Increase signficantly for DDL bulk loading, restore 
> operations
> fsync off (so that time is not being spent waiting for stuff 
> to be written to disk). Note: you may not be able to recover your database 
> after a crash when set to off.
> checkpoint_segments   Increase this significantly for DML bulk loading, 
> restore operations
> max_wal_sizeIncrease significantly like you would to 
> checkpoint_segments
> min_wal_sizeIncrease significantly like you would to 
> checkpoint_segments
> checkpoint_timeout  Increase to at least 30min
> archive_mode  off
> autovacuumoff
> synchronous_commitoff
> wal_level minimal
> max_wal_senders 0
> full_page_writesoff during DML bulk loading, restore operations
> wal_buffers 16MB during DML bulk loading, restore operations
>
>
> Regards,
> Michael Vitale
>
>
> Ogden Brash wrote on 10/3/2019 4:30 PM:
>
> I recently performed a pg_dump (data-only) of a relatively large database
> where we store intermediate results of calculations. It is approximately 3
> TB on disk and has about 20 billion rows.
>
> We do the dump/restore about once a month and as the dataset has grown,
> the restores have gotten very slow. So, this time I decided to do it a
> different way and have some observations that puzzle me.
>
> Background:
>
> The data is extremely simple. The rows consist only of numbers and are all
> fixed length. There are no foreign keys, constraints, null values, or
> default values. There are no strings or arrays. There are 66 tables and the
> number of rows in each table forms a gaussian distribution; so there are 3
> tables which have about 3 billion rows each and the rest of the tables have
> significantly fewer rows.
>
> I used the directory format when doing the pg_dump. The compressed data of
> the dump is 550 GB.
>
> I am using: (PostgreSQL) 11.5 (Ubuntu 11.5-1.pgdg18.04+1)
>
> The machine that I attempted to do a pg_restore to is a dedicated server
> just for one instance of posgresql. It has 32 GB of memory and is running
> Ubuntu 18.04 (headless). It physical hardware, 

Re: Some observations on very slow pg_restore operations

2019-10-03 Thread Justin Pryzby
What is the state of KSM/THP?  Did you try disabling them ?  I've seen these
can cause high iowait (although that was a virtual environment).  It would be
interesting to see vmstat output.
tail /sys/kernel/mm/ksm/run 
/sys/kernel/mm/transparent_hugepage/khugepaged/defrag 
/sys/kernel/mm/transparent_hugepage/enabled 
/sys/kernel/mm/transparent_hugepage/defrag  
 
https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com

I realize the postgres params don't seem related to the difference between
local/remote/NFS, but did you see all these ?
https://www.postgresql.org/docs/current/populate.html

Justin