very high replay_lag on 3-node cluster

2019-07-19 Thread Tiemen Ruiten
Hello,

In my previous post[1] on this list I brought up an issue with long running
checkpoints. I reduced checkpoint_timeout to a more reasonable value (15m
down from 60m) and forced/immediate checkpoints now complete mostly in
under a minute. This thread and another one[2] on the Clusterlabs
mailinglist also helped me understand more about how PostgreSQL internals
work, thanks everyone!

Now to my current issue: I took the advice to add more monitoring on replay
lag (using pg_last_xact_replay_timestamp) and things are not looking good.
Last night replication lagged by almost 6 hours on one of the nodes[3], but
eventually caught up. As you can see in that screenshot, ph-sql-03 is
consistently slower to replay than ph-sql-05 (ph-sql-04 is the current
master) and there happen to be different SSD's in ph-sql-03 (Crucial MX300
vs Crucial MX500 in the other two), which makes me think this is IO
related.

When I check the replay_lag column of pg_stat_replication, the numbers are
consistent with the data from pg_last_xact_replay_timestamp:

postgres=# SELECT application_name, replay_lag FROM pg_stat_replication;
 application_name |   replay_lag
--+-
 ph-sql-03| 00:15:16.179952
 ph-sql-05| 00:10:01.078163

Currently this doesn't present an operational issue, as the slaves aren't
used by applications (still waiting for development to make the necessary
changes). So there are no queries running at all on the slaves apart from
the occasional monitoring.

Cluster specifications:
all nodes:
- filesystem: ZFS stripe of mirrors
- 2* CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores total)
- 128 GB RAM
ph-sql-03: 8* Crucial MX300 1050MB, underprovisioned to 1TB
ph-sql-0{4,5}: 8* Crucial MX500 1TB

postgresql.conf with GUCs changed from default:

shared_buffers = 8GB
work_mem = 64MB
maintenance_work_mem = 2GB
autovacuum_work_mem = 1GB
effective_io_concurrency = 200
max_worker_processes = 50
max_parallel_maintenance_workers = 8
max_parallel_workers_per_gather = 8
max_parallel_workers = 40
wal_level = replica
synchronous_commit = off
full_page_writes = off
wal_log_hints = on
wal_buffers = 128MB
checkpoint_timeout = 15min
max_wal_size = 8GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pgbackrest --stanza=pgdb2 archive-push %p'
max_wal_senders = 10
wal_keep_segments = 20
hot_standby = on
hot_standby_feedback = on
random_page_cost = 1.5
effective_cache_size = 48GB
default_statistics_target = 500
shared_preload_libraries = 'timescaledb, pg_cron, pg_prewarm'   # (change
requires restart)
max_locks_per_transaction = 512

What are possible reasons for the high replay_lag? Is my storage just too
slow? Are there any tunables available?

[1]
https://www.postgresql.org/message-id/flat/CAEkBuzeno6ztiM1g4WdzKRJFgL8b2nfePNU%3Dq3sBiEZUm-D-sQ%40mail.gmail.com

[2] https://lists.clusterlabs.org/pipermail/users/2019-July/025967.html
[3] https://ibb.co/0sncjBZ

Thank you,

Tiemen Ruiten


Re: very high replay_lag on 3-node cluster

2019-07-22 Thread Tiemen Ruiten
Anyone have an idea? Thanks very much in advance for any reply.

On Fri, Jul 19, 2019 at 1:46 PM Tiemen Ruiten  wrote:

> Hello,
>
> In my previous post[1] on this list I brought up an issue with long
> running checkpoints. I reduced checkpoint_timeout to a more reasonable
> value (15m down from 60m) and forced/immediate checkpoints now complete
> mostly in under a minute. This thread and another one[2] on the Clusterlabs
> mailinglist also helped me understand more about how PostgreSQL internals
> work, thanks everyone!
>
> Now to my current issue: I took the advice to add more monitoring on
> replay lag (using pg_last_xact_replay_timestamp) and things are not looking
> good. Last night replication lagged by almost 6 hours on one of the
> nodes[3], but eventually caught up. As you can see in that screenshot,
> ph-sql-03 is consistently slower to replay than ph-sql-05 (ph-sql-04 is the
> current master) and there happen to be different SSD's in ph-sql-03
> (Crucial MX300 vs Crucial MX500 in the other two), which makes me think
> this is IO related.
>
> When I check the replay_lag column of pg_stat_replication, the numbers are
> consistent with the data from pg_last_xact_replay_timestamp:
>
> postgres=# SELECT application_name, replay_lag FROM pg_stat_replication;
>  application_name |   replay_lag
> --+-
>  ph-sql-03| 00:15:16.179952
>  ph-sql-05| 00:10:01.078163
>
> Currently this doesn't present an operational issue, as the slaves aren't
> used by applications (still waiting for development to make the necessary
> changes). So there are no queries running at all on the slaves apart from
> the occasional monitoring.
>
> Cluster specifications:
> all nodes:
> - filesystem: ZFS stripe of mirrors
> - 2* CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores total)
> - 128 GB RAM
> ph-sql-03: 8* Crucial MX300 1050MB, underprovisioned to 1TB
> ph-sql-0{4,5}: 8* Crucial MX500 1TB
>
> postgresql.conf with GUCs changed from default:
>
> shared_buffers = 8GB
> work_mem = 64MB
> maintenance_work_mem = 2GB
> autovacuum_work_mem = 1GB
> effective_io_concurrency = 200
> max_worker_processes = 50
> max_parallel_maintenance_workers = 8
> max_parallel_workers_per_gather = 8
> max_parallel_workers = 40
> wal_level = replica
> synchronous_commit = off
> full_page_writes = off
> wal_log_hints = on
> wal_buffers = 128MB
> checkpoint_timeout = 15min
> max_wal_size = 8GB
> min_wal_size = 1GB
> checkpoint_completion_target = 0.9
> archive_mode = on
> archive_command = 'pgbackrest --stanza=pgdb2 archive-push %p'
> max_wal_senders = 10
> wal_keep_segments = 20
> hot_standby = on
> hot_standby_feedback = on
> random_page_cost = 1.5
> effective_cache_size = 48GB
> default_statistics_target = 500
> shared_preload_libraries = 'timescaledb, pg_cron, pg_prewarm'   # (change
> requires restart)
> max_locks_per_transaction = 512
>
> What are possible reasons for the high replay_lag? Is my storage just too
> slow? Are there any tunables available?
>
> [1]
> https://www.postgresql.org/message-id/flat/CAEkBuzeno6ztiM1g4WdzKRJFgL8b2nfePNU%3Dq3sBiEZUm-D-sQ%40mail.gmail.com
>
> [2] https://lists.clusterlabs.org/pipermail/users/2019-July/025967.html
> [3] https://ibb.co/0sncjBZ
>
> Thank you,
>
> Tiemen Ruiten
>


Re: very high replay_lag on 3-node cluster

2019-07-22 Thread Tiemen Ruiten
On Mon, Jul 22, 2019 at 11:28 AM Jehan-Guillaume (ioguix) de Rorthais <
iog...@free.fr> wrote:

> Hi,
>
> On Mon, 22 Jul 2019 11:05:57 +0200
> Tiemen Ruiten  wrote:
> [...]
> > > Now to my current issue: I took the advice to add more monitoring on
> > > replay lag (using pg_last_xact_replay_timestamp) and things are not
> looking
> > > good. Last night replication lagged by almost 6 hours on one of the
> > > nodes[3], but eventually caught up. As you can see in that screenshot,
> > > ph-sql-03 is consistently slower to replay than ph-sql-05 (ph-sql-04
> is the
> > > current master) and there happen to be different SSD's in ph-sql-03
> > > (Crucial MX300 vs Crucial MX500 in the other two), which makes me think
> > > this is IO related.
>
> Such a difference is quite surprising. Moreover, I suppose you have some
> caching in front of disks (either RAID or SAN?). Could you describe your
> disk
> stack with more details?
>
> Do you have any detailed metrics about disks and network IO to share?
>
> The network is the same for both nodes?
>

Hi!

All nodes have only local storage. The filesystem is ZFS, which has its own
builtin caching (ARC/L2ARC & ZIL). There are no hardware caching devices
present. The disks are connected using a Supermicro AOC-S3008L-L8E HBA
(shown as LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3
(rev 02) with lspci). This is the layout of the pool:

[root@ph-sql-04 ter]# zpool status
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:07:51 with 0 errors on Tue Jul 16
03:17:52 2019
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
sda ONLINE   0 0 0
sdb ONLINE   0 0 0
  mirror-1  ONLINE   0 0 0
sdc ONLINE   0 0 0
sdd ONLINE   0 0 0
  mirror-2  ONLINE   0 0 0
sde ONLINE   0 0 0
sdf ONLINE   0 0 0
  mirror-3  ONLINE   0 0 0
sdg ONLINE   0 0 0
sdh ONLINE   0 0 0

I've attached a graph of network IO on all servers. The network config is
identical for all three nodes: 2x bonded gigabit connection to the same
stacked switch pair.

Currently I don't have much metrics on raw disk IO, I can add some
monitoring items and report those tomorrow if that would be useful. I do
have a lot of metrics available related to PostgreSQL (buffers
written/read, checkpoints, rows fetched/updated/inserted etc.)


Re: self-made certs not quite right

2021-03-03 Thread Tiemen Ruiten
Hello Rob,

With root.crt in ~/.postgresql, testing the jdbc connection from a
> tomcat server generates this failure (again either localhost or
> $(hostname))
> Blow out on db connection to jdbc:postgresql://localhost:5432/postgres;
> SSL error: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to
> find valid certification path to requested target
> org.postgresql.util.PSQLException: SSL error: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to
> find valid certification path to requested target
>

Java has its own certificate keystore, you would need to add your
certificate to it: https://tomcat.apache.org/tomcat-8.5-doc/ssl-howto.html

Hope this helps.

-- 
Tiemen Ruiten
Infrastructure Engineer


checkpoints taking much longer than expected

2019-06-14 Thread Tiemen Ruiten
Hello,

I setup a new 3-node cluster with the following specifications:

2x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2*20 cores)
128 GB RAM
8x Crucial MX500 1TB SSD's

FS is ZFS, the dataset with the PGDATA directory on it has the following
properties (only non-default listed):

NAMEPROPERTY  VALUE  SOURCE
tank/pgsql  used  234G   -
tank/pgsql  available 3.28T  -
tank/pgsql  referenced234G   -
tank/pgsql  compressratio 2.68x  -
tank/pgsql  compression   lz4inherited from tank
tank/pgsql  atime offinherited from tank
tank/pgsql  canmount  on local
tank/pgsql  xattr sa inherited from tank

My postgresql.conf (only changed from default listed):

hba_file = '/var/lib/pgsql/pg_hba.conf'
listen_addresses = '*'
max_connections = 800
shared_buffers = 8GB
work_mem = 64MB
maintenance_work_mem = 2GB
autovacuum_work_mem = 1GB
dynamic_shared_memory_type = posix
effective_io_concurrency = 200
max_worker_processes = 50
max_parallel_maintenance_workers = 8
max_parallel_workers_per_gather = 8
max_parallel_workers = 40
wal_level = replica
synchronous_commit = off
full_page_writes = on
wal_log_hints = on
wal_buffers = 128MB
checkpoint_timeout = 60min
max_wal_size = 8GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pgbackrest --stanza=pgdb2 archive-push %p'
max_wal_senders = 10
wal_keep_segments = 20
hot_standby = on
hot_standby_feedback = on
random_page_cost = 1.5
effective_cache_size = 48GB
default_statistics_target = 500 # range 1-1
idle_in_transaction_session_timeout = 30min # in milliseconds, 0 is disabled
shared_preload_libraries = 'timescaledb, pg_cron'
max_locks_per_transaction = 512
timescaledb.max_background_workers = 8

My problem is that checkpoints are taking a long time. Even when I run a
few manual checkpoints one after the other, they keep taking very long, up
to 10 minutes:

2019-06-14 15:21:10.351 CEST [23657] LOG:  checkpoint starting: immediate
force wait
2019-06-14 15:25:57.655 CEST [23657] LOG:  checkpoint complete: wrote
139831 buffers (13.3%); 0 WAL file(s) added, 148 removed, 40 recycled;
write=284.751 s, sync=0.532 s, total=287.304 s; sync files=537,
longest=0.010 s, average=0.000 s; distance=2745065 kB, estimate=2841407 kB
2019-06-14 15:26:01.988 CEST [23657] LOG:  checkpoint starting: immediate
force wait
2019-06-14 15:30:30.430 CEST [23657] LOG:  checkpoint complete: wrote
238322 buffers (22.7%); 0 WAL file(s) added, 0 removed, 172 recycled;
write=264.794 s, sync=0.415 s, total=268.441 s; sync files=378,
longest=0.011 s, average=0.001 s; distance=2834018 kB, estimate=2840668 kB
2019-06-14 15:30:44.097 CEST [23657] LOG:  checkpoint starting: immediate
force wait
2019-06-14 15:37:01.438 CEST [23657] LOG:  checkpoint complete: wrote
132286 buffers (12.6%); 0 WAL file(s) added, 54 removed, 96 recycled;
write=366.614 s, sync=2.975 s, total=377.341 s; sync files=467,
longest=0.095 s, average=0.006 s; distance=2444291 kB, estimate=2801030 kB

What is going on? It doesn't seem like normal behaviour?


Re: checkpoints taking much longer than expected

2019-06-15 Thread Tiemen Ruiten
On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost  wrote:

> Greetings,
>
> * Tiemen Ruiten (t.rui...@tech-lab.io) wrote:
> > checkpoint_timeout = 60min
>
> That seems like a pretty long timeout.
>

My reasoning was that a longer recovery time to avoid writes would be
acceptable because there are two more nodes in the cluster to fall back on
in case of emergency.


>
> > My problem is that checkpoints are taking a long time. Even when I run a
> > few manual checkpoints one after the other, they keep taking very long,
> up
> > to 10 minutes:
>
> You haven't said *why* this is an issue...  Why are you concerned with
> how long it takes to do a checkpoint?
>

During normal operation I don't mind that it takes a long time, but when
performing maintenance I want to be able to gracefully bring down the
master without long delays to promote one of the standby's.


>
> The time information is all there and it tells you what it's doing and
> how much had to be done... If you're unhappy with how long it takes to
> write out gigabytes of data and fsync hundreds of files, talk to your
> storage people...
>

I am the storage people too :)


>
> Thanks,
>
> Stephen
>


Re: checkpoints taking much longer than expected

2019-06-16 Thread Tiemen Ruiten
On Sun, Jun 16, 2019 at 8:57 PM Alvaro Herrera 
wrote:

>
> Note that Joyent ended up proposing patches to fix their performance
> problem (and got them committed).  Maybe it would be useful for Tiemen
> to try that code?  (That commit cherry-picks cleanly on REL_11_STABLE.)
>

Interesting! The performance improvements in that thread look like it's
worth trying. Is this patch likely to make it to an 11.x release?


Re: checkpoints taking much longer than expected

2019-06-16 Thread Tiemen Ruiten
On Sun, Jun 16, 2019 at 7:30 PM Stephen Frost  wrote:

> Ok, so you want fewer checkpoints because you expect to failover to a
> replica rather than recover the primary on a failure.  If you're doing
> synchronous replication, then that certainly makes sense.  If you
> aren't, then you're deciding that you're alright with losing some number
> of writes by failing over rather than recovering the primary, which can
> also be acceptable but it's certainly much more questionable.
>

Yes, in our setup that's the case: a few lost transactions will have a
negligible impact to the business.


> I'm getting the feeling that your replicas are async, but it sounds like
> you'd be better off with having at least one sync replica, so that you
> can flip to it quickly.


They are indeed async, we traded durability for performance here, because
we can accept some lost transactions.


> Alternatively, having a way to more easily make
> the primary to accepting new writes, flush everything to the replicas,
> report that it's completed doing so, to allow you to promote a replica
> without losing anything, and *then* go through the process on the
> primary of doing a checkpoint, would be kind of nice.
>

I suppose that would require being able to demote a master to a slave
during runtime.
That would definitely be nice-to-have.


>
>
> Thanks,
>
> Stephen
>


Re: checkpoints taking much longer than expected

2019-06-17 Thread Tiemen Ruiten
On Sun, Jun 16, 2019 at 8:57 PM Alvaro Herrera 
wrote:

> On 2019-Jun-14, Peter J. Holzer wrote:
>
> > There was a discussion about ZFS' COW behaviour and PostgreSQL reusing
> > WAL files not being a good combination about a year ago:
> >
> https://www.postgresql.org/message-id/flat/CACukRjO7DJvub8e2AijOayj8BfKK3XXBTwu3KKARiTr67M3E3w%40mail.gmail.com
> >
> > Maybe you have the same problem?
>
> Note that Joyent ended up proposing patches to fix their performance
> problem (and got them committed).  Maybe it would be useful for Tiemen
> to try that code?  (That commit cherry-picks cleanly on REL_11_STABLE.)
>

Unfortunately it doesn't compile:

access/transam/xlog.o: In function `XLogFileInit':
/home/ter/rpmbuild/BUILD/postgresql-11.3/src/backend/access/transam/xlog.c:3299:
undefined reference to `pg_pwrite'
collect2: error: ld returned 1 exit status


>
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>


Re: Please help: pgAdmin 4 on Amazon Linux 2

2021-08-28 Thread Tiemen Ruiten
Hi,

Amazon Linux is not compatible with RHEL the way CentOS is and it's not a
supported platform for the RPM installation:
https://www.pgadmin.org/download/pgadmin-4-rpm/

The repo configuration probably uses some variables to create the download
URLs and because the versions don't match, it results in a 404. You may
have some luck downloading and installing the RPM directly, but you'll
likely want to take a look at other installation methods on the page I
linked if you want to install pgAdmin on Amazon Linux.

On Fri, Aug 27, 2021 at 5:34 PM Blake McBride  wrote:

> Greetings,
>
> I am trying to install pgAdmin 4 on Amazon Linux 2.  PostgreSQL is already
> installed and working fine.  I believe Amazon Linux 2 is based on RedHat.
>
> I am doing the following:
>
> [root@a-1lxumlkkw4mu4 ~]# rpm -i
> https://ftp.postgresql.org/pub/pgadmin/pgadmin4/yum/pgadmin4-redhat-repo-2-1.noarch.rpm
> warning: /var/tmp/rpm-tmp.ZEygli: Header V3 RSA/SHA256 Signature, key ID
> 210976f2: NOKEY
> [root@a-1lxumlkkw4mu4 ~]# yum install pgadmin4
> Loaded plugins: amzn_workspaces_filter_updates, halt_os_update_check,
> priorities, update-motd
> amzn2-core
>   | 3.7 kB  00:00:00
>
> amzn2extra-GraphicsMagick1.3
>   | 3.0 kB  00:00:00
>
> amzn2extra-docker
>| 3.0 kB  00:00:00
>
> amzn2extra-epel
>| 3.0 kB  00:00:00
>
> amzn2extra-gimp
>| 1.3 kB  00:00:00
>
> amzn2extra-libreoffice
>   | 3.0 kB  00:00:00
>
> amzn2extra-mate-desktop1.x
>   | 3.0 kB  00:00:00
>
> epel/x86_64/metalink
>   |  15 kB  00:00:00
>
> firefox
>| 2.2 kB  00:00:00
>
> google-chrome
>| 1.3 kB  00:00:00
>
>
> https://ftp.postgresql.org/pub/pgadmin/pgadmin4/yum/redhat/rhel-2-x86_64/repodata/repomd.xml:
> [Errno 14] HTTPS Error 404 - Not Found
> Trying other mirror.
>
>
>  One of the configured repositories failed (pgadmin4),
>  and yum doesn't have enough cached data to continue. At this point the
> only
>  safe thing yum can do is fail. There are a few ways to work "fix" this:
>
>  1. Contact the upstream for the repository and get them to fix the
> problem.
>
>  2. Reconfigure the baseurl/etc. for the repository, to point to a
> working
> upstream. This is most often useful if you are using a newer
> distribution release than is supported by the repository (and the
> packages for the previous distribution release still work).
>
>  3. Run the command with the repository temporarily disabled
> yum --disablerepo=pgAdmin4 ...
>
>  4. Disable the repository permanently, so yum won't use it by
> default. Yum
> will then just ignore the repository until you permanently enable
> it
> again or use --enablerepo for temporary usage:
>
> yum-config-manager --disable pgAdmin4
> or
> subscription-manager repos --disable=pgAdmin4
>
>  5. Configure the failing repository to be skipped, if it is
> unavailable.
> Note that yum will try to contact the repo. when it runs most
> commands,
> so will have to try and fail each time (and thus. yum will be be
> much
> slower). If it is a very temporary problem though, this is often a
> nice
> compromise:
>
> yum-config-manager --save
> --setopt=pgAdmin4.skip_if_unavailable=true
>
> failure: repodata/repomd.xml from pgAdmin4: [Errno 256] No more mirrors to
> try.
>
> https://ftp.postgresql.org/pub/pgadmin/pgadmin4/yum/redhat/rhel-2-x86_64/repodata/repomd.xml:
> [Errno 14] HTTPS Error 404 - Not Found
> [root@a-1lxumlkkw4mu4 ~]#
>
>
> I have no idea how to fix this.  Any help would sure be appreciated.
>
> Blake McBride
>
>


-- 
Tiemen Ruiten
Infrastructure Engineer