Re: checkpoints taking much longer than expected

2019-06-17 Thread Andres Freund
On 2019-06-16 12:25:58 -0400, Jeff Janes wrote: > Right, but true only because they were "checkpoint starting: immediate". > Otherwise the reported write time includes intentional sleeps added to > honor the checkpoint_completion_target. A bit confusing to report it that > way, I think. +1 It's

Re: checkpoints taking much longer than expected

2019-06-17 Thread Tiemen Ruiten
On Sun, Jun 16, 2019 at 8:57 PM Alvaro Herrera wrote: > On 2019-Jun-14, Peter J. Holzer wrote: > > > There was a discussion about ZFS' COW behaviour and PostgreSQL reusing > > WAL files not being a good combination about a year ago: > > > https://www.postgresql.org/message-id/flat/CACukRjO7DJvub8

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > On 2019-Jun-16, Stephen Frost wrote: > > > The issue being discussed here is writing out to the heap files during a > > checkpoint... > > We don't really know, as it was already established that the log line is > misattributing time

Re: checkpoints taking much longer than expected

2019-06-16 Thread Alvaro Herrera
On 2019-Jun-16, Stephen Frost wrote: > The issue being discussed here is writing out to the heap files during a > checkpoint... We don't really know, as it was already established that the log line is misattributing time spent ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ Po

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > On Sun, Jun 16, 2019 at 7:30 PM Stephen Frost wrote: > > Ok, so you want fewer checkpoints because you expect to failover to a > > replica rather than recover the primary on a failure. If you're doing > > synchronous replication, then th

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > On 2019-Jun-16, Stephen Frost wrote: > > Not likely to help with what you're experiencing anyway though... > > My gut feeling is that you're wrong, since (as I understand) the > symptoms are the same. The issue in the linked-to thre

Re: checkpoints taking much longer than expected

2019-06-16 Thread Tiemen Ruiten
On Sun, Jun 16, 2019 at 7:30 PM Stephen Frost wrote: > Ok, so you want fewer checkpoints because you expect to failover to a > replica rather than recover the primary on a failure. If you're doing > synchronous replication, then that certainly makes sense. If you > aren't, then you're deciding

Re: checkpoints taking much longer than expected

2019-06-16 Thread Alvaro Herrera
On 2019-Jun-16, Stephen Frost wrote: > Not likely to help with what you're experiencing anyway though... My gut feeling is that you're wrong, since (as I understand) the symptoms are the same. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Re

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > On Sun, Jun 16, 2019 at 8:57 PM Alvaro Herrera > wrote: > > Note that Joyent ended up proposing patches to fix their performance > > problem (and got them committed). Maybe it would be useful for Tiemen > > to try that code? (That commi

Re: checkpoints taking much longer than expected

2019-06-16 Thread Tiemen Ruiten
On Sun, Jun 16, 2019 at 8:57 PM Alvaro Herrera wrote: > > Note that Joyent ended up proposing patches to fix their performance > problem (and got them committed). Maybe it would be useful for Tiemen > to try that code? (That commit cherry-picks cleanly on REL_11_STABLE.) > Interesting! The per

Re: checkpoints taking much longer than expected

2019-06-16 Thread Alvaro Herrera
On 2019-Jun-14, Peter J. Holzer wrote: > There was a discussion about ZFS' COW behaviour and PostgreSQL reusing > WAL files not being a good combination about a year ago: > https://www.postgresql.org/message-id/flat/CACukRjO7DJvub8e2AijOayj8BfKK3XXBTwu3KKARiTr67M3E3w%40mail.gmail.com > > Maybe yo

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Jeff Janes (jeff.ja...@gmail.com) wrote: > On Sat, Jun 15, 2019 at 4:50 AM Tiemen Ruiten wrote: > > On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost wrote: > >> The time information is all there and it tells you what it's doing and > >> how much had to be done... If you're unhappy with

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost wrote: > > * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > > > checkpoint_timeout = 60min > > > > That seems like a pretty long timeout. > > My reasoning was that a longer recovery time to av

Re: checkpoints taking much longer than expected

2019-06-16 Thread Jeff Janes
On Sat, Jun 15, 2019 at 4:50 AM Tiemen Ruiten wrote: > > On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost wrote: > >> >> The time information is all there and it tells you what it's doing and >> how much had to be done... If you're unhappy with how long it takes to >> write out gigabytes of data an

Re: checkpoints taking much longer than expected

2019-06-16 Thread Michael Loftis
On Fri, Jun 14, 2019 at 08:02 Tiemen Ruiten wrote: > Hello, > > I setup a new 3-node cluster with the following specifications: > > 2x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2*20 cores) > 128 GB RAM > 8x Crucial MX500 1TB SSD's > > FS is ZFS, the dataset with the PGDATA directory on it has th

Re: checkpoints taking much longer than expected

2019-06-15 Thread Peter Geoghegan
On Sat, Jun 15, 2019 at 1:50 AM Tiemen Ruiten wrote: > During normal operation I don't mind that it takes a long time, but when > performing maintenance I want to be able to gracefully bring down the master > without long delays to promote one of the standby's. Maybe an "immediate" mode shutdow

Re: checkpoints taking much longer than expected

2019-06-15 Thread Tiemen Ruiten
On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost wrote: > Greetings, > > * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > > checkpoint_timeout = 60min > > That seems like a pretty long timeout. > My reasoning was that a longer recovery time to avoid writes would be acceptable because there are two m

Re: checkpoints taking much longer than expected

2019-06-14 Thread Peter J. Holzer
On 2019-06-14 16:01:40 +0200, Tiemen Ruiten wrote: > FS is ZFS, the dataset with the PGDATA directory on it has the following > properties (only non-default listed): [...] > My problem is that checkpoints are taking a long time. Even when I run a few > manual checkpoints one after the other, they k

Re: checkpoints taking much longer than expected

2019-06-14 Thread Stephen Frost
Greetings, * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > checkpoint_timeout = 60min That seems like a pretty long timeout. > My problem is that checkpoints are taking a long time. Even when I run a > few manual checkpoints one after the other, they keep taking very long, up > to 10 minutes: Y

Re: checkpoints taking much longer than expected

2019-06-14 Thread Stephen Frost
Greetings, * Ravi Krishna (ravikris...@mail.com) wrote: > On 6/14/19 10:01 AM, Tiemen Ruiten wrote: > >LOG:  checkpoint starting: immediate force wait > > Does it mean that the DB is blocked until the completion of checkpoint. > Years ago > Informix use to have this issue until they fixed around

Re: checkpoints taking much longer than expected

2019-06-14 Thread Ravi Krishna
On 6/14/19 10:01 AM, Tiemen Ruiten wrote: LOG:  checkpoint starting: immediate force wait Does it mean that the DB is blocked until the completion of checkpoint. Years ago Informix use to have this issue until they fixed around 2006.

checkpoints taking much longer than expected

2019-06-14 Thread Tiemen Ruiten
Hello, I setup a new 3-node cluster with the following specifications: 2x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2*20 cores) 128 GB RAM 8x Crucial MX500 1TB SSD's FS is ZFS, the dataset with the PGDATA directory on it has the following properties (only non-default listed): NAMEPROPE