Postgres 10, slave not catching up with master

2018-10-21 Thread Boris Sagadin
Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x 1.9TB
NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
Ubuntu 16.04.

Multi-tenant DB with about 4 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started syncing from
master, which took about 4 hours, then it started applying the WALs.
However, it seems it can't catch up. Delay is still around 3 hours
(measured with now() - pg_last_xact_replay_timestamp()), even a day later.
It goes a few 100s up and down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected clients,
recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I found
suspicious is stracing the recovery process constantly produces many errors
such as:

lseek(428, 0, SEEK_END) = 780124160
lseek(30, 0, SEEK_END)  = 212992
read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
unavailable)
lseek(680, 0, SEEK_END) = 493117440
read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
unavailable)
lseek(774, 0, SEEK_END) = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
unavailable)
lseek(774, 0, SEEK_END) = 583368704
read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
unavailable)
lseek(277, 0, SEEK_END) = 502882304
lseek(6, 516096, SEEK_SET)  = 516096
read(6,
"\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
8192) = 8192
read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
unavailable)
lseek(735, 0, SEEK_END) = 272809984
read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
unavailable)
lseek(277, 0, SEEK_END) = 502882304

ls -l fd/9
lr-x-- 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]


Perf top on recovery produces:

 27.76%  postgres[.] pglz_decompress
   9.90%  [kernel][k] entry_SYSCALL_64_after_swapgs
   7.09%  postgres[.] hash_search_with_hash_value
   4.26%  libpthread-2.23.so  [.] llseek
   3.64%  libpthread-2.23.so  [.] __read_nocancel
   2.80%  [kernel][k] __fget_light
   2.67%  postgres[.] 0x0034d3ba
   1.85%  [kernel][k] ext4_llseek
   1.84%  postgres[.] pg_comp_crc32c_sse42
   1.44%  postgres[.] hash_any
   1.35%  postgres[.] 0x0036afad
   1.29%  postgres[.] MarkBufferDirty
   1.21%  postgres[.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no change.

I can turn off the WAL compression but I doubt this is the main culprit.
Any ideas appreciated.

Regards,
Boris


Re: Postgres 10, slave not catching up with master

2018-10-22 Thread Boris Sagadin
Yes, turning wal_compression off improves things. Slave that was mentioned
unfortunately lagged too much before this setting was applied and was
turned off. However the remaining slave lags less now, although still
occasionally up to a few minutes. I think single threadedness of recovery
is a big slowdown for write heavy databases. Maybe an option to increase
wal_size beyond 16MB in v11 will help.

In the meantime we'll solve this by splitting the DB to 2 or 3 clusters or
maybe trying out some sharding solution like Citus.


Boris

On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin  wrote:

> Hello,
>
> I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x 1.9TB
> NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
> Ubuntu 16.04.
>
> Multi-tenant DB with about 4 tables, insert heavy.
>
> I started a new slave with identical HW specs, SR. DB started syncing from
> master, which took about 4 hours, then it started applying the WALs.
> However, it seems it can't catch up. Delay is still around 3 hours
> (measured with now() - pg_last_xact_replay_timestamp()), even a day
> later. It goes a few 100s up and down, but it seems to float around 3h mark.
>
> Disk IO is low at about 10%, measured with iostat, no connected clients,
> recovery process is at around 90% CPU single core usage.
>
> Tried tuning the various parameters, but with no avail. Only thing I found
> suspicious is stracing the recovery process constantly produces many errors
> such as:
>
> lseek(428, 0, SEEK_END) = 780124160
> lseek(30, 0, SEEK_END)  = 212992
> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
> unavailable)
> lseek(680, 0, SEEK_END) = 493117440
> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
> unavailable)
> lseek(774, 0, SEEK_END) = 583368704
>
> ...[snip]...
>
> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
> unavailable)
> lseek(774, 0, SEEK_END) = 583368704
> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
> unavailable)
> lseek(277, 0, SEEK_END) = 502882304
> lseek(6, 516096, SEEK_SET)  = 516096
> read(6, 
> "\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
> 8192) = 8192
> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
> unavailable)
> lseek(735, 0, SEEK_END) = 272809984
> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource temporarily
> unavailable)
> lseek(277, 0, SEEK_END) = 502882304
>
> ls -l fd/9
> lr-x-- 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]
>
>
> Perf top on recovery produces:
>
>  27.76%  postgres[.] pglz_decompress
>9.90%  [kernel][k] entry_SYSCALL_64_after_swapgs
>7.09%  postgres[.] hash_search_with_hash_value
>4.26%  libpthread-2.23.so  [.] llseek
>3.64%  libpthread-2.23.so  [.] __read_nocancel
>2.80%  [kernel][k] __fget_light
>2.67%  postgres[.] 0x0034d3ba
>1.85%  [kernel][k] ext4_llseek
>1.84%  postgres[.] pg_comp_crc32c_sse42
>1.44%  postgres[.] hash_any
>1.35%  postgres[.] 0x0036afad
>1.29%  postgres[.] MarkBufferDirty
>1.21%  postgres[.] XLogReadRecord
> [...]
>
> Tried changing the process limits with prlimit to unlimited, but no change.
>
> I can turn off the WAL compression but I doubt this is the main culprit.
> Any ideas appreciated.
>
> Regards,
> Boris
>
>


Re: Postgres 10, slave not catching up with master

2018-10-23 Thread Boris Sagadin
Nothing special, just:

standby_mode = 'on'
primary_conninfo = 'host=...  user=repmgr application_name=nodex'
recovery_target_timeline = 'latest'


Boris

On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas  wrote:

> Hi
>
> can share recovery.conf file settings??
>
> El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin (
> bo...@infosplet.com) escribió:
>
>> Yes, turning wal_compression off improves things. Slave that was
>> mentioned unfortunately lagged too much before this setting was applied and
>> was turned off. However the remaining slave lags less now, although still
>> occasionally up to a few minutes. I think single threadedness of recovery
>> is a big slowdown for write heavy databases. Maybe an option to increase
>> wal_size beyond 16MB in v11 will help.
>>
>> In the meantime we'll solve this by splitting the DB to 2 or 3 clusters
>> or maybe trying out some sharding solution like Citus.
>>
>>
>> Boris
>>
>> On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin 
>> wrote:
>>
>>> Hello,
>>>
>>> I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x
>>> 1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
>>> Ubuntu 16.04.
>>>
>>> Multi-tenant DB with about 4 tables, insert heavy.
>>>
>>> I started a new slave with identical HW specs, SR. DB started syncing
>>> from master, which took about 4 hours, then it started applying the WALs.
>>> However, it seems it can't catch up. Delay is still around 3 hours
>>> (measured with now() - pg_last_xact_replay_timestamp()), even a day
>>> later. It goes a few 100s up and down, but it seems to float around 3h mark.
>>>
>>> Disk IO is low at about 10%, measured with iostat, no connected clients,
>>> recovery process is at around 90% CPU single core usage.
>>>
>>> Tried tuning the various parameters, but with no avail. Only thing I
>>> found suspicious is stracing the recovery process constantly produces many
>>> errors such as:
>>>
>>> lseek(428, 0, SEEK_END) = 780124160
>>> lseek(30, 0, SEEK_END)  = 212992
>>> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> lseek(680, 0, SEEK_END) = 493117440
>>> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> lseek(774, 0, SEEK_END) = 583368704
>>>
>>> ...[snip]...
>>>
>>> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> lseek(774, 0, SEEK_END) = 583368704
>>> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> lseek(277, 0, SEEK_END) = 502882304
>>> lseek(6, 516096, SEEK_SET)  = 516096
>>> read(6, 
>>> "\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
>>> 8192) = 8192
>>> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> lseek(735, 0, SEEK_END) = 272809984
>>> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> lseek(277, 0, SEEK_END) = 502882304
>>>
>>> ls -l fd/9
>>> lr-x-- 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]
>>>
>>>
>>> Perf top on recovery produces:
>>>
>>>  27.76%  postgres[.] pglz_decompress
>>>9.90%  [kernel][k] entry_SYSCALL_64_after_swapgs
>>>7.09%  postgres[.] hash_search_with_hash_value
>>>4.26%  libpthread-2.23.so  [.] llseek
>>>3.64%  libpthread-2.23.so  [.] __read_nocancel
>>>2.80%  [kernel][k] __fget_light
>>>2.67%  postgres[.] 0x0034d3ba
>>>1.85%  [kernel][k] ext4_llseek
>>>1.84%  postgres[.] pg_comp_crc32c_sse42
>>>1.44%  postgres[.] hash_any
>>>1.35%  postgres[.] 0x0036afad
>>>1.29%  postgres[.] MarkBufferDirty
>>>1.21%  postgres[.] XLogReadRecord
>>> [...]
>>>
>>> Tried changing the process limits with prlimit to unlimited, but no
>>> change.
>>>
>>> I can turn off the WAL compression but I doubt this is the main culprit.
>>> Any ideas appreciated.
>>>
>>> Regards,
>>> Boris
>>>
>>>
>>
>
> --
> Cordialmente,
>
> Ing. Hellmuth I. Vargas S.
> Esp. Telemática y Negocios por Internet
> Oracle Database 10g Administrator Certified Associate
> EnterpriseDB Certified PostgreSQL 9.3 Associate
>
>


Re: Postgres 10, slave not catching up with master

2018-10-23 Thread Boris Sagadin
Yes, times are all identical, set to UTC, ntpd is used.

 log_delay
---
 15.788175

This is delay at this moment, but we graph replication delay and it's
fluctuating between 0 and 30s. Before I turned off wal compression, lag was
much bigger (0 to up to 8 minutes). We have lots of tables (40k) and many
upserts.


Boris

On Tue, Oct 23, 2018 at 8:24 PM, Hellmuth Vargas  wrote:

> Hi
>
> Both servers are configured with the same date, time and time
> configuration?
>
> El mar., 23 de oct. de 2018 a la(s) 13:16, Hellmuth Vargas (
> hiv...@gmail.com) escribió:
>
>> Hi
>>
>> which result you get from the following query:
>>
>> SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
>> THEN 0
>> ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
>> END AS log_delay;
>>
>> source:
>>
>> https://severalnines.com/blog/postgresql-streaming-replication-deep-dive
>>
>> El mar., 23 de oct. de 2018 a la(s) 11:28, Boris Sagadin (
>> bo...@infosplet.com) escribió:
>>
>>> Nothing special, just:
>>>
>>> standby_mode = 'on'
>>> primary_conninfo = 'host=...  user=repmgr application_name=nodex'
>>> recovery_target_timeline = 'latest'
>>>
>>>
>>> Boris
>>>
>>> On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas 
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> can share recovery.conf file settings??
>>>>
>>>> El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin (
>>>> bo...@infosplet.com) escribió:
>>>>
>>>>> Yes, turning wal_compression off improves things. Slave that was
>>>>> mentioned unfortunately lagged too much before this setting was applied 
>>>>> and
>>>>> was turned off. However the remaining slave lags less now, although still
>>>>> occasionally up to a few minutes. I think single threadedness of recovery
>>>>> is a big slowdown for write heavy databases. Maybe an option to increase
>>>>> wal_size beyond 16MB in v11 will help.
>>>>>
>>>>> In the meantime we'll solve this by splitting the DB to 2 or 3
>>>>> clusters or maybe trying out some sharding solution like Citus.
>>>>>
>>>>>
>>>>> Boris
>>>>>
>>>>> On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin 
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x
>>>>>> 1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, 
>>>>>> ext4,
>>>>>> Ubuntu 16.04.
>>>>>>
>>>>>> Multi-tenant DB with about 4 tables, insert heavy.
>>>>>>
>>>>>> I started a new slave with identical HW specs, SR. DB started syncing
>>>>>> from master, which took about 4 hours, then it started applying the WALs.
>>>>>> However, it seems it can't catch up. Delay is still around 3 hours
>>>>>> (measured with now() - pg_last_xact_replay_timestamp()), even a day
>>>>>> later. It goes a few 100s up and down, but it seems to float around 3h 
>>>>>> mark.
>>>>>>
>>>>>> Disk IO is low at about 10%, measured with iostat, no connected
>>>>>> clients, recovery process is at around 90% CPU single core usage.
>>>>>>
>>>>>> Tried tuning the various parameters, but with no avail. Only thing I
>>>>>> found suspicious is stracing the recovery process constantly produces 
>>>>>> many
>>>>>> errors such as:
>>>>>>
>>>>>> lseek(428, 0, SEEK_END) = 780124160
>>>>>> lseek(30, 0, SEEK_END)  = 212992
>>>>>> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource
>>>>>> temporarily unavailable)
>>>>>> lseek(680, 0, SEEK_END) = 493117440
>>>>>> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource
>>>>>> temporarily unavailable)
>>>>>> lseek(774, 0, SEEK_END) = 583368704
>>>>>>
>>>>>> ...[snip]...
>>>>>>
>>>>>> read(9, 0x7ffe4001f557, 1)  = -1 EAGAIN (Resource
>>>>>> temporar

Re: Postgres 10, slave not catching up with master

2018-10-24 Thread Boris Sagadin
Yes, as stated, the lag went very much down after disabling wal
compression, it's manageable at least. Network is 10GB.

Lep pozdrav,

*Boris Sagadin*
InfoSplet, informacijske tehnologije, d.o.o.
*www.infosplet.com* <http://www.infosplet.com/> | Tel: 0590 / 45 800, GSM:
041 / 337 848

On Wed, Oct 24, 2018 at 3:34 PM, Hellmuth Vargas  wrote:

> Hi
> El mié., 24 de oct. de 2018 a la(s) 00:39, Boris Sagadin (
> bo...@infosplet.com) escribió:
>
>> Yes, times are all identical, set to UTC, ntpd is used.
>>
>>  log_delay
>> ---
>>  15.788175
>>
>> This is delay at this moment, but we graph replication delay and it's
>> fluctuating between 0 and 30s.
>>
>
>
> But the fluctuation is between 0 and 30s!1, are not 4 hours fortunately.
> Apart from the theme wal compression I think you should check networks
>
>
>
>> Before I turned off wal compression, lag was much bigger (0 to up to 8
>> minutes). We have lots of tables (40k) and many upserts.
>>
>>
>> Boris
>>
>> On Tue, Oct 23, 2018 at 8:24 PM, Hellmuth Vargas 
>> wrote:
>>
>>> Hi
>>>
>>> Both servers are configured with the same date, time and time
>>> configuration?
>>>
>>> El mar., 23 de oct. de 2018 a la(s) 13:16, Hellmuth Vargas (
>>> hiv...@gmail.com) escribió:
>>>
>>>> Hi
>>>>
>>>> which result you get from the following query:
>>>>
>>>> SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
>>>> THEN 0
>>>> ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
>>>> END AS log_delay;
>>>>
>>>> source:
>>>>
>>>> https://severalnines.com/blog/postgresql-streaming-
>>>> replication-deep-dive
>>>>
>>>> El mar., 23 de oct. de 2018 a la(s) 11:28, Boris Sagadin (
>>>> bo...@infosplet.com) escribió:
>>>>
>>>>> Nothing special, just:
>>>>>
>>>>> standby_mode = 'on'
>>>>> primary_conninfo = 'host=...  user=repmgr application_name=nodex'
>>>>> recovery_target_timeline = 'latest'
>>>>>
>>>>>
>>>>> Boris
>>>>>
>>>>> On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas 
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> can share recovery.conf file settings??
>>>>>>
>>>>>> El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin (
>>>>>> bo...@infosplet.com) escribió:
>>>>>>
>>>>>>> Yes, turning wal_compression off improves things. Slave that was
>>>>>>> mentioned unfortunately lagged too much before this setting was applied 
>>>>>>> and
>>>>>>> was turned off. However the remaining slave lags less now, although 
>>>>>>> still
>>>>>>> occasionally up to a few minutes. I think single threadedness of 
>>>>>>> recovery
>>>>>>> is a big slowdown for write heavy databases. Maybe an option to increase
>>>>>>> wal_size beyond 16MB in v11 will help.
>>>>>>>
>>>>>>> In the meantime we'll solve this by splitting the DB to 2 or 3
>>>>>>> clusters or maybe trying out some sharding solution like Citus.
>>>>>>>
>>>>>>>
>>>>>>> Boris
>>>>>>>
>>>>>>> On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores,
>>>>>>>> 4x 1.9TB NVMe drive) AWS instance with about 5TB of disk space 
>>>>>>>> occupied,
>>>>>>>> ext4, Ubuntu 16.04.
>>>>>>>>
>>>>>>>> Multi-tenant DB with about 4 tables, insert heavy.
>>>>>>>>
>>>>>>>> I started a new slave with identical HW specs, SR. DB started
>>>>>>>> syncing from master, which took about 4 hours, then it started 
>>>>>>>> applying the
>>>>>>>> WALs. However, it seems it can't catch up. Delay is still around 3 
>>>>>>>> hours
>>

Logical replication very slow

2019-02-24 Thread Boris Sagadin
Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04,
NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at
good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with
2 indexes) and the speed is only about 2MB/s.

Checked disk util with iostat and only about 20% utilized on master, 15% on
target, CPU load on master is low. On slave I can see the "logical
replication worker" process is taking about 70% CPU time on a single core,
machine has 16 cores.

Is there a setting I am missing here? Any ideas appreciated.

Boris


Re: Logical replication very slow

2019-02-25 Thread Boris Sagadin
Doing an initial replica.

postgres 119454 93.5 25.9 34613692 32649656 ?   Rs   07:16  32:45  \_
postgres: 10/main: bgworker: logical replication worker for subscription
24783 sync 16500

I've cancelled the sync, set the tables to unlogged type and started it
again. I think it helped, still much slower than binary, but better, about
40MB/s. Will set them back to logged type after initial replica is done.

After the initial replica, there aren't that many updates, so it's OK then.
But if a need for a new slave arises, waiting a few days for initial
replica to finish, which in binary replication case is just hours, can be a
big problem for us.

Boris


On Mon, Feb 25, 2019 at 8:08 AM Achilleas Mantzios <
ach...@matrix.gatewaynet.com> wrote:

> On 25/2/19 8:52 π.μ., Boris Sagadin wrote:
>
> Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04,
> NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at
> good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with
> 2 indexes) and the speed is only about 2MB/s.
>
> Is logical replication subscriber in "streaming" state or in initial
> snapshot? What's the behavior after the initial snapshot, when it gets into
> streaming state?
>
> Checked disk util with iostat and only about 20% utilized on master, 15%
> on target, CPU load on master is low. On slave I can see the "logical
> replication worker" process is taking about 70% CPU time on a single core,
> machine has 16 cores.
>
> Is there a setting I am missing here? Any ideas appreciated.
>
> Boris
>
>
>
>
>
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>


Re: Logical replication very slow

2019-02-25 Thread Boris Sagadin
I think it should. I set it to unlogged on target/slave server only. One
other table which is much smaller and already replicated receives changes
from master.

About settings copy_data to false, nice idea, I'll try that too and compare
speed.



On Mon, Feb 25, 2019 at 9:51 AM Achilleas Mantzios <
ach...@matrix.gatewaynet.com> wrote:

> On 25/2/19 9:59 π.μ., Boris Sagadin wrote:
>
>
> Doing an initial replica.
>
> postgres 119454 93.5 25.9 34613692 32649656 ?   Rs   07:16  32:45  \_
> postgres: 10/main: bgworker: logical replication worker for subscription
> 24783 sync 16500
>
> I've cancelled the sync, set the tables to unlogged type and started it
> again. I think it helped, still much slower than binary, but better, about
> 40MB/s. Will set them back to logged type after initial replica is done.
>
> Will this work at all? Doesn't unlogged mean no changes are written to the
> WAL? What if you just copy by dump and then just add and refresh without 
> copy_data
> ?
>
>
> After the initial replica, there aren't that many updates, so it's OK
> then. But if a need for a new slave arises, waiting a few days for initial
> replica to finish, which in binary replication case is just hours, can be a
> big problem for us.
>
> Boris
>
>
> On Mon, Feb 25, 2019 at 8:08 AM Achilleas Mantzios <
> ach...@matrix.gatewaynet.com> wrote:
>
>> On 25/2/19 8:52 π.μ., Boris Sagadin wrote:
>>
>> Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04,
>> NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at
>> good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with
>> 2 indexes) and the speed is only about 2MB/s.
>>
>> Is logical replication subscriber in "streaming" state or in initial
>> snapshot? What's the behavior after the initial snapshot, when it gets into
>> streaming state?
>>
>> Checked disk util with iostat and only about 20% utilized on master, 15%
>> on target, CPU load on master is low. On slave I can see the "logical
>> replication worker" process is taking about 70% CPU time on a single core,
>> machine has 16 cores.
>>
>> Is there a setting I am missing here? Any ideas appreciated.
>>
>> Boris
>>
>>
>>
>>
>>
>> --
>> Achilleas Mantzios
>> IT DEV Lead
>> IT DEPT
>> Dynacom Tankers Mgmt
>>
>>
>
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>


Slave server sometimes locks up

2019-03-05 Thread Boris Sagadin
PgSQL 10.7, Ubuntu 16.04 LTS

Symptoms:

- server accepts new queries until connections exhausted (all queries are
SELECT)
- queries are active, never end, but no disk IO
- queries can't be killed with kill -TERM or pg_terminate_backend()
- system load is minimal (vmstat shows 100% idle)
- perf top shows nothing
- statement_timeout is ignored
- no locks with SELECT relation::regclass, * FROM pg_locks WHERE NOT
GRANTED;
- server exits only on kill -9
- strace on SELECT process indefinitely shows:

futex(0x7f00fe94c938, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL,
^Cstrace: Process 121319 detached
 

GDB backtrace:

(gdb) bt
#0  0x7f05256f1827 in futex_abstimed_wait_cancelable (private=128,
abstime=0x0, expected=0, futex_word=0x7f00fe94ba38) at
../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x7f00fe94ba38, abstime=0x0) at
sem_waitcommon.c:111
#2  0x7f05256f18d4 in __new_sem_wait_slow (sem=0x7f00fe94ba38,
abstime=0x0) at sem_waitcommon.c:181
#3  0x7f05256f197a in __new_sem_wait (sem=sem@entry=0x7f00fe94ba38) at
sem_wait.c:29
#4  0x55c9b95eb792 in PGSemaphoreLock (sema=0x7f00fe94ba38) at
pg_sema.c:316
#5  0x55c9b965eaec in LWLockAcquire (lock=0x7f00fe96f880,
mode=mode@entry=LW_EXCLUSIVE) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/storage/lmgr/lwlock.c:1233
#6  0x55c9b96497f7 in dsm_create (size=size@entry=105544,
flags=flags@entry=1) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/storage/ipc/dsm.c:493
#7  0x55c9b94139ff in InitializeParallelDSM
(pcxt=pcxt@entry=0x55c9bb8d9d58)
at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/access/transam/parallel.c:268
#8  0x55c9b9538187 in ExecInitParallelPlan (planstate=0x55c9bb949178,
estate=0x55c9bb948568, nworkers=1) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/execParallel.c:470
#9  0x55c9b9547d20 in ExecGather (pstate=0x55c9bb948e58) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeGather.c:158
#10 0x55c9b95543d5 in ExecProcNode (node=0x55c9bb948e58) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#11 ExecNestLoop (pstate=0x55c9bb948c68) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeNestloop.c:109
#12 0x55c9b9556b56 in ExecProcNode (node=0x55c9bb948c68) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#13 ExecSort (pstate=0x55c9bb9489f8) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeSort.c:106
#14 0x55c9b9556e57 in ExecProcNode (node=0x55c9bb9489f8) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#15 ExecUnique (pstate=0x55c9bb948808) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeUnique.c:73
#16 0x55c9b9533fcb in ExecProcNode (node=0x55c9bb948808) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#17 ExecutePlan (execute_once=, dest=0x55c9bb8a9328,
direction=, numberTuples=0, sendTuples=,
operation=CMD_SELECT, use_parallel_mode=,
planstate=0x55c9bb948808, estate=0x55c9bb948568)
at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/execMain.c:1723
#18 standard_ExecutorRun (queryDesc=0x55c9bb98ba78, direction=, count=0, execute_once=) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/execMain.c:364
#19 0x55c9b9670d6b in PortalRunSelect (portal=portal@entry=0x55c9bb8f1d68,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x55c9bb8a9328)
at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/pquery.c:932
#20 0x55c9b9672380 in PortalRun (portal=portal@entry=0x55c9bb8f1d68,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
'\001', run_once=, dest=dest@entry=0x55c9bb8a9328,
altdest=altdest@entry=0x55c9bb8a9328,
completionTag=0x7ffdde0387c0 "") at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/pquery.c:773
#21 0x55c9b966eef9 in exec_execute_message
(max_rows=9223372036854775807, portal_name=0x55c9bb8a8f18 "") at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/postgres.c:2007
#22 PostgresMain (argc=, argv=argv@entry=0x55c9bb845328,
dbname=0x55c9bb7ec9e8 "prod", username=) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/postgres.c:4180
#23 0x55c9b93a46b9 in BackendRun (port=0x55c9bb840ce0) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/postmaster/postmaster.c:4405
#24 BackendStartup (port=0x55c9bb840ce0) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/postmaster/postmaster.c:4077
#25 ServerLoop () at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/ba

Re: Slave server sometimes locks up

2019-03-05 Thread Boris Sagadin
Hi Thomas,

thank you for your quick reply, much appreciated.

1. "dsa_area could not attach to segment": dsm.c, fixed in commit 6c0fb941.
> 2. "cannot unpin a segment that is not pinned": dsm.c, fixed in commit
> 0b55aaac.
>
>
Yes, I found both entries in our logs, each once per incident.


> The fix will be included in the next set of releases, but in the
> meantime you could consider turning off parallel query (set
> max_parallel_workers_per_gather = 0).  In practice I think you could
> also avoid this problem by loading a library that calls something like
> srandom(getpid()) in _PG_init() (so it runs in every parallel worker
> making ID collisions extremely unlikely), but that's not really a
> serious recommendation since it requires writing C code.
>
>
This is very reassuring, as I was slowly starting to doubt my DBA
competence here. :)
I'll try this workaround until the next release is out, will report if
anything out of ordinary re-occurs.

Boris


pg_restore mostly idle on restoring a large number of tables

2023-07-13 Thread Boris Sagadin
Hi,

restoring a 1.5TB database with about 800k tables on i3.4xlarge AWS
instace, PgSQL V12.15 on Ubuntu.

Running pg_restore with -j 16, I noticed the pg_restore is busy for an hour
or so with IO at 80%+ and then most of processes start idling and only a
few doing some work, disk IO at 1-2%, pg_stat_activity is mostly idle, same
goes for CPU, and this state proceeds for further 6 hours, disk space
increases very slowly.

I thought because of a lot of small tables, number of workers should be
increased to increase parallel efficiency, so I tried with -j 128. The
situation was somewhat better, but most of the workers start idling, again
disk IO lowers to about 4% util, CPU util goes to about 4%, too.

Stracing workers produces the perpetual read call on most pg_restore
workers:

# strace -p 59567
strace: Process 59567 attached
read(3,

With only about 10 or so (out of 128) workers doing some actual work:

strace -p 59367 -e sendto
strace: Process 59367 attached
sendto(4, "Q\0\0\0\263ALTER TABLE ONLY raw.space0"..., 180, MSG_NOSIGNAL,
NULL, 0) = 180
sendto(4, "Q\0\0\0.TRUNCATE TABLE ONLY raw.spa"..., 47, MSG_NOSIGNAL, NULL,
0) = 47
sendto(4, "Q\0\0\0\205COPY raw.space000448117 (da"..., 134, MSG_NOSIGNAL,
NULL, 0) = sendto(4, "Q\0\0\0\vCOMMIT\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
sendto(4, "Q\0\0\0pCREATE INDEX idx_space00044"..., 113, MSG_NOSIGNAL,
NULL, 0) = 113
sendto(4, "Q\0\0\0\263ALTER TABLE ONLY raw.space0"..., 180, MSG_NOSIGNAL,
NULL, 0) = 180
...
.. some lines omitted

I would think that all workers would proceed with creating indexes and
doing some useful work until the restore is finished completely?

Most of the tables are very small, 2 indexes per table and without any
foreign references etc., we have a multi tenant environment.

Thanks,
Boris