How to prevent POSTGRES killing linux system from accepting too much inserts?
Hello,I encountered into this kernel message, and I cannot login into the Linux system anymore: Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than 120 seconds.Dec 17 23:02:30 hq-pg kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Dec 17 23:02:30 hq-pg kernel: sync D 965ebabd1040 0 6573 6572 0x0080Dec 17 23:02:30 hq-pg kernel: Call Trace:Dec 17 23:02:30 hq-pg kernel: [
Re: shared memory size during upgrade pgsql with partitions (max_locks_per_transaction)
>> On Tue, Dec 17, 2019 at 08:03:41PM +, Piotr Włodarczyk wrote:
>> Currently we're working on PSQL 11.5 and we're trying upgrade to 12.1.
>>
>> During that we have a problem:
>>
>> command: "/usr/pgsql-12/bin/pg_dump" --host /cluster/postgresql --port
50432
>> --username postgres --schema-only --quote-all-identifiers
--binary-upgrade
>> --format=custom --file="pg_upgrade_dump_281535902.custom"
'dbname=sprint'
>> >> "pg_upgrade_dump_281535902.log" 2> &1
>> pg_dump: error: query failed: ERROR: out of shared memory
>> HINT: You might need to increase max_locks_per_transaction.
>> pg_dump: error: query was: LOCK TABLE
>> "some_schemaa"."table_part_8000_2018q3" IN ACCESS SHARE MODE
>>
>> On current instance we have about one thousand of partitions, partitioned
in
>> two levels: first by id_product, and second level by quarter of the year,
as
>> you can see on above log.
>>
>> How have we to calculate shared memory, and (eventually
>> max_locks_per_transaction) to be fit to the limits during upgrade?
>
>
>
> Great question. Clearly, if you can run that (or similar) pg_dump
command,
> then you can pg_upgrade. I think you could also do pg_upgrade --check,
pg_upgrade --check doesn't prompt any error or warning
>
>
>
> The query looks like
> FROM pg_class c...
> WHERE c.relkind in ('%c', '%c', '%c', '%c', '%c', '%c',
'%c') "
>
>
>
> ..and then does:
>
>
>
>if (tblinfo[i].dobj.dump &&
>(tblinfo[i].relkind == RELKIND_RELATION ||
> tblinfo-> relkind == RELKIND_PARTITIONED_TABLE) &&
>(tblinfo[i].dobj.dump &
DUMP_COMPONENTS_REQUIRING_LOCK))
>{
>resetPQExpBuffer(query);
>appendPQExpBuffer(query,
> "LOCK TABLE %s IN
ACCESS SHARE MODE",
>
fmtQualifiedDumpable(&tblinfo[i]));
>ExecuteSqlStatement(fout, query-> data);
>}
>
>
>
> ..then filters by -N/-n/-t/-T (which doesn't apply to pg_upgrade):
>selectDumpableTable(&tblinfo[i], fout);
>
>
>
> So it looks like COUNT(1) FROM pg_class WHERE relkind IN ('r','p') should
do it.
>
>
>
> But actually, during pg_upgrade, since nothing else is running, you
actually
> have max_connections*max_locks_per_transaction total locks.
>
>
>
> Said differently, I think you could set max_locks_per_transaction to:
> SELECT (SELECT COUNT(1) FROM pg_class WHERE relkind IN
('r','p'))/current_setting('max_connections')::int;
>
>
>
> ..probably with a fudge factor of +10 for any system process (and due to
> integer truncation).
>
>
>
> Someone might say that pg_upgrade or pg_dump could check for that
specifically..
Yes, and temporarily increase, or HINT how to calculate proper value.
>
>
>
> Justin
>
>
We realized that the problem is with pg_dump doing during pg_upgreade.
Now we're after upgrade and we can't check Yours calculation. We simply
increased max_connections until migration passed :)
I'll try to check it on empty, fake database.
smime.p7s
Description: S/MIME cryptographic signature
Re: Consecutive Query Executions with Increasing Execution Time
On Tue, 2019-12-17 at 11:11 -0500, Jeff Janes wrote: > On Tue, Dec 17, 2019 at 8:08 AM Laurenz Albe wrote: > > On Mon, 2019-12-16 at 15:50 -0500, Tom Lane wrote: > > > Peter Geoghegan writes: > > > > Why do the first and the twentieth executions of the query have almost > > > > identical "buffers shared/read" numbers? That seems odd. > > > > > > It's repeat execution of the same query, so that doesn't seem odd to me. > > > > Really? Shouldn't the blocks be in shared buffers after a couple > > of executions? > > If it is doing a seq scan (I don't know if it is) they intentionally use a > small ring buffer to, so they evict their own recently used blocks, rather > than evicting other people's blocks. So these blocks won't build up in > shared_buffers very rapidly just on the basis of repeated seq scans. Sure, but according to the execution plans it is doing a Parallel Index Only Scan. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: Consecutive Query Executions with Increasing Execution Time
Laurenz Albe writes: > On Tue, 2019-12-17 at 11:11 -0500, Jeff Janes wrote: >> If it is doing a seq scan (I don't know if it is) they intentionally use a >> small ring buffer to, so they evict their own recently used blocks, rather >> than evicting other people's blocks. So these blocks won't build up in >> shared_buffers very rapidly just on the basis of repeated seq scans. > Sure, but according to the execution plans it is doing a Parallel Index Only > Scan. Nonetheless, the presented test case consists of repeatedly doing the same query, in a fresh session each time. If there's not other activity then this should reach some sort of steady state. The table is apparently fairly large, so I don't find it surprising that the steady state fails to be 100% cached. regards, tom lane
Re: How to prevent POSTGRES killing linux system from accepting too much inserts?
On Wed, Dec 18, 2019 at 3:53 AM James(王旭) wrote: > > Hello, >> >> I encountered into this kernel message, and I cannot login into the Linux >> system anymore: >> >> >> >>> Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1 >>> >>> Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than >>> 120 seconds. >>> >>> Dec 17 23:02:30 hq-pg kernel: "echo 0 > >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> >>> Dec 17 23:02:30 hq-pg kernel: syncD 965ebabd1040 0 >>> 6573 6572 0x0080 >>> >>> Dec 17 23:02:30 hq-pg kernel: Call Trace: >>> >>> Dec 17 23:02:30 hq-pg kernel: [] ? >>> generic_write_sync+0x70/0x70 >> >> >> After some google I guess it's the problem that IO speed is low, while the >> insert requests are coming too much quickly.So PG put these into cache first >> then kernel called sync. >> >> I know I can queue the requests, so that POSTGRES will not accept these >> requests which will result in an increase in system cache. >> >> But is there any way I can tell POSTGRES, that you can only handle 2 >> records per second, or 4M per second, please don't accept inserts more than >> that speed. >> >> For me, POSTGRES just waiting is much better than current behavior. >> >> >> Any help will be much appreciated. This is more a problem with the o/s than with postgres itself. synchronous_commit is one influential parameter that can possibly help mitigate the issue with some safety tradeoffs (read the docs). For linux, one possible place to look is tuning dirty_background_ratio and related parameters. The idea is you want the o/s to be more aggressive about syncing to reduce the impact of i/o storm; basically you are trading off some burst performance for consistency of performance. Another place to look is checkpoint behavior. Do some searches, there is tons of information about this on the net. merlin
Re: How to prevent POSTGRES killing linux system from accepting too much inserts?
On Wed, Dec 18, 2019 at 4:53 AM James(王旭) wrote: > Hello, >> >> I encountered into this kernel message, and I cannot login into the Linux >> system anymore: > > >> >> Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1 >> >> Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than >>> 120 seconds. >> >> Dec 17 23:02:30 hq-pg kernel: "echo 0 > >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> >> Dec 17 23:02:30 hq-pg kernel: syncD 965ebabd1040 0 >>> 6573 6572 0x0080 >> >> Dec 17 23:02:30 hq-pg kernel: Call Trace: >> >> Dec 17 23:02:30 hq-pg kernel: [] ? >>> generic_write_sync+0x70/0x70 >> >> >> After some google I guess it's the problem that IO speed is low, while >> the insert requests are coming too much quickly.So PG put these into cache >> first then kernel called sync > > Could you expand on what you found in the googling, with links? I've never seen these in my kernel log, and I don't know what they mean other than the obvious that it is something to do with IO. Also, what kernel and file system are you using? > . > > I know I can queue the requests, so that POSTGRES will not accept these >> requests which will result in an increase in system cache. > > But is there any way I can tell POSTGRES, that you can only handle 2 >> records per second, or 4M per second, please don't accept inserts more than >> that speed. > > For me, POSTGRES just waiting is much better than current behavior. > > I don't believe there is a setting from within PostgreSQL to do this. There was a proposal for a throttle on WAL generation back in February, but with no recent discussion or (visible) progress: https://www.postgresql.org/message-id/flat/2B42AB02-03FC-406B-B92B-18DED2D8D491%40anarazel.de#b63131617e84d3a0ac29da956e6b8c5f I think the real answer here to get a better IO system, or maybe a better kernel. Otherwise, once you find a painful workaround for one symptom you will just smack into another one. Cheers, Jeff >
Re: How to prevent POSTGRES killing linux system from accepting too much inserts?
THE TRUTH CANNOT BE HIDDEN **Explosion in my car. https://www.docdroid.net/s11XHOS/the-truth-cannot-be-hidden.pdf On Wed, 18 Dec 2019, 10:54 James(王旭), wrote: > Hello, >> >> I encountered into this kernel message, and I cannot login into the Linux >> system anymore: > > >> >> Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1 >> >> Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than >>> 120 seconds. >> >> Dec 17 23:02:30 hq-pg kernel: "echo 0 > >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> >> Dec 17 23:02:30 hq-pg kernel: syncD 965ebabd1040 0 >>> 6573 6572 0x0080 >> >> Dec 17 23:02:30 hq-pg kernel: Call Trace: >> >> Dec 17 23:02:30 hq-pg kernel: [] ? >>> generic_write_sync+0x70/0x70 >> >> >> After some google I guess it's the problem that IO speed is low, while >> the insert requests are coming too much quickly.So PG put these into cache >> first then kernel called sync. > > I know I can queue the requests, so that POSTGRES will not accept these >> requests which will result in an increase in system cache. > > But is there any way I can tell POSTGRES, that you can only handle 2 >> records per second, or 4M per second, please don't accept inserts more than >> that speed. > > For me, POSTGRES just waiting is much better than current behavior. > > >> Any help will be much appreciated. > > >> >> Thanks, > > James > > >>
