How to prevent POSTGRES killing linux system from accepting too much inserts?

2019-12-18 Thread James(王旭)
Hello,I encountered into this kernel message, and I cannot login into the Linux 
system anymore:

Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1Dec 17 23:02:30 hq-pg 
kernel: INFO: task sync:6573 blocked for more than 120 seconds.Dec 17 23:02:30 
hq-pg kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.Dec 17 23:02:30 hq-pg kernel: sync        
    D 965ebabd1040     0  6573  
 6572 0x0080Dec 17 23:02:30 hq-pg kernel: Call Trace:Dec 17 23:02:30 
hq-pg kernel: [

Re: shared memory size during upgrade pgsql with partitions (max_locks_per_transaction)

2019-12-18 Thread Piotr Włodarczyk
>> On Tue, Dec 17, 2019 at 08:03:41PM +, Piotr Włodarczyk wrote:
>> Currently we're working on PSQL 11.5 and we're trying upgrade to 12.1.
>> 
>> During that we have a problem:
>> 
>> command: "/usr/pgsql-12/bin/pg_dump" --host /cluster/postgresql --port
50432
>> --username postgres --schema-only --quote-all-identifiers
--binary-upgrade
>> --format=custom  --file="pg_upgrade_dump_281535902.custom"
'dbname=sprint'
>> >> "pg_upgrade_dump_281535902.log" 2> &1
>> pg_dump: error: query failed: ERROR:  out of shared memory
>> HINT:  You might need to increase max_locks_per_transaction.
>> pg_dump: error: query was: LOCK TABLE
>> "some_schemaa"."table_part_8000_2018q3" IN ACCESS SHARE MODE
>> 
>> On current instance we have about one thousand of partitions, partitioned
in
>> two levels: first by id_product, and second level by quarter of the year,
as
>> you can see on above log.
>> 
>> How have we to calculate shared memory, and (eventually
>> max_locks_per_transaction) to be fit to the limits during upgrade? 
> 
> 
> 
> Great question.  Clearly, if you can run that (or similar) pg_dump
command,
> then you can pg_upgrade.  I think you could also do pg_upgrade --check,

pg_upgrade --check doesn't prompt any error or warning 

> 
> 
> 
> The query looks like
>   FROM pg_class c...
>   WHERE c.relkind in ('%c', '%c', '%c', '%c', '%c', '%c',
'%c') "
> 
> 
> 
> ..and then does:
> 
> 
> 
>if (tblinfo[i].dobj.dump &&
>(tblinfo[i].relkind == RELKIND_RELATION ||
> tblinfo-> relkind == RELKIND_PARTITIONED_TABLE) &&
>(tblinfo[i].dobj.dump &
DUMP_COMPONENTS_REQUIRING_LOCK))
>{
>resetPQExpBuffer(query);
>appendPQExpBuffer(query,
>  "LOCK TABLE %s IN
ACCESS SHARE MODE",
>
fmtQualifiedDumpable(&tblinfo[i]));
>ExecuteSqlStatement(fout, query-> data);
>}
> 
> 
> 
> ..then filters by -N/-n/-t/-T (which doesn't apply to pg_upgrade):
>selectDumpableTable(&tblinfo[i], fout);
> 
> 
> 
> So it looks like COUNT(1) FROM pg_class WHERE relkind IN ('r','p') should
do it.
> 
> 
> 
> But actually, during pg_upgrade, since nothing else is running, you
actually
> have max_connections*max_locks_per_transaction total locks.
> 
> 
> 
> Said differently, I think you could set max_locks_per_transaction to:
> SELECT (SELECT COUNT(1) FROM pg_class WHERE relkind IN
('r','p'))/current_setting('max_connections')::int;
> 
> 
> 
> ..probably with a fudge factor of +10 for any system process (and due to
> integer truncation).
> 
> 
> 
> Someone might say that pg_upgrade or pg_dump could check for that
specifically..

Yes, and temporarily increase, or HINT how to calculate proper value.

> 
> 
> 
> Justin
> 
> 

We realized that the problem is with pg_dump doing during pg_upgreade.

Now we're after upgrade and we can't check Yours calculation. We simply
increased max_connections until migration passed :) 

I'll try to check it on empty, fake database. 


smime.p7s
Description: S/MIME cryptographic signature


Re: Consecutive Query Executions with Increasing Execution Time

2019-12-18 Thread Laurenz Albe
On Tue, 2019-12-17 at 11:11 -0500, Jeff Janes wrote:
> On Tue, Dec 17, 2019 at 8:08 AM Laurenz Albe  wrote:
> > On Mon, 2019-12-16 at 15:50 -0500, Tom Lane wrote:
> > > Peter Geoghegan  writes:
> > > > Why do the first and the twentieth executions of the query have almost
> > > > identical "buffers shared/read" numbers? That seems odd.
> > > 
> > > It's repeat execution of the same query, so that doesn't seem odd to me.
> > 
> > Really?  Shouldn't the blocks be in shared buffers after a couple
> > of executions?
> 
> If it is doing a seq scan (I don't know if it is) they intentionally use a
> small ring buffer to, so they evict their own recently used blocks, rather
> than evicting other people's blocks.  So these blocks won't build up in
> shared_buffers very rapidly just on the basis of repeated seq scans.

Sure, but according to the execution plans it is doing a Parallel Index Only 
Scan.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Consecutive Query Executions with Increasing Execution Time

2019-12-18 Thread Tom Lane
Laurenz Albe  writes:
> On Tue, 2019-12-17 at 11:11 -0500, Jeff Janes wrote:
>> If it is doing a seq scan (I don't know if it is) they intentionally use a
>> small ring buffer to, so they evict their own recently used blocks, rather
>> than evicting other people's blocks.  So these blocks won't build up in
>> shared_buffers very rapidly just on the basis of repeated seq scans.

> Sure, but according to the execution plans it is doing a Parallel Index Only 
> Scan.

Nonetheless, the presented test case consists of repeatedly doing
the same query, in a fresh session each time.  If there's not other
activity then this should reach some sort of steady state.  The
table is apparently fairly large, so I don't find it surprising
that the steady state fails to be 100% cached.

regards, tom lane




Re: How to prevent POSTGRES killing linux system from accepting too much inserts?

2019-12-18 Thread Merlin Moncure
On Wed, Dec 18, 2019 at 3:53 AM James(王旭)  wrote:
>
> Hello,
>>
>> I encountered into this kernel message, and I cannot login into the Linux 
>> system anymore:
>>
>>
>>
>>> Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1
>>>
>>> Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than 
>>> 120 seconds.
>>>
>>> Dec 17 23:02:30 hq-pg kernel: "echo 0 > 
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>
>>> Dec 17 23:02:30 hq-pg kernel: syncD 965ebabd1040 0  
>>> 6573   6572 0x0080
>>>
>>> Dec 17 23:02:30 hq-pg kernel: Call Trace:
>>>
>>> Dec 17 23:02:30 hq-pg kernel: [] ? 
>>> generic_write_sync+0x70/0x70
>>
>>
>> After some google I guess it's the problem that IO speed is low, while the 
>> insert requests are coming too much quickly.So PG put these into cache first 
>> then kernel called sync.
>>
>> I know I can queue the requests, so that POSTGRES will not accept these 
>> requests which will result in an increase in system cache.
>>
>> But is there any way I can tell POSTGRES, that you can only handle 2 
>> records per second, or 4M per second, please don't accept inserts more than 
>> that speed.
>>
>> For me, POSTGRES just waiting is much better than current behavior.
>>
>>
>> Any help will be much appreciated.

This is more a problem with the o/s than with postgres itself.

synchronous_commit is one influential parameter that can possibly help
mitigate the issue with some safety tradeoffs (read the docs).   For
linux, one possible place to look is tuning dirty_background_ratio and
related parameters.  The idea is you want the o/s to be more
aggressive about syncing to reduce the impact of i/o storm; basically
you are trading off some burst performance for consistency of
performance.  Another place to look is checkpoint behavior.   Do some
searches, there is tons of information about this on the net.

merlin




Re: How to prevent POSTGRES killing linux system from accepting too much inserts?

2019-12-18 Thread Jeff Janes
On Wed, Dec 18, 2019 at 4:53 AM James(王旭)  wrote:

> Hello,
>>
>> I encountered into this kernel message, and I cannot login into the Linux
>> system anymore:
>
>
>>
>> Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1
>>
>> Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than
>>> 120 seconds.
>>
>> Dec 17 23:02:30 hq-pg kernel: "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>
>> Dec 17 23:02:30 hq-pg kernel: syncD 965ebabd1040 0
>>> 6573   6572 0x0080
>>
>> Dec 17 23:02:30 hq-pg kernel: Call Trace:
>>
>> Dec 17 23:02:30 hq-pg kernel: [] ?
>>> generic_write_sync+0x70/0x70
>>
>>
>> After some google I guess it's the problem that IO speed is low, while
>> the insert requests are coming too much quickly.So PG put these into cache
>> first then kernel called sync
>
>
Could you expand on what you found in the googling, with links?  I've never
seen these in my kernel log, and I don't know what they mean other than the
obvious that it is something to do with IO.  Also, what kernel and file
system are you using?


> .
>
> I know I can queue the requests, so that POSTGRES will not accept these
>> requests which will result in an increase in system cache.
>
> But is there any way I can tell POSTGRES, that you can only handle 2
>> records per second, or 4M per second, please don't accept inserts more than
>> that speed.
>
> For me, POSTGRES just waiting is much better than current behavior.
>
>
I don't believe there is a setting from within PostgreSQL to do this.

There was a proposal for a throttle on WAL generation back in February, but
with no recent discussion or (visible) progress:

https://www.postgresql.org/message-id/flat/2B42AB02-03FC-406B-B92B-18DED2D8D491%40anarazel.de#b63131617e84d3a0ac29da956e6b8c5f


I think the real answer here to get a better IO system, or maybe a better
kernel.  Otherwise, once you find a painful workaround for one symptom you
will just smack into another one.

Cheers,

Jeff

>


Re: How to prevent POSTGRES killing linux system from accepting too much inserts?

2019-12-18 Thread Osahon Oduware
THE TRUTH CANNOT BE HIDDEN
**Explosion in my car.
https://www.docdroid.net/s11XHOS/the-truth-cannot-be-hidden.pdf

On Wed, 18 Dec 2019, 10:54 James(王旭),  wrote:

> Hello,
>>
>> I encountered into this kernel message, and I cannot login into the Linux
>> system anymore:
>
>
>>
>> Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1
>>
>> Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than
>>> 120 seconds.
>>
>> Dec 17 23:02:30 hq-pg kernel: "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>
>> Dec 17 23:02:30 hq-pg kernel: syncD 965ebabd1040 0
>>> 6573   6572 0x0080
>>
>> Dec 17 23:02:30 hq-pg kernel: Call Trace:
>>
>> Dec 17 23:02:30 hq-pg kernel: [] ?
>>> generic_write_sync+0x70/0x70
>>
>>
>> After some google I guess it's the problem that IO speed is low, while
>> the insert requests are coming too much quickly.So PG put these into cache
>> first then kernel called sync.
>
> I know I can queue the requests, so that POSTGRES will not accept these
>> requests which will result in an increase in system cache.
>
> But is there any way I can tell POSTGRES, that you can only handle 2
>> records per second, or 4M per second, please don't accept inserts more than
>> that speed.
>
> For me, POSTGRES just waiting is much better than current behavior.
>
>
>> Any help will be much appreciated.
>
>
>>
>> Thanks,
>
> James
>
>
>>