Re: Too many .history file in pg_xlog takes lots of space

2018-03-15 Thread 彭昱傑
Hi  Michael, Alvaro, Tom:

Really appreciate yours help, this is an invalid report, and I'm sorry for
that.

After I  examine restart script, I found we generate recovery.conf every
time, and this cause lost of timeline.

Thanks.

2018-03-14 23:29 GMT+08:00 Tom Lane :

> Alvaro Herrera  writes:
> > 彭昱傑 wrote:
> >> Every time I restart postgre server, it generates a new history file:
>
> > That's strange -- it shouldn't happen ... sounds like you're causing a
> > crash each time you restart.  Are you using immediate mode in shutdown
> > maybe?  If so, don't; use fast mode instead.
>
> I'm confused by this report too.  Plain crashes shouldn't result in
> forking a new timeline.  To check, I tried "-m immediate", as well as
> "kill -9 postmaster", and neither of those resulted in a new .history file
> on restart.  I wonder if the OP's restart process involves calling
> pg_resetxlog or something like that (which would be risky as heck).
>
> regards, tom lane
>


RE: PG 9.6 Slow inserts with long-lasting LWLocks

2018-03-15 Thread Pavel Suderevsky
Hi, 

Well, unfortunately I still need community help.

-- Environment
OS: Centos CentOS Linux release 7.2.1511
Kernel:  3.10.0-327.36.3.el7.x86_64
PostgreSQL: 9.6.3
-- Hardware
Server: Dell PowerEdge R430
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Raid controller: PERC H730 Mini (1GB cache)
Disks: 8 x 10K RPM SAS 12GB/s 2.5 (ST1200MM0088) in RAID 6
RAM: 192GB (M393A2G40DB0-CPB x 16)
For more detailed hardware info please see attached configuration.txt
-- postgresql.conf
max_connections = 2048
shared_buffers = 48GB
temp_buffers = 128MB
work_mem = 256MB
maintenance_work_mem = 512MB
dynamic_shared_memory_type = posix
wal_level = hot_standby
min_wal_size = 4GB
max_wal_size = 32GB
huge_pages = on
+
numactl interleave=all
-- sysctl.conf 
kernel.shmmax=64424509440
kernel.shmall=4294967296
kernel.sem = 1024 32767 128 16384
fs.aio-max-nr=3145728
fs.file-max = 6815744
net.core.rmem_default=262144
net.core.rmem_max=4194304
net.core.wmem_default=262144
net.core.wmem_max=1048586
vm.nr_hugepages=33000
vm.dirty_background_bytes=67108864
vm.dirty_bytes=536870912
vm.min_free_kbytes=1048576
zone_reclaim_mode=0

Again: problem is the occasional long inserts that can happen 1-5 times per day 
on OLTP system.
No autovacuum performed during long inserts. WAL rate is 1-2Gb per hour, no 
correlation spotted with this issue.
Wait event "buffer_mapping" happen for appropriate transactions but not every 
time (maybe just not every time catched).
I have two suspects for such behaviour: I/O system and high concurrency.
There is a problem with one application that frequently recreates up to 90 
sessions but investigation shows that there is no direct correlation between 
such sessions and long transactions, at least it is not the root cause of the 
issue (of course such app behaviour will be fixed).

The investigation and tracing with strace in particular showed that:
1. The only long event straced from postgres backends was <... semop resumed>.
2. Seems the whole host gets hung during such events. 

Example:
Java application located on separate host reports several long transactions:
123336.943 - [1239588mks]: event.insert-table
123336.943 - [1240827mks]: event.insert-table
123337.019 - [1292534mks]: event.insert-table
143353.542 - [5467657mks]: event.insert-table
143353.543 - [5468884mks]: event.insert-table
152338.763 - [1264588mks]: event.insert-table
152338.765 - [2054887mks]: event.insert-table

Strace output for event happened at 14:33 with particular known pid:
119971 14:33:48.075375 epoll_wait(3,  
119971 14:33:48.075696 <... epoll_wait resumed> {{EPOLLIN, {u32=27532016, 
u64=27532016}}}, 1, -1) = 1 <0.000313>
119971 14:33:48.075792 recvfrom(9,  
119971 14:33:48.075866 <... recvfrom resumed> 
"B\0\0\3\27\0S_21\0\0*\0\1\0\1\0\1\0\0\0\0\0\1\0\1\0\0\0\0\0"..., 8192, 0, 
NULL, NULL) = 807 <0.66>
119971 14:33:48.076243 semop(26706044, {{8, -1, 0}}, 1 
120019 14:33:48.119971 recvfrom(9,  
119971 14:33:53.491029 <... semop resumed> ) = 0 <5.414772> 
119971 14:33:53.500356 lseek(18, 0, SEEK_END 
119971 14:33:53.500436 <... lseek resumed> ) = 107790336 <0.72>
119971 14:33:53.500514 lseek(20, 0, SEEK_END 

Checking strace long semop calls for whole day:
root@host [20180314 17:47:36]:/home/user$ egrep " <[1-9]." /tmp/strace | grep 
semop
119991 12:33:36 <... semop resumed> )   = 0 <1.419394>
119942 12:33:36 <... semop resumed> )   = 0 <1.422554>
119930 12:33:36 <... semop resumed> )   = 0 <1.414916>
119988 12:33:36 <... semop resumed> )   = 0 <1.213309>
119966 12:33:36 <... semop resumed> )   = 0 <1.237492>
119958 14:33:53.489398 <... semop resumed> ) = 0 <5.455830>
120019 14:33:53.490613 <... semop resumed> ) = 0 <5.284505>
119997 14:33:53.490638 <... semop resumed> ) = 0 <5.111661>
12 14:33:53.490649 <... semop resumed> ) = 0 <3.521992>
119991 14:33:53.490660 <... semop resumed> ) = 0 <2.522460>
119988 14:33:53.490670 <... semop resumed> ) = 0 <5.252485>
120044 14:33:53.490834 <... semop resumed> ) = 0 <1.718129>
119976 14:33:53.490852 <... semop resumed> ) = 0 <2.489563>
119974 14:33:53.490862 <... semop resumed> ) = 0 <1.520801>
119984 14:33:53.491011 <... semop resumed> ) = 0 <1.213411>
119971 14:33:53.491029 <... semop resumed> ) = 0 <5.414772>
119969 14:33:53.491039 <... semop resumed> ) = 0 <2.275608>
119966 14:33:53.491048 <... semop resumed> ) = 0 <2.526024>
119942 14:33:53.491058 <... semop resumed> ) = 0 <5.448506>
119964 15:23:38.746394 <... semop resumed> ) = 0 <2.034851>
119960 15:23:38.746426 <... semop resumed> ) = 0 <2.038321>
119966 15:23:38.752646 <... semop resumed> ) = 0 <1.252342>

Also it was spotted that WALWriter Postgres backend also spend time in  during hangs.

Also I have application on db host that performs pg_stat_activity shapshots 
every 500m and for example I can see that there were no snapshot between 
14:33:47 and 14:33:53.
Separate simple script on db host every ~100ms checks ps output for this 
application and writes it into the txt file. And we can see that while it 
usually perfor