Slave server sometimes locks up

2019-03-05 Thread Boris Sagadin
PgSQL 10.7, Ubuntu 16.04 LTS

Symptoms:

- server accepts new queries until connections exhausted (all queries are
SELECT)
- queries are active, never end, but no disk IO
- queries can't be killed with kill -TERM or pg_terminate_backend()
- system load is minimal (vmstat shows 100% idle)
- perf top shows nothing
- statement_timeout is ignored
- no locks with SELECT relation::regclass, * FROM pg_locks WHERE NOT
GRANTED;
- server exits only on kill -9
- strace on SELECT process indefinitely shows:

futex(0x7f00fe94c938, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL,
^Cstrace: Process 121319 detached
 

GDB backtrace:

(gdb) bt
#0  0x7f05256f1827 in futex_abstimed_wait_cancelable (private=128,
abstime=0x0, expected=0, futex_word=0x7f00fe94ba38) at
../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x7f00fe94ba38, abstime=0x0) at
sem_waitcommon.c:111
#2  0x7f05256f18d4 in __new_sem_wait_slow (sem=0x7f00fe94ba38,
abstime=0x0) at sem_waitcommon.c:181
#3  0x7f05256f197a in __new_sem_wait (sem=sem@entry=0x7f00fe94ba38) at
sem_wait.c:29
#4  0x55c9b95eb792 in PGSemaphoreLock (sema=0x7f00fe94ba38) at
pg_sema.c:316
#5  0x55c9b965eaec in LWLockAcquire (lock=0x7f00fe96f880,
mode=mode@entry=LW_EXCLUSIVE) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/storage/lmgr/lwlock.c:1233
#6  0x55c9b96497f7 in dsm_create (size=size@entry=105544,
flags=flags@entry=1) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/storage/ipc/dsm.c:493
#7  0x55c9b94139ff in InitializeParallelDSM
(pcxt=pcxt@entry=0x55c9bb8d9d58)
at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/access/transam/parallel.c:268
#8  0x55c9b9538187 in ExecInitParallelPlan (planstate=0x55c9bb949178,
estate=0x55c9bb948568, nworkers=1) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/execParallel.c:470
#9  0x55c9b9547d20 in ExecGather (pstate=0x55c9bb948e58) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeGather.c:158
#10 0x55c9b95543d5 in ExecProcNode (node=0x55c9bb948e58) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#11 ExecNestLoop (pstate=0x55c9bb948c68) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeNestloop.c:109
#12 0x55c9b9556b56 in ExecProcNode (node=0x55c9bb948c68) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#13 ExecSort (pstate=0x55c9bb9489f8) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeSort.c:106
#14 0x55c9b9556e57 in ExecProcNode (node=0x55c9bb9489f8) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#15 ExecUnique (pstate=0x55c9bb948808) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/nodeUnique.c:73
#16 0x55c9b9533fcb in ExecProcNode (node=0x55c9bb948808) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/include/executor/executor.h:250
#17 ExecutePlan (execute_once=, dest=0x55c9bb8a9328,
direction=, numberTuples=0, sendTuples=,
operation=CMD_SELECT, use_parallel_mode=,
planstate=0x55c9bb948808, estate=0x55c9bb948568)
at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/execMain.c:1723
#18 standard_ExecutorRun (queryDesc=0x55c9bb98ba78, direction=, count=0, execute_once=) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/executor/execMain.c:364
#19 0x55c9b9670d6b in PortalRunSelect (portal=portal@entry=0x55c9bb8f1d68,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x55c9bb8a9328)
at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/pquery.c:932
#20 0x55c9b9672380 in PortalRun (portal=portal@entry=0x55c9bb8f1d68,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
'\001', run_once=, dest=dest@entry=0x55c9bb8a9328,
altdest=altdest@entry=0x55c9bb8a9328,
completionTag=0x7ffdde0387c0 "") at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/pquery.c:773
#21 0x55c9b966eef9 in exec_execute_message
(max_rows=9223372036854775807, portal_name=0x55c9bb8a8f18 "") at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/postgres.c:2007
#22 PostgresMain (argc=, argv=argv@entry=0x55c9bb845328,
dbname=0x55c9bb7ec9e8 "prod", username=) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/tcop/postgres.c:4180
#23 0x55c9b93a46b9 in BackendRun (port=0x55c9bb840ce0) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/postmaster/postmaster.c:4405
#24 BackendStartup (port=0x55c9bb840ce0) at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/postmaster/postmaster.c:4077
#25 ServerLoop () at
/build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/ba

Re: Performance comparison between Pgsql 10.5 and Pgsql 11.2

2019-03-05 Thread Nicola Contu
Not sure what you are requesting exactly but here is the strace for the
start of the pg_ctl



Il giorno lun 4 mar 2019 alle ore 21:55 Thomas Munro 
ha scritto:

> On Tue, Mar 5, 2019 at 5:01 AM Nicola Contu 
> wrote:
> > Attached a part of the strace running the pgbench command for pg11
> > Also attached strace_10 for pg10.6.
>
> That looks like strace output from pgbench, and I don't see any
> interesting differences between v10 and v11 (though I'm surprised to
> see it using poll() instead of ppoll(), and opening a new connection
> for every transaction).
>
> How about syscalls on the server side?  You could start it with
> something like "strace -f path/to/postgres -D path/to/pgdata" (-f for
> follow children), and perhaps also use -c so that it shows aggregated
> data (up until you ^C it) instead of every syscall?
>
> --
> Thomas Munro
> https://enterprisedb.com
>
[postgres@STAGING-CMD1 ~]$  strace /usr/local/pgsql11.2/bin/pg_ctl -D 
/db/pgsql11/data/ -c start
execve("/usr/local/pgsql11.2/bin/pg_ctl", ["/usr/local/pgsql11.2/bin/pg_ctl", 
"-D", "/db/pgsql11/data/", "-c", "start"], [/* 27 vars */]) = 0
brk(NULL)   = 0x204b000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f6878854000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/usr/local/pgsql11.2/lib/tls/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (No such file or directory)
stat("/usr/local/pgsql11.2/lib/tls/x86_64", 0x7fff444938d0) = -1 ENOENT (No 
such file or directory)
open("/usr/local/pgsql11.2/lib/tls/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
stat("/usr/local/pgsql11.2/lib/tls", 0x7fff444938d0) = -1 ENOENT (No such file 
or directory)
open("/usr/local/pgsql11.2/lib/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 
-1 ENOENT (No such file or directory)
stat("/usr/local/pgsql11.2/lib/x86_64", 0x7fff444938d0) = -1 ENOENT (No such 
file or directory)
open("/usr/local/pgsql11.2/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
stat("/usr/local/pgsql11.2/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=48564, ...}) = 0
mmap(NULL, 48564, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6878848000
close(3)= 0
open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0m\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=144792, ...}) = 0
mmap(NULL, 2208904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f6878418000
mprotect(0x7f687842f000, 2093056, PROT_NONE) = 0
mmap(0x7f687862e000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x7f687862e000
mmap(0x7f687863, 13448, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f687863
close(3)= 0
open("/usr/local/pgsql11.2/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No 
such file or directory)
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P%\2\0\0\0\0\0"..., 832) 
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2173512, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f6878847000
mmap(NULL, 3981792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f687804b000
mprotect(0x7f687820e000, 2093056, PROT_NONE) = 0
mmap(0x7f687840d000, 24576, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c2000) = 0x7f687840d000
mmap(0x7f6878413000, 16864, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f6878413000
close(3)= 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f6878845000
arch_prctl(ARCH_SET_FS, 0x7f6878845740) = 0
mprotect(0x7f687840d000, 16384, PROT_READ) = 0
mprotect(0x7f687862e000, 4096, PROT_READ) = 0
mprotect(0x60a000, 4096, PROT_READ) = 0
mprotect(0x7f6878855000, 4096, PROT_READ) = 0
munmap(0x7f6878848000, 48564)   = 0
set_tid_address(0x7f6878845a10) = 92441
set_robust_list(0x7f6878845a20, 24) = 0
rt_sigaction(SIGRTMIN, {0x7f687841e7e0, [], SA_RESTORER|SA_SIGINFO, 
0x7f68784276d0}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f687841e870, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 
0x7f68784276d0}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
brk(NULL)   = 0x204b000
brk(0x206c000)  = 0x206c000
brk(NULL)   = 0x206c000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106070960, ...}) = 0
mmap(NULL, 106070960, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6871b22000
close(3)= 0

Re: Question about pg_upgrade from 9.2 to X.X

2019-03-05 Thread Perumal Raj
Thanks Sergei/Justin for the continues update.

So reorg Schema might be created as part of some scripts prior to 9.2
Version ?
These are the functions in DB not the Extension. However these functions
will not run as the associated libraries are not exists in System now (9.2)
and I hope no impact to system.

AS '$libdir/pg_reorg', 'reorg_disable_autovacuum';
AS '$libdir/pg_reorg', 'reorg_get_index_keys';
AS '$libdir/pg_reorg', 'reorg_apply';
AS '$libdir/pg_reorg', 'reorg_drop';
AS '$libdir/pg_reorg', 'reorg_indexdef';
AS '$libdir/pg_reorg', 'reorg_swap';
AS '$libdir/pg_reorg', 'reorg_trigger';
AS '$libdir/pg_reorg', 'reorg_version';

Will continue 9.6 upgrade after dropping reorg schema.

One Question need your address,

Prior to 9.2 to 9.6 upgrade , I have tried 9.2 10.7 upgrade and failed
similar error(you can refer beginning o the post ).

> could not load library "$libdir/hstore": ERROR:  could not access file
"$libdir/hstore": No such file or directory
> could not load library "$libdir/adminpack": ERROR:  could not access file
"$libdir/adminpack": No such file or directory
> could not load library "$libdir/uuid-ossp": ERROR:  could not access file
"$libdir/uuid-ossp": No such file or directory

These Extension seems to be standard. What is the use of these function and
do we have any alternative in Higher version or Enhanced object if i drop
it in 9.2 and continue upgrade to 10.7 Version.

Thanks and Regards,

On Mon, Mar 4, 2019 at 11:42 PM Sergei Kornilov  wrote:

> Hi
>
> seems this is unpackaged extension, usually installed prior 9.1 release.
> Maybe reorg even does not support "create extension" syntax. That was long
> ago and project homepage is unavailable now. pg_repack documentation
> mention "support for PostgreSQL 9.2 and EXTENSION packaging" as
> improvements.
>
> > Are these functions seeded ( default) one ?
>
> No its not default.
>
> regards, Sergei
>


LIMIT not showing all results

2019-03-05 Thread Casey Deccio
Okay, the subject is a little misleading because of course LIMIT isn't supposed 
to all results, but I've got an issue where LIMIT isn't showing the number of 
results I would expect.  For example:

mydb=> select id,name,date from analysis where name = 'foo' order by date desc 
limit 3;
   id |name |  date  
---+-+
195898786 | foo | 2019-03-05 06:45:29+00
(1 row)

mydb=> select id,name,date from analysis where name = 'foo' order by date desc 
limit 20;
   id |name |  date  
---+-+
195898786 | foo | 2019-03-05 06:45:29+00


But when I query without limit (or even with "limit all"), there are more:

mydb=> select id,name,date from analysis where name = 'foo' order by date desc;
   id |name |  date
---+-+
195898786 | foo | 2019-03-05 06:45:29+00
195842671 | foo | 2019-01-24 14:31:45+00
195667475 | foo | 2018-12-30 23:40:11+00
195256709 | foo | 2018-10-29 18:33:07+00
...
(Many more rows)

psql version 9.6.11.
Server version 9.4.20.

We just migrated the database from one server to another (as far as I know, 
there was no version change), and it worked previously.

Any ideas?

Many thanks in advance.
Casey


Re: [External] LIMIT not showing all results

2019-03-05 Thread Vijaykumar Jain
Can you run both the queries with
“explain analyze select ” and paste the output.


On Tue, 5 Mar 2019 at 9:41 PM Casey Deccio  wrote:

> Okay, the subject is a little misleading because of course LIMIT isn't
> supposed to all results, but I've got an issue where LIMIT isn't showing
> the number of results I would expect.  For example:
>
> mydb=> select id,name,date from analysis where name = 'foo' order by date
> desc limit 3;
>id |name |  date
> ---+-+
> 195898786 | foo | 2019-03-05 06:45:29+00
> (1 row)
>
> mydb=> select id,name,date from analysis where name = 'foo' order by date
> desc limit 20;
>id |name |  date
> ---+-+
> 195898786 | foo | 2019-03-05 06:45:29+00
>
>
> But when I query without limit (or even with "limit all"), there are more:
>
> mydb=> select id,name,date from analysis where name = 'foo' order by date
> desc;
>id |name |  date
> ---+-+
> 195898786 | foo | 2019-03-05 06:45:29+00
> 195842671 | foo | 2019-01-24 14:31:45+00
> 195667475 | foo | 2018-12-30 23:40:11+00
> 195256709 | foo | 2018-10-29 18:33:07+00
> ...
> (Many more rows)
>
> psql version 9.6.11.
> Server version 9.4.20.
>
> We just migrated the database from one server to another (as far as I
> know, there was no version change), and it worked previously.
>
> Any ideas?
>
> Many thanks in advance.
> Casey
>
-- 

Regards,
Vijay


Re: Slave server sometimes locks up

2019-03-05 Thread Thomas Munro
On Wed, Mar 6, 2019 at 1:39 AM Boris Sagadin  wrote:
> PgSQL 10.7, Ubuntu 16.04 LTS
>
> Symptoms:
>
> - server accepts new queries until connections exhausted (all queries are 
> SELECT)
> - queries are active, never end, but no disk IO
> - queries can't be killed with kill -TERM or pg_terminate_backend()
> - system load is minimal (vmstat shows 100% idle)
> - perf top shows nothing
> - statement_timeout is ignored
> - no locks with SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;
> - server exits only on kill -9
> - strace on SELECT process indefinitely shows:
>
> futex(0x7f00fe94c938, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL, 
> ^Cstrace: Process 121319 detached
>  
>
> GDB backtrace:
>
> (gdb) bt
> #0  0x7f05256f1827 in futex_abstimed_wait_cancelable (private=128, 
> abstime=0x0, expected=0, futex_word=0x7f00fe94ba38) at 
> ../sysdeps/unix/sysv/linux/futex-internal.h:205
> #1  do_futex_wait (sem=sem@entry=0x7f00fe94ba38, abstime=0x0) at 
> sem_waitcommon.c:111
> #2  0x7f05256f18d4 in __new_sem_wait_slow (sem=0x7f00fe94ba38, 
> abstime=0x0) at sem_waitcommon.c:181
> #3  0x7f05256f197a in __new_sem_wait (sem=sem@entry=0x7f00fe94ba38) at 
> sem_wait.c:29
> #4  0x55c9b95eb792 in PGSemaphoreLock (sema=0x7f00fe94ba38) at 
> pg_sema.c:316
> #5  0x55c9b965eaec in LWLockAcquire (lock=0x7f00fe96f880, 
> mode=mode@entry=LW_EXCLUSIVE) at 
> /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/storage/lmgr/lwlock.c:1233
> #6  0x55c9b96497f7 in dsm_create (size=size@entry=105544, 
> flags=flags@entry=1) at 
> /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/storage/ipc/dsm.c:493
> #7  0x55c9b94139ff in InitializeParallelDSM 
> (pcxt=pcxt@entry=0x55c9bb8d9d58) at 
> /build/postgresql-10-BKASGd/postgresql-10-10.7/build/../src/backend/access/transam/parallel.c:268

Hello Boris,

This looks like a known symptom of a pair of bugs we recently tracked
down and fixed:

1. "dsa_area could not attach to segment": dsm.c, fixed in commit 6c0fb941.
2. "cannot unpin a segment that is not pinned": dsm.c, fixed in commit 0b55aaac.

Do you see one of those messages earlier in your logs?  The bug was
caused by a failure to allow for a corner case where a new shared
memory segment has the same ID as a recently/concurrently destroyed
one, and is made more likely to occur by not-very-random random
numbers.  If one of these errors occurs while cleaning up from a
parallel query, a backend can self-deadlock while trying to cleanup
the same thing again in the error-handling path, and then other
backends will later block on that lock if they try to run a parallel
query.

The fix will be included in the next set of releases, but in the
meantime you could consider turning off parallel query (set
max_parallel_workers_per_gather = 0).  In practice I think you could
also avoid this problem by loading a library that calls something like
srandom(getpid()) in _PG_init() (so it runs in every parallel worker
making ID collisions extremely unlikely), but that's not really a
serious recommendation since it requires writing C code.

-- 
Thomas Munro
https://enterprisedb.com



Re: [External] LIMIT not showing all results

2019-03-05 Thread Casey Deccio



> On Mar 5, 2019, at 9:15 AM, Vijaykumar Jain  wrote:
> 
> Can you run both the queries with
> “explain analyze select ” and paste the output.

dnsviz=> explain analyze select id,name,date from analysis where name = 'foo' 
order by date desc limit 20;   

 QUERY PLAN 
   


   
---
--  

   
 Limit  (cost=0.57..81.74 rows=20 width=31) (actual time=0.540..0.541 rows=1 
loops=1)
  
   ->  Index Scan Backward using analysis_name_date_key on analysis  
(cost=0.57..7760.25 rows=1912 width=31) (actual 
time=0.539..0.540 rows=1 loops=1)   

   
 Index Cond: ((name)::text = 'foo'::text)
 Planning time: 6.728 ms
 Execution time: 0.587 ms
(5 rows)

dnsviz=> explain analyze select id,name,date from analysis where name = 'foo' 
order by date desc;  

   QUERY PLAN   
   


   
---|
-   

   
 Sort  (cost=7720.96..7725.74 rows=1912 width=31) (actual time=687.953..688.460 
rows=4965 loops=1)  
   
   Sort Key: date   

   
   Sort Method: quicksort  Memory: 580kB

   
   ->  Bitmap Heap Scan on analysis  (cost=75.39..7616.75 rows=1912 width=31) 
(actual time=8.600..677.916 rows=4965 loops=1) 
 Recheck Cond: ((name)::text = 'foo'::text) 

   
 Heap Blocks: exact=4943

   
 ->  Bitmap Index Scan on analysis_name_cache_group_id_key  
(cost=0.00..74.91 rows=1912 width=0) (actual time=7.028..7.028 rows=4965 loop
s=1)

   
   Index Cond: ((name)::text = 'foo'::text)
 Planning time: 0.222 ms
 Execution time: 688.897 ms
(10 rows)

Thanks,
Casey


Re: Question about pg_upgrade from 9.2 to X.X

2019-03-05 Thread Justin Pryzby
On Tue, Mar 05, 2019 at 08:09:12AM -0800, Perumal Raj wrote:
> Thanks Sergei/Justin for the continues update.
> 
> So reorg Schema might be created as part of some scripts prior to 9.2
> Version ?

I'm guessing they were probably created in 9.2.

> These are the functions in DB not the Extension. However these functions
> will not run as the associated libraries are not exists in System now (9.2)
> and I hope no impact to system.

I guess someone installed pgreorg, ran its scripts to install its functions
into the DB, and then removed pgreorg without removing its scripts.

> One Question need your address,
> 
> Prior to 9.2 to 9.6 upgrade , I have tried 9.2 10.7 upgrade and failed
> similar error(you can refer beginning o the post ).
> 
> > could not load library "$libdir/hstore": ERROR:  could not access file 
> > "$libdir/hstore": No such file or directory
> > could not load library "$libdir/adminpack": ERROR:  could not access file 
> > "$libdir/adminpack": No such file or directory
> > could not load library "$libdir/uuid-ossp": ERROR:  could not access file 
> > "$libdir/uuid-ossp": No such file or directory
> 
> These Extension seems to be standard. What is the use of these function and
> do we have any alternative in Higher version or Enhanced object if i drop
> it in 9.2 and continue upgrade to 10.7 Version.

See Sergei's response:
https://www.postgresql.org/message-id/7164691551378448%40myt3-1179f584969c.qloud-c.yandex.net

You probably want to install this package for the new version (9.6 or 10 or
11).

[pryzbyj@TS-DB ~]$ rpm -ql postgresql11-contrib |grep -E 
'(uuid-ossp|adminpack|hstore)\.control'
/usr/pgsql-11/share/extension/adminpack.control
/usr/pgsql-11/share/extension/hstore.control
/usr/pgsql-11/share/extension/uuid-ossp.control

Justin



Re: [External] LIMIT not showing all results

2019-03-05 Thread Tom Lane
Casey Deccio  writes:
>> On Mar 5, 2019, at 9:15 AM, Vijaykumar Jain  wrote:
>> Can you run both the queries with
>> “explain analyze select ” and paste the output.

> dnsviz=> explain analyze select id,name,date from analysis where name = 'foo' 
> order by date desc limit 20;   
>   
>QUERY PLAN 
>


   
> ---
> --
>   
>
>  Limit  (cost=0.57..81.74 rows=20 width=31) (actual time=0.540..0.541 rows=1 
> loops=1)  
> 
>->  Index Scan Backward using analysis_name_date_key on analysis  
> (cost=0.57..7760.25 rows=1912 width=31) (actual 
> time=0.539..0.540 rows=1 loops=1) 
>   
>
>  Index Cond: ((name)::text = 'foo'::text)
>  Planning time: 6.728 ms
>  Execution time: 0.587 ms
> (5 rows)

Hm, so possibly corruption in that index?  REINDEX might help.

regards, tom lane



Re: Performance comparison between Pgsql 10.5 and Pgsql 11.2

2019-03-05 Thread Thomas Munro
On Wed, Mar 6, 2019 at 4:22 AM Nicola Contu  wrote:
>
> Not sure what you are requesting exactly but here is the strace for the start 
> of the pg_ctl

I meant that you could run the server itself in the foreground under
strace, like so:

$ strace -f -c /usr/local/pgsql11.2/bin/postgres -D /db/pgsql11/data/

Then perform your testing, and finally stop it with pg_ctl from
another window (or hit ^C in this window) and strace should spit out a
table of system calls with some counters.  We might be able to see why
v11 is spending so much more time executing system calls than v10 for
your workload, or at least which systems calls they are, assuming you
run the same transactions against both versions.

-- 
Thomas Munro
https://enterprisedb.com



Re: [External] LIMIT not showing all results

2019-03-05 Thread Vijaykumar Jain
Thanks Tom.


I mean if the instance is a test instance,
probably analysis_name_date_key can be dropped and the query can be
run again so as to check if it still returns the correct rows.
or create an index in parallel with the same col as
analysis_name_date_key and check if the optimizer choses the right
index.
and then come to conclusion of bad index.

Also is there an option where we can force a particular index to be used ?

i read somewhere the below query may help with detecting bad index, is
this correct?

SELECT n.nspname, c.relname
FROM   pg_catalog.pg_class c, pg_catalog.pg_namespace n,
   pg_catalog.pg_index i
WHERE  (i.indisvalid = false OR i.indisready = false) AND
   i.indexrelid = c.oid AND c.relnamespace = n.oid AND
   n.nspname != 'pg_catalog' AND
   n.nspname != 'information_schema' AND
   n.nspname != 'pg_toast'


Regards,
Vijay

On Tue, Mar 5, 2019 at 10:16 PM Tom Lane  wrote:
>
> Casey Deccio  writes:
> >> On Mar 5, 2019, at 9:15 AM, Vijaykumar Jain  wrote:
> >> Can you run both the queries with
> >> “explain analyze select ” and paste the output.
>
> > dnsviz=> explain analyze select id,name,date from analysis where name = 
> > 'foo' order by date desc limit 20;
> > 
> >  QUERY PLAN
>
> > ---
> > --
> >  Limit  (cost=0.57..81.74 rows=20 width=31) (actual time=0.540..0.541 
> > rows=1 loops=1)
> >->  Index Scan Backward using analysis_name_date_key on analysis  
> > (cost=0.57..7760.25 rows=1912 width=31) (actual
> > time=0.539..0.540 rows=1 loops=1)
> >  Index Cond: ((name)::text = 'foo'::text)
> >  Planning time: 6.728 ms
> >  Execution time: 0.587 ms
> > (5 rows)
>
> Hm, so possibly corruption in that index?  REINDEX might help.
>
> regards, tom lane
>



Re: [External] LIMIT not showing all results

2019-03-05 Thread Andreas Kretschmer




Am 05.03.19 um 17:51 schrieb Vijaykumar Jain:

Thanks Tom.


I mean if the instance is a test instance,
probably analysis_name_date_key can be dropped and the query can be
run again so as to check if it still returns the correct rows.
or create an index in parallel with the same col as
analysis_name_date_key and check if the optimizer choses the right
index.
and then come to conclusion of bad index.

Also is there an option where we can force a particular index to be used ?



no, but you can set enable_indexscan to off and maybe also 
enable_bitmapscan to off to force the planner to choose a seq-scan.

I'm sure in this case you will get a correct result.

the other thing is, it would be nice to to know why the index is corrupt.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: [External] LIMIT not showing all results

2019-03-05 Thread Tom Lane
Andreas Kretschmer  writes:
> the other thing is, it would be nice to to know why the index is corrupt.

Given that (a) this was triggered by a server migration and (b)
the leading column of the index looks like it's probably varchar,
I'm suspicious that the new server has different collation behavior.

regards, tom lane



Re: [External] LIMIT not showing all results

2019-03-05 Thread Andreas Kretschmer
On 5 March 2019 18:54:33 CET, Tom Lane  wrote:
>Andreas Kretschmer  writes:
>> the other thing is, it would be nice to to know why the index is
>corrupt.
>
>Given that (a) this was triggered by a server migration and (b)
>the leading column of the index looks like it's probably varchar,
>I'm suspicious that the new server has different collation behavior.
>
>   regards, tom lane


yes, sounds reasonable.



Regards, Andreas.



-- 
2ndQuadrant - The PostgreSQL Support Company
-- 
2ndQuadrant - The PostgreSQL Support Company



Re: [External] LIMIT not showing all results

2019-03-05 Thread Matthew Pounsett
On Tue, 5 Mar 2019 at 12:54, Tom Lane  wrote:

> Andreas Kretschmer  writes:
> > the other thing is, it would be nice to to know why the index is corrupt.
>
> Given that (a) this was triggered by a server migration and (b)
> the leading column of the index looks like it's probably varchar,
> I'm suspicious that the new server has different collation behavior.


The migration in question was an rsync from a Debian 9 box
running 9.4.19-0+deb8u1 to a FreeBSD 11 box  running 9.4.20.


Re: LIMIT not showing all results

2019-03-05 Thread Casey Deccio

> On Mar 5, 2019, at 10:37 AM, Andreas Kretschmer  
> wrote:
> 
> no, but you can set enable_indexscan to off and maybe also enable_bitmapscan 
> to off to force the planner to choose a seq-scan.
> I'm sure in this case you will get a correct result.

So this (setting enable_indexscan to off) might be a short-term fix, while 
reindexing is the more robust fix?

Casey

Re: [External] LIMIT not showing all results

2019-03-05 Thread Andreas Kretschmer




Am 05.03.19 um 19:09 schrieb Matthew Pounsett:



On Tue, 5 Mar 2019 at 12:54, Tom Lane > wrote:


Andreas Kretschmer mailto:andr...@a-kretschmer.de>> writes:
> the other thing is, it would be nice to to know why the index is
corrupt.

Given that (a) this was triggered by a server migration and (b)
the leading column of the index looks like it's probably varchar,
I'm suspicious that the new server has different collation behavior.


The migration in question was an rsync from a Debian 9 box 
running 9.4.19-0+deb8u1 to a FreeBSD 11 box  running 9.4.20.


yeah, check and compare the glibc - version on both systems. (ldd --version)


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: LIMIT not showing all results

2019-03-05 Thread Andreas Kretschmer




Am 05.03.19 um 19:41 schrieb Casey Deccio:


On Mar 5, 2019, at 10:37 AM, Andreas Kretschmer 
mailto:andr...@a-kretschmer.de>> wrote:


no, but you can set enable_indexscan to off and maybe also 
enable_bitmapscan to off to force the planner to choose a seq-scan.

I'm sure in this case you will get a correct result.


So this (setting enable_indexscan to off) might be a short-term fix, 
while reindexing is the more robust fix?




yes, sure. The index is corrupt.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: [External] LIMIT not showing all results

2019-03-05 Thread Tom Lane
Matthew Pounsett  writes:
> On Tue, 5 Mar 2019 at 12:54, Tom Lane  wrote:
>> Given that (a) this was triggered by a server migration and (b)
>> the leading column of the index looks like it's probably varchar,
>> I'm suspicious that the new server has different collation behavior.

> The migration in question was an rsync from a Debian 9 box
> running 9.4.19-0+deb8u1 to a FreeBSD 11 box  running 9.4.20.

Yeah, that would fit the theory :-(.  Debian would be using glibc
and FreeBSD would not be.  If you were using C collation in the
database, you'd be all right because that's standardized, but I'll
bet you were using something else.  What does psql \l show for the
"collate" setting of this database?  (Or, if by chance you had an
explicit COLLATE setting on the column in question, what's that?)

In any case, you should be reindexing any indexes on textual columns
that were not using "C" collation; none of them can be trusted.
The system catalogs should be OK.

regards, tom lane



Server upgrade advice

2019-03-05 Thread Rory Campbell-Lange
We've been happy running a database server and replica for some years
with the following details and specs:

postgres 9.5 (currently)
supermicro X9DRD-7LN4F
LSI Megaraid MR9261-8i with BBU
250gb raid 1 /
224gb raid 10 /db
126GB RAM (1066Mhz DDR3)
2 x Xeon E5-2609 v2 @ 2.5GHz

Services on the server are scaling up quite quickly, so we are running
out of disk space for the several hundred databases in the cluster.
While the disk space is fairly easy to solve, our main issue is CPU
hitting daily 5 minute peaks of 65% plus under load for complex plpgsql
queries, causing query backups. While we don't often spill queries to
disk, running out of RAM is an incipient problem too. 

While we could split the cluster there are some management issues to do
with that, together with our having a policy of local and remote
replicas. 

Consequently we're thinking of the following replacement servers:

postgres 11 (planned)
supermicro 113TQ-R700W 
LSI MegaRAID 9271-8i SAS/SATA RAID Controller, 1Gb DDR3 Cache (PCIE- Gen 3)
500gb raid 1 /
2tb raid 10 /db
with "zero maintenance flash cache protection"
256GB RAM (2666MHz DDR4)
2x E5-2680 v4 Intel Xeon, 14 Cores, 2.40GHz, 35M Cache,

This configuration gives us lots more storage, double the RAM (with 8
slots free) and just under 4x CPU (according to passmark) with lots more
cores.

We're hoping to get two to three years of service out of this upgrade,
but then will split the cluster between servers if demand grows more
than we anticipate.

Any comments on this upgrade, strategy or the "zero maintenance" thingy
(instead of a BBU) would be much appreciated.

Rory



Re: Slave server sometimes locks up

2019-03-05 Thread Boris Sagadin
Hi Thomas,

thank you for your quick reply, much appreciated.

1. "dsa_area could not attach to segment": dsm.c, fixed in commit 6c0fb941.
> 2. "cannot unpin a segment that is not pinned": dsm.c, fixed in commit
> 0b55aaac.
>
>
Yes, I found both entries in our logs, each once per incident.


> The fix will be included in the next set of releases, but in the
> meantime you could consider turning off parallel query (set
> max_parallel_workers_per_gather = 0).  In practice I think you could
> also avoid this problem by loading a library that calls something like
> srandom(getpid()) in _PG_init() (so it runs in every parallel worker
> making ID collisions extremely unlikely), but that's not really a
> serious recommendation since it requires writing C code.
>
>
This is very reassuring, as I was slowly starting to doubt my DBA
competence here. :)
I'll try this workaround until the next release is out, will report if
anything out of ordinary re-occurs.

Boris


Re: Server upgrade advice

2019-03-05 Thread Kenneth Marshall
> 
> Consequently we're thinking of the following replacement servers:
> 
> postgres 11 (planned)
> supermicro 113TQ-R700W 
> LSI MegaRAID 9271-8i SAS/SATA RAID Controller, 1Gb DDR3 Cache (PCIE- Gen 
> 3)
> 500gb raid 1 /
> 2tb raid 10 /db
> with "zero maintenance flash cache protection"
> 256GB RAM (2666MHz DDR4)
> 2x E5-2680 v4 Intel Xeon, 14 Cores, 2.40GHz, 35M Cache,
> 
> This configuration gives us lots more storage, double the RAM (with 8
> slots free) and just under 4x CPU (according to passmark) with lots more
> cores.
> 
> We're hoping to get two to three years of service out of this upgrade,
> but then will split the cluster between servers if demand grows more
> than we anticipate.
> 
> Any comments on this upgrade, strategy or the "zero maintenance" thingy
> (instead of a BBU) would be much appreciated.
> 
> Rory
> 

Hi Rory,

Is there a reason not to consider an all flash solution? The AMD EPYC
processor series supports enough NVMe channels to support your sizing.
The 7401P single processor is a good value proposition.

Regards,
Ken



Re: Server upgrade advice

2019-03-05 Thread Rory Campbell-Lange
On 05/03/19, Kenneth Marshall (k...@rice.edu) wrote:
> > 
> > Consequently we're thinking of the following replacement servers:
> > 
> > postgres 11 (planned)
> > supermicro 113TQ-R700W 
> > LSI MegaRAID 9271-8i SAS/SATA RAID Controller, 1Gb DDR3 Cache (PCIE- 
> > Gen 3)
> > 500gb raid 1 /
> > 2tb raid 10 /db
> > with "zero maintenance flash cache protection"
> > 256GB RAM (2666MHz DDR4)
> > 2x E5-2680 v4 Intel Xeon, 14 Cores, 2.40GHz, 35M Cache,
> > 
> > This configuration gives us lots more storage, double the RAM (with 8
> > slots free) and just under 4x CPU (according to passmark) with lots more
> > cores.
> > 
> > We're hoping to get two to three years of service out of this upgrade,
> > but then will split the cluster between servers if demand grows more
> > than we anticipate.
> > 
> > Any comments on this upgrade, strategy or the "zero maintenance" thingy
> > (instead of a BBU) would be much appreciated.

> Is there a reason not to consider an all flash solution? The AMD EPYC
> processor series supports enough NVMe channels to support your sizing.
> The 7401P single processor is a good value proposition.

Hi Ken

Thanks very much for your response.

I'm completely naive about the uses of NVMe. Does it support RAID, for
instance? Since we are not IO-bound at the moment, do you believe NVMe
would relieve our RAM/CPU problems?

I'd be grateful for some pointers to database-related NVMe info.

Many thanks
Rory



Re: Server upgrade advice

2019-03-05 Thread Kenneth Marshall
> 
> > Is there a reason not to consider an all flash solution? The AMD EPYC
> > processor series supports enough NVMe channels to support your sizing.
> > The 7401P single processor is a good value proposition.
> 
> Hi Ken
> 
> Thanks very much for your response.
> 
> I'm completely naive about the uses of NVMe. Does it support RAID, for
> instance? Since we are not IO-bound at the moment, do you believe NVMe
> would relieve our RAM/CPU problems?
> 
> I'd be grateful for some pointers to database-related NVMe info.
> 
> Many thanks
> Rory

Hi Rory,

NVMe is just a block device. It does have much, much higher I/O
bandwidth than standard disks, 4GB/s versus 100MB/s. I am just starting
with NVMe and SSD for databases. Hopefully, some other more experienced
users can chime in.

Regards,
Ken



Re: [External] LIMIT not showing all results

2019-03-05 Thread Casey Deccio



> On Mar 5, 2019, at 11:55 AM, Tom Lane  wrote:
> 
> Matthew Pounsett  writes:
>> On Tue, 5 Mar 2019 at 12:54, Tom Lane  wrote:
>>> Given that (a) this was triggered by a server migration and (b)
>>> the leading column of the index looks like it's probably varchar,
>>> I'm suspicious that the new server has different collation behavior.
> 
>> The migration in question was an rsync from a Debian 9 box
>> running 9.4.19-0+deb8u1 to a FreeBSD 11 box  running 9.4.20.
> 
> Yeah, that would fit the theory :-(.  Debian would be using glibc
> and FreeBSD would not be.  If you were using C collation in the
> database, you'd be all right because that's standardized, but I'll
> bet you were using something else.  What does psql \l show for the
> "collate" setting of this database?  (Or, if by chance you had an
> explicit COLLATE setting on the column in question, what's that?)
> 
> In any case, you should be reindexing any indexes on textual columns
> that were not using "C" collation; none of them can be trusted.
> The system catalogs should be OK.

Many thanks to you and everyone who helped with this issue, with informative 
and actionable advice.  Reindexing worked like a champ, and we seem to be back 
in business.

Casey


Re: [External] LIMIT not showing all results

2019-03-05 Thread Matthew Pounsett
On Tue, 5 Mar 2019 at 13:55, Tom Lane  wrote:

>
> Yeah, that would fit the theory :-(.  Debian would be using glibc
> and FreeBSD would not be.  If you were using C collation in the
> database, you'd be all right because that's standardized, but I'll
> bet you were using something else.  What does psql \l show for the
> "collate" setting of this database?  (Or, if by chance you had an
> explicit COLLATE setting on the column in question, what's that?)
>

All of the databases are using en_US.UTF-8, which is (I think) the default
these days for most distributions, isn't it?

So yeah.. that would be it.  Thanks for your help.

The rsync migration was because we needed to do a cross-country copy before
putting the original DB server on a truck, but we couldn't get
pg_basebackup to complete the 22TB sync without dying.  rsync was
restartable, so we went that route instead.  Now that the two copies are
physically next to each other again, after we do a rebuild of the original
server I'll be syncing the data back (this time using pg_basebackup and
replication).  I *assume* we shouldn't expect similar collation problems
replicating data that way, but it seems prudent to check.
Should we?


Re: [External] LIMIT not showing all results

2019-03-05 Thread Tom Lane
Matthew Pounsett  writes:
> On Tue, 5 Mar 2019 at 13:55, Tom Lane  wrote:
>> Yeah, that would fit the theory :-(.  Debian would be using glibc
>> and FreeBSD would not be.

> The rsync migration was because we needed to do a cross-country copy before
> putting the original DB server on a truck, but we couldn't get
> pg_basebackup to complete the 22TB sync without dying.  rsync was
> restartable, so we went that route instead.  Now that the two copies are
> physically next to each other again, after we do a rebuild of the original
> server I'll be syncing the data back (this time using pg_basebackup and
> replication).  I *assume* we shouldn't expect similar collation problems
> replicating data that way, but it seems prudent to check.
> Should we?

Uh, yeah you should.  The point here is that Debian and FreeBSD
have different ideas of what en_US.UTF-8 sort order means, so that
an index that's correctly ordered by the lights of one system
is incorrect (corrupt) according to the other.

If you're planninng to install (the same version of) FreeBSD on
the original server hardware, then rsync'ing back from the new
system should be fine.  But Debian<->FreeBSD is gonna be trouble
in either direction.

We generally don't recommend physical database transfers across
OS boundaries, and this is the main reason why.

regards, tom lane



Re: [External] LIMIT not showing all results

2019-03-05 Thread Matthew Pounsett
On Tue, 5 Mar 2019 at 18:09, Tom Lane  wrote:

>
> If you're planninng to install (the same version of) FreeBSD on
> the original server hardware, then rsync'ing back from the new
> system should be fine.  But Debian<->FreeBSD is gonna be trouble
> in either direction.
>

But I'm specifically NOT talking about doing an rsync to get the data
back.. the plan is to use in-protocol replication.  Maybe that's a
distinction without a difference, but that's why I brought it up.

The replication documentation, and more specifically the pg_basebackup
documentation, makes no mention of cross-OS replication as being a problem
for any reason.  If that is expected to be a problem, then perhaps that
should be updated?


write on standby

2019-03-05 Thread Julie Nishimura
Hello there,
Is it possible for a test app to connect to the standby dB of an active-Standby 
dB pair?  While both continue to be connected and replicating? What if it’s 
needed to write tmp tables that are later dropped? Can it be done in hot 
standby scenario on standby? Version is 9.6.2

Thanks


Re: [External] LIMIT not showing all results

2019-03-05 Thread Tom Lane
Matthew Pounsett  writes:
> On Tue, 5 Mar 2019 at 18:09, Tom Lane  wrote:
>> If you're planninng to install (the same version of) FreeBSD on
>> the original server hardware, then rsync'ing back from the new
>> system should be fine.  But Debian<->FreeBSD is gonna be trouble
>> in either direction.

> But I'm specifically NOT talking about doing an rsync to get the data
> back.. the plan is to use in-protocol replication.  Maybe that's a
> distinction without a difference, but that's why I brought it up.

Won't help: all forms of physical replication that we support are just
going to try to copy the indexes bit-for-bit.  You could maybe get
away with it if you were using logical replication, but the time penalty
would be severe.

> The replication documentation, and more specifically the pg_basebackup
> documentation, makes no mention of cross-OS replication as being a problem
> for any reason.  If that is expected to be a problem, then perhaps that
> should be updated?

Hmm, I'm pretty sure it's documented somewhere, but maybe not in the
places you looked ...

regards, tom lane



Re: [External] LIMIT not showing all results

2019-03-05 Thread Adrian Klaver

On 3/5/19 3:18 PM, Matthew Pounsett wrote:



On Tue, 5 Mar 2019 at 18:09, Tom Lane > wrote:



If you're planninng to install (the same version of) FreeBSD on
the original server hardware, then rsync'ing back from the new
system should be fine.  But Debian<->FreeBSD is gonna be trouble
in either direction.


But I'm specifically NOT talking about doing an rsync to get the data 
back.. the plan is to use in-protocol replication.  Maybe that's a 
distinction without a difference, but that's why I brought it up.


The replication documentation, and more specifically the pg_basebackup 
documentation, makes no mention of cross-OS replication as being a 
problem for any reason.  If that is expected to be a problem, then 
perhaps that should be updated?


Generally covered under:

https://www.postgresql.org/docs/10/warm-standby.html#STANDBY-PLANNING

"It is usually wise to create the primary and standby servers so that 
they are as similar as possible, at least from the perspective of the 
database server."


You are using binary replication so binary differences come into play.

That is why later versions(10+) grew logical replication:

https://www.postgresql.org/docs/10/logical-replication.html







--
Adrian Klaver
adrian.kla...@aklaver.com



Re: write on standby

2019-03-05 Thread Adrian Klaver

On 3/5/19 3:34 PM, Julie Nishimura wrote:

Hello there,
Is it possible for a test app to connect to the standby dB of an 
active-Standby dB pair?  While both continue to be connected and 
replicating? What if it’s needed to write tmp tables that are later 
dropped? Can it be done in hot standby scenario on standby? Version is 9.6.2


No:

https://www.postgresql.org/docs/9.6/hot-standby.html

26.5.1. User's Overview

In particular:

"Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT. This 
restriction applies even to temporary tables, because carrying out these 
operations would require updating the system catalog tables."




Thanks



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: [External] LIMIT not showing all results

2019-03-05 Thread Matthew Pounsett
On Tue, 5 Mar 2019 at 18:39, Adrian Klaver 
wrote:

> >
> > The replication documentation, and more specifically the pg_basebackup
> > documentation, makes no mention of cross-OS replication as being a
> > problem for any reason.  If that is expected to be a problem, then
> > perhaps that should be updated?
>
> Generally covered under:
>
> https://www.postgresql.org/docs/10/warm-standby.html#STANDBY-PLANNING
>
> "It is usually wise to create the primary and standby servers so that
> they are as similar as possible, at least from the perspective of the
> database server."
>

Nothing in that paragraph says to me that I'm going to have problems as a
result of differences in postgres software dependencies.  It's a pretty
vague warning that seems to imply that as long as your hardware
architecture and filesystem layout are identical there shouldn't be any
issues.

Thanks for your clarification, though.  We'll have to take that into
account in our migration plan back to the original server.  I don't think
we can afford the downtime for a dump/restore between systems, so we'll
likely just end up regenerating indexes again, before cutting the
application over to the other DB server.

That is why later versions(10+) grew logical replication:
> https://www.postgresql.org/docs/10/logical-replication.html


That seems promising, but we'd have to deal with the upgrade to 10.x
first.  And presumably logical replication would allow more freedom in
replicating between mismatched versions as well... possibly as a
low-downtime migration path for getting very large databases moved to more
recent releases?


Non-pausing table scan on 9.6 replica?

2019-03-05 Thread Mark Fletcher
Hi All,

On a 9.6 streaming replica, we do table scans for stats and other things.
During these scans, the replication is paused (the 'recovering' postgres
process has 'waiting' appended to it). We're not using transactions with
these scans. Is there anything we can do to prevent the pausing?

Thanks,
Mark


Connection Drop from PostgreSQL Database Server

2019-03-05 Thread Mohit Kumar Sahni
I have done a setup over Azure Cloud for my PostgreSQL DB.
It's Ubuntu 18.04 LTS (4vCPU, 8GB RAM) machine with PostgreSQL 9.6 version.
The problem that occurring is, when the connection to the postgresql db stays 
idle for some time let's say 2 to 10 minutes then the connection to the db does 
not respond such that it doesn't fulfil the request and keep processing the 
query.
Same goes with my JAVA Springboot Application. The connection doesn't respond 
and the query keep processing.

This happens randomly such that the timing is not traceable sometimes it 
happens in 2 minutes, sometimes in 10 minutes & sometimes don't.
I have tried with PostgreSQL Configuration file parameters. I have tried:
tcp_keepalive_idle, tcp_keepalive_interval, tcp_keepalive_count.
Also statement_timeout & session_timeout parameters but it doesn't change 
anyway.

Thank you


Re: Connection Drop from PostgreSQL Database Server

2019-03-05 Thread Adrian Klaver

On 3/5/19 4:19 AM, Mohit Kumar Sahni wrote:

I have done a setup over Azure Cloud for my PostgreSQL DB.
It's Ubuntu 18.04 LTS (4vCPU, 8GB RAM) machine with PostgreSQL 9.6 version.
The problem that occurring is, when the connection to the postgresql db stays 
idle for some time let's say 2 to 10 minutes then the connection to the db does 
not respond such that it doesn't fulfil the request and keep processing the 
query.
Same goes with my JAVA Springboot Application. The connection doesn't respond 
and the query keep processing.


Is there anything in the Postgres log that is relevant?

The same for the OS system log?



This happens randomly such that the timing is not traceable sometimes it happens in 
2 minutes, sometimes in 10 minutes & sometimes don't.
I have tried with PostgreSQL Configuration file parameters. I have tried:
tcp_keepalive_idle, tcp_keepalive_interval, tcp_keepalive_count.
Also statement_timeout & session_timeout parameters but it doesn't change 
anyway.


What values for the above?



Thank you




--
Adrian Klaver
adrian.kla...@aklaver.com



How to make PAF use psql to login with password

2019-03-05 Thread 范国腾
Hi,

We use the PAF (https://dalibo.github.io/PAF/?) to manage the postgresql.


According to user's requirement, we could not use trust mode in the 
pg_hba.conf? file. So when running psql, it will ask us to input the password 
and we have to input the password manually.


So the pcs status show the following error:

* pgsqld_stop_0 on node1-pri 'unknown error' (1): call=34, status=complete, 
exitreason='Unexpected state for instance "pgsqld" (returned 1)',
last-rc-change='Wed Mar  6 09:09:46 2019', queued=0ms, exec=504ms



The cause of the error is that the PAF 
(/usr/lib/ocf/resource.d/heartbeat/pgsqlms)  will ask to input the password and 
we could not pass the password to psql command in the PAF script.


exec $PGPSQL, '--set', 'ON_ERROR_STOP=1', '-qXAtf', $tmpfile,
'-R', $RS, '-F', $FS, '--port', $pgport, '--host', 
$pghost,'--username','sysdba',

Is there any way for us to pass the password to the psql command in the PAF 
script?

We have tried to add the "export PGPASSWORD=123456" in the /etc/profile and it 
does not work.

thanks








Re: How to make PAF use psql to login with password

2019-03-05 Thread Adrian Klaver

On 3/5/19 5:57 PM, 范国腾 wrote:

Hi,

We use the PAF (https://dalibo.github.io/PAF/​) to manage the postgresql.


According to user's requirement, we could not use trust mode in the 
pg_hba.conf​ file. So when running psql, it will ask us to input the 
password and we have to input the password manually.



So the pcs status show the following error:

* pgsqld_stop_0 on node1-pri 'unknown error' (1): call=34, 
status=complete, exitreason='Unexpected state for instance "pgsqld" 
(returned 1)',

     last-rc-change='Wed Mar  6 09:09:46 2019', queued=0ms, exec=504ms



The cause of the error is that the PAF 
(/usr/lib/ocf/resource.d/heartbeat/pgsqlms)  will ask to input the 
password and we could not pass the password to psql command in the PAF 
script.



         exec $PGPSQL, '--set', 'ON_ERROR_STOP=1', '-qXAtf', $tmpfile,
     '-R', $RS, '-F', $FS, '--port', $pgport, '--host', 
$pghost,'--username','sysdba',


Is there any way for us to pass the password to the psql command in the 
PAF script?


Use .pgpass?:

https://www.postgresql.org/docs/10/libpq-pgpass.html



We have tried to add the "export PGPASSWORD=123456" in the /etc/profile 
and it does not work.


thanks









--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Connection Drop from PostgreSQL Database Server

2019-03-05 Thread Adrian Klaver

On 3/5/19 7:13 PM, Mohit Kumar Sahni wrote:

Hi Adrian,

I have checked my log files. Here are some of the lines that I think were 
logged at the time of connection drop:
2019-03-03 15:57:33.444 UTC [7146] azpostgres@master_configuration_db LOG:  
could not receive data from client: Connection timed out
2019-03-03 15:57:33.444 UTC [13161] azpostgres@master_configuration_db LOG:  
could not receive data from client: Connection timed out
2019-03-03 15:57:33.444 UTC [7571] azpostgres@master_configuration_db LOG:  
could not receive data from client: Connection timed out
2019-03-03 15:57:33.446 UTC [14459] azpostgres@master_configuration_db LOG:  
could not receive data from client: Connection timed out
2019-03-03 15:57:33.448 UTC [18010] azpostgres@postgres LOG:  could not receive 
data from client: Connection timed out


Where is the client running relative to the server?

Does Azure Cloud have a timeout set for connections?



And for the parameters, I have tried following values:
1. tcp_keepalive_idle: 1 & 2700
2. tcp_keepalive_interval: 1 & 2600
3. tcp_keepalive_count: 0 & 1
4. session_timeout: 2

All the values couldn't solve the problem.

Thanks
Mohit Kumar Sahni


From: Adrian Klaver 
Sent: Wednesday, March 6, 2019 6:19 AM
To: Mohit Kumar Sahni; pgsql-general@lists.postgresql.org
Cc: Maneesh Kumar Singh
Subject: Re: Connection Drop from PostgreSQL Database Server

On 3/5/19 4:19 AM, Mohit Kumar Sahni wrote:

I have done a setup over Azure Cloud for my PostgreSQL DB.
It's Ubuntu 18.04 LTS (4vCPU, 8GB RAM) machine with PostgreSQL 9.6 version.
The problem that occurring is, when the connection to the postgresql db stays 
idle for some time let's say 2 to 10 minutes then the connection to the db does 
not respond such that it doesn't fulfil the request and keep processing the 
query.
Same goes with my JAVA Springboot Application. The connection doesn't respond 
and the query keep processing.


Is there anything in the Postgres log that is relevant?

The same for the OS system log?



This happens randomly such that the timing is not traceable sometimes it happens in 
2 minutes, sometimes in 10 minutes & sometimes don't.
I have tried with PostgreSQL Configuration file parameters. I have tried:
tcp_keepalive_idle, tcp_keepalive_interval, tcp_keepalive_count.
Also statement_timeout & session_timeout parameters but it doesn't change 
anyway.


What values for the above?



Thank you




--
Adrian Klaver
adrian.kla...@aklaver.com





--
Adrian Klaver
adrian.kla...@aklaver.com



Re: write on standby

2019-03-05 Thread Andreas Kretschmer




Am 06.03.19 um 00:34 schrieb Julie Nishimura:

Hello there,
Is it possible for a test app to connect to the standby dB of an 
active-Standby dB pair?


that's possible, but ...


While both continue to be connected and replicating? What if it’s 
needed to write tmp tables that are later dropped?


... that's not. Not with streaming replication, but ...



Can it be done in hot standby scenario on standby? Version is 9.6.2



... it would be possible with logical replication (trigger based 
solutions like slony or lindiste or logical replication using pglogical 
from us, 2ndQ).


https://www.2ndquadrant.com/en/resources/pglogical/


You can also use BDR2 with PG9.6, but this is only available for our 
customers.



Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: Non-pausing table scan on 9.6 replica?

2019-03-05 Thread Andreas Kretschmer




Am 06.03.19 um 01:26 schrieb Mark Fletcher:

Hi All,

On a 9.6 streaming replica, we do table scans for stats and other 
things. During these scans, the replication is paused (the 
'recovering' postgres process has 'waiting' appended to it). We're not 
using transactions with these scans. Is there anything we can do to 
prevent the pausing?






have you set ```max_standby_streaming_delay``? The default is 30 
seconds, which means that this will be the maximum time allowed for a 
replication lag caused by a conflicting query.
You can use ``hot_standby_feedback = on``, but the downside will be more 
bloat on the tables.



Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: write on standby

2019-03-05 Thread Julie Nishimura
Thank you. Are you going to have any presentations on Postgresql conference in 
NYC soon?


From: Andreas Kretschmer 
Sent: Tuesday, March 5, 2019 9:16 PM
To: pgsql-general@lists.postgresql.org
Subject: Re: write on standby



Am 06.03.19 um 00:34 schrieb Julie Nishimura:
> Hello there,
> Is it possible for a test app to connect to the standby dB of an
> active-Standby dB pair?

that's possible, but ...


> While both continue to be connected and replicating? What if it’s
> needed to write tmp tables that are later dropped?

... that's not. Not with streaming replication, but ...


> Can it be done in hot standby scenario on standby? Version is 9.6.2
>

... it would be possible with logical replication (trigger based
solutions like slony or lindiste or logical replication using pglogical
from us, 2ndQ).

https://www.2ndquadrant.com/en/resources/pglogical/


You can also use BDR2 with PG9.6, but this is only available for our
customers.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: write on standby

2019-03-05 Thread Andreas Kretschmer
On 6 March 2019 06:26:45 CET, Julie Nishimura  wrote:
>Thank you. Are you going to have any presentations on Postgresql
>conference in NYC soon?
>
>
>From: Andreas Kretschmer 
>Sent: Tuesday, March 5, 2019 9:16 PM
>To: pgsql-general@lists.postgresql.org
>Subject: Re: write on standby
>
>
>
>Am 06.03.19 um 00:34 schrieb Julie Nishimura:
>> Hello there,
>> Is it possible for a test app to connect to the standby dB of an
>> active-Standby dB pair?
>
>that's possible, but ...
>
>
>> While both continue to be connected and replicating? What if it’s
>> needed to write tmp tables that are later dropped?
>
>... that's not. Not with streaming replication, but ...
>
>
>> Can it be done in hot standby scenario on standby? Version is 9.6.2
>>
>
>... it would be possible with logical replication (trigger based
>solutions like slony or lindiste or logical replication using pglogical
>from us, 2ndQ).
>
>https://www.2ndquadrant.com/en/resources/pglogical/
>
>
>You can also use BDR2 with PG9.6, but this is only available for our
>customers.
>
>
>Regards, Andreas
>
>--
>2ndQuadrant - The PostgreSQL Support Company.
>www.2ndQuadrant.com

Hi,

I think so, if you want i can organize a contact with someone from our company.


Andreas (from germany)
-- 
2ndQuadrant - The PostgreSQL Support Company



Re: Non-pausing table scan on 9.6 replica?

2019-03-05 Thread Mark Fletcher
Thank you for responding to my email.

On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer 
wrote:

>
> have you set ```max_standby_streaming_delay``? The default is 30
> seconds, which means that this will be the maximum time allowed for a
> replication lag caused by a conflicting query.
>

Yes, we've bumped that up a lot.


> You can use ``hot_standby_feedback = on``, but the downside will be more
> bloat on the tables.
>
> I'm not sure I understand. I'm not worried about the query being cancelled
on the replica. max_standby_streaming_delay fixes that for us. I'm worried
about the streaming replication being paused while this table scan is
running. If the table scan takes several minutes, then the replica becomes
several minutes out of sync with the master. I'd prefer that not to happen
and I'm wondering if there's a way to do that.

Thanks,
Mark


Re: Non-pausing table scan on 9.6 replica?

2019-03-05 Thread Andreas Kretschmer




Am 06.03.19 um 06:41 schrieb Mark Fletcher:

Thank you for responding to my email.

On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer 
mailto:andr...@a-kretschmer.de>> wrote:



have you set ```max_standby_streaming_delay``? The default is 30
seconds, which means that this will be the maximum time allowed for a
replication lag caused by a conflicting query.


Yes, we've bumped that up a lot.


i tought so.



You can use ``hot_standby_feedback = on``, but the downside will
be more
bloat on the tables.

I'm not sure I understand. I'm not worried about the query being 
cancelled on the replica. max_standby_streaming_delay fixes that for 
us. I'm worried about the streaming replication being paused while 
this table scan is running. If the table scan takes several minutes, 
then the replica becomes several minutes out of sync with the master. 
I'd prefer that not to happen and I'm wondering if there's a way to do 
that.




You can choose. max_standby_streaming_delay with delay in the streaming 
(hence the name) or hot_standby_feedback, with the downside of possible 
more bloat.



Regards, Andreas


--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: Non-pausing table scan on 9.6 replica?

2019-03-05 Thread Sameer Kumar
On Wed, Mar 6, 2019 at 1:41 PM Mark Fletcher  wrote:

> Thank you for responding to my email.
>
> On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer 
> wrote:
>
>>
>> have you set ```max_standby_streaming_delay``? The default is 30
>> seconds, which means that this will be the maximum time allowed for a
>> replication lag caused by a conflicting query.
>>
>
> Yes, we've bumped that up a lot.
>

Did you encounter cases where the queries were canceled on standby because
of streaming replication? Did you face any kind of recovery conflict
because of which you bumped this up?



> You can use ``hot_standby_feedback = on``, but the downside will be more
>> bloat on the tables.
>>
>> I'm not sure I understand. I'm not worried about the query being
> cancelled on the replica. max_standby_streaming_delay fixes that for us.
> I'm worried about the streaming replication being paused while this table
> scan is running. If the table scan takes several minutes, then the replica
> becomes several minutes out of sync with the master. I'd prefer that not to
> happen and I'm wondering if there's a way to do that.
>

Typically all operations on Master will be replicated as soon as possible
and standby will apply them as it receives them. There could be situations
when the operation on master cannot be replicated because it would conflict
with the running query e.g. you fired a select (a long-running report) and
in the meantime someone updated a row. Now because of MVCC, if your SELECT
query was running on master it would still give consistent result. It will
also be the case with standby, standby will also maintain older version and
select would get consistent result.
Then VACUUM gets fired by virtue of acuto-vacuum. If a SELECT was running
on master, the auto-vacuum will factor in the older version of rows which
currently running SELECT queries would require and will not remove them.
But master has no idea about what is happening on standby and will remove
rows which are used/required by a SELECT on standby. When this VACUUM gets
replicated to standby it will realize that there is a conflict in on going
query and replicated VACUUM operation so it will be stalled (by the virtue
of max_standby_streaming_delay). Hence you end up with replication delay. A
much smarter thing would be if Master knew which row versions should not be
removed (because they are in use on standby), that is what
"hot_standby_feedback offers to achieve.

But conflict on standby could also happen because of other operations e.g.
a query is reading from a table on standby and you did a DDL operation on
the master that modifies the table. In that case an exclusive lock will be
acquired on master, but when that gets replicated it will result in
conflict and standby will stall replication.

If your priority is to have replica as close as possible to master then
disable max_standby_streaming_delay



> Thanks,
> Mark
>
>


query has no destination for result data

2019-03-05 Thread Rob Sargent
I’m using 10.7. Does an empty result set generate this error by any chance. 

One of my plpgsql functions is now throwing this error:
select * from genome_threshold_mono('11-O3C.pbs','1-O3C_chr',1.96, 100);
NOTICE:  group id is 5eed8d65-d39a-4f72-97a3-ca391b84880d
NOTICE:  New threshold: 66128154-d128-4e66-bb8e-e9c9ee5ae89d
NOTICE:  doing chrom 11
NOTICE:  2019-03-06 00:21:17.253375-07: markerset id is 
9a8f7487-bd64-4d43-9adf-5ae1c6744e60(1-O3C_chr11.Loci.ld), people 
(5eed8d65-d39a-4f72-97a3-ca391b84880d) id is 11-O3C.pbs
NOTICE:  table "collected" does not exist, skipping
NOTICE:  table "mrkidx" does not exist, skipping
NOTICE:  2019-03-06 00:21:17.295142-07: working with 28607 markers
NOTICE:  2019-03-06 00:21:17.383835-07: added 3514 segments to imputed_pvalue_t
ERROR:  query has no destination for result data
HINT:  If you want to discard the results of a SELECT, use PERFORM instead.
CONTEXT:  PL/pgSQL function optimal_pvalue_mono(text,text,integer,double 
precision,integer) line 65 at SQL statement
SQL statement "insert into goptset
select * from optimal_pvalue_mono(people_name, mvec.name, mvec.chrom, 
conf, maxi)"
PL/pgSQL function genome_pvalue_mono(text,text,double precision,integer) line 
19 at SQL statement
SQL statement "insert into threshold_segment(id,threshold_id, segment_id, 
smooth_pvalue)
 select uuid_generate_v4(), tid, f.segment_id, f.pval
 from genome_pvalue_mono(pbs_name, markers_rx, conf, maxi) as f"
PL/pgSQL function genome_threshold_mono(text,text,double precision,integer) 
line 30 at SQL statement

The code referenced at line 65 is the last line in a "for row in query” 
construct as follows:
54for segp in
55  select s.id, s.firstmarker, s.lastmarker, 
56 v.ipv,
57 array_length(p.probands,1) as pbs,
58 s.lastmarker - s.firstmarker as mks
59  from segment s 
60   join imputed_pvalue_t v on s.id = v.segment_id
61   join probandset p on s.probandset_id = p.id
62   join probandset_group_member m on p.id = m.member_id
63  where s.markerset_id = mkset
64and m.group_id = pbsgid
65  order by ipv, pbs, mks
66LOOP
Plugging in the appropriate values for an example run generates a proper 
dataset (~1300 rows)as far as I can tell.  
This construct had been working until recent changes but I cannot relate the 
message to any deformity in the current schema or code.
Any pointers appreciated.