Replication Issues

2018-09-27 Thread bhargav kamineni
Hi Team,

I have configured replication using slot ,But it failed by throwing the
ERROR  *pg_basebackup: could not get transaction log end position from
server: ERROR:  requested WAL segment 00012C9D0085 has already
been removed *, which is  unexpected because i have created the slot on
master first and then issued the base backup command from slave's end
the command is
*usr/lib/postgresql/9.5/bin/pg_basebackup -U  user --max-rate='150 M'
--progress --verbose --write-recovery-conf --status-interval='10 s' -D
data_dir  -h host_ip  -p 5433*
These  are the settings on my master
archive_mode=on
archive_command='/bin/true'
wal_keep_segments=512
max_wal_senders=4
Series of steps i have followed :
1) Enabled password less authentication between master and slave
2)created slot on master (assuming it will store wal's regardless of other
settings)
3)started basebackup from slave's end
4)Issued checkpoint at master's end

Here my concern is , slave should recover WAL from replication slot but why
i got the above ERROR , Why slot removed  the requested wal file , Could
you please let me know the reason why it happened or did i miss something ?

Thanks,
Bhargav K


Re: Replication Issues

2018-09-28 Thread bhargav kamineni
Ee are using 9.5 version, slot option -s is not available ,can I go with -X
stream option ?

On Fri 28 Sep, 2018, 12:01 PM Laurenz Albe, 
wrote:

>  bhargav kamineni wrote:
> > Hi Team,
> >
> > I have configured replication using slot ,But it failed by throwing the
> > ERROR  pg_basebackup: could not get transaction log end position from
> server:
> > ERROR:  requested WAL segment 00012C9D0085 has already been
> removed ,
> > which is  unexpected because i have created the slot on master first
> > and then issued the base backup command from slave's end
> > the command is
> > usr/lib/postgresql/9.5/bin/pg_basebackup -U  user --max-rate='150 M'
> --progress --verbose --write-recovery-conf --status-interval='10 s' -D
> data_dir  -h host_ip  -p 5433
> > These  are the settings on my master
> > archive_mode=on
> > archive_command='/bin/true'
> > wal_keep_segments=512
> > max_wal_senders=4
> > Series of steps i have followed :
> > 1) Enabled password less authentication between master and slave
> > 2)created slot on master (assuming it will store wal's regardless of
> other settings)
> > 3)started basebackup from slave's end
> > 4)Issued checkpoint at master's end
> >
> > Here my concern is , slave should recover WAL from replication slot but
> why i
> > got the above ERROR , Why slot removed  the requested wal file , Could
> you please
> > let me know the reason why it happened or did i miss something ?
>
> I guess your base backup took long enough for the required WAL segments to
> be
> removed by the time it was done.
>
> To prevent that, create a replication slot *before* you perform
> pg_basebackup
> and use the options "-S  -X stream" of pg_basebackup.
>
> You then use the same slot in "recovery.conf".
>
> That way you cannot lose any WAL.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>


Slot issues

2018-10-14 Thread bhargav kamineni
Hi Team,

I am getting this ERROR while starting my second slave server

PANIC:  too many replication slots active before shutdown
HINT:  Increase max_replication_slots and try again.

max_replication_slots on my master is 2 and one of them is already active
for another slave,
do i need to increase this parameter for the need of working of another
slave ? if so whats the reason ?



Thanks
Bhargav K


Re: [External] Slot issues

2018-10-14 Thread bhargav kamineni
postgres=# select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database
| active | active_pid | xmin | catalog_xmin |  restart_lsn
--++---++--+++--+--+---
 replication_slot || physical  ||  | t  |
23185 |  |  | 7642/B07AC830
(1 row)


On Mon, 15 Oct 2018 at 01:34, Vijaykumar Jain  wrote:

> from your master, can you give us the output of
>
> select * from pg_replication_slots;
>
> maybe some stray slot exists that you may not be aware of ?
>
> Regards,
> Vijay
>
>
> On Mon, Oct 15, 2018 at 1:08 AM Vijaykumar Jain 
> wrote:
>
>> ok my bad.
>>
>> i just set one of the 9.6.9 versions to have max_replication_slots =2 and
>> still i had to slots active.
>>
>> ostgres=# table pg_replication_slots;
>> slot_name| plugin | slot_type | datoid | database |
>> active | active_pid | xmin | catalog_xmin | restart_lsn |
>> confirmed_flush_lsn
>>
>> -++---++--+++--+--+-+-
>>  a02 || physical  ||  | t  |  13719 |
>>  |  | 2/D3D0  |
>> a03 || physical  ||  | t  |  13720 |
>>  |  | 2/D3D0  |
>> (2 rows)
>>
>> postgres=# show max_replication_slots;
>>  max_replication_slots
>> ---
>>  2
>> (1 row)
>>
>>
>> yep it should work with 2.
>> Regards,
>> Vijay
>>
>>
>> On Mon, Oct 15, 2018 at 1:02 AM Vijaykumar Jain 
>> wrote:
>>
>>> I guess max_replication_slots has to be greater than active replication
>>> slots.
>>> maybe you need to increase max_replication_slots to 3 to have 2 active
>>> slots.
>>>
>>> Regards,
>>> Vijay
>>>
>>>
>>> On Mon, Oct 15, 2018 at 12:54 AM bhargav kamineni <
>>> bhargavpostg...@gmail.com> wrote:
>>>
>>>>  for 9.x. Default was 0.  we have set max_replication_slot =2 , one is
>>>> already being used , postgres should use  second slot for another server
>>>> but dono why its throwing error
>>>>
>>>> On Mon, 15 Oct 2018 at 00:32, Vijaykumar Jain 
>>>> wrote:
>>>>
>>>>> I guess max_replication_slots has to >  current total slots in use.
>>>>> (and not >= )
>>>>>
>>>>>
>>>>> https://www.postgresql.org/docs/10/static/runtime-config-replication.html
>>>>>
>>>>> https://github.com/postgres/postgres/blob/d6e98ebe375e115c29028f9bd090f0f7e07e2527/src/backend/replication/slot.c#L1506
>>>>>
>>>>> from the doc, it says defaults are 10, any reason why you have set it
>>>>> to 2. You need to set it at a value higher than number of slots active.
>>>>> also, i guess changing this parameter would require a server restart.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vijay
>>>>>
>>>>>
>>>>> On Mon, Oct 15, 2018 at 12:16 AM bhargav kamineni <
>>>>> bhargavpostg...@gmail.com> wrote:
>>>>>
>>>>>> Hi Team,
>>>>>>
>>>>>> I am getting this ERROR while starting my second slave server
>>>>>>
>>>>>> PANIC:  too many replication slots active before shutdown
>>>>>> HINT:  Increase max_replication_slots and try again.
>>>>>>
>>>>>> max_replication_slots on my master is 2 and one of them is already
>>>>>> active for another slave,
>>>>>> do i need to increase this parameter for the need of working of
>>>>>> another slave ? if so whats the reason ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Bhargav K
>>>>>>
>>>>>


Re: Slot issues

2018-10-14 Thread bhargav kamineni
I got his on standby,  could you please explain in detail about
 --*but that *on the standby* haven't set max_replication_slots high enough*
.

Thanks,
Bhargav

On Mon, 15 Oct 2018 at 01:50, Andres Freund  wrote:

> Hi,
>
> On 2018-10-15 00:15:53 +0530, bhargav kamineni wrote:
> > Hi Team,
> >
> > I am getting this ERROR while starting my second slave server
> >
> > PANIC:  too many replication slots active before shutdown
> > HINT:  Increase max_replication_slots and try again.
> >
> > max_replication_slots on my master is 2 and one of them is already active
> > for another slave,
> > do i need to increase this parameter for the need of working of another
> > slave ? if so whats the reason ?
>
> You're getting that error on the standby, not the primary, right?   In
> all likelihood the problem is that you copied over replication slots to
> your standby server, but that *on the standby* haven't set
> max_replication_slots high enough.
>
> Greetings,
>
> Andres Freund
>


Re: [External] Re: Slot issues

2018-10-14 Thread bhargav kamineni
pg_replslot # ls -lrth
total 4.0K
drwx-- 2 postgres postgres 4.0K Oct 14 14:55 replication_slot



On Mon, 15 Oct 2018 at 02:11, Vijaykumar Jain  wrote:

> ok wait, you data directory path may be different :)
> i mean wherever your pg data dir is, from there find pg_replslot folder
> and list the contents of it.
>
> Regards,
> Vijay
>
>
> On Mon, Oct 15, 2018 at 2:10 AM Vijaykumar Jain 
> wrote:
>
>> i guess he is trying to confirm if the value of max_replication_slot is
>> the same on all the master and the standby nodes.
>>
>> also,
>>
>> I am trying to shoot in the dark.
>>
>> can you give the output of (on master)
>> ls /var/lib/postgresql//main/pg_replslot/*
>>
>> also the value of max_replication_slot on the master and all the
>> standby's.
>> also, how are you creating the replication slot?
>>
>> can you share the query?
>>
>> also pls share the content of the recovery.conf on the standby nodes
>> (both currently running and currently not running).
>> you can scribble company secret stuff.
>>
>>
>> Regards,
>> Vijay
>>
>>
>> On Mon, Oct 15, 2018 at 1:58 AM bhargav kamineni <
>> bhargavpostg...@gmail.com> wrote:
>>
>>> I got his on standby,  could you please explain in detail about
>>>  --*but that *on the standby* haven't set max_replication_slots high
>>> enough*.
>>>
>>> Thanks,
>>> Bhargav
>>>
>>> On Mon, 15 Oct 2018 at 01:50, Andres Freund  wrote:
>>>
>>>> Hi,
>>>>
>>>> On 2018-10-15 00:15:53 +0530, bhargav kamineni wrote:
>>>> > Hi Team,
>>>> >
>>>> > I am getting this ERROR while starting my second slave server
>>>> >
>>>> > PANIC:  too many replication slots active before shutdown
>>>> > HINT:  Increase max_replication_slots and try again.
>>>> >
>>>> > max_replication_slots on my master is 2 and one of them is already
>>>> active
>>>> > for another slave,
>>>> > do i need to increase this parameter for the need of working of
>>>> another
>>>> > slave ? if so whats the reason ?
>>>>
>>>> You're getting that error on the standby, not the primary, right?   In
>>>> all likelihood the problem is that you copied over replication slots to
>>>> your standby server, but that *on the standby* haven't set
>>>> max_replication_slots high enough.
>>>>
>>>> Greetings,
>>>>
>>>> Andres Freund
>>>>
>>>


Re: Slot issues

2018-10-14 Thread bhargav kamineni
Yeah i have used rsync , Got it now will increase the max_replication_slots
to high enough , Thank you Andres Freund :-)

On Mon, 15 Oct 2018 at 02:40, Andres Freund  wrote:

> Hi,
>
> Please try to quote properly.
>
> On 2018-10-15 01:57:53 +0530, bhargav kamineni wrote:
> > I got his on standby,  could you please explain in detail about
> >  --*but that *on the standby* haven't set max_replication_slots high
> enough*
> > .
>
> What is max_replication_slots set to on the new standby?
>
> If you created the new basebackup using rsync, and didn't exclude
> pg_replication_slot, it'll have copied the slots from the primary.  And
> thus needs a high enough max_replication_slots to work with them.
>
> - Andres
>


Re: Slot issues

2018-10-14 Thread bhargav kamineni
>  You probably afterwards want to drop those slots from the new standby.
> See the following section from the docs:

>
https://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP-DATA
"It is often a good idea to also omit from the backup the files within the
cluster's pg_replslot/ directory, so that replication slots that exist on
the master do not become part of the backup. Otherwise, the subsequent use
of the backup to create a standby may result in indefinite retention of WAL
files on the standby, and possibly bloat on the master if hot standby
feedback is enabled, because the clients that are using those replication
slots will still be connecting to and updating the slots on the master, not
the standby. Even if the backup is only intended for use in creating a new
master, copying the replication slots isn't expected to be particularly
useful, since the contents of those slots will likely be badly out of date
by the time the new master comes on line."

Since i already synced the pg_repslot to  standby ,Is it okay if i remove
the pg_repslot directory befor starting postgresql service  ?


On Mon, 15 Oct 2018 at 02:54, Andres Freund  wrote:

> Hi,
>
> As I just wrote:
> > On Mon, 15 Oct 2018 at 02:40, Andres Freund  wrote:
> > > Please try to quote properly.
>
> On 2018-10-15 02:45:51 +0530, bhargav kamineni wrote:
> > Yeah i have used rsync , Got it now will increase the
> max_replication_slots
> > to high enough , Thank you Andres Freund :-)
>
> You probably afterwards want to drop those slots from the new standby.
> See the following section from the docs:
>
>
> https://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP-DATA
> "It is often a good idea to also omit from the backup the files within the
> cluster's pg_replslot/ directory, so that replication slots that exist on
> the master do not become part of the backup. Otherwise, the subsequent use
> of the backup to create a standby may result in indefinite retention of WAL
> files on the standby, and possibly bloat on the master if hot standby
> feedback is enabled, because the clients that are using those replication
> slots will still be connecting to and updating the slots on the master, not
> the standby. Even if the backup is only intended for use in creating a new
> master, copying the replication slots isn't expected to be particularly
> useful, since the contents of those slots will likely be badly out of date
> by the time the new master comes on line."
>
> Greetings,
>
> Andres Freund
>


Extensions

2019-07-11 Thread bhargav kamineni
Hi Team,

I have a doubt regarding the extensions.My db has these many extensions
Name
---
 adminpack
 btree_gist
 dblink
 file_fdw
 fuzzystrmatch
 hstore
 insert_username
 intarray
 pageinspect
 pg_buffercache
 pg_freespacemap
 pg_repack
 pg_stat_statements
 pg_trgm
 pgcrypto
 plpgsql
 unaccent


How can i figure out what all extensions are being used ? is there any way ?

Thanks,
Bhargav


pg_terminate_backend not working

2019-09-24 Thread bhargav kamineni
Hi Team ,

We are observing long running process hung in active state on one of our db
from last two days with usename 'xyz', We have tried killing the respective
pid with pg_terminate_backned(pid), it's returning true but the quires are
not actually being terminated. Based on the client address and port number
from pg_stat_activity  we found the connection origin and killed that
connection initiation process from that host , even after that the
connections on db from   'xyz' are not getting terminated.

and these connections are not making network calls a self dblink connection
is being established in the query.

as a workaround we have disabled the connections initiation script for xyz
user and restarted the database  right after restart we observed the new
connections from xyz user to database which is not  expected because as we
have  already disables the connections initiation script.

select  pg_postmaster_start_time();
   pg_postmaster_start_time
--
 2019-09-24 06:44:07.41594+00

select usename,backend_start from pg_stat_activity where now()-xact_start >
'3 hours' and usename ='xyz';
 usename | backend_start

-+
 xyz | 2019-09-24 06:44:38.879047+00 |
 xyz | 2019-09-24 06:44:38.880501+00 |
 xyz | 2019-09-24 06:44:38.881326+00 |
 xyz | 2019-09-24 06:44:38.877561+00 |
 xyz | 2019-09-24 06:44:38.878336+00 |

what could be the possible reason for these connections and
pg_termiante_backend(pid) not working ?

Thanks,
Bhargav K


Re: pg_terminate_backend not working

2019-09-24 Thread bhargav kamineni
Any thoughts on this ?

On Tue, 24 Sep 2019 at 16:44, bhargav kamineni 
wrote:

> Hi Team ,
>
> We are observing long running process hung in active state on one of our
> db from last two days with usename 'xyz', We have tried killing the
> respective pid with pg_terminate_backned(pid), it's returning true but the
> quires are not actually being terminated. Based on the client address and
> port number from pg_stat_activity  we found the connection origin and
> killed that connection initiation process from that host , even after that
> the connections on db from   'xyz' are not getting terminated.
>
> and these connections are not making network calls a self dblink
> connection is being established in the query.
>
> as a workaround we have disabled the connections initiation script for xyz
> user and restarted the database  right after restart we observed the new
> connections from xyz user to database which is not  expected because as we
> have  already disables the connections initiation script.
>
> select  pg_postmaster_start_time();
>pg_postmaster_start_time
> --
>  2019-09-24 06:44:07.41594+00
>
> select usename,backend_start from pg_stat_activity where now()-xact_start
> > '3 hours' and usename ='xyz';
>  usename | backend_start
>
> -+
>  xyz | 2019-09-24 06:44:38.879047+00 |
>  xyz | 2019-09-24 06:44:38.880501+00 |
>  xyz | 2019-09-24 06:44:38.881326+00 |
>  xyz | 2019-09-24 06:44:38.877561+00 |
>  xyz | 2019-09-24 06:44:38.878336+00 |
>
> what could be the possible reason for these connections and
> pg_termiante_backend(pid) not working ?
>
> Thanks,
> Bhargav K
>
>
>


Re: pg_terminate_backend not working

2019-09-24 Thread bhargav kamineni
> Hi Team ,
>
> We are observing long running process hung in active state on one of
> our db from last two days with usename 'xyz', We have tried killing
> the respective pid with pg_terminate_backned(pid), it's returning
> true but the quires are not actually being terminated. Based on the
> client address and port number from pg_stat_activity  we found the
> connection origin and killed that connection initiation process from
> that host , even after that the connections on db from   'xyz' are
> not getting terminated.

>Your client backend is most likely sitting in a blocked sys call such as
network send, etc.

these queries are not doing network calls,they have loopback dblink
connection with in.
>Not sure though what is relaunching them after you say they were
disabled some{where,how}.

Is there any  workaround  to  terminate those stuck process apart from
restarting the database ?

On Wed, 25 Sep 2019 at 00:05, Jerry Sievers  wrote:

> bhargav kamineni  writes:
>
> > Hi Team ,
> >
> > We are observing long running process hung in active state on one of
> > our db from last two days with usename 'xyz', We have tried killing
> > the respective pid with pg_terminate_backned(pid), it's returning
> > true but the quires are not actually being terminated. Based on the
> > client address and port number from pg_stat_activity  we found the
> > connection origin and killed that connection initiation process from
> > that host , even after that the connections on db from   'xyz' are
> > not getting terminated.
>
> Your client backend is most likely sitting in a blocked sys call such as
> network send, etc.
>
> Not sure though what is relaunching them after you say they were
> disabled some{where,how}.
>
> HTH
>
> >
> > and these connections are not making network calls a self dblink
> > connection is being established in the query.
> >
> > as a workaround we have disabled the connections initiation script
> > for xyz user and restarted the database  right after restart we
> > observed the new connections from xyz user to database which is not
> > expected because as we have  already disables the connections
> > initiation script.
> >
> > select  pg_postmaster_start_time();
> >pg_postmaster_start_time
> > --
> >  2019-09-24 06:44:07.41594+00
> >
> > select usename,backend_start from pg_stat_activity where now()
> > -xact_start > '3 hours' and usename ='xyz';
> >  usename | backend_start
> >
> > -+
> >  xyz | 2019-09-24 06:44:38.879047+00 |
> >  xyz | 2019-09-24 06:44:38.880501+00 |
> >  xyz | 2019-09-24 06:44:38.881326+00 |
> >  xyz | 2019-09-24 06:44:38.877561+00 |
> >  xyz | 2019-09-24 06:44:38.878336+00 |
> >
> > what could be the possible reason for these connections and
> > pg_termiante_backend(pid) not working ?
> >
> > Thanks,
> > Bhargav K
> >
> >
> >
> >
>
> --
> Jerry Sievers
> Postgres DBA/Development Consulting
> e: postgres.consult...@comcast.net
>


PMChildFlags array

2019-10-03 Thread bhargav kamineni
Hi,

Observed below errors  in logfile

2019-09-20 02:00:24.504 UTC,,,99779,,5d73303a.185c3,73,,2019-09-07 04:21:14
UTC,,0,FATAL,XX000,"no free slots in PMChildFlags array",""
2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20 02:00:24
UTC,,0,ERROR,58P01,"could not open shared memory segment
""/PostgreSQL.2520932"": No such file or directory",""
2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20 02:00:24
UTC,,0,ERROR,58P01,"could not open shared memory segment
""/PostgreSQL.2520932"": No such file or directory",""


what could be the possible reasons for this to occur and is there any
chance of database corruption after this event ?


Regards,
Bhargav


Re: PMChildFlags array

2019-10-03 Thread bhargav kamineni
Any suggestions on this ?

On Thu, 3 Oct 2019 at 16:27, bhargav kamineni 
wrote:

> Hi,
>
> Observed below errors  in logfile
>
> 2019-09-20 02:00:24.504 UTC,,,99779,,5d73303a.185c3,73,,2019-09-07
> 04:21:14 UTC,,0,FATAL,XX000,"no free slots in PMChildFlags array",""
> 2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20
> 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> ""/PostgreSQL.2520932"": No such file or directory",""
> 2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20
> 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> ""/PostgreSQL.2520932"": No such file or directory",""
>
>
> what could be the possible reasons for this to occur and is there any
> chance of database corruption after this event ?
>
>
> Regards,
> Bhargav
>
>


Re: PMChildFlags array

2019-10-03 Thread bhargav kamineni
> Hi,
>
> Observed below errors  in logfile
>
> 2019-09-20 02:00:24.504 UTC,,,99779,,5d73303a.185c3,73,,2019-09-07
> 04:21:14 UTC,,0,FATAL,XX000,"no free slots in PMChildFlags
array",""
> 2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20
> 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> ""/PostgreSQL.2520932"": No such file or directory",""
> 2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20
> 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> ""/PostgreSQL.2520932"": No such file or directory",""
>

>Postgres version?

PostgreSQL 10.8

>OS and version?

NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"

What was the database doing just before the FATAL line?

Postgres was rejecting a bunch of connections from a user who is having a
connection limit set. that was the the FATAL error that i could see in log
file.
 FATAL,53300,"too many connections for role ""user_app"""

db=\du user_app
   List of roles
  Role name   |  Attributes   | Member of
--+---+
 user_app | No inheritance   +| {application_role}
  | 100 connections  +|
  | Password valid until infinity |

> what could be the possible reasons for this to occur and is there any
> chance of database corruption after this event ?

The source(backend/storage/ipc/pmsignal.c ) says:

"/* Out of slots ... should never happen, else postmaster.c messed up */
     elog(FATAL, "no free slots in PMChildFlags array");
"

Someone else will need to comment on what 'messed up' could be

On Thu, 3 Oct 2019 at 18:56, Adrian Klaver 
wrote:

> On 10/3/19 3:57 AM, bhargav kamineni wrote:
> > Hi,
> >
> > Observed below errors  in logfile
> >
> > 2019-09-20 02:00:24.504 UTC,,,99779,,5d73303a.185c3,73,,2019-09-07
> > 04:21:14 UTC,,0,FATAL,XX000,"no free slots in PMChildFlags
> array",""
> > 2019-09-20 02:00:24.505 UTC,,,109949,,5d8432b8.1ad7d,1,,2019-09-20
> > 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> > ""/PostgreSQL.2520932"": No such file or directory",""
> > 2019-09-20 02:00:24.505 UTC,,,109950,,5d8432b8.1ad7e,1,,2019-09-20
> > 02:00:24 UTC,,0,ERROR,58P01,"could not open shared memory segment
> > ""/PostgreSQL.2520932"": No such file or directory",""
> >
>
> Postgres version?
>
> OS and version?
>
> What was the database doing just before the FATAL line?
>
> > what could be the possible reasons for this to occur and is there any
> > chance of database corruption after this event ?
>
> The source(backend/storage/ipc/pmsignal.c ) says:
>
> "/* Out of slots ... should never happen, else postmaster.c messed up */
>  elog(FATAL, "no free slots in PMChildFlags array");
> "
>
> Someone else will need to comment on what 'messed up' could be.
>
> >
> >
> > Regards,
> > Bhargav
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: PMChildFlags array

2019-10-03 Thread bhargav kamineni
bhargav kamineni  writes:
> Postgres was rejecting a bunch of connections from a user who is having a
> connection limit set. that was the the FATAL error that i could see in log
> file.
>  FATAL,53300,"too many connections for role ""user_app"""

> db=\du user_app
>List of roles
>   Role name   |  Attributes   | Member of
> --+---+
>  user_app | No inheritance   +| {application_role}
>   | 100 connections  +|
>   | Password valid until infinity |

>Hm, what's the overall max_connections limit?  (I'm wondering
in particular if it's more or less than 100.)

its set to 500;
show max_connections ;
 max_connections
-
 500


On Thu, 3 Oct 2019 at 22:52, Tom Lane  wrote:

> bhargav kamineni  writes:
> > Postgres was rejecting a bunch of connections from a user who is having a
> > connection limit set. that was the the FATAL error that i could see in
> log
> > file.
> >  FATAL,53300,"too many connections for role ""user_app"""
>
> > db=\du user_app
> >List of roles
> >   Role name   |  Attributes   | Member of
> > --+---+
> >  user_app | No inheritance   +| {application_role}
> >   | 100 connections  +|
> >   | Password valid until infinity |
>
> Hm, what's the overall max_connections limit?  (I'm wondering
> in particular if it's more or less than 100.)
>
> regards, tom lane
>


Re: PMChildFlags array

2019-10-05 Thread bhargav kamineni
Thanks Tom Lane for detailing the issue.
>So ... how many is "a bunch"?
more than 85

>Looking at the code, it seems like it'd be possible for a sufficiently
>aggressive spawner of incoming connections to reach the
>MaxLivePostmasterChildren limit.  While the postmaster would correctly
>reject additional connection attempts after that, what it would not do
>is ensure that any child slots are left for new parallel worker processes.
>So we could hypothesize that the error you're seeing in the log is from
>failure to spawn a parallel worker process, due to being out of child
>slots.
Thanks Tom Lane for detailing the issue.

we have enabled "max_parallel_workers_per_gather = 4".  20 days before we
ran into this issue .


>However, given that max_connections = 500, MaxLivePostmasterChildren()
>would be 1000-plus.  This would mean that reaching this condition would
>require *at least* 500 concurrent connection-attempts-that-haven't-yet-
>been-rejected, maybe well more than that if you didn't have close to
>500 legitimately open sessions.  That seems like a lot, enough to suggest
>that you've got some pretty serious bug in your client-side logic.

below errors observed after crash in postgres logfile :

ERROR:  xlog flush request  is not satisfied for couple of tables , we have
initiated the vacuum full on those tables and the error went off after that.
ERROR:  right sibling's left-link doesn't match: block 273660 links to
273500 instead of expected 273661 in index -- observed this error while
doing vacuum freeze on databsase , we have dropped this index and created a
new one

Observations :

Vacuum freeze analyze job is getting stuck at database end which is
initiated thru cronjob, pg_cancel_backend(), pg_termiante_backend() is not
able to terminate those stuck  process , Restarting the database only able
to clear those process , i am thinking this is happening due to corruption
(if this is true how can i detect this ? pg_dump ?). is  there any way to
overcome this problem ?

does migrating the database to a new instance (pg_basebackup and switching
over to new instance ) solves this issue ?

Anyway, I think it's clearly a bug that canAcceptConnections() thinks the
number of acceptable connections is identical to the number of allowed
child processes; it needs to be less, by the number of background
processes we want to support.  But it seems like a darn hard-to-hit bug,
so I'm not quite sure that that explains your observation.

On Fri, 4 Oct 2019 at 03:49, Tom Lane  wrote:

> bhargav kamineni  writes:
> >> What was the database doing just before the FATAL line?
>
> > Postgres was rejecting a bunch of connections from a user who is having a
> > connection limit set. that was the the FATAL error that i could see in
> log
> > file.
> >  FATAL,53300,"too many connections for role ""user_app"""
>
> So ... how many is "a bunch"?
>
> Looking at the code, it seems like it'd be possible for a sufficiently
> aggressive spawner of incoming connections to reach the
> MaxLivePostmasterChildren limit.  While the postmaster would correctly
> reject additional connection attempts after that, what it would not do
> is ensure that any child slots are left for new parallel worker processes.
> So we could hypothesize that the error you're seeing in the log is from
> failure to spawn a parallel worker process, due to being out of child
> slots.
>
> However, given that max_connections = 500, MaxLivePostmasterChildren()
> would be 1000-plus.  This would mean that reaching this condition would
> require *at least* 500 concurrent connection-attempts-that-haven't-yet-
> been-rejected, maybe well more than that if you didn't have close to
> 500 legitimately open sessions.  That seems like a lot, enough to suggest
> that you've got some pretty serious bug in your client-side logic.
>
> Anyway, I think it's clearly a bug that canAcceptConnections() thinks the
> number of acceptable connections is identical to the number of allowed
> child processes; it needs to be less, by the number of background
> processes we want to support.  But it seems like a darn hard-to-hit bug,
> so I'm not quite sure that that explains your observation.
>
> regards, tom lane
>


Re: OID out of range

2020-01-15 Thread bhargav kamineni
Any workaround to make it work ?

On Thu, 16 Jan 2020 at 00:00, Adrian Klaver 
wrote:

> On 1/15/20 10:18 AM, Konireddy Rajashekar wrote:
> > I am getting ERROR: OID out of range while firing below SQL , what could
> > be the reason? I am joining pg_class and one user_created table to
> > compare tables size.
> >
> > select  u.relid,c.relnamespace::regnamespace::text,c.relname,now() as
> current_time,pg_size_pretty(pg_relation_size(c.oid)) as
> current_size,pg_size_pretty(u.table_size) as
> previous_size,pg_size_pretty(pg_relation_size(c.oid) -
> pg_relation_size(u.table_size)) as diff from user_tables_sizes u  join
> pg_class c on  u.relid::bigint =  c.oid::bigint where c.relkind='r' and
> c.relnamespace::regnamespace::text='rpx_reporting_stage' and
> u.captured_dt::date=current_date - interval '1 days'ERROR:  OID out of
> rangetest-# \d user_tables_sizes
>
> Given this:
>
> https://www.postgresql.org/docs/12/datatype-oid.html
> "The oid type is currently implemented as an unsigned four-byte integer. "
>
> I am pretty sure this:
>
> c.oid::bigint
>
> is the problem.
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>


oldest xmin is far in the past :: BUT xmin is not available in system

2022-04-18 Thread bhargav kamineni
Hi Team,

It seems vacuum is behaving somewhat weird on postgres database , observing
below HINTS on the vacuum logs

WARNING:  oldest xmin is far in the past

HINT:  Close open transactions soon to avoid wraparound problems.

You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.


Below is the auto-vacuum status on the bloated tables:

=> SELECT schemaname, relname, n_live_tup, n_dead_tup, last_autovacuum

FROM pg_stat_all_tables

ORDER BY n_dead_tup

/ (n_live_tup

   * current_setting('autovacuum_vacuum_scale_factor')::float8

  + current_setting('autovacuum_vacuum_threshold')::float8)

 DESC

LIMIT 10;

 schemaname |   relname   | n_live_tup | n_dead_tup |
 last_autovacuum

+-+++---

 pg_catalog | pg_statistic|136 |  37563 | 2022-04-18
04:00:21.045089+00

 public | test1   |  209405206 |  126752908 | 2022-04-18
03:59:43.013758+00

 public | test2  |  513770985 |   49258312 | 2022-04-18
04:00:23.24043+00

 public | test3 |   90853150 |4090146 | 2022-04-18
04:00:25.868147+00

 pg_catalog | pg_shdepend |153 | 29 | 2022-04-08
12:16:02.816631+00

 pg_catalog | pg_index| 73 | 18 |

 pg_toast   | pg_toast_2619   | 16 | 12 | 2022-03-13
23:01:54.334003+00

 pg_catalog | pg_class|425 | 19 | 2022-03-01
13:15:57.534378+00

 pg_catalog | pg_proc |   2457 | 48 |

 pg_toast   | pg_toast_2618   |252 | 10 |



i tried to vacuum the the first table pg_statistic , Below is the log

postgres=> VACUUM (VERBOSE) pg_statistic;

WARNING:  oldest xmin is far in the past

HINT:  Close open transactions soon to avoid wraparound problems.

You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.

INFO:  aggressively vacuuming "pg_catalog.pg_statistic"

INFO:  "pg_statistic": found 0 removable, 37699 nonremovable row versions
in 6331 out of 6351 pages

DETAIL:  37563 dead row versions cannot be removed yet, oldest xmin:
648320155

There were 3340 unused item identifiers.

Skipped 0 pages due to buffer pins, 20 frozen pages.

0 pages are entirely empty.

CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.

WARNING:  oldest xmin is far in the past

HINT:  Close open transactions soon to avoid wraparound problems.

You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.

INFO:  aggressively vacuuming "pg_toast.pg_toast_2619"

INFO:  "pg_toast_2619": found 0 removable, 16 nonremovable row versions in
3 out of 11 pages

DETAIL:  12 dead row versions cannot be removed yet, oldest xmin: 648320155

There were 11 unused item identifiers.

Skipped 0 pages due to buffer pins, 8 frozen pages.

0 pages are entirely empty.

CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

VACUUM



Table is getting vacuumed but not able to remove the dead tuples
because of *oldest
xmin: 648320155* , but the mentioned xim is not associated with long
running quries or stale replication slots or prepared transactions.

*Long running:*

postgres=> SELECT now()-query_start,pid, datname, usename, state,
backend_xmin

FROM pg_stat_activity

WHERE backend_xmin IS NOT NULL

ORDER BY age(backend_xmin) DESC;

?column? |  pid  |  datname  | usename | state  |
backend_xmin

-+---+---+-++--

 00:00:29.910155 |   539 | postgres | | active |832858371

 00:00:23.766305 |  1211 | postgres | | active |832858509

 00:00:00.756961 |  2151 | postgres | | active |832859484

 00:00:00.060784 | 30833 | postgres | root| active |832859508

 00:00:00.004473 | 29270 | postgres | root| active |832859508

 00:00:00.009809 | 29271 | postgres | root| active |832859508

 00:00:00.015169 | 27145 | postgres | root| active |832859508

 00:00:00|  1450 | postgres | postgres | active |832859508

 00:00:00.010672 |   544 | postgres | root| active |832859508

 00:00:00.034516 | 19940 | postgres | root| active |832859508

(10 rows)



*stale replication slots:*



postgres=> SELECT slot_name, slot_type, database, xmin

FROM pg_replication_slots

ORDER BY age(xmin) DESC;

 slot_name | slot_type | database | xmin

---+---+--+--

(0 rows)





*Prepared transaction's :*



postgres=> SELECT gid, prepared, owner, database, transaction AS xmin

postgres-> FROM pg_prepared_xacts

postgres-> ORDER BY age(transaction) DESC;

 gid | prepared | owner | database | xmin

-+--+---+--+--

(0 rows)





Checked for long running queries on replica side , but haven't found any

postgres=> show hot_standby_feedback ;

 hot_stan

Re: oldest xmin is far in the past :: BUT xmin is not available in system

2022-04-18 Thread bhargav kamineni
Other details:

postgres=> select version();
   version
-
 PostgreSQL 13.5 on aarch64-unknown-linux-gnu, compiled by
aarch64-unknown-linux-gnu-gcc (GCC) 7.4.0, 64-bit
(1 row)

postgres=> select aurora_version();
 aurora_version

 13.5.1
(1 row)


postgres=> \l+

 List
of databases

   Name|  Owner   | Encoding |   Collate   |Ctype|   Access
privileges   |   Size| Tablespace |Description

---+--+--+-+-+---+---++

 Postgres | root | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/root
+| 361 GB| pg_default |

   |  |  | | |
root=CTc/root+|   ||

   |  |  | | |
pmm=CTc/root  |   ||

 Test  | root | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
  | 8391 kB   | pg_default | default administrative connection
database

 rdsadmin  | rdsadmin | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
rdsadmin=CTc/rdsadmin | No Access | pg_default |

 template0 | rdsadmin | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/rdsadmin  +| 16 MB | pg_default | unmodifiable empty database

   |  |  | | |
rdsadmin=CTc/rdsadmin |   ||

 template1 | root | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
root=CTc/root+| 8215 kB   | pg_default | default template for new
databases

   |  |  | | |
=c/root   |   ||

(5 rows)

executing the vacuum on the entire cluster is also giving the same HINTS
and WARNING's

WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.
WARNING:  oldest xmin is far in the past

Regards,
BK

On Tue, Apr 19, 2022 at 11:36 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
>
> On Monday, April 18, 2022, bhargav kamineni  wrote:
>
>> Hi Team,
>>
>> It seems vacuum is behaving somewhat weird on postgres database ,
>> observing below HINTS on the vacuum logs
>>
>> WARNING:  oldest xmin is far in the past
>>
>> HINT:  Close open transactions soon to avoid wraparound problems.
>>
>> You might also need to commit or roll back old prepared transactions, or
>> drop stale replication slots.
>>
>>
>> What version?
>
> What other databases are present?
>
> Others can give better (more detailed/nuanced) guidance but if you can
> just start vacuuming every table in every database manually, you probably
> should just do that.  Vacuum freeze specifically.
>
> David J.
>
>


Tool for comparing parameters

2018-10-24 Thread bhargav kamineni
Hi Team,


Is there any opensource or paid tool  that compares postgresql database
parameters? (NOT SCHEMA) .


Thanks
Bhargav K


order of reading the conf files

2018-12-05 Thread bhargav kamineni
Hi,

may i know the order in which postgres reads the configuration files like
conf , auto.conf , hba  ?
and how does postmaster forks postgres , can we see that forking process in
logfile ?



Thanks,
Banu.


Re: order of reading the conf files

2018-12-06 Thread bhargav kamineni
Thanks a lot Stephen.

On Thu, Dec 6, 2018 at 8:53 PM Stephen Frost  wrote:

> Greetings,
>
> * Thomas Kellerer (spam_ea...@gmx.net) wrote:
> > Stephen Frost schrieb am 06.12.2018 um 15:52:
> > > The regular postgresql.conf file is read first, then
> > > postgresql.auto.conf and then pg_hba.conf and pg_ident.conf.  We can't
> > > read pg_hba.conf/pg_ident.conf before reading postgresql.conf and
> > > postgresql.auto.conf because their location is specified in
> > > postgresql.conf/postgresql.auto.conf.
> >
> > When are the .conf files read that are included from within
> "postgresql.conf"?
> > The manual is not clear about that.
>
> At the time we hit the 'include' line.
>
> > Are they processed before "postgresql.auto.conf" or after?
>
> postgresql.auto.conf is always last.
>
> Thanks!
>
> Stephen
>


Temp tables

2018-12-09 Thread bhargav kamineni
Hi,

What happens if we create and insert/update the data in TEMP tables , Does
that data really gets inserted at disk level or at buffer level and what
happens to this data after completion of the transaction ?



Thanks
Banu


Capacity Planning

2019-01-19 Thread bhargav kamineni
Hi Team,


What are the check lists for doing capacity planning for a postgresql
server ?

Regards,
BK.


Re: Capacity Planning

2019-01-20 Thread bhargav kamineni
Thank-you Allan.

On Sun, Jan 20, 2019 at 1:59 PM Ron  wrote:

> IOW, the same as every other RDBMS...
>
> On 1/20/19 1:29 AM, Allan Kamau wrote:
>
> Some of the aspects you may want to consider are:
> 1) Per unit time (day or week), how much data will be persisted to the
> database, number of records and total size of the data.
> 2) How much of these data will be updated and how frequently in a given
> time unit (day or week).
> 3) Will the databases hosted on the server be for OLTP (OnLine
> Transactional Processing) or OLAP (OnLine Analytical Processing) or a
> combination of the two.
> 4) In case of hardware failure or electrical outage or outage due to other
> factors including human activity, how much loss of data (and time) can be
> tolerated, see "https://www.postgresql.org/docs/11/backup.html"; and "
> https://www.postgresql.org/docs/11/high-availability.html";.
> 5) The size of the budget for hardware and manpower.
>
> Allan.
>
> On Sat, Jan 19, 2019 at 1:38 PM bhargav kamineni 
> wrote:
>
>> Hi Team,
>>
>>
>> What are the check lists for doing capacity planning for a postgresql
>> server ?
>>
>> Regards,
>> BK.
>>
>
> --
> Angular momentum makes the world go 'round.
>


log_min_duration_statement

2019-01-24 Thread bhargav kamineni
Hi Team,,

I have set log_min_duration_statement=1000 as my configuration but when i
observe log file i could see queries that ran below 1000ms are also getting
logged , for instance
duration: 0.089 ms
 duration: 0.175 ms
 duration: 0.139 ms
 duration: 0.451 ms
duration: 0.136 ms
 duration: 0.340 ms
 duration: 0.150 ms
duration: 0.440 ms
duration: 0.338 ms
 duration: 0.156 ms
tion: 0.396 ms
 duration: 0.221 ms
duration: 0.161 ms
 *
these are my logsettings

  name   |  setting
-+---
 log_autovacuum_min_duration | 0
 log_checkpoints | on
 log_connections | on
 log_destination | syslog
 log_directory   | log
 log_disconnections  | on
 log_duration| off
 log_error_verbosity | default
 log_executor_stats  | off
 log_file_mode   | 0600
 log_filename| postgresql-%Y-%m-%d_%H%M%S.log
 log_hostname| off
 log_line_prefix |
appname=%a,user=%u,db=%d,pid=%p,txid=%x,vtxid=%v
 log_lock_waits  | on
 log_min_duration_statement  | 1000
 log_min_error_statement | error
 log_min_messages| error
 log_parser_stats| off
 log_planner_stats   | off
 log_replication_commands| off
 log_rotation_age| 1440
 log_rotation_size   | 10240
 log_statement   | all
 log_statement_stats | off
 log_temp_files  | 0
 log_timezone| Navajo
 log_truncate_on_rotation| off
 logging_collector   | on
 max_logical_replication_workers | 4
 syslog_facility | local4
 syslog_ident| postgres
 syslog_sequence_numbers | on
 syslog_split_messages   | on
 wal_log_hints   | off

Regards,
Bhargav K


Re: log_min_duration_statement

2019-01-24 Thread bhargav kamineni
> Hi Team,,
>
> I have set log_min_duration_statement=1000 as my configuration but when
> i observe log file i could see queries that ran below 1000ms are also
> getting logged , for instance
> duration: 0.089 ms
>   duration: 0.175 ms
>   duration: 0.139 ms
>   duration: 0.451 ms
> duration: 0.136 ms
>   duration: 0.340 ms
>   duration: 0.150 ms
> duration: 0.440 ms
> duration: 0.338 ms
>   duration: 0.156 ms
> tion: 0.396 ms
>   duration: 0.221 ms
> duration: 0.161 ms
>   *

>The above would not happen to be autovacuum statements?

Nope, those are not autovaccum related logs , they are related to queries
only.

You have log_autovacuum_min_duration = 0 below.

On Thu, 24 Jan 2019 at 20:53, Adrian Klaver 
wrote:

> On 1/24/19 6:23 AM, bhargav kamineni wrote:
> > Hi Team,,
> >
> > I have set log_min_duration_statement=1000 as my configuration but when
> > i observe log file i could see queries that ran below 1000ms are also
> > getting logged , for instance
> > duration: 0.089 ms
> >   duration: 0.175 ms
> >   duration: 0.139 ms
> >   duration: 0.451 ms
> > duration: 0.136 ms
> >   duration: 0.340 ms
> >   duration: 0.150 ms
> > duration: 0.440 ms
> > duration: 0.338 ms
> >   duration: 0.156 ms
> > tion: 0.396 ms
> >   duration: 0.221 ms
> > duration: 0.161 ms
> >   *
>
> The above would not happen to be autovacuum statements?
>
> You have log_autovacuum_min_duration = 0 below.
>
> > these are my logsettings
> >
> >name   |  setting
> >
> -+---
> >   log_autovacuum_min_duration | 0
> >   log_checkpoints | on
> >   log_connections | on
> >   log_destination | syslog
> >   log_directory   | log
> >   log_disconnections  | on
> >   log_duration| off
> >   log_error_verbosity | default
> >   log_executor_stats  | off
> >   log_file_mode   | 0600
> >   log_filename| postgresql-%Y-%m-%d_%H%M%S.log
> >   log_hostname| off
> >   log_line_prefix |
> > appname=%a,user=%u,db=%d,pid=%p,txid=%x,vtxid=%v
> >   log_lock_waits  | on
> >   log_min_duration_statement  | 1000
> >   log_min_error_statement | error
> >   log_min_messages| error
> >   log_parser_stats| off
> >   log_planner_stats   | off
> >   log_replication_commands| off
> >   log_rotation_age| 1440
> >   log_rotation_size   | 10240
> >   log_statement   | all
> >   log_statement_stats | off
> >   log_temp_files  | 0
> >   log_timezone| Navajo
> >   log_truncate_on_rotation| off
> >   logging_collector   | on
> >   max_logical_replication_workers | 4
> >   syslog_facility | local4
> >   syslog_ident| postgres
> >   syslog_sequence_numbers | on
> >   syslog_split_messages   | on
> >   wal_log_hints   | off
> >
> > Regards,
> > Bhargav K
> >
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>