pgbouncer with ldap

2019-09-09 Thread Ayub M
Hello, I am trying to setup pgbouncer with ldap to connect with rds
postgres. Downloaded latest version of pbbouncer (1.11) and using hba
auth_type. Getting below error when starting pgbouncer daemon. Am I
missing something?

[centos@ip-x.x.x.x etc]$ pgbouncer -V
PgBouncer version 1.11.0

[centos@ip-x.x.x.x etc]$ cat pg_hba.conf
host all user1 0.0.0.0/0 md5
host all all 0.0.0.0/0 ldap ldapserver=ldap-server.com ldapprefix="cn="
ldapsuffix=", dc=group, dc=com"

[centos@ip-x.x.x.x etc]$ cat pgbouncer.ini | grep -v '^$' | grep -v '^;'
[databases]
db1 = host=xxx.rds.amazonaws.com port=5439 dbname=db1
[users]
[pgbouncer]
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
listen_addr = *
listen_port = 5439
auth_type = hba
auth_file = /home/centos/pgbouncer-1.11.0/etc/userlist.txt
auth_hba_file = /home/centos/pgbouncer-1.11.0/etc/pg_hba.conf

[centos@ip-x.x.x.x etc]$  pgbouncer -d pgbouncer.ini
2019-09-09 06:57:54.447 UTC [9465] WARNING hba line 2: unsupported method:
buf=ldap
2019-09-09 06:57:54.448 UTC [9465] WARNING could not parse hba config line 2

Regards,
Ayub


Re: pgbouncer with ldap

2019-09-09 Thread Ayub M
It has hba and via hba file one can specify ldap connections

https://www.postgresql.org/docs/9.3/auth-pg-hba-conf.html

On Mon, Sep 9, 2019, 3:40 AM Christoph Moench-Tegeder 
wrote:

> ## Ayub M (hia...@gmail.com):
>
> > Hello, I am trying to setup pgbouncer with ldap to connect with rds
> > postgres. Downloaded latest version of pbbouncer (1.11) and using hba
> > auth_type. Getting below error when starting pgbouncer daemon. Am I
> > missing something?
>
> There's no "ldap" mentioned anywhere in the docs:
>  https://pgbouncer.github.io/config.html
> That could be a hint that ldap isn't supported...
>
> Regards,
> Christoph
>
> --
> Spare Space
>


Re: pgbouncer with ldap

2019-09-09 Thread Ayub M
Thank you Achilleas and Laurenz for the points, will try this out.

On Mon, Sep 9, 2019, 6:46 AM Achilleas Mantzios <
ach...@matrix.gatewaynet.com> wrote:

> On 9/9/19 12:41 μ.μ., Laurenz Albe wrote:
> > Christoph Moench-Tegeder wrote:
> >>> It has hba and via hba file one can specify ldap connections
> >>>
> >>> https://www.postgresql.org/docs/9.3/auth-pg-hba-conf.html
> >> https://pgbouncer.github.io/config.html#hba-file-format
> >> "Auth-method field: Only methods supported by PgBouncer’s auth_type
> >> are supported", and "ldap" is not supported.
> >> When there's no ldap support in pgbouncer, there's no ldap support
> >> in pgbouncer.
> > To throw in something less tautological:
> >
> > PgBouncer supports PAM authentication, so if you are on UNIX,
> > you could use PAM's LDAP module to do what you want.
> Right, I had written a blog about it :
>
> https://severalnines.com/database-blog/one-security-system-application-connection-pooling-and-postgresql-case-ldap
>
> However, I always wished (since my first endeavors with pgbouncer) it was
> less complicated.
> >
> > Yours,
> > Laurenz Albe
>
>
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>
>
>


Re: pgbouncer with ldap

2019-09-11 Thread Ayub M
Achilleas, for this setup to work are changes to postgresql.conf and
pg_hba.conf needed? I am trying to implement this for AWS rds Aurora where
these files are not accessible.

On Mon, Sep 9, 2019, 6:46 AM Achilleas Mantzios <
ach...@matrix.gatewaynet.com> wrote:

> On 9/9/19 12:41 μ.μ., Laurenz Albe wrote:
> > Christoph Moench-Tegeder wrote:
> >>> It has hba and via hba file one can specify ldap connections
> >>>
> >>> https://www.postgresql.org/docs/9.3/auth-pg-hba-conf.html
> >> https://pgbouncer.github.io/config.html#hba-file-format
> >> "Auth-method field: Only methods supported by PgBouncer’s auth_type
> >> are supported", and "ldap" is not supported.
> >> When there's no ldap support in pgbouncer, there's no ldap support
> >> in pgbouncer.
> > To throw in something less tautological:
> >
> > PgBouncer supports PAM authentication, so if you are on UNIX,
> > you could use PAM's LDAP module to do what you want.
> Right, I had written a blog about it :
>
> https://severalnines.com/database-blog/one-security-system-application-connection-pooling-and-postgresql-case-ldap
>
> However, I always wished (since my first endeavors with pgbouncer) it was
> less complicated.
> >
> > Yours,
> > Laurenz Albe
>
>
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>
>
>


Re: pgbouncer with ldap

2019-09-12 Thread Ayub M
Okay, thanks for the response. Unfortunately Aurora does not expose these
files or I should say there is no concept of these files in AWS managed
Aurora DB service. Anyway I will give a try and let you know.

On Thu, Sep 12, 2019 at 1:52 AM Achilleas Mantzios <
ach...@matrix.gatewaynet.com> wrote:

> On 11/9/19 2:47 μ.μ., Ayub M wrote:
>
> Achilleas, for this setup to work are changes to postgresql.conf and
> pg_hba.conf needed? I am trying to implement this for AWS rds Aurora where
> these files are not accessible.
>
> Those files are needed in any case if you work with postgresql.
> Unfortunately no experience with Aurora. He have been building from source
> for ages.
>
> On Mon, Sep 9, 2019, 6:46 AM Achilleas Mantzios <
> ach...@matrix.gatewaynet.com> wrote:
>
>> On 9/9/19 12:41 μ.μ., Laurenz Albe wrote:
>> > Christoph Moench-Tegeder wrote:
>> >>> It has hba and via hba file one can specify ldap connections
>> >>>
>> >>> https://www.postgresql.org/docs/9.3/auth-pg-hba-conf.html
>> >> https://pgbouncer.github.io/config.html#hba-file-format
>> >> "Auth-method field: Only methods supported by PgBouncer’s auth_type
>> >> are supported", and "ldap" is not supported.
>> >> When there's no ldap support in pgbouncer, there's no ldap support
>> >> in pgbouncer.
>> > To throw in something less tautological:
>> >
>> > PgBouncer supports PAM authentication, so if you are on UNIX,
>> > you could use PAM's LDAP module to do what you want.
>> Right, I had written a blog about it :
>>
>> https://severalnines.com/database-blog/one-security-system-application-connection-pooling-and-postgresql-case-ldap
>>
>> However, I always wished (since my first endeavors with pgbouncer) it was
>> less complicated.
>> >
>> > Yours,
>> > Laurenz Albe
>>
>>
>> --
>> Achilleas Mantzios
>> IT DEV Lead
>> IT DEPT
>> Dynacom Tankers Mgmt
>>
>>
>>
>>
>
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>

-- 
Regards,
Ayub


Re: pgbouncer with ldap

2019-09-13 Thread Ayub M
Stumbled in the first step - PAM authentication via pgbouncer. After
compiling pgbouncer with the pam plug-in, I am unable to login into the db
- throws PAM error message. Please help.

User created with the same password as linux user --
localhost:~$ psql -h dbhost -p 3306 -U admin -W db1
db1=> create user testuser password 'hello123';
CREATE ROLE

[ec2-user@ip-1.1.1.1 pam.d]$ psql -h localhost -p 5432 testdb -U testuser
Password for user testuser:
psql: ERROR:  auth failed

Log entries - pgbouncer.log
2019-09-13 06:51:47.180 UTC [5752] LOG C-0x1243020: testdb/testuser@[::1]:52408
login attempt: db=testdb user=testuser tls=no
2019-09-13 06:51:47.180 UTC [5752] NOISE safe_send(12, 9) = 9
2019-09-13 06:51:47.180 UTC [5752] NOISE resync(12): done=86, parse=86,
recv=86
2019-09-13 06:51:47.180 UTC [5752] NOISE resync(12): done=0, parse=0, recv=0
2019-09-13 06:51:47.180 UTC [5752] NOISE safe_recv(12, 4096) = 14
2019-09-13 06:51:47.180 UTC [5752] NOISE C-0x1243020:
testdb/testuser@[::1]:52408
read pkt='p' len=14
2019-09-13 06:51:47.180 UTC [5752] DEBUG C-0x1243020:
testdb/testuser@[::1]:52408
pam_auth_begin(): pam_first_taken_slot=1, pam_first_free_slot=1
2019-09-13 06:51:47.180 UTC [5752] DEBUG pam_auth_worker(): processing slot
1
2019-09-13 06:51:47.180 UTC [5752] WARNING pam_authenticate() failed:
Authentication failure
2019-09-13 06:51:47.181 UTC [5752] DEBUG pam_auth_worker(): authorization
completed, status=3
2019-09-13 06:51:47.386 UTC [5752] LOG C-0x1243020: testdb/testuser@[::1]:52408
closing because: auth failed (age=0s)
2019-09-13 06:51:47.386 UTC [5752] WARNING C-0x1243020:
testdb/testuser@[::1]:52408
pooler error: auth failed

Able to login as testuser
[ec2-user@ip-1.1.1.1 pam.d]$ su - testuser
Password:
Last login: Fri Sep 13 06:21:12 UTC 2019 on pts/1
[testuser@ip-1.1.1.1 ~]$ id
uid=1001(testuser) gid=1001(testuser) groups=1001(testuser)
context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

The user was created as follows
[root@ip-1.1.1.1 ~]# adduser -p hello123 testuser
[root@ip-1.1.1.1 ~]# id testuser
uid=1001(testuser) gid=1001(testuser) groups=1001(testuser)

Here is the pgbouncer.ini config
[ec2-user@ip-1.1.1.1 etc]$ less pgbouncer.ini | grep -v '^$' | grep -v '^;'
[databases]
testdb = host=dbhost port=3306 dbname=db1
[users]
[pgbouncer]
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
listen_addr = *
listen_port = 5432
auth_type = pam

Am I missing something? Any permissions?

On Thu, Sep 12, 2019 at 4:54 AM Ayub M  wrote:

> Okay, thanks for the response. Unfortunately Aurora does not expose these
> files or I should say there is no concept of these files in AWS managed
> Aurora DB service. Anyway I will give a try and let you know.
>
> On Thu, Sep 12, 2019 at 1:52 AM Achilleas Mantzios <
> ach...@matrix.gatewaynet.com> wrote:
>
>> On 11/9/19 2:47 μ.μ., Ayub M wrote:
>>
>> Achilleas, for this setup to work are changes to postgresql.conf and
>> pg_hba.conf needed? I am trying to implement this for AWS rds Aurora where
>> these files are not accessible.
>>
>> Those files are needed in any case if you work with postgresql.
>> Unfortunately no experience with Aurora. He have been building from source
>> for ages.
>>
>> On Mon, Sep 9, 2019, 6:46 AM Achilleas Mantzios <
>> ach...@matrix.gatewaynet.com> wrote:
>>
>>> On 9/9/19 12:41 μ.μ., Laurenz Albe wrote:
>>> > Christoph Moench-Tegeder wrote:
>>> >>> It has hba and via hba file one can specify ldap connections
>>> >>>
>>> >>> https://www.postgresql.org/docs/9.3/auth-pg-hba-conf.html
>>> >> https://pgbouncer.github.io/config.html#hba-file-format
>>> >> "Auth-method field: Only methods supported by PgBouncer’s auth_type
>>> >> are supported", and "ldap" is not supported.
>>> >> When there's no ldap support in pgbouncer, there's no ldap support
>>> >> in pgbouncer.
>>> > To throw in something less tautological:
>>> >
>>> > PgBouncer supports PAM authentication, so if you are on UNIX,
>>> > you could use PAM's LDAP module to do what you want.
>>> Right, I had written a blog about it :
>>>
>>> https://severalnines.com/database-blog/one-security-system-application-connection-pooling-and-postgresql-case-ldap
>>>
>>> However, I always wished (since my first endeavors with pgbouncer) it
>>> was less complicated.
>>> >
>>> > Yours,
>>> > Laurenz Albe
>>>
>>>
>>> --
>>> Achilleas Mantzios
>>> IT DEV Lead
>>> IT DEPT
>>> Dynacom Tankers Mgmt
>>>
>>>
>>>
>>>
>>
>> --
>> Achilleas Mantzios
>> IT DEV Lead
>> IT DEPT
>> Dynacom Tankers Mgmt
>>
>>
>
> --
> Regards,
> Ayub
>


-- 
Regards,
Ayub


Re: pgbouncer with ldap

2019-09-13 Thread Ayub M
Yes I did set that, here is how pgbouncer looks like ---

-rwsrwsr-x. 1 root root 2087504 Sep 13 00:45 pgbouncer

On Fri, Sep 13, 2019 at 6:50 AM Achilleas Mantzios <
ach...@matrix.gatewaynet.com> wrote:

> On 13/9/19 10:19 π.μ., Ayub M wrote:
>
> Stumbled in the first step - PAM authentication via pgbouncer. After
> compiling pgbouncer with the pam plug-in, I am unable to login into the db
> - throws PAM error message. Please help.
>
> User created with the same password as linux user --
> localhost:~$ psql -h dbhost -p 3306 -U admin -W db1
> db1=> create user testuser password 'hello123';
> CREATE ROLE
>
> [ec2-user@ip-1.1.1.1 pam.d]$ psql -h localhost -p 5432 testdb -U testuser
> Password for user testuser:
> psql: ERROR:  auth failed
>
>
> ok, pgbouncer should be able to read /etc/pam* files.
> Did you miss the
> # chown root:staff ~pgbouncer/pgbouncer-1.9.0/pgbouncer
> # chmod +s ~pgbouncer/pgbouncer-1.9.0/pgbouncer
> part?
>
>
> Log entries - pgbouncer.log
> 2019-09-13 06:51:47.180 UTC [5752] LOG C-0x1243020: 
> testdb/testuser@[::1]:52408
> login attempt: db=testdb user=testuser tls=no
> 2019-09-13 06:51:47.180 UTC [5752] NOISE safe_send(12, 9) = 9
> 2019-09-13 06:51:47.180 UTC [5752] NOISE resync(12): done=86, parse=86,
> recv=86
> 2019-09-13 06:51:47.180 UTC [5752] NOISE resync(12): done=0, parse=0,
> recv=0
> 2019-09-13 06:51:47.180 UTC [5752] NOISE safe_recv(12, 4096) = 14
> 2019-09-13 06:51:47.180 UTC [5752] NOISE C-0x1243020: 
> testdb/testuser@[::1]:52408
> read pkt='p' len=14
> 2019-09-13 06:51:47.180 UTC [5752] DEBUG C-0x1243020: 
> testdb/testuser@[::1]:52408
> pam_auth_begin(): pam_first_taken_slot=1, pam_first_free_slot=1
> 2019-09-13 06:51:47.180 UTC [5752] DEBUG pam_auth_worker(): processing
> slot 1
> 2019-09-13 06:51:47.180 UTC [5752] WARNING pam_authenticate() failed:
> Authentication failure
> 2019-09-13 06:51:47.181 UTC [5752] DEBUG pam_auth_worker(): authorization
> completed, status=3
> 2019-09-13 06:51:47.386 UTC [5752] LOG C-0x1243020: 
> testdb/testuser@[::1]:52408
> closing because: auth failed (age=0s)
> 2019-09-13 06:51:47.386 UTC [5752] WARNING C-0x1243020: 
> testdb/testuser@[::1]:52408
> pooler error: auth failed
>
> Able to login as testuser
> [ec2-user@ip-1.1.1.1 pam.d]$ su - testuser
> Password:
> Last login: Fri Sep 13 06:21:12 UTC 2019 on pts/1
> [testuser@ip-1.1.1.1 ~]$ id
> uid=1001(testuser) gid=1001(testuser) groups=1001(testuser)
> context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
>
> The user was created as follows
> [root@ip-1.1.1.1 ~]# adduser -p hello123 testuser
> [root@ip-1.1.1.1 ~]# id testuser
> uid=1001(testuser) gid=1001(testuser) groups=1001(testuser)
>
> Here is the pgbouncer.ini config
> [ec2-user@ip-1.1.1.1 etc]$ less pgbouncer.ini | grep -v '^$' | grep -v
> '^;'
> [databases]
> testdb = host=dbhost port=3306 dbname=db1
> [users]
> [pgbouncer]
> logfile = /var/log/pgbouncer/pgbouncer.log
> pidfile = /var/run/pgbouncer/pgbouncer.pid
> listen_addr = *
> listen_port = 5432
> auth_type = pam
>
> Am I missing something? Any permissions?
>
> On Thu, Sep 12, 2019 at 4:54 AM Ayub M  wrote:
>
>> Okay, thanks for the response. Unfortunately Aurora does not expose these
>> files or I should say there is no concept of these files in AWS managed
>> Aurora DB service. Anyway I will give a try and let you know.
>>
>> On Thu, Sep 12, 2019 at 1:52 AM Achilleas Mantzios <
>> ach...@matrix.gatewaynet.com> wrote:
>>
>>> On 11/9/19 2:47 μ.μ., Ayub M wrote:
>>>
>>> Achilleas, for this setup to work are changes to postgresql.conf and
>>> pg_hba.conf needed? I am trying to implement this for AWS rds Aurora where
>>> these files are not accessible.
>>>
>>> Those files are needed in any case if you work with postgresql.
>>> Unfortunately no experience with Aurora. He have been building from source
>>> for ages.
>>>
>>> On Mon, Sep 9, 2019, 6:46 AM Achilleas Mantzios <
>>> ach...@matrix.gatewaynet.com> wrote:
>>>
>>>> On 9/9/19 12:41 μ.μ., Laurenz Albe wrote:
>>>> > Christoph Moench-Tegeder wrote:
>>>> >>> It has hba and via hba file one can specify ldap connections
>>>> >>>
>>>> >>> https://www.postgresql.org/docs/9.3/auth-pg-hba-conf.html
>>>> >> https://pgbouncer.github.io/config.html#hba-file-format
>>>> >> "Auth-method field: Only methods supported by PgBouncer’s auth_type
>>>> >> are supported", and "ldap" is not supported.

fetch time included in pg_stat_statements?

2019-10-01 Thread Ayub M
Does the pg_stat_statements.total_time include the time it takes for all
fetches of a cursor query. Or is it only the db time taken to execute the
query?

-- 
Regards,
Ayub


work_mem and shared_buffers question

2020-02-10 Thread Ayub M
   1. shared_buffers - In a regular PostgreSQL installation, say I am
   allocating 25% of my memory to shared_buffers that means it leaves 75% for
   rest such as OS, page cache and work_mems etc. Is my understanding correct?
   If so, AWS Aurora for Postgres uses 75% of memory for shared_buffers, then
   it would leave just 25% for other things?
   2. Does the memory specified for work_mem, fully gets allocated to all
   sessions irrespective of whether they do any sorting or hashing operations?


-- 
Regards,
Ayub


Re: work_mem and shared_buffers question

2020-02-10 Thread Ayub M
Thanks, for q2 - if work_mem is a limit before spilling onto disk, is there
a min amount of memory which gets allocated to each session when it starts?

On Mon, Feb 10, 2020 at 3:51 AM Naresh g  wrote:

> 1. Yes,  by default it is advisable to set 25% of RAM to shared buffers
> because community version postgres highly depends on OS cache,  which means
> when you fetch something from disk, it is first copied to OS cache (75%)
> then to shared buffers(25%).
> Two copies of data will be there in your system RAM.
> The copy is called buffered IO.
>
> Amazon Aurora eliminates this buffered IO,  hence it is not required you
> stick to restrict 25% of RAM.
>
> 2. Work_mem is just setting,  if you sort something out your session uses
> than memory other wise it just lies at OS.
>
> On Mon, 10 Feb, 2020, 1:34 PM Ayub M,  wrote:
>
>>
>>1. shared_buffers - In a regular PostgreSQL installation, say I am
>>allocating 25% of my memory to shared_buffers that means it leaves 75% for
>>rest such as OS, page cache and work_mems etc. Is my understanding 
>> correct?
>>If so, AWS Aurora for Postgres uses 75% of memory for shared_buffers, then
>>it would leave just 25% for other things?
>>2. Does the memory specified for work_mem, fully gets allocated to
>>all sessions irrespective of whether they do any sorting or hashing
>>operations?
>>
>>
>> --
>> Regards,
>> Ayub
>>
>

-- 
Regards,
Ayub


aws sct/dms versus ora2pg

2020-02-21 Thread Ayub M
I would like to get suggestions and feedback on aws sct/dms vs ora2pg for
an Oracle (on-prem) to PostgreSQL (aws rds) migration project.

One big difference between them I see is the fact that dms supports
incremental loads from oracle to postgres (dont think ora2pg supports that,
but I could be wrong).

Someone who researched or used these tools help list pros and cons of each
approach and which is highly recommended.


postgres stats on the new primary

2020-07-27 Thread Ayub M
I have a RDS PostgreSQL v11.6 with primary and standby. On the primary,
vacuum/analyze were running and  pg_stat_all_tables's
last_vacuum/analyze/autovacuum/autoanalyze were having values when
vacuum/analyze were run.

1. Switchover to Standby happened, now when I see pg_stat_all_tables (on
the active primary which was standby earlier), the last vacuum/analyze
columns are all nulls. Would primary-standby replication not replicate
system tables? Is this how it behaves?

2. On the new primary, would the table have stats on them? (I do see
pg_stats entries for the tables on new standby), are they reliable or do I
need to gather stats and run vacuum after failover/switchover to standby?


hash joins are causing no space left error

2020-08-12 Thread Ayub M
This is PostgreSQL 11 on AWS, there is a mview query in this OLAP database,
the tables involved are huge - 50-100m records on average records hundreds
of columns in most cases. The query runs for a while and then errors out
saying "No space left on device". I could see it generating around 500gb of
temp file data. At times it goes thru and at times it fails - probably due
to other queries running at the same time and causing failure.

The tables are partitioned and indexed on the PKs and FKs. Using
parallelism (4) with increased work_mem (4gb).

The joins are happening on around 10 tables and all are joining on the PK
and FK columns. I see partition pruning happening but the hash joins are
killing the query.

Is there any way to avoid hash joins? If we create hash indexes on the
joining columns, would PostgreSQL avoid hashing operation and instead use
hash indexes on the tables and join them. That way I feel resource
intensive hashing would be avoided and there wont be any need of temp
files. I tried but does not seem to work, when I query the table with
specific values then it uses the hash index but when I am joining the
tables it seems to do its own hash join.

My question is how to optimize massive table joins in PostgreSQL to resolve
- avoid space failures and make it run fast - takes a couple of hours to
complete now. Any best practices or suggestions.


Re: hash joins are causing no space left error

2020-08-15 Thread Ayub M
On Wed, Aug 12, 2020 at 8:06 PM Tom Lane  wrote:

> Ayub M  writes:
> > This is PostgreSQL 11 on AWS, there is a mview query in this OLAP
> database,
> > the tables involved are huge - 50-100m records on average records
> hundreds
> > of columns in most cases. The query runs for a while and then errors out
> > saying "No space left on device". I could see it generating around 500gb
> of
> > temp file data. At times it goes thru and at times it fails - probably
> due
> > to other queries running at the same time and causing failure.
>
> Are you sure that these queries are actually producing the answers you
> want?  It sounds suspiciously like you are computing underconstrained
> joins.
> --> Yes, it is as per the business requirement.
>


> > The joins are happening on around 10 tables and all are joining on the PK
> > and FK columns. I see partition pruning happening but the hash joins are
> > killing the query.
> > Is there any way to avoid hash joins?
>
> TBH, you are asking the wrong question.  A merge join would take about as
> much temporary space, and a nestloop join over so much data would probably
> not finish in an amount of time you're willing to wait.  Indexes are NOT
> a magic solution here.  What you need to be thinking about is how to not
> need to process so much data.
>
> If you really need to have this proven to you, you can try "set
> enable_hashjoin = off", but I don't think you'll find that better.
> --> You are right, neither merge join nor nested loop are resolving the
> issue.
>


> regards, tom lane
>


-- 
Regards,
Ayub


postgres materialized view refresh in parallel

2020-08-18 Thread Ayub M
Created materialized view in parallel by setting
max_parallel_workers_per_gather to 4. I could see parallel sessions kicking
off and creating the mview much faster than serial execution (10mins -
parallel vs over an hour - serial).

Now I want the refresh of the mview to happen taking around same time as
create, I am okay with it taking little more than create, to execute the
steps it takes when refreshing the view. But what I am noticing is, refresh
is not running in parallel at all - even after having
max_parallel_workers_per_gather to 4. Does PostgreSQL 11.6 not support
refresh of the mviews in parallel as it did while creating the mview?


any issue with enable_partitionwise_join at instance level?

2020-10-11 Thread Ayub M
Hello, while running aggregate queries on partitioned table there is
improvement when enable_partitionwise_join is turned on. By default this
parameter is disabled and going by the documentation it says (
https://www.postgresql.org/docs/11/runtime-config-query.html)

```
Because partitionwise grouping or aggregation can use significantly more
CPU time and memory during planning, the default is off.
```

Why would it take "significantly" more CPU and memory for the planner? Is
there any foreseeable issues when enabling this at the instance level?


postgres materialized view refresh performance

2020-10-22 Thread Ayub M
There is a table t which is used in a mview mv, this is the only table in
the mview definition.

create table t (c1 int, ..., c10 int);-- there is a pk on say c1
columncreate materialized view mv as select c1, c2...c10 from
t;---there is a unique index on say c5 and bunch of other indexes on
the mview.

The reason there is a mview created instead of using table t, is that that
the table gets truncated and reloaded every couple of hours and we don't
want users to see an empty table at any point of time that's why mview is
being used.

Using "refresh materialized view concurrently", this mview is being used by
APIs and end users.

Couple of questions I have -

   1. Whenever mview refresh concurrently happens, does pg create another
   set of table and indexes and switch it with the orig? If no, then does it
   update the existing data?
   2. If the usage of mview is pretty heavy does it impact the performance
   of the refresh process? Vice-versa, if the refresh is going on does the
   performance of mview by users take a hit?
   3. The mview gets refreshed in a couple of mins sometimes and sometimes
   it takes hours. When it runs for longer, there are no locks and no resource
   shortage, the number of recs in the base table is 6m (7.5gb) which is not
   huge so why does it take so long to refresh the mview?
   4. Does mview need vacuum/analyze/reindex?


Re: postgres materialized view refresh performance

2020-10-25 Thread Ayub M
Thank you both.

As for the mview refresh taking long --
  • The mview gets refreshed in a couple of mins sometimes and sometimes it
takes hours. When it runs for longer, there are no locks and no resource
shortage, the number of recs in the base table is 6m (7.5gb) which is not
huge so why does it take so long to refresh the mview?

Does the run time correlate with the number of changes being made?

-- Almost the same number of records are present in the base table (6
million records). The base table gets truncated and reloaded everytime with
almost the same number of records.

And the mview is a simple select from this one base table.

The mview has around 10 indexes, 1 unique and 9 non-unique indexes.

Population of the base tables takes about 2 mins, using "insert into select
from table", but when the mview is created for the first time it takes 16
minutes. Even when I remove all but one unique index it takes about 7
minutes. Any clue as to why it is taking longer than the create of the base
table (which is 2 mins).

On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk <
phi...@americanefficient.com> wrote:

>
>
> > On Oct 23, 2020, at 9:52 AM, Ravi Krishna  wrote:
> >
> >> My understanding is that when CONCURRENTLY is specified, Postgres
> implements the refresh as a series of INSERT, UPDATE,
> >> and DELETE statements on the existing view. So the answer to your
> question is no, Postgres doesn’t create another table and
> >> then swap it.
> >
> > The INSERTS/UPDATE/DELETE happens only for the difference.  PG first
> creates a new temp table and then compares it with
> > the MV and detects the difference.  That is why for CONCURRENTLY, a
> unique index is required on the MV.
>
> Yes, thank you, that’s what I understand too but I expressed it very
> poorly.
>
>

-- 
Regards,
Ayub


Re: postgres materialized view refresh performance

2020-10-26 Thread Ayub M
It's a simple sequential scan plan of one line, just reading the base table
sequentially.

On Mon, Oct 26, 2020, 9:21 AM Philip Semanchuk 
wrote:

>
>
> > On Oct 25, 2020, at 10:52 PM, Ayub M  wrote:
> >
> > Thank you both.
> >
> > As for the mview refresh taking long --
> >   • The mview gets refreshed in a couple of mins sometimes and sometimes
> it takes hours. When it runs for longer, there are no locks and no resource
> shortage, the number of recs in the base table is 6m (7.5gb) which is not
> huge so why does it take so long to refresh the mview?
> >
> > Does the run time correlate with the number of changes being made?
> >
> > -- Almost the same number of records are present in the base table (6
> million records). The base table gets truncated and reloaded everytime with
> almost the same number of records.
> >
> > And the mview is a simple select from this one base table.
> >
> > The mview has around 10 indexes, 1 unique and 9 non-unique indexes.
> >
> > Population of the base tables takes about 2 mins, using "insert into
> select from table", but when the mview is created for the first time it
> takes 16 minutes. Even when I remove all but one unique index it takes
> about 7 minutes. Any clue as to why it is taking longer than the create of
> the base table (which is 2 mins).
>
> Do you know if it’s executing a different plan when it takes a long time?
> auto_explain can help with that.
>
>
>
> >
> > On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk <
> phi...@americanefficient.com> wrote:
> >
> >
> > > On Oct 23, 2020, at 9:52 AM, Ravi Krishna 
> wrote:
> > >
> > >> My understanding is that when CONCURRENTLY is specified, Postgres
> implements the refresh as a series of INSERT, UPDATE,
> > >> and DELETE statements on the existing view. So the answer to your
> question is no, Postgres doesn’t create another table and
> > >> then swap it.
> > >
> > > The INSERTS/UPDATE/DELETE happens only for the difference.  PG first
> creates a new temp table and then compares it with
> > > the MV and detects the difference.  That is why for CONCURRENTLY, a
> unique index is required on the MV.
> >
> > Yes, thank you, that’s what I understand too but I expressed it very
> poorly.
> >
> >
> >
> > --
> > Regards,
> > Ayub
>
>


postgres auditing truncates and deletes on tables

2020-12-15 Thread Ayub M
for an RDS Aurora PostgreSQL v11.7 database I want to setup DDL and
truncate and delete auditing.

I am setting log_statement to DDL, to log all DDLs (this excludes truncate
statements). I also want to setup auditing only for deletes and truncates
on tables. I don't want to set log_statement to all, as this would fill up
my logs.

Checking in pg_audit, it supports write class which includes delete and
truncate, but it also includes write/update and copy which I don't need.

Is there any way to setup auditing to audit DDL plus deletes/truncates only
using pg_audit/native auditing?


Partitioning an existing table - pg10.6

2019-07-03 Thread Ayub M
Hello, I am using postgres 10.6 and have a huge table with ~30m recs which
needs to be partitioned. Whats the best strategy for it with minimal
downtime?

1. Rename existing table and create new partitioned tables with orig name
and keep inserting data from the old renamed to new table. This will incur
downtime for the apps.
2. Create partitioned table with new name and move the data from current
table (move oldest to newest) and when moving current month paritition
shutdown down the app and move the last set of data. This will not incur
downtime but if any data is changed which was already copied over then it
might be lost.

Any other approaches/alternatives?

-- 
Regards,
Ayub


postgres index usage count too high

2021-04-28 Thread Ayub M
There is a table in the db, whose index_scan count from pg_stat_all_tables
for this table seems to be too high, there are not that many queries being
executed against this table. Wondering how this count can be too high.

   1. The db is up since 80 days so I assume these are cumulative stats
   since last startup?
   2. Could it be possible that a query is using this table joining other
   tables, and this table is being probed multiple times in loops. Below is a
   googled part of the plan showing parallel index only scan happened 5 times
   for the index. I am assuming something of this sort is happening making the
   index scan count going too high. Please let me know if that might be the
   case.

 ->  Parallel Index Only Scan using
us_geonames_type_idx on us_geonames  (cost=0.43..24401.17 rows=559758
width=4) (actual time=0.036..90.309 rows=447806 loops=5)


   1. Is there any other possible explanation for this high count. I see
   updates do increase this count but there are not those many updates.
   Inserts and deletes do not seem to touch this counter.

-[ RECORD 1 ]---+--
relid   | 3029143981
schemaname  | myschema
relname | mytable
seq_scan| 196
seq_tup_read| 2755962642
idx_scan| 4362625959
idx_tup_fetch   | 3579773932
n_tup_ins   | 93821564
n_tup_upd   | 645310
n_tup_del   | 0
n_tup_hot_upd   | 21288
n_live_tup  | 31153237
n_dead_tup  | 0
n_mod_since_analyze | 0
last_vacuum | 2021-04-24 05:06:56.481349+00
last_autovacuum | 2021-03-04 00:27:26.705849+00
last_analyze| 2021-04-24 05:07:37.589756+00
last_autoanalyze| 2021-03-04 08:55:32.673118+00
vacuum_count| 69
autovacuum_count| 1
analyze_count   | 69
autoanalyze_count   | 55

db=> select * from pg_stat_all_indexes where relname = 'mytable' and
indexrelname = 'mytable_pkey' order by idx_tup_fetch desc nulls
last;-[ RECORD 1 ]-+---
relid | 3029143926
indexrelid| 3029143974
schemaname| myschema
relname   | mytable
indexrelname  | mytable_pkey
idx_scan  | 3806451145
idx_tup_read  | 97277555
idx_tup_fetch | 61522


Thanks.


postgres vacuum memory limits

2021-07-31 Thread Ayub M
Hello, when maintenance_work_mem and autovacuum_work_mem are set, my
understanding is that the vacuum and autovacuum sessions should be limited
to use the memory limits set by these parameters. But I am seeing more
memory being used than these limits by autovacuum sessions, any reason why
this would happen?

Please see below examples, where maintenance_work_mem is set to 20mb and
shared_buffers is 128mb. When I see the memory for this session in top, it
shows 162mb. But when default_statistics_target is increased to 3000, the
session usage is 463mb, which is way more than 20mb maintenance_work_mem
and 128mb shared_buffer. Shouldn't the process memory be capped to 20+128mb?

postgres=# show maintenance_work_mem ;
 maintenance_work_mem --
 20MB
(1 row)
postgres=# vacuum analyze mdm_context;
VACUUM
postgres=# show shared_buffers;
 shared_buffers 
 128MB
(1 row)

   PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+
COMMAND
 62246 postgres  20   0  422892 165996 139068 R 57.1 15.7  25:06.34
postgres: postgres postgres [local] VACUUM


postgres=# show default_statistics_target;
 default_statistics_target ---
 100
(1 row)
postgres=# set default_statistics_target=3000;SET
postgres=# vacuum analyze mdm_context;
VACUUM

PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+
COMMAND
 62246 postgres  20   0  876132 474384   2976 R 62.9 47.6  25:11.41
postgres: postgres postgres [local] VACUUM