Unable to run psql on 9.5 after broken 12 remove

2020-07-03 Thread Moreno Andreo

I have a production server running pg9.5 seamlessly.
Yesterday I decided to install libpq to have some crypto functions. 
Unexpectedly, it installed postgresql 12 and its libpq.

I don't need pg 12, so I decided to remove it.
It did not went well

root@datastore-1:/home/user# apt-get --purge remove postgresql-client-12
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libuuid-perl
Use 'apt-get autoremove' to remove it.
The following packages will be REMOVED:
  postgresql-12* postgresql-client-12* postgresql-contrib*
0 upgraded, 0 newly installed, 3 to remove and 89 not upgraded.
1 not fully installed or removed.
After this operation, 30.9 MB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 32065 files and directories currently installed.)
Removing postgresql-contrib (12+215.pgdg80+1) ...
Removing postgresql-12 (12.3-1.pgdg80+1) ...
update-alternatives: using 
/usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide 
/usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode

Purging configuration files for postgresql-12 (12.3-1.pgdg80+1) ...
Removing postgresql-client-12 (12.3-1.pgdg80+1) ...
update-alternatives: using /usr/share/postgresql/9.5/man/man1/psql.1.gz 
to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode

Processing triggers for postgresql-common (215.pgdg80+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
Setting up python3.4 (3.4.2-1+deb8u3) ...
  File "/usr/lib/python3.4/http/client.py", line 1014
    raise InvalidURL(f"URL can't contain control characters. {url!r} "
^
SyntaxError: invalid syntax
dpkg: error processing package python3.4 (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 python3.4
E: Sub-process /usr/bin/dpkg returned an error code (1)

After this, the package is not anymore on the installed list and I'm not 
able to issue the psql command:


root@datastore-1:/home/user# sudo -u postgres psql
Error: PostgreSQL version 12 is not installed

How can I repair this?
Thanks in advance
Moreno.





Re: Unable to run psql on 9.5 after broken 12 remove

2020-07-03 Thread Moreno Andreo

After upgrading python the InvalidUrl is gone, but I still can't run psql

Output of pg_lsclusters, if needed
root@datastore-1:/usr/share/postgresql-common# pg_lsclusters
Ver Cluster Port Status    Owner Data 
directory   Log file
9.5 main    6543 online    postgres 
/var/lib/postgresql/9.5/main /var/log/postgresql/postgresql-9.5-main.log
12  main    5432 down,binaries_missing  
/var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log


I followed
https://askubuntu.com/questions/1223270/psql-command-error-postgresql-version-12-is-not-installed
with no luck, apt-purge simply states that pg12 is not installed.


Il 03/07/2020 10:37, Moreno Andreo ha scritto:

I have a production server running pg9.5 seamlessly.
Yesterday I decided to install libpq to have some crypto functions. 
Unexpectedly, it installed postgresql 12 and its libpq.

I don't need pg 12, so I decided to remove it.
It did not went well

root@datastore-1:/home/user# apt-get --purge remove postgresql-client-12
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer 
required:

  libuuid-perl
Use 'apt-get autoremove' to remove it.
The following packages will be REMOVED:
  postgresql-12* postgresql-client-12* postgresql-contrib*
0 upgraded, 0 newly installed, 3 to remove and 89 not upgraded.
1 not fully installed or removed.
After this operation, 30.9 MB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 32065 files and directories currently installed.)
Removing postgresql-contrib (12+215.pgdg80+1) ...
Removing postgresql-12 (12.3-1.pgdg80+1) ...
update-alternatives: using 
/usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide 
/usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode

Purging configuration files for postgresql-12 (12.3-1.pgdg80+1) ...
Removing postgresql-client-12 (12.3-1.pgdg80+1) ...
update-alternatives: using 
/usr/share/postgresql/9.5/man/man1/psql.1.gz to provide 
/usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode

Processing triggers for postgresql-common (215.pgdg80+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell 
packages...

Removing obsolete dictionary files:
Setting up python3.4 (3.4.2-1+deb8u3) ...
  File "/usr/lib/python3.4/http/client.py", line 1014
    raise InvalidURL(f"URL can't contain control characters. {url!r} "
^
SyntaxError: invalid syntax
dpkg: error processing package python3.4 (--configure):
 subprocess installed post-installation script returned error exit 
status 1

Errors were encountered while processing:
 python3.4
E: Sub-process /usr/bin/dpkg returned an error code (1)

After this, the package is not anymore on the installed list and I'm 
not able to issue the psql command:


root@datastore-1:/home/user# sudo -u postgres psql
Error: PostgreSQL version 12 is not installed

How can I repair this?
Thanks in advance
Moreno.










[HELP] Regarding how to install libraries

2020-07-03 Thread Praveen Kumar K S
Hello,

PostgreSQL libraries are required while compiling pgpool. I need to install
postgresql-libs and postgresql-devel on Ubuntu Server 16.04

How can I install only these packages without having to install the entire
postgres server ? TIA.

-- 


*Regards,*


*K S Praveen Kumar*


Re: [HELP] Regarding how to install libraries

2020-07-03 Thread Adrian Klaver

On 7/3/20 4:37 AM, Praveen Kumar K S wrote:

Hello,

PostgreSQL libraries are required while compiling pgpool. I need to 
install postgresql-libs and postgresql-devel on Ubuntu Server 16.04


Why not install pgpool from package?

Assuming using PGDG repo:

sudo apt install postgresql-12-pgpool2




How can I install only these packages without having to install the 
entire postgres server ? TIA.


--
*Regards,

*
*K S Praveen Kumar

*



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Unable to run psql on 9.5 after broken 12 remove

2020-07-03 Thread Adrian Klaver

On 7/3/20 1:54 AM, Moreno Andreo wrote:

After upgrading python the InvalidUrl is gone, but I still can't run psql

Output of pg_lsclusters, if needed
root@datastore-1:/usr/share/postgresql-common# pg_lsclusters
Ver Cluster Port Status    Owner Data 
directory   Log file
9.5 main    6543 online    postgres 
/var/lib/postgresql/9.5/main /var/log/postgresql/postgresql-9.5-main.log
12  main    5432 down,binaries_missing  
/var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log


I followed
https://askubuntu.com/questions/1223270/psql-command-error-postgresql-version-12-is-not-installed 


with no luck, apt-purge simply states that pg12 is not installed.


Looks like the cluster was not removed from the pgcommon setup. This 
would explain why you can't run psql. By default pgcommon looks for the 
version of psql connected with the most recent version of Postgres it 
knows about, in this case 12. Unfortunately that binary no longer 
exists. You still have psql. Do:


/usr/lib/postgresql/9.5/bin/psql --help


To avoid having to do that try:

sudo pg_dropcluster  12 main




Il 03/07/2020 10:37, Moreno Andreo ha scritto:

I have a production server running pg9.5 seamlessly.
Yesterday I decided to install libpq to have some crypto functions. 
Unexpectedly, it installed postgresql 12 and its libpq.

I don't need pg 12, so I decided to remove it.
It did not went well

root@datastore-1:/home/user# apt-get --purge remove postgresql-client-12
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer 
required:

  libuuid-perl
Use 'apt-get autoremove' to remove it.
The following packages will be REMOVED:
  postgresql-12* postgresql-client-12* postgresql-contrib*
0 upgraded, 0 newly installed, 3 to remove and 89 not upgraded.
1 not fully installed or removed.
After this operation, 30.9 MB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 32065 files and directories currently installed.)
Removing postgresql-contrib (12+215.pgdg80+1) ...
Removing postgresql-12 (12.3-1.pgdg80+1) ...
update-alternatives: using 
/usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide 
/usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode

Purging configuration files for postgresql-12 (12.3-1.pgdg80+1) ...
Removing postgresql-client-12 (12.3-1.pgdg80+1) ...
update-alternatives: using 
/usr/share/postgresql/9.5/man/man1/psql.1.gz to provide 
/usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode

Processing triggers for postgresql-common (215.pgdg80+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell 
packages...

Removing obsolete dictionary files:
Setting up python3.4 (3.4.2-1+deb8u3) ...
  File "/usr/lib/python3.4/http/client.py", line 1014
    raise InvalidURL(f"URL can't contain control characters. {url!r} "
^
SyntaxError: invalid syntax
dpkg: error processing package python3.4 (--configure):
 subprocess installed post-installation script returned error exit 
status 1

Errors were encountered while processing:
 python3.4
E: Sub-process /usr/bin/dpkg returned an error code (1)

After this, the package is not anymore on the installed list and I'm 
not able to issue the psql command:


root@datastore-1:/home/user# sudo -u postgres psql
Error: PostgreSQL version 12 is not installed

How can I repair this?
Thanks in advance
Moreno.











--
Adrian Klaver
adrian.kla...@aklaver.com




Re: [HELP] Regarding how to install libraries

2020-07-03 Thread Adrian Klaver

On 7/3/20 7:21 AM, Adrian Klaver wrote:

On 7/3/20 4:37 AM, Praveen Kumar K S wrote:

Hello,

PostgreSQL libraries are required while compiling pgpool. I need to 
install postgresql-libs and postgresql-devel on Ubuntu Server 16.04


Why not install pgpool from package?

Assuming using PGDG repo:

sudo apt install postgresql-12-pgpool2


This was assuming you have Postgres 12 installed. If necessary modify to 
fit version actually installed.







How can I install only these packages without having to install the 
entire postgres server ? TIA.


--
*Regards,

*
*K S Praveen Kumar

*






--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Unable to run psql on 9.5 after broken 12 remove

2020-07-03 Thread Moreno Andreo

Il 03/07/2020 16:51, Adrian Klaver ha scritto:

On 7/3/20 1:54 AM, Moreno Andreo wrote:

Looks like the cluster was not removed from the pgcommon setup. This 
would explain why you can't run psql. By default pgcommon looks for 
the version of psql connected with the most recent version of Postgres 
it knows about, in this case 12. Unfortunately that binary no longer 
exists. You still have psql. Do:


/usr/lib/postgresql/9.5/bin/psql --help

OK, it confirms that 9.5 client is still ok


To avoid having to do that try:

sudo pg_dropcluster  12 main

This reported a warning

root@datastore-1:/home/user# pg_dropcluster 12 main
Warning: corrupted cluster: data directory does not exist

... but it did its job

root@datastore-1:/home/user# sudo -u postgres psql
psql (9.5.6)
Type "help" for help.

postgres=# \q


Thanks, you saved my day again!

Moreno.






Il 03/07/2020 10:37, Moreno Andreo ha scritto:

I have a production server running pg9.5 seamlessly.
Yesterday I decided to install libpq to have some crypto functions. 
Unexpectedly, it installed postgresql 12 and its libpq.

I don't need pg 12, so I decided to remove it.
It did not went well

root@datastore-1:/home/user# apt-get --purge remove 
postgresql-client-12

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer 
required:

  libuuid-perl
Use 'apt-get autoremove' to remove it.
The following packages will be REMOVED:
  postgresql-12* postgresql-client-12* postgresql-contrib*
0 upgraded, 0 newly installed, 3 to remove and 89 not upgraded.
1 not fully installed or removed.
After this operation, 30.9 MB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 32065 files and directories currently installed.)
Removing postgresql-contrib (12+215.pgdg80+1) ...
Removing postgresql-12 (12.3-1.pgdg80+1) ...
update-alternatives: using 
/usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide 
/usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode

Purging configuration files for postgresql-12 (12.3-1.pgdg80+1) ...
Removing postgresql-client-12 (12.3-1.pgdg80+1) ...
update-alternatives: using 
/usr/share/postgresql/9.5/man/man1/psql.1.gz to provide 
/usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode

Processing triggers for postgresql-common (215.pgdg80+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell 
packages...

Removing obsolete dictionary files:
Setting up python3.4 (3.4.2-1+deb8u3) ...
  File "/usr/lib/python3.4/http/client.py", line 1014
    raise InvalidURL(f"URL can't contain control characters. {url!r} "
^
SyntaxError: invalid syntax
dpkg: error processing package python3.4 (--configure):
 subprocess installed post-installation script returned error exit 
status 1

Errors were encountered while processing:
 python3.4
E: Sub-process /usr/bin/dpkg returned an error code (1)

After this, the package is not anymore on the installed list and I'm 
not able to issue the psql command:


root@datastore-1:/home/user# sudo -u postgres psql
Error: PostgreSQL version 12 is not installed

How can I repair this?
Thanks in advance
Moreno.


















Re: Unable to run psql on 9.5 after broken 12 remove

2020-07-03 Thread Adrian Klaver

On 7/3/20 8:15 AM, Moreno Andreo wrote:

Il 03/07/2020 16:51, Adrian Klaver ha scritto:

On 7/3/20 1:54 AM, Moreno Andreo wrote:

Looks like the cluster was not removed from the pgcommon setup. This 
would explain why you can't run psql. By default pgcommon looks for 
the version of psql connected with the most recent version of Postgres 
it knows about, in this case 12. Unfortunately that binary no longer 
exists. You still have psql. Do:


/usr/lib/postgresql/9.5/bin/psql --help

OK, it confirms that 9.5 client is still ok


To avoid having to do that try:

sudo pg_dropcluster  12 main

This reported a warning


Yeah, I'm guessing the original uninstall clean up process went off the 
rails here:


"
Processing triggers for postgresql-common (215.pgdg80+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
Setting up python3.4 (3.4.2-1+deb8u3) ...
  File "/usr/lib/python3.4/http/client.py", line 1014
raise InvalidURL(f"URL can't contain control characters. {url!r} "
^
SyntaxError: invalid syntax
dpkg: error processing package python3.4 (--configure):
"


The below just finished what the above started.



root@datastore-1:/home/user# pg_dropcluster 12 main
Warning: corrupted cluster: data directory does not exist

... but it did its job

root@datastore-1:/home/user# sudo -u postgres psql
psql (9.5.6)
Type "help" for help.

postgres=# \q


Thanks, you saved my day again!

Moreno.









--
Adrian Klaver
adrian.kla...@aklaver.com




Re: survey: psql syntax errors abort my transactions

2020-07-03 Thread Ron

On 7/3/20 1:54 AM, Laurenz Albe wrote:

On Thu, 2020-07-02 at 08:54 -0700, Jeremy Schneider wrote:

Maybe it's just me, but I'm wondering if it's worth changing the default 
behavior
of psql so it doesn't abort transactions in interactive mode when I mistakenly
mis-spell "select" or something silly like that.
This is of course easily remedied in my psqlrc file by adding "\set 
ON_ERROR_ROLLBACK interactive".
[...]
But I do know that for all the new people coming to PostgreSQL right now
(including lots at my company), none of them are going to know about this 
setting
and personally I think the default is user-unfriendly.
[...]

So...

Survey for the user community here on the pgsql-general list: it would be great 
if lots
of people could chime in by answering two questions about your very own 
production environment:

question 1) are you worried about scripts in your production environment where 
damage
could be caused by a different default in a future new major version of 
postgresql?
not aborting transactions in interactive mode when syntax errors occur)

I would dislike if interactive mode behaves differently from a non-interactive 
mode.

This is my favorite example why I like the way PostgreSQL does things:

/* poor man's VACUUM (FULL) */
BEGIN;
CREATTE TABLE t2 AS SELECT * FROM t1;
DROP TABLE t1;
ALTER TABLE t2 RENAME TO t1;
COMMIT;


How so, since it does not carry over indexes, foreign keys, triggers, 
partition references, etc?


--
Angular momentum makes the world go 'round.




Re: survey: psql syntax errors abort my transactions

2020-07-03 Thread Julien Rouhaud
On Fri, Jul 3, 2020 at 7:46 PM Ron  wrote:
>
> On 7/3/20 1:54 AM, Laurenz Albe wrote:
> > This is my favorite example why I like the way PostgreSQL does things:
> >
> > /* poor man's VACUUM (FULL) */
> > BEGIN;
> > CREATTE TABLE t2 AS SELECT * FROM t1;
> > DROP TABLE t1;
> > ALTER TABLE t2 RENAME TO t1;
> > COMMIT;
>
> How so, since it does not carry over indexes, foreign keys, triggers,
> partition references, etc?

The point of this example is that if you have a typo in the CREATE
TABLE like here, you *don't want* to continue executing the commands,
which would drop the original table while you don't have a copy of the
data anymore.  That's what he meant by liking the way postgres does
things, not how to do this poor man's vacuum full.




Degraded performance during table rewrite

2020-07-03 Thread Mohamed Wael Khobalatte
Hi all,

I am attempting to do a bunch of table rewrites to shrink a table in the
absence of pg_repack and vacuum full (both not an option). The database is
growing fast and has had significant bloat in both heaps and indexes, so
index rebuilds alone won't cut it. We found that table inheritance can be
used to achieve this rather nicely. We are running PG v9.6.18.

We are setting up the inheritance as follows:

BEGIN;
ALTER TABLE #{table} RENAME TO #{table}_old;
CREATE TABLE #{table} (LIKE #{table}_old INCLUDING ALL);
ALTER TABLE #{table}_old INHERIT #{table};
ALTER SEQUENCE #{table}_id_seq OWNED BY #{table}.id;
COMMIT;

Then, the migration itself runs as follows (each in a transaction, looping
through records and sleeping for a bit)

WITH del AS (
  DELETE FROM #{old_table}
  WHERE id IN (
SELECT id
FROM #{old_table}
WHERE id > #{max_deleted_id} -- This is the max deleted from the
previous batch, we grab it programmatically.
ORDER BY id ASC
LIMIT #{batch_size}
  )
  RETURNING *
)
INSERT INTO #{table}
SELECT * FROM del
RETURNING id

For instance, the batch_size can be 10_000, and the code sleeps
programatically for 200ms (this is done in a Ruby script).

"max_deleted_id" is passed to each run from the previous one. This improves
the inner SELECT query.

This works very well. However, I noticed two suprising things:

1. The performance of the delete and insert drops by several orders of
magnitude as the script runs. For instance, in one run, it goes from 150ms
average run to 700ms per batch.

2. The explain itself takes a while to run on a sample batch. In one table,
the explain alone took four seconds.

To try and reproduce this locally, I used the following dummy table:

create table towns (id serial primary key, code text, article text, name
text, department text);

insert into towns (
code, article, name, department
)
select
left(md5(i::text), 10),
md5(random()::text),
md5(random()::text),
left(md5(random()::text), 4)

from generate_series(1, 1) s(i);

This spends 150ms per batch, which climbs to 700ms per batch. A vacuum of
the old table lowers is back to 150ms, but I don't understand why, because
we structure the query to jump over all previously dead rows. There is an
old thread in which Tom Lane mentions that the planner might itself be
walking that primary index. Is this applicable here? And is there anything
we can do besides more aggressive and continued vacuuming of the old table
(or a change in autovacuum settings)? Ideally, we want to run this
overnight without much supervision.


Re: Degraded performance during table rewrite

2020-07-03 Thread Mohamed Wael Khobalatte
On Fri, Jul 3, 2020 at 4:24 PM Mohamed Wael Khobalatte <
mkhobala...@grubhub.com> wrote:

> Hi all,
>
> I am attempting to do a bunch of table rewrites to shrink a table in the
> absence of pg_repack and vacuum full (both not an option). The database is
> growing fast and has had significant bloat in both heaps and indexes, so
> index rebuilds alone won't cut it. We found that table inheritance can be
> used to achieve this rather nicely. We are running PG v9.6.18.
>
> We are setting up the inheritance as follows:
>
> BEGIN;
> ALTER TABLE #{table} RENAME TO #{table}_old;
> CREATE TABLE #{table} (LIKE #{table}_old INCLUDING ALL);
> ALTER TABLE #{table}_old INHERIT #{table};
> ALTER SEQUENCE #{table}_id_seq OWNED BY #{table}.id;
> COMMIT;
>
> Then, the migration itself runs as follows (each in a transaction, looping
> through records and sleeping for a bit)
>
> WITH del AS (
>   DELETE FROM #{old_table}
>   WHERE id IN (
> SELECT id
> FROM #{old_table}
> WHERE id > #{max_deleted_id} -- This is the max deleted from the
> previous batch, we grab it programmatically.
> ORDER BY id ASC
> LIMIT #{batch_size}
>   )
>   RETURNING *
> )
> INSERT INTO #{table}
> SELECT * FROM del
> RETURNING id
>
> For instance, the batch_size can be 10_000, and the code sleeps
> programatically for 200ms (this is done in a Ruby script).
>
> "max_deleted_id" is passed to each run from the previous one. This
> improves the inner SELECT query.
>
> This works very well. However, I noticed two suprising things:
>
> 1. The performance of the delete and insert drops by several orders of
> magnitude as the script runs. For instance, in one run, it goes from 150ms
> average run to 700ms per batch.
>
> 2. The explain itself takes a while to run on a sample batch. In one
> table, the explain alone took four seconds.
>
> To try and reproduce this locally, I used the following dummy table:
>
> create table towns (id serial primary key, code text, article text, name
> text, department text);
>
> insert into towns (
> code, article, name, department
> )
> select
> left(md5(i::text), 10),
> md5(random()::text),
> md5(random()::text),
> left(md5(random()::text), 4)
>
> from generate_series(1, 1) s(i);
>
> This spends 150ms per batch, which climbs to 700ms per batch. A vacuum of
> the old table lowers is back to 150ms, but I don't understand why, because
> we structure the query to jump over all previously dead rows. There is an
> old thread in which Tom Lane mentions that the planner might itself be
> walking that primary index. Is this applicable here? And is there anything
> we can do besides more aggressive and continued vacuuming of the old table
> (or a change in autovacuum settings)? Ideally, we want to run this
> overnight without much supervision.
>

I just ran a comparison between our version and PG12. To make things easier
to pinpoint, I did the following:

- Increased WAL size (this was never an issue anway) and spread out
checkpointing.
- Disabled autovacuum on both the old table and the new one.

Ran the migration on PG 9.6 and 12. The 9.6 still climbs from 150ms to
700ms per batch, while PG12 stays stable at 150ms per batch, so it seems to
me like *something* has improved between the versions, but not sure what.


Re: Degraded performance during table rewrite

2020-07-03 Thread Tom Lane
Mohamed Wael Khobalatte  writes:
> ... the migration itself runs as follows (each in a transaction, looping
> through records and sleeping for a bit)

> WITH del AS (
>   DELETE FROM #{old_table}
>   WHERE id IN (
> SELECT id
> FROM #{old_table}
> WHERE id > #{max_deleted_id} -- This is the max deleted from the
> previous batch, we grab it programmatically.
> ORDER BY id ASC
> LIMIT #{batch_size}
>   )
>   RETURNING *
> )
> INSERT INTO #{table}
> SELECT * FROM del
> RETURNING id

> This spends 150ms per batch, which climbs to 700ms per batch. A vacuum of
> the old table lowers is back to 150ms, but I don't understand why, because
> we structure the query to jump over all previously dead rows. There is an
> old thread in which Tom Lane mentions that the planner might itself be
> walking that primary index. Is this applicable here? And is there anything
> we can do besides more aggressive and continued vacuuming of the old table
> (or a change in autovacuum settings)? Ideally, we want to run this
> overnight without much supervision.

Yeah, given that the slowdown seems to be in the planner, and given your
further observation that v12 is better, I'd say that this is an issue
with get_actual_variable_range.  That's going to be invoked to try to
determine the selectivity of the "WHERE id > #{max_deleted_id}" clause,
if the constant is past the last value in the histogram for the id
column.

The improvement you see in v12 actually came in in v11, and I think
I'll just quote the commit log:

Author: Tom Lane 
Branch: master Release: REL_11_BR [3ca930fc3] 2017-09-07 19:41:51 -0400

Improve performance of get_actual_variable_range with recently-dead tuples.

In commit fccebe421, we hacked get_actual_variable_range() to scan the
index with SnapshotDirty, so that if there are many uncommitted tuples
at the end of the index range, it wouldn't laboriously scan through all
of them looking for a live value to return.  However, that didn't fix it
for the case of many recently-dead tuples at the end of the index;
SnapshotDirty recognizes those as committed dead and so we're back to
the same problem.

To improve the situation, invent a "SnapshotNonVacuumable" snapshot type
and use that instead.  The reason this helps is that, if the snapshot
rejects a given index entry, we know that the indexscan will mark that
index entry as killed.  This means the next get_actual_variable_range()
scan will proceed past that entry without visiting the heap, making the
scan a lot faster.  We may end up accepting a recently-dead tuple as
being the estimated extremal value, but that doesn't seem much worse than
the compromise we made before to accept not-yet-committed extremal values.

The cost of the scan is still proportional to the number of dead index
entries at the end of the range, so in the interval after a mass delete
but before VACUUM's cleaned up the mess, it's still possible for
get_actual_variable_range() to take a noticeable amount of time, if you've
got enough such dead entries.  But the constant factor is much much better
than before, since all we need to do with each index entry is test its
"killed" bit.

We chose to back-patch commit fccebe421 at the time, but I'm hesitant to
do so here, because this form of the problem seems to affect many fewer
people.  Also, even when it happens, it's less bad than the case fixed
by commit fccebe421 because we don't get the contention effects from
expensive TransactionIdIsInProgress tests.

Dmitriy Sarafannikov, reviewed by Andrey Borodin

Discussion: 
https://postgr.es/m/05c72cf7-b5f6-4db9-8a09-5ac897653...@yandex.ru


There are a number of possibilities for working around this in your
particular situation, short of an upgrade to v11+.  You could try doing a
manual VACUUM between deletion steps, but that could fail to fix it if
anything else is running concurrently (because the VACUUM might not think
it's safe to recycle the recently-dead tuples yet).  I think possibly
a better approach is to try to avoid the situation wherein estimating
"WHERE id > #{max_deleted_id}" requires determining the table's true
endpoint id value.  For that, the last id value seen in the pg_stats
histogram for the id column has to be greater than the max_deleted_id
value.  So you might find that increasing the deletion batch size
(thereby reducing max_deleted_id) does the trick; or you could increase
the statistics target for that column, making the histogram larger and
hence (probably) making its endpoint higher.

regards, tom lane




Re: Degraded performance during table rewrite

2020-07-03 Thread Ron

On 7/3/20 3:24 PM, Mohamed Wael Khobalatte wrote:

Hi all,

I am attempting to do a bunch of table rewrites to shrink a table in the 
absence of pg_repack and vacuum full (both not an option). The database is 
growing fast and has had significant bloat in both heaps and indexes, so 
index rebuilds alone won't cut it. We found that table inheritance can be 
used to achieve this rather nicely. We are running PG v9.6.18.


We are setting up the inheritance as follows:

BEGIN;
ALTER TABLE #{table} RENAME TO #{table}_old;
CREATE TABLE #{table} (LIKE #{table}_old INCLUDING ALL);
ALTER TABLE #{table}_old INHERIT #{table};
ALTER SEQUENCE #{table}_id_seq OWNED BY #{table}.id;
COMMIT;

Then, the migration itself runs as follows (each in a transaction, looping 
through records and sleeping for a bit)


WITH del AS (
  DELETE FROM #{old_table}
  WHERE id IN (
    SELECT id
    FROM #{old_table}
    WHERE id > #{max_deleted_id} -- This is the max deleted from the 
previous batch, we grab it programmatically.

    ORDER BY id ASC
    LIMIT #{batch_size}
  )
  RETURNING *
)
INSERT INTO #{table}
SELECT * FROM del
RETURNING id

For instance, the batch_size can be 10_000, and the code sleeps 
programatically for 200ms (this is done in a Ruby script).


"max_deleted_id" is passed to each run from the previous one. This 
improves the inner SELECT query.


This works very well. However, I noticed two suprising things:

1. The performance of the delete and insert drops by several orders of 
magnitude as the script runs. For instance, in one run, it goes from 150ms 
average run to 700ms per batch.


2. The explain itself takes a while to run on a sample batch. In one 
table, the explain alone took four seconds.


To try and reproduce this locally, I used the following dummy table:

create table towns (id serial primary key, code text, article text, name 
text, department text);


insert into towns (
    code, article, name, department
)
select
    left(md5(i::text), 10),
    md5(random()::text),
    md5(random()::text),
    left(md5(random()::text), 4)

from generate_series(1, 1) s(i);

This spends 150ms per batch, which climbs to 700ms per batch. A vacuum of 
the old table lowers is back to 150ms, but I don't understand why, because 
we structure the query to jump over all previously dead rows. There is an 
old thread in which Tom Lane mentions that the planner might itself be 
walking that primary index. Is this applicable here? And is there anything 
we can do besides more aggressive and continued vacuuming of the old table 
(or a change in autovacuum settings)? Ideally, we want to run this 
overnight without much supervision.


Is the problem really about writing the new table, or reading from the old 
table?


--
Angular momentum makes the world go 'round.


Re: Degraded performance during table rewrite

2020-07-03 Thread Mohamed Wael Khobalatte
On Fri, Jul 3, 2020 at 5:26 PM Tom Lane  wrote:

> Mohamed Wael Khobalatte  writes:
> > ... the migration itself runs as follows (each in a transaction, looping
> > through records and sleeping for a bit)
>
> > WITH del AS (
> >   DELETE FROM #{old_table}
> >   WHERE id IN (
> > SELECT id
> > FROM #{old_table}
> > WHERE id > #{max_deleted_id} -- This is the max deleted from the
> > previous batch, we grab it programmatically.
> > ORDER BY id ASC
> > LIMIT #{batch_size}
> >   )
> >   RETURNING *
> > )
> > INSERT INTO #{table}
> > SELECT * FROM del
> > RETURNING id
>
> > This spends 150ms per batch, which climbs to 700ms per batch. A vacuum of
> > the old table lowers is back to 150ms, but I don't understand why,
> because
> > we structure the query to jump over all previously dead rows. There is an
> > old thread in which Tom Lane mentions that the planner might itself be
> > walking that primary index. Is this applicable here? And is there
> anything
> > we can do besides more aggressive and continued vacuuming of the old
> table
> > (or a change in autovacuum settings)? Ideally, we want to run this
> > overnight without much supervision.
>
> Yeah, given that the slowdown seems to be in the planner, and given your
> further observation that v12 is better, I'd say that this is an issue
> with get_actual_variable_range.  That's going to be invoked to try to
> determine the selectivity of the "WHERE id > #{max_deleted_id}" clause,
> if the constant is past the last value in the histogram for the id
> column.
>
> The improvement you see in v12 actually came in in v11, and I think
> I'll just quote the commit log:
>
> Author: Tom Lane 
> Branch: master Release: REL_11_BR [3ca930fc3] 2017-09-07 19:41:51 -0400
>
> Improve performance of get_actual_variable_range with recently-dead
> tuples.
>
> In commit fccebe421, we hacked get_actual_variable_range() to scan the
> index with SnapshotDirty, so that if there are many uncommitted tuples
> at the end of the index range, it wouldn't laboriously scan through all
> of them looking for a live value to return.  However, that didn't fix
> it
> for the case of many recently-dead tuples at the end of the index;
> SnapshotDirty recognizes those as committed dead and so we're back to
> the same problem.
>
> To improve the situation, invent a "SnapshotNonVacuumable" snapshot
> type
> and use that instead.  The reason this helps is that, if the snapshot
> rejects a given index entry, we know that the indexscan will mark that
> index entry as killed.  This means the next get_actual_variable_range()
> scan will proceed past that entry without visiting the heap, making the
> scan a lot faster.  We may end up accepting a recently-dead tuple as
> being the estimated extremal value, but that doesn't seem much worse
> than
> the compromise we made before to accept not-yet-committed extremal
> values.
>
> The cost of the scan is still proportional to the number of dead index
> entries at the end of the range, so in the interval after a mass delete
> but before VACUUM's cleaned up the mess, it's still possible for
> get_actual_variable_range() to take a noticeable amount of time, if
> you've
> got enough such dead entries.  But the constant factor is much much
> better
> than before, since all we need to do with each index entry is test its
> "killed" bit.
>
> We chose to back-patch commit fccebe421 at the time, but I'm hesitant
> to
> do so here, because this form of the problem seems to affect many fewer
> people.  Also, even when it happens, it's less bad than the case fixed
> by commit fccebe421 because we don't get the contention effects from
> expensive TransactionIdIsInProgress tests.
>
> Dmitriy Sarafannikov, reviewed by Andrey Borodin
>
> Discussion:
> https://postgr.es/m/05c72cf7-b5f6-4db9-8a09-5ac897653...@yandex.ru
>
>
> There are a number of possibilities for working around this in your
> particular situation, short of an upgrade to v11+.  You could try doing a
> manual VACUUM between deletion steps, but that could fail to fix it if
> anything else is running concurrently (because the VACUUM might not think
> it's safe to recycle the recently-dead tuples yet).  I think possibly
> a better approach is to try to avoid the situation wherein estimating
> "WHERE id > #{max_deleted_id}" requires determining the table's true
> endpoint id value.  For that, the last id value seen in the pg_stats
> histogram for the id column has to be greater than the max_deleted_id
> value.  So you might find that increasing the deletion batch size
> (thereby reducing max_deleted_id) does the trick; or you could increase
> the statistics target for that column, making the histogram larger and
> hence (probably) making its endpoint higher.
>
> regards, tom lane
>

Hi Tom, thanks for your response.

I did increase the target to 10_000 in

Re: Degraded performance during table rewrite

2020-07-03 Thread Mohamed Wael Khobalatte
On Fri, Jul 3, 2020 at 10:16 PM Mohamed Wael Khobalatte <
mkhobala...@grubhub.com> wrote:

>
> On Fri, Jul 3, 2020 at 5:26 PM Tom Lane  wrote:
>
>> Mohamed Wael Khobalatte  writes:
>> > ... the migration itself runs as follows (each in a transaction, looping
>> > through records and sleeping for a bit)
>>
>> > WITH del AS (
>> >   DELETE FROM #{old_table}
>> >   WHERE id IN (
>> > SELECT id
>> > FROM #{old_table}
>> > WHERE id > #{max_deleted_id} -- This is the max deleted from the
>> > previous batch, we grab it programmatically.
>> > ORDER BY id ASC
>> > LIMIT #{batch_size}
>> >   )
>> >   RETURNING *
>> > )
>> > INSERT INTO #{table}
>> > SELECT * FROM del
>> > RETURNING id
>>
>> > This spends 150ms per batch, which climbs to 700ms per batch. A vacuum
>> of
>> > the old table lowers is back to 150ms, but I don't understand why,
>> because
>> > we structure the query to jump over all previously dead rows. There is
>> an
>> > old thread in which Tom Lane mentions that the planner might itself be
>> > walking that primary index. Is this applicable here? And is there
>> anything
>> > we can do besides more aggressive and continued vacuuming of the old
>> table
>> > (or a change in autovacuum settings)? Ideally, we want to run this
>> > overnight without much supervision.
>>
>> Yeah, given that the slowdown seems to be in the planner, and given your
>> further observation that v12 is better, I'd say that this is an issue
>> with get_actual_variable_range.  That's going to be invoked to try to
>> determine the selectivity of the "WHERE id > #{max_deleted_id}" clause,
>> if the constant is past the last value in the histogram for the id
>> column.
>>
>> The improvement you see in v12 actually came in in v11, and I think
>> I'll just quote the commit log:
>>
>> Author: Tom Lane 
>> Branch: master Release: REL_11_BR [3ca930fc3] 2017-09-07 19:41:51 -0400
>>
>> Improve performance of get_actual_variable_range with recently-dead
>> tuples.
>>
>> In commit fccebe421, we hacked get_actual_variable_range() to scan the
>> index with SnapshotDirty, so that if there are many uncommitted tuples
>> at the end of the index range, it wouldn't laboriously scan through
>> all
>> of them looking for a live value to return.  However, that didn't fix
>> it
>> for the case of many recently-dead tuples at the end of the index;
>> SnapshotDirty recognizes those as committed dead and so we're back to
>> the same problem.
>>
>> To improve the situation, invent a "SnapshotNonVacuumable" snapshot
>> type
>> and use that instead.  The reason this helps is that, if the snapshot
>> rejects a given index entry, we know that the indexscan will mark that
>> index entry as killed.  This means the next
>> get_actual_variable_range()
>> scan will proceed past that entry without visiting the heap, making
>> the
>> scan a lot faster.  We may end up accepting a recently-dead tuple as
>> being the estimated extremal value, but that doesn't seem much worse
>> than
>> the compromise we made before to accept not-yet-committed extremal
>> values.
>>
>> The cost of the scan is still proportional to the number of dead index
>> entries at the end of the range, so in the interval after a mass
>> delete
>> but before VACUUM's cleaned up the mess, it's still possible for
>> get_actual_variable_range() to take a noticeable amount of time, if
>> you've
>> got enough such dead entries.  But the constant factor is much much
>> better
>> than before, since all we need to do with each index entry is test its
>> "killed" bit.
>>
>> We chose to back-patch commit fccebe421 at the time, but I'm hesitant
>> to
>> do so here, because this form of the problem seems to affect many
>> fewer
>> people.  Also, even when it happens, it's less bad than the case fixed
>> by commit fccebe421 because we don't get the contention effects from
>> expensive TransactionIdIsInProgress tests.
>>
>> Dmitriy Sarafannikov, reviewed by Andrey Borodin
>>
>> Discussion:
>> https://postgr.es/m/05c72cf7-b5f6-4db9-8a09-5ac897653...@yandex.ru
>>
>>
>> There are a number of possibilities for working around this in your
>> particular situation, short of an upgrade to v11+.  You could try doing a
>> manual VACUUM between deletion steps, but that could fail to fix it if
>> anything else is running concurrently (because the VACUUM might not think
>> it's safe to recycle the recently-dead tuples yet).  I think possibly
>> a better approach is to try to avoid the situation wherein estimating
>> "WHERE id > #{max_deleted_id}" requires determining the table's true
>> endpoint id value.  For that, the last id value seen in the pg_stats
>> histogram for the id column has to be greater than the max_deleted_id
>> value.  So you might find that increasing the deletion batch size
>> (thereby reducing max_deleted_id) does the trick; or you could increase
>> the statistics targ

Re: Degraded performance during table rewrite

2020-07-03 Thread Tom Lane
Mohamed Wael Khobalatte  writes:
> Do you happen to know if there is an upper limit to how much time the
> planner is willing to spend on this?

There is not.  (I have considered that idea, but it's not implemented.
I'm not sure whether there's still much practical problem given the
v11+ behavior, so I'm not in a hurry to complicate things more.)

Another idea that might conceivably be useful to you on 9.6 is to
reorganize the bulk deletions so that most of them aren't at the endpoint
of the live id range.  If you made it so that the extremal values are
deleted last, you'd never hit this behavior.

regards, tom lane




Re: Degraded performance during table rewrite

2020-07-03 Thread Mohamed Wael Khobalatte
>
>  Another idea that might conceivably be useful to you on 9.6 is to

reorganize the bulk deletions so that most of them aren't at the endpoint
> of the live id range.  If you made it so that the extremal values are
> deleted last, you'd never hit this behavior.
>
> regards, tom lane
>

Hm, I am not sure I understood your suggestion. We'd still need to find a
range of ids to delete, and the order by asc has the nice property that it
moves old records first, which helps tables with a lot of activity on
recently created tuples (and I suppose an ordering in the other direction
would suffer from the same problem).