restore and recovery using WAL: unkown messages in serverlog

2019-09-03 Thread Matthias Apitz


Hello,

I'm testing WAL based backup and restore/recovery and encounter some
unclear messages in the 'serverlog' on recovery. The used procedure on a
test server is:

Kick the copy of the current WAL to the archive:

$ psql -Usisis -c "select pg_switch_wal();"
$ ls -l /data/postgresql11/wal_archive
insgesamt 49212
-rw--- 1 postgres postgres 16777216  3. Sep 07:53 00010001008B
-rw--- 1 postgres postgres 16777216  3. Sep 08:52 00010001008C
-rw--- 1 postgres postgres 16777216  3. Sep 08:54 00010001008D

Now we backup the cluster:

$ pg_basebackup -Usisis -Ft -D /data/postgresql11/backup-20190903-1

$ ls -l /data/postgresql11/backup-20190903-1
insgesamt 6379972
-rw--- 1 postgres postgres 6509913088  3. Sep 09:02 base.tar
-rw--- 1 postgres postgres   16779264  3. Sep 09:02 pg_wal.tar

$ ls -l /data/postgresql11/wal_archive/
insgesamt 65620
-rw--- 1 postgres postgres 16777216  3. Sep 07:53 00010001008B
-rw--- 1 postgres postgres 16777216  3. Sep 08:52 00010001008C
-rw--- 1 postgres postgres 16777216  3. Sep 08:54 00010001008D
-rw--- 1 postgres postgres 16777216  3. Sep 09:02 00010001008E
-rw--- 1 postgres postgres  342  3. Sep 09:02 
00010001008E.0028.backup

We stop and "destroy" the cluster (i.e. we move it away):

# /etc/init.d/postgres stop
# mv /data/postgresql11/data /data/postgresql11/data.away
# mkdir /data/postgresql11/data
# chown postgres:postgres /data/postgresql11/data
# chmod 0700 /data/postgresql11/data

We restore the cluster as user 'postgres' and recover based on the WAL:

$ tar xf /data/postgresql11/backup-20190903-1/base.tar-C 
/data/postgresql11/data/
$ tar xf /data/postgresql11/backup-20190903-1/pg_wal.tar  -C 
/data/postgresql11/data/pg_wal/

For recover we create a file recovery.conf

$ vim /data/postgresql11/data/recovery.conf

  restore_command = 'cp /data/postgresql11/wal_archive/%f %p'

Now start PostgreSQL again:

# /etc/init.d/postgres start

In the 'serverlog' it gives some errors (marked with ***>); sorry that the
messages are in German, it means 'cp: stat failed for ..., file or directory 
not found':

 2019-09-03 09:18:44.355 CEST [25388] LOG:  starte Wiederherstellung aus 
Archiv
 2019-09-03 09:18:45.249 CEST [25388] LOG:  Logdatei 
»00010001008E« aus Archiv wiederhergestellt
 2019-09-03 09:18:45.306 CEST [25388] LOG:  Redo beginnt bei 1/8E28
 2019-09-03 09:18:45.308 CEST [25388] LOG:  konsistenter 
Wiederherstellungszustand erreicht bei 1/8E000130
 2019-09-03 09:18:45.308 CEST [25387] LOG:  Datenbanksystem ist bereit, um 
lesende Verbindungen anzunehmen
 2019-09-03 09:18:46.024 CEST [25388] LOG:  Logdatei 
»00010001008F« aus Archiv wiederhergestellt
***> cp: der Aufruf von stat für 
'/data/postgresql11/wal_archive/000100010090' ist nicht möglich: 
Datei oder Verzeichnis nicht gefunden
 2019-09-03 09:18:47.838 CEST [25388] LOG:  Redo fertig bei 1/8F000140
 2019-09-03 09:18:47.869 CEST [25388] LOG:  Logdatei 
»00010001008F« aus Archiv wiederhergestellt
***> cp: der Aufruf von stat für 
'/data/postgresql11/wal_archive/0002.history' ist nicht möglich: Datei oder 
Verzeichnis nicht gefunden
 2019-09-03 09:18:47.900 CEST [25388] LOG:  gewählte neue Zeitleisten-ID: 2
 2019-09-03 09:18:47.961 CEST [25388] LOG:  Wiederherstellung aus Archiv 
abgeschlossen
***> cp: der Aufruf von stat für 
'/data/postgresql11/wal_archive/0001.history' ist nicht möglich: Datei oder 
Verzeichnis nicht gefunden
 2019-09-03 09:18:48.072 CEST [25387] LOG:  Datenbanksystem ist bereit, um 
Verbindungen anzunehmen
 
What is causing the errors about the missing files in 
/data/postgresql11/wal_archive?

Btw: Is there away to run the server for German UTF-8, but with English
messages in the log?

Thanks

matthias

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub




Re: implicit transaction changes trigger behaviour

2019-09-03 Thread Willy-Bas Loos
Thank you so much, the "stable" thing was it.
I'm not sure if it is underdocumented, i clearly didn't adhere to the rule
that a stable function " is guaranteed to return the same results given the
same arguments for all rows within a single statement".
BTW in my example i made a mistake too, but that was beside the point
really.

Cheers,
Willy-Bas

On Thu, Aug 29, 2019 at 3:35 PM Tom Lane  wrote:

> Willy-Bas Loos  writes:
> > I currently have a fairly complex use case to solve and one thing i tried
> > was a deferred constraint trigger. I'm not sure if this solution is the
> way
> > to go, but anyway: As i was testing my code, i noticed that the trigger
> > behaves differently depending on whether or not i explicitly use BEGIN
> and
> > COMMIT, even though there is only 1 query in the transaction.
> > I am wondering if this is a bug in postgresql?
>
> I think the issue is that you marked the trigger as STABLE.  That causes
> it to use the calling query's snapshot so it doesn't see the updates,
> if it's fired during the delete query and not during the subsequent
> COMMIT.  If I remove the STABLE label then it works as you expect.
>
> This is probably under-documented but I'm not sure that it should be
> considered a bug.
>
> The trigger seems a bit broken besides that, in that the comments claim it
> has something to do with the OLD row's id field(s) but the query is not in
> fact taking that into account.
>
> regards, tom lane
>


-- 
Willy-Bas Loos


Re: restore and recovery using WAL: unkown messages in serverlog

2019-09-03 Thread Luca Ferrari
On Tue, Sep 3, 2019 at 9:57 AM Matthias Apitz  wrote:
>  2019-09-03 09:18:46.024 CEST [25388] LOG:  Logdatei 
> »00010001008F« aus Archiv wiederhergestellt
> ***> cp: der Aufruf von stat für 
> '/data/postgresql11/wal_archive/000100010090' ist nicht möglich: 
> Datei oder Verzeichnis nicht gefunden

It's not an error, it is searching the next wal that after 0x8F is
0x90, but the log is not there, so the recovery has finished. The
cluster does not know where to stop because your recovery.conf did not
specified, so it tries to do as much wal processing as it can and
stops when it finds no more.

>  2019-09-03 09:18:47.838 CEST [25388] LOG:  Redo fertig bei 1/8F000140

Redo is done.


> ***> cp: der Aufruf von stat für 
> '/data/postgresql11/wal_archive/0002.history' ist nicht möglich: Datei 
> oder Verzeichnis nicht gefunden

having finisched the wals it is searching for a branch on timeline 2

>  2019-09-03 09:18:47.900 CEST [25388] LOG:  gewählte neue Zeitleisten-ID: 
> 2

and it switches on timeline 2 autonomously

Pretty much that should be what you see in the logs.

> Btw: Is there away to run the server for German UTF-8, but with English
> messages in the log?

Set lc_messages in the postgresql.conf to something in english, I
think 'en_US.UTF-8' could do the trick, but haven't tried it.

Luca




Re: pgmodeler ?

2019-09-03 Thread Tony Shelver
<<
On Sun, 1 Sep 2019 at 13:45, Thiemo Kellner 
wrote:

> Quoting Olivier Gautherot :
>
> This is the specific error message:
>
>>
>> Could not execute the SQL command.
>> Message returned: ERROR: column pr.proisagg does not exist
>> LINE 1: ...namespace AS ns ON pr.pronamespace = ns.oid WHERE pr.proisag...
>> ^
>> HINT: Perhaps you meant to reference the column
>>
>> --
>> "They that would give up essential liberty for temporary safety deserve
>> neither liberty nor safety."
>> -- Benjamin Franklin
>
>
> Can you confirm the versions of pgModeler and PG? Pgmodeler 0.91 does not
> fully support PG 11.x and there is an update in preparation in 0.92 which
> should. You may want to givi it a try.
>
>
> Actually, this is a known problem:
> https://github.com/pgmodeler/pgmodeler/issues/1281
>
> Maybe you want to give the beta a shot: https://pgmodeler.io/download
> >>
> I built 9.2 beta and have beeen using it quite successfully against pg11
> for a while
>


Slow statement using parallelism after 9.6>11 upgrade

2019-09-03 Thread Arnaud L.

Hi list

We have upgraded our database from 9.6 to 11 (and updated PostGIS from 
2.3 to 2.5 as well).
We are using it among other things to store an OpenStreetMap database, 
populated by the osm2pgsql program and updated on a daily basis.


The process used to take ~1h30 minutes before the upgrade, but after the 
upgrade it looks like it will never run to completion in a 24h time frame.


The problem is apparently that the planner choses to run some statements 
in parallel, but these take a lot longer than when run without it.
Here are the to explain analyze of the same statement, with parallelism 
on and off :


Statement :
PREPARE mark_ways_by_node(int8) AS select id from planet_osm_ways WHERE 
nodes && ARRAY[$1];

EXPLAIN EXECUTE mark_ways_by_node(1);


max_parallel_workers_per_gather = 2 :

Gather  (cost=12545.61..2357352.25 rows=1420982 width=8) (actual 
time=2.577..64.028 rows=1 loops=1)

  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Bitmap Heap Scan on planet_osm_ways 
(cost=11545.61..2214254.05 rows=592076 width=8) (actual 
time=0.213..0.213 rows=0 loops=3)

Recheck Cond: (nodes && '{1}'::bigint[])
Heap Blocks: exact=1
->  Bitmap Index Scan on planet_osm_ways_nodes_idx 
(cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.173..0.174 
rows=1 loops=1)

  Index Cond: (nodes && '{1}'::bigint[])
Planning Time: 8.596 ms
Execution Time: 64.135 ms


max_parallel_workers_per_gather = 0 :

Bitmap Heap Scan on planet_osm_ways  (cost=11545.61..3462154.46 
rows=1420982 width=8) (actual time=0.677..0.679 rows=1 loops=1)

  Recheck Cond: (nodes && '{1}'::bigint[])
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on planet_osm_ways_nodes_idx 
(cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268 
rows=1 loops=1)

Index Cond: (nodes && '{1}'::bigint[])
Planning Time: 0.193 ms
Execution Time: 0.773 ms


So this Parallel Bitmap Heap Scan seems to be quite problematic here.
What runtime setting should I change to fix this, without losing the 
benefit of parallelism ?


Thanks a lot !

Cheers
--
Arnaud




Re: Slow statement using parallelism after 9.6>11 upgrade

2019-09-03 Thread Tom Lane
"Arnaud L."  writes:
> We have upgraded our database from 9.6 to 11 (and updated PostGIS from 
> 2.3 to 2.5 as well).
> ...

Have you re-ANALYZEd the database?  The problem with this query
seems to be the spectacularly awful rowcount estimate here:

>->  Bitmap Index Scan on planet_osm_ways_nodes_idx 
> (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268 
> rows=1 loops=1)
>  Index Cond: (nodes && '{1}'::bigint[])

The planner should be able to do better than that, given up-to-date
statistics on the "nodes" column.

regards, tom lane




Re: Slow statement using parallelism after 9.6>11 upgrade

2019-09-03 Thread Arnaud L.

Le 03/09/2019 à 15:43, Tom Lane a écrit :

"Arnaud L."  writes:
We have upgraded our database from 9.6 to 11 (and updated PostGIS from 
2.3 to 2.5 as well).

...


Have you re-ANALYZEd the database?  The problem with this query
seems to be the spectacularly awful rowcount estimate here:


You mean after the upgrade process ? Yes I have.
I've juste re-run "ANALYZE table" to rule this out, estimates are the same.

Maybe some statistic target problem ? Estimated number of rows is 
284.196.352

Also, this is a GIN index on a bigint[] column.

I've setup parallel_tuple_cost to 1.0 parallel_setup_cost to 5000.0 for 
the time being which solves this specific problem. These value don't 
look very sensible though, they are very high compared to the default ones.


Cheers
--
Arnaud




Re: Slow statement using parallelism after 9.6>11 upgrade

2019-09-03 Thread Arnaud L.

Le 03/09/2019 à 16:39, Paul Ramsey a écrit :

On Sep 3, 2019, at 7:06 AM, Arnaud L.  wrote:
I've setup parallel_tuple_cost to 1.0 parallel_setup_cost to 5000.0 for the 
time being which solves this specific problem. These value don't look very 
sensible though, they are very high compared to the default ones.


You can also leave that setting unchanged and just change the behaviour on your 
one table:

ALTER TABLE nodes SET ( parallel_workers = 0);


Nice, this will fix this precise query and leave parallelism in place 
for the other ones. Thanks !


--
Arnaud




Re: Slow statement using parallelism after 9.6>11 upgrade

2019-09-03 Thread Arnaud L.

Le 03/09/2019 à 17:02, Imre Samu a écrit :

 > We have upgraded our database from 9.6 to 11

This is now the latest PG ?   PG11.5?


Yes, PostgreSQL 11.5, compiled by Visual C++ build 1914, 64-bit.


- the osm2pgsql has an own parallelizations ...  ( osm2pgsql  
--number-processes .. )
    so be careful to add more parallelisation to the PG side with the 
same osm2pgsql parameters !   ( check the memory usages! )
- check the benchmarks and the tunnings: 
https://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks
- you can ask help on the  : "osm dev mail list" (="dev OpenStreetMap 
developer discusssion" https://lists.openstreetmap.org/listinfo/dev )



Yes, I know. It as been fine tuned over some time now to fit our 
server's capacity.
Disabling parallelisme has allowed the update to run in ~5000s, which is 
consistent with what we could witness before the upgrade.


I had started writing an issue on the osm2pgsql github before realizing 
it was probably a PG-only issue.



Cheers
--
Arnaud





Re: Postgres HA issue - Standby server does not start after Master compute host is shut down

2019-09-03 Thread Jehan-Guillaume (ioguix) de Rorthais
On Tue, 3 Sep 2019 20:27:37 +0530
Nagendra Bandi  wrote:
...
> *Problem Description :*
> 
> PostgreSQL database is set up for High Availability and *asynchronous
> streaming replication* with *hot standby* as described in
> https://www.postgresql.org/docs/9.4/warm-standby.html#STREAMING-REPLICATION.
> There is only 1 master and 1 standby server set up. The deployment is on
> Openstack cloud where master database is running on one compute host and
> standby server is running on another compute host. When master is brought
> down (*compute host is powered off*), stand by is *not* able to take over.
> The PostgreSQL database server on the standby fails to start with errors
> messages given below.
> 
> *Error logs from Standby Postgres DB:*
> 
> *Logs form postgres log :
> *< 2019-08-22 16:00:04.110 UTC >LOG: invalid record length at C/513BF150
> 
> < 2019-08-22 16:00:04.110 UTC >LOG: redo done at C/513BF118
> 
> < 2019-08-22 16:00:04.115 UTC >FATAL: WAL ends before end of online backup

It looks like the standby never received the end of online backup record from
the master.

Could you describe how you built your standby?

Regards,




Re: Slow statement using parallelism after 9.6>11 upgrade

2019-09-03 Thread Imre Samu
> We have upgraded our database from 9.6 to 11

This is now the latest PG ?   PG11.5?
( for example - in PG11.5 fixed: * "Fix failure to resolve deadlocks
involving multiple parallel worker processes"*
https://www.postgresql.org/docs/current/release-11-5.html )

> populated by the osm2pgsql program and updated on a daily basis.
> What runtime setting should I change to fix this, without losing the
benefit of parallelism ?

- the osm2pgsql has an own parallelizations ...  ( osm2pgsql
--number-processes .. )
   so be careful to add more parallelisation to the PG side with the same
osm2pgsql parameters !   ( check the memory usages! )
- check the benchmarks and the tunnings:
https://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks
- you can ask help on the  : "osm dev mail list" (="dev OpenStreetMap
developer discusssion" https://lists.openstreetmap.org/listinfo/dev )

Imre




Arnaud L.  ezt írta (időpont: 2019. szept. 3., K,
14:11):

> Hi list
>
> We have upgraded our database from 9.6 to 11 (and updated PostGIS from
> 2.3 to 2.5 as well).
> We are using it among other things to store an OpenStreetMap database,
> populated by the osm2pgsql program and updated on a daily basis.
>
> The process used to take ~1h30 minutes before the upgrade, but after the
> upgrade it looks like it will never run to completion in a 24h time frame.
>
> The problem is apparently that the planner choses to run some statements
> in parallel, but these take a lot longer than when run without it.
> Here are the to explain analyze of the same statement, with parallelism
> on and off :
>
> Statement :
> PREPARE mark_ways_by_node(int8) AS select id from planet_osm_ways WHERE
> nodes && ARRAY[$1];
> EXPLAIN EXECUTE mark_ways_by_node(1);
>
>
> max_parallel_workers_per_gather = 2 :
>
> Gather  (cost=12545.61..2357352.25 rows=1420982 width=8) (actual
> time=2.577..64.028 rows=1 loops=1)
>Workers Planned: 2
>Workers Launched: 2
>->  Parallel Bitmap Heap Scan on planet_osm_ways
> (cost=11545.61..2214254.05 rows=592076 width=8) (actual
> time=0.213..0.213 rows=0 loops=3)
>  Recheck Cond: (nodes && '{1}'::bigint[])
>  Heap Blocks: exact=1
>  ->  Bitmap Index Scan on planet_osm_ways_nodes_idx
> (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.173..0.174
> rows=1 loops=1)
>Index Cond: (nodes && '{1}'::bigint[])
> Planning Time: 8.596 ms
> Execution Time: 64.135 ms
>
>
> max_parallel_workers_per_gather = 0 :
>
> Bitmap Heap Scan on planet_osm_ways  (cost=11545.61..3462154.46
> rows=1420982 width=8) (actual time=0.677..0.679 rows=1 loops=1)
>Recheck Cond: (nodes && '{1}'::bigint[])
>Heap Blocks: exact=1
>->  Bitmap Index Scan on planet_osm_ways_nodes_idx
> (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268
> rows=1 loops=1)
>  Index Cond: (nodes && '{1}'::bigint[])
> Planning Time: 0.193 ms
> Execution Time: 0.773 ms
>
>
> So this Parallel Bitmap Heap Scan seems to be quite problematic here.
> What runtime setting should I change to fix this, without losing the
> benefit of parallelism ?
>
> Thanks a lot !
>
> Cheers
> --
> Arnaud
>
>
>


Re: Postgres HA issue - Standby server does not start after Master compute host is shut down

2019-09-03 Thread Nagendra Bandi
Hi Jehan,
Thanks for your quick response.
Standby is built from the primary using pg_basebackup. i.e. Initial copy of
the primary database is taken with pg_basebackup command and then restarted
the server.
Regards,
Nagendra

On Tue, Sep 3, 2019 at 8:43 PM Jehan-Guillaume (ioguix) de Rorthais <
iog...@free.fr> wrote:

> On Tue, 3 Sep 2019 20:27:37 +0530
> Nagendra Bandi  wrote:
> ...
> > *Problem Description :*
> >
> > PostgreSQL database is set up for High Availability and *asynchronous
> > streaming replication* with *hot standby* as described in
> >
> https://www.postgresql.org/docs/9.4/warm-standby.html#STREAMING-REPLICATION
> .
> > There is only 1 master and 1 standby server set up. The deployment is on
> > Openstack cloud where master database is running on one compute host and
> > standby server is running on another compute host. When master is brought
> > down (*compute host is powered off*), stand by is *not* able to take
> over.
> > The PostgreSQL database server on the standby fails to start with errors
> > messages given below.
> >
> > *Error logs from Standby Postgres DB:*
> >
> > *Logs form postgres log :
> > *< 2019-08-22 16:00:04.110 UTC >LOG: invalid record length at C/513BF150
> >
> > < 2019-08-22 16:00:04.110 UTC >LOG: redo done at C/513BF118
> >
> > < 2019-08-22 16:00:04.115 UTC >FATAL: WAL ends before end of online
> backup
>
> It looks like the standby never received the end of online backup record
> from
> the master.
>
> Could you describe how you built your standby?
>
> Regards,
>


Re: Slow statement using parallelism after 9.6>11 upgrade

2019-09-03 Thread Paul Ramsey



> On Sep 3, 2019, at 7:06 AM, Arnaud L.  wrote:
> 
> Le 03/09/2019 à 15:43, Tom Lane a écrit :
>> "Arnaud L."  writes:
>>> We have upgraded our database from 9.6 to 11 (and updated PostGIS from 2.3 
>>> to 2.5 as well).
>>> ...
>> Have you re-ANALYZEd the database?  The problem with this query
>> seems to be the spectacularly awful rowcount estimate here:
> 
> You mean after the upgrade process ? Yes I have.
> I've juste re-run "ANALYZE table" to rule this out, estimates are the same.
> 
> Maybe some statistic target problem ? Estimated number of rows is 284.196.352
> Also, this is a GIN index on a bigint[] column.
> 
> I've setup parallel_tuple_cost to 1.0 parallel_setup_cost to 5000.0 for the 
> time being which solves this specific problem. These value don't look very 
> sensible though, they are very high compared to the default ones.

You can also leave that setting unchanged and just change the behaviour on your 
one table:

ALTER TABLE nodes SET ( parallel_workers = 0);

P.

> 
> Cheers
> --
> Arnaud
> 
> 





Re: Bad Estimate for multi tenant database queries

2019-09-03 Thread Michael Lewis
>
> CREATE STATISTICS MT_ReservationDepartureUtc (ndistinct) on "AccountCode",
> "DepartureUtc" from "Reservation"."Reservation";
> CREATE STATISTICS MT_ReservationArrivalUtc (ndistinct) on "AccountCode",
> "ArrivalUtc" from "Reservation"."Reservation";
> CREATE STATISTICS MT_ReservationNoShowFeeId (ndistinct) on "AccountCode",
> "NoShowFeeId" from "Reservation"."Reservation";
> CREATE STATISTICS MT_ReservationTimeSliceDefinitionId (ndistinct) on
> "AccountCode", "TimeSliceDefinitionId" from "Reservation"."Reservation";
>
> but that didn't help either
>

Did you try with 'dependencies' for the statistics_kind or only ndistinct?
What was default_statistics_target set to at the time you created the
extended statistics? I am not sure if that value is used, but I would
assume so.


Re: literal vs dynamic partition constraint in plan execution

2019-09-03 Thread Michael Lewis
I think I would expect this behavior with how you have defined the
constraints as the function results rather than just being strict
comparisons to the timestamp field.

Instead of this-
Check constraints:
> "y2018_mis_ora_check" CHECK (date_part('year'::text, mis_ora) =
> 2018::double precision)

I would expect this-
Check constraints:
> "y2018_mis_ora_check" CHECK mis_ora) >= make_timestamp(2018, 1,
> 1, 0, 0, 0::double precision) AND < make_timestamp(2019, 1,
> 1, 0, 0, 0::double precision)


Re: Upgrade 96 -> 11

2019-09-03 Thread Adrian Klaver

On 9/2/19 5:52 PM, Adrian Klaver wrote:

It's still creating the schema elements when it fails, it hasn't 
started linking yet


Alright at least you still a working 9.6 cluster .

Not sure where to go from here. Like you I am not sure how it can CREATE 
EXTENSION and not actually follow through on that. Especially with no 
errors for that operation. I'm going to have to think on this. Hopefully 
someone else has an idea on this and can chime in.


Aah. I don't have postgis installed, still:

pg_dump --binary-upgrade -s  -d production -U postgres > 
production_binary.sql


-- For binary upgrade, create an empty extension and insert objects into it
DROP EXTENSION IF EXISTS tablefunc;
SELECT pg_catalog.binary_upgrade_create_empty_extension('tablefunc', 
'public', true, '1.0', NULL, NULL, ARRAY[]::pg_catalog.text[]);



Try the above on your schema and see what you get.






 >
 > I have set PGBINOLD, PGBINNEW, PGDATAOLD, PGDATANEW correctly.



--
Adrian Klaver
adrian.kla...@aklaver.com




Running a Simple Update Statement Fails, Second Time Suceeds.

2019-09-03 Thread Kumar, Virendra
Here is simple query, I am running via function call. This statement runs fine 
as SQL but when put in function as plpgsql it failes with error below, when I 
ran second times in same session it succeeds:
--
product_master_fdw=> \set VERBOSITY verbose
product_master_fdw=>
select sddf_update.tps_update_1(p_pres_id_in=>50241::integer,
  p_last_audit_update_dt_in=>'2019-09-03 
12:44:21.356638'::timestamp ,
  
p_audit_update_user_name_tx_in=>'a...@xyz.com'::character varying,
  p_major_class_name_tx_in=>'TEST0826222'::character 
varying,
  p_effective_dt_in=>CURRENT_TIMESTAMP::timestamp
 );
ERROR:  XX000: cache lookup failed for type 196609
CONTEXT:  SQL statement "UPDATE product_history.external_sys_class_code_pres
 SET class_code_id =
CASE WHEN p_class_code_id_in='0.1' THEN  class_code_id  
ELSE p_class_code_id_in END ,
 major_classification_cd =
CASE WHEN p_major_classification_cd_in='.1' THEN  
major_classification_cd  ELSE p_major_classification_cd_in END ,
 major_classification_name_tx =
CASE WHEN p_major_class_name_tx_in='0.1' THEN  
major_classification_name_tx  ELSE p_major_class_name_tx_in END ,
 coverage_short_name_tx =
CASE WHEN p_coverage_short_name_tx_in='0.1' THEN  
coverage_short_name_tx  ELSE p_coverage_short_name_tx_in END ,
 coverage_name_tx =
CASE WHEN p_coverage_name_tx_in='0.1' THEN  
coverage_name_tx  ELSE p_coverage_name_tx_in END ,
 cdt_source_system_cd =
CASE WHEN p_cdt_source_system_cd_in='0.1' THEN  
cdt_source_system_cd  ELSE p_cdt_source_system_cd_in END ,
 cdt_submitting_country_cd =
CASE WHEN p_cdt_submitting_country_cd_in='0.1' THEN  
cdt_submitting_country_cd  ELSE p_cdt_submitting_country_cd_in END ,
 cdt_status_cd =
CASE WHEN p_cdt_status_cd_in='0.1' THEN  cdt_status_cd  
ELSE p_cdt_status_cd_in END ,
 effective_dt =
CASE WHEN p_effective_dt_in=TO_DATE('01/01/1600', 'mm/dd/') 
THEN  effective_dt  ELSE p_effective_dt_in END ,
 expiration_dt =
CASE WHEN p_expiration_dt_in=TO_DATE('01/01/1600', 
'mm/dd/') THEN  expiration_dt  ELSE p_expiration_dt_in END ,
 audit_insert_user_name_tx =
CASE WHEN p_audit_insert_user_name_tx_in='0.1' THEN  
audit_insert_user_name_tx  ELSE p_audit_insert_user_name_tx_in END ,
 audit_update_dt = CURRENT_TIMESTAMP,
 audit_update_user_name_tx =
CASE WHEN p_audit_update_user_name_tx_in='0.1' THEN  
audit_update_user_name_tx  ELSE p_audit_update_user_name_tx_in END ,
 latest_version_in =
CASE WHEN p_latest_version_in_in='0' THEN  latest_version_in  
ELSE p_latest_version_in_in END ,
 delete_in =
CASE WHEN p_delete_in_in='0' THEN  delete_in  ELSE 
p_delete_in_in END
   WHERE pres_id = p_pres_id_in
 AND audit_update_dt = p_last_audit_update_dt_in"
PL/pgSQL function px_co_pr_pres_pg.spt_update_1(bigint,timestamp without time 
zone,timestamp without time zone,character varying,character varying,character 
varying,character varying,character varying,character varying,character 
varying,character varying,character varying,character varying,character 
varying,character varying,timestamp without time zone) line 7 at SQL statement
LOCATION:  getTypeOutputInfo, lsyscache.c:2681


Regards,
Virendra



This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.


RE: Running a Simple Update Statement Fails, Second Time Suceeds.

2019-09-03 Thread Kumar, Virendra
Type ID doesn't seem to be existing:

product_master_fdw=> select * from pg_type where oid=196609;
typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | 
typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | 
typinput | typoutput | typreceive | t
ypsend | typmodin | typmodout | typanalyze | typalign | typstorage | typnotnull 
| typbasetype | typtypmod | typndims | typcollation | typdefaultbin | 
typdefault | typacl
-+--+--++--+-+-++--+--+--+-+--+--+---++--
---+--+---++--+++-+---+--+--+---++
(0 rows)

Regards,
Virendra

From: Kumar, Virendra
Sent: Tuesday, September 03, 2019 4:56 PM
To: pgsql-general@lists.postgresql.org
Subject: Running a Simple Update Statement Fails, Second Time Suceeds.

Here is simple query, I am running via function call. This statement runs fine 
as SQL but when put in function as plpgsql it failes with error below, when I 
ran second times in same session it succeeds:
--
product_master_fdw=> \set VERBOSITY verbose
product_master_fdw=>
select sddf_update.tps_update_1(p_pres_id_in=>50241::integer,
  p_last_audit_update_dt_in=>'2019-09-03 
12:44:21.356638'::timestamp ,
  
p_audit_update_user_name_tx_in=>'a...@xyz.com'::character varying,
  p_major_class_name_tx_in=>'TEST0826222'::character 
varying,
  p_effective_dt_in=>CURRENT_TIMESTAMP::timestamp
 );
ERROR:  XX000: cache lookup failed for type 196609
CONTEXT:  SQL statement "UPDATE product_history.external_sys_class_code_pres
 SET class_code_id =
CASE WHEN p_class_code_id_in='0.1' THEN  class_code_id  
ELSE p_class_code_id_in END ,
 major_classification_cd =
CASE WHEN p_major_classification_cd_in='.1' THEN  
major_classification_cd  ELSE p_major_classification_cd_in END ,
 major_classification_name_tx =
CASE WHEN p_major_class_name_tx_in='0.1' THEN  
major_classification_name_tx  ELSE p_major_class_name_tx_in END ,
 coverage_short_name_tx =
CASE WHEN p_coverage_short_name_tx_in='0.1' THEN  
coverage_short_name_tx  ELSE p_coverage_short_name_tx_in END ,
 coverage_name_tx =
CASE WHEN p_coverage_name_tx_in='0.1' THEN  
coverage_name_tx  ELSE p_coverage_name_tx_in END ,
 cdt_source_system_cd =
CASE WHEN p_cdt_source_system_cd_in='0.1' THEN  
cdt_source_system_cd  ELSE p_cdt_source_system_cd_in END ,
 cdt_submitting_country_cd =
CASE WHEN p_cdt_submitting_country_cd_in='0.1' THEN  
cdt_submitting_country_cd  ELSE p_cdt_submitting_country_cd_in END ,
 cdt_status_cd =
CASE WHEN p_cdt_status_cd_in='0.1' THEN  cdt_status_cd  
ELSE p_cdt_status_cd_in END ,
 effective_dt =
CASE WHEN p_effective_dt_in=TO_DATE('01/01/1600', 'mm/dd/') 
THEN  effective_dt  ELSE p_effective_dt_in END ,
 expiration_dt =
CASE WHEN p_expiration_dt_in=TO_DATE('01/01/1600', 
'mm/dd/') THEN  expiration_dt  ELSE p_expiration_dt_in END ,
 audit_insert_user_name_tx =
CASE WHEN p_audit_insert_user_name_tx_in='0.1' THEN  
audit_insert_user_name_tx  ELSE p_audit_insert_user_name_tx_in END ,
 audit_update_dt = CURRENT_TIMESTAMP,
 audit_update_user_name_tx =
CASE WHEN p_audit_update_user_name_tx_in='0.1' THEN  
audit_update_user_name_tx  ELSE p_audit_update_user_name_tx_in END ,
 latest_version_in =
CASE WHEN p_latest_version_in_in='0' THEN  latest_version_in  
ELSE p_latest_version_in_in END ,
 delete_in =
CASE WHEN p_delete_in_in='0' THEN  delete_in  ELSE 
p_delete_in_in END
   WHERE pres_id = p_pres_id_in
 AND audit_update_dt = p_last_audit_update_dt_in"
PL/pgSQL function px_co_pr_pres_pg.spt_update_1(bigint,timestamp without time 
zone,timestamp without time zone,character varying,character varying,character 
varying,character varying,character varying,character varying,character 
varying,character varying,character varying,character varying,character 
varying,character varying,timestamp without time zone) line 7 at SQL statement
LOCATION:  getTypeOutputInfo, lsyscache.c:2681


Regards,
Virendra



This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this commun

Re: Running a Simple Update Statement Fails, Second Time Suceeds.

2019-09-03 Thread Adrian Klaver

On 9/3/19 1:56 PM, Kumar, Virendra wrote:
Here is simple query, I am running via function call. This statement 
runs fine as SQL but when put in function as plpgsql it failes with 
error below, when I ran second times in same session it succeeds:


We will need to see the function definition.



--

product_master_fdw=> \set VERBOSITY verbose

product_master_fdw=>

select sddf_update.tps_update_1(p_pres_id_in=>50241::integer,

   p_last_audit_update_dt_in=>'2019-09-03 
12:44:21.356638'::timestamp ,


   
p_audit_update_user_name_tx_in=>'a...@xyz.com'::character varying,


   
p_major_class_name_tx_in=>'TEST0826222'::character varying,


   p_effective_dt_in=>CURRENT_TIMESTAMP::timestamp

  );

ERROR:  XX000: cache lookup failed for type 196609

CONTEXT:  SQL statement "UPDATE product_history.external_sys_class_code_pres

  SET class_code_id =

     CASE WHEN p_class_code_id_in='0.1' THEN  
class_code_id  ELSE p_class_code_id_in END ,


  major_classification_cd =

     CASE WHEN p_major_classification_cd_in='.1' 
THEN  major_classification_cd  ELSE p_major_classification_cd_in END ,


  major_classification_name_tx =

     CASE WHEN p_major_class_name_tx_in='0.1' THEN  
major_classification_name_tx  ELSE p_major_class_name_tx_in END ,


  coverage_short_name_tx =

     CASE WHEN p_coverage_short_name_tx_in='0.1' 
THEN  coverage_short_name_tx  ELSE p_coverage_short_name_tx_in END ,


  coverage_name_tx =

     CASE WHEN p_coverage_name_tx_in='0.1' THEN  
coverage_name_tx  ELSE p_coverage_name_tx_in END ,


  cdt_source_system_cd =

     CASE WHEN p_cdt_source_system_cd_in='0.1' THEN  
cdt_source_system_cd  ELSE p_cdt_source_system_cd_in END ,


  cdt_submitting_country_cd =

     CASE WHEN p_cdt_submitting_country_cd_in='0.1' 
THEN  cdt_submitting_country_cd  ELSE p_cdt_submitting_country_cd_in END ,


  cdt_status_cd =

     CASE WHEN p_cdt_status_cd_in='0.1' THEN  
cdt_status_cd  ELSE p_cdt_status_cd_in END ,


  effective_dt =

     CASE WHEN p_effective_dt_in=TO_DATE('01/01/1600', 
'mm/dd/') THEN  effective_dt  ELSE p_effective_dt_in END ,


  expiration_dt =

     CASE WHEN p_expiration_dt_in=TO_DATE('01/01/1600', 
'mm/dd/') THEN  expiration_dt  ELSE p_expiration_dt_in END ,


  audit_insert_user_name_tx =

     CASE WHEN p_audit_insert_user_name_tx_in='0.1' 
THEN  audit_insert_user_name_tx  ELSE p_audit_insert_user_name_tx_in END ,


  audit_update_dt = CURRENT_TIMESTAMP,

      audit_update_user_name_tx =

     CASE WHEN p_audit_update_user_name_tx_in='0.1' 
THEN  audit_update_user_name_tx  ELSE p_audit_update_user_name_tx_in END ,


  latest_version_in =

     CASE WHEN p_latest_version_in_in='0' THEN  
latest_version_in  ELSE p_latest_version_in_in END ,


  delete_in =

     CASE WHEN p_delete_in_in='0' THEN  delete_in  ELSE 
p_delete_in_in END


    WHERE pres_id = p_pres_id_in

  AND audit_update_dt = p_last_audit_update_dt_in"

PL/pgSQL function px_co_pr_pres_pg.spt_update_1(bigint,timestamp without 
time zone,timestamp without time zone,character varying,character 
varying,character varying,character varying,character varying,character 
varying,character varying,character varying,character varying,character 
varying,character varying,character varying,timestamp without time zone) 
line 7 at SQL statement


LOCATION:  getTypeOutputInfo, lsyscache.c:2681

Regards,

Virendra




This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Running a Simple Update Statement Fails, Second Time Suceeds.

2019-09-03 Thread Tom Lane
Adrian Klaver  writes:
> On 9/3/19 1:56 PM, Kumar, Virendra wrote:
>> Here is simple query, I am running via function call. This statement 
>> runs fine as SQL but when put in function as plpgsql it failes with 
>> error below, when I ran second times in same session it succeeds:

> We will need to see the function definition.

Also, what PG version is that?  This looks a bit like bug #15913,
but it might be something else.

>> This message is intended only for the use of the addressee and may contain
>> information that is PRIVILEGED AND CONFIDENTIAL.
>> 
>> If you are not the intended recipient, you are hereby notified that any
>> dissemination of this communication is strictly prohibited. If you have
>> received this communication in error, please erase all copies of the message
>> and its attachments and notify the sender immediately. Thank you.

I rather wonder whether I'm even allowed to read this, let alone
answer it.  You do realize that this sort of add-on is completely
silly on a public mailing list?

regards, tom lane




Re: Running a Simple Update Statement Fails, Second Time Suceeds.

2019-09-03 Thread Kumar, Virendra
Hi Tom,

Sincere apologies for that privacy notice in email, this company policy which I 
cannot skip.

Adrian,

The function is really simple. I’ll share the code as soon as I can.

Regards,
Virendra

> On Sep 3, 2019, at 5:41 PM, Tom Lane  wrote:
>
> Adrian Klaver  writes:
>>> On 9/3/19 1:56 PM, Kumar, Virendra wrote:
>>> Here is simple query, I am running via function call. This statement
>>> runs fine as SQL but when put in function as plpgsql it failes with
>>> error below, when I ran second times in same session it succeeds:
>
>> We will need to see the function definition.
>
> Also, what PG version is that?  This looks a bit like bug #15913,
> but it might be something else.
>
>>> This message is intended only for the use of the addressee and may contain
>>> information that is PRIVILEGED AND CONFIDENTIAL.
>>>
>>> If you are not the intended recipient, you are hereby notified that any
>>> dissemination of this communication is strictly prohibited. If you have
>>> received this communication in error, please erase all copies of the message
>>> and its attachments and notify the sender immediately. Thank you.
>
> I rather wonder whether I'm even allowed to read this, let alone
> answer it.  You do realize that this sort of add-on is completely
> silly on a public mailing list?
>
>regards, tom lane



This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.


killing vacuum analyze process

2019-09-03 Thread Julie Nishimura
Hello there,
PostgreSQL 9.6.2 on x86_64-pc-linux-gnu
It seems like we have one process running since 8/19. Is it any repercussion if 
we kill it?

postgres=# select * from pg_stat_activity where pid = '18986';
-[ RECORD 1 
]+-
datid| 2342921
datname  | symphony_middleware
pid  | 18986
usesysid | 10
usename  | postgres
application_name |
client_addr  |
client_hostname  |
client_port  |
backend_start| 2019-08-19 02:00:59.683198+00
xact_start   | 2019-08-19 02:03:54.191741+00
query_start  | 2019-08-19 02:03:54.191741+00
state_change | 2019-08-19 02:03:54.191742+00
wait_event_type  | BufferPin
wait_event   | BufferPin
state| active
backend_xid  |
backend_xmin | 1075730757
query| autovacuum: VACUUM ANALYZE public.resource_build (to prevent 
wraparound)



Re: killing vacuum analyze process

2019-09-03 Thread Tom Lane
Julie Nishimura  writes:
> PostgreSQL 9.6.2 on x86_64-pc-linux-gnu
> It seems like we have one process running since 8/19. Is it any repercussion 
> if we kill it?

> postgres=# select * from pg_stat_activity where pid = '18986';
> -[ RECORD 1 
> ]+-
> datid| 2342921
> datname  | symphony_middleware
> pid  | 18986
> usesysid | 10
> usename  | postgres
> application_name |
> client_addr  |
> client_hostname  |
> client_port  |
> backend_start| 2019-08-19 02:00:59.683198+00
> xact_start   | 2019-08-19 02:03:54.191741+00
> query_start  | 2019-08-19 02:03:54.191741+00
> state_change | 2019-08-19 02:03:54.191742+00
> wait_event_type  | BufferPin
> wait_event   | BufferPin
> state| active
> backend_xid  |
> backend_xmin | 1075730757
> query| autovacuum: VACUUM ANALYZE public.resource_build (to 
> prevent wraparound)

Since this is an anti-wraparound vacuum, autovacuum is just going to
launch another one pretty soon if you kill this one.  Assuming that
the buffer pin blockage is real and not some kind of broken shared
memory state, the new one will hang up at the same spot.  You'd be
better advised to find out what's pinning that buffer and kill that.

Admittedly this is easier said than done, since there's not much
infrastructure for seeing what's happening at that level.  But you
could look for transactions that are at least as old as this one and
have some kind of lock on that table (according to pg_locks).

If there are no such transactions, then the corrupt-shared-memory
hypothesis becomes likely, and a postmaster restart is indicated.

BTW, you really ought to be running something newer than 9.6.2.

regards, tom lane




Re: killing vacuum analyze process

2019-09-03 Thread Julie Nishimura
Thank you Tom. I can see bunch of old processes running ROLLBACK... Should I 
kill them or they only way to clear those is to restart the server?

postgres=# select * from pg_stat_activity where datname ='symphony_middleware' 
and  query_start < '08/20/2019' and query like '%ROLLBACK%';
  datid  |   datname   |  pid   | usesysid |   usename   | 
application_name | client_addr  | client_hostname | client_port | 
backend_start | xact_start |  query_start  | 
state_change  | wait_event_type | wait_event | state | backend_xid | 
backend_xmin |  query
-+-++--+-+--+--+-+-+---++---+---+-++---+-+--+--
 2342921 | symphony_middleware |  72165 |16407 | symphony_middleware |  
| 10.24.33.226 | |   37252 | 2019-06-04 
13:17:56.204483+00 || 2019-06-24 13:51:52.834401+00 | 2019-06-24 
13:51:52.834479+00 | || idle  | |   
   | ROLLBACK
 2342921 | symphony_middleware | 159407 |16407 | symphony_middleware |  
| 10.24.33.6   | |   45482 | 2019-07-29 
16:23:36.271366+00 || 2019-08-16 18:28:27.924116+00 | 2019-08-16 
18:28:27.92419+00  | || idle  | |   
   | ROLLBACK
 2342921 | symphony_middleware | 135443 |16407 | symphony_middleware |  
| 10.24.33.226 | |   47712 | 2019-05-31 
16:17:55.143017+00 || 2019-08-13 15:17:01.685057+00 | 2019-08-13 
15:17:01.685103+00 | || idle  | |   
   | ROLLBACK
 2342921 | symphony_middleware | 135442 |16407 | symphony_middleware |  
| 10.24.33.226 | |   47710 | 2019-05-31 
16:17:55.132574+00 || 2019-08-13 15:17:32.973151+00 | 2019-08-13 
15:17:32.97322+00  | || idle  | |   
   | ROLLBACK
 2342921 | symphony_middleware | 135440 |16407 | symphony_middleware |  
| 10.24.33.226 | |   47706 | 2019-05-31 
16:17:55.082091+00 || 2019-06-24 17:23:15.519127+00 | 2019-06-24 
17:23:15.519181+00 | || idle  | |   
   | ROLLBACK
 2342921 | symphony_middleware |  38211 |16407 | symphony_middleware |  
| 10.24.33.6   | |   43318 | 2019-06-04 
10:33:19.742976+00 || 2019-06-24 17:23:09.212067+00 | 2019-06-24 
17:23:09.212117+00 | || idle  | |   
   | ROLLBACK
 2342921 | symphony_middleware | 136304 |16407 | symphony_middleware |  
| 10.24.33.6   | |   51012 | 2019-05-31 
16:20:23.911493+00 || 2019-08-19 10:17:33.284189+00 | 2019-08-19 
10:17:33.284256+00 | || idle  | |   
   | ROLLBACK
(7 rows)


From: Tom Lane 
Sent: Tuesday, September 3, 2019 4:11 PM
To: Julie Nishimura 
Cc: pgsql-general@lists.postgresql.org 
Subject: Re: killing vacuum analyze process

Julie Nishimura  writes:
> PostgreSQL 9.6.2 on x86_64-pc-linux-gnu
> It seems like we have one process running since 8/19. Is it any repercussion 
> if we kill it?

> postgres=# select * from pg_stat_activity where pid = '18986';
> -[ RECORD 1 
> ]+-
> datid| 2342921
> datname  | symphony_middleware
> pid  | 18986
> usesysid | 10
> usename  | postgres
> application_name |
> client_addr  |
> client_hostname  |
> client_port  |
> backend_start| 2019-08-19 02:00:59.683198+00
> xact_start   | 2019-08-19 02:03:54.191741+00
> query_start  | 2019-08-19 02:03:54.191741+00
> state_change | 2019-08-19 02:03:54.191742+00
> wait_event_type  | BufferPin
> wait_event   | BufferPin
> state| active
> backend_xid  |
> backend_xmin | 1075730757
> query| autovacuum: VACUUM ANALYZE public.resource_build (to 
> prevent wraparound)

Since this is an anti-wraparound vacuum, autovacuum is just going to
launch another one pretty soon if you kill this one.  Assuming that
the buffer pin blockage is real and not some kind of broken shared
memory state, the new one will hang up at the same spot.  You'd be
better advised to find out what's pinning that buffer and kill that.

Admittedly this is easier said than done, since there's not much
infrastructure for seeing what's happening at that level.  But you
could look for transactions that are at least as ol

Re: Unexpected "canceling statement due to user request" error

2019-09-03 Thread Will Storey
On Sun 2019-09-01 20:58:30 -0400, Tom Lane wrote:
> >> A separate question is how come the particular query you're complaining
> >> about has (seemingly) a fairly wide window where it never does any
> >> CHECK_FOR_INTERRUPTS call before terminating.  Perhaps there's someplace
> >> we need to sprinkle a few more of those.
> 
> > Yeah, it is strange. There are many queries in this system with statement
> > timeouts and this is the only one where I see this and it happens several
> > times a day.
> 
> It'd be interesting to see EXPLAIN ANALYZE output from a typical execution
> of that query.  Even better would be a test case (w/ tables and dummy
> data) so somebody else could reproduce it; but maybe looking at EXPLAIN
> ANALYZE would be enough to determine where we're missing checks.

I put output from a query with values that caused the error here:

https://gist.githubusercontent.com/horgh/653e60b8c2f071d859424517d653eb4e/raw/5d16e11d0ac884ed9dc120c4521e2de8f2e7c3d6/test.txt

I sanitised some of the names and values again. Sorry about that.

As far as a test case: I'm not sure how to reproduce it myself. It just
happens periodically (15 times in the past 24 hours). The tables are
somewhat large (60+ GiB per partition). I could try to generate dummy
versions if you like, but they would have to be significantly dummied, so
I'm not sure it would end up being representative.




Re: Upgrade 96 -> 11

2019-09-03 Thread Adrian Klaver

On 9/3/19 3:45 PM, James Sewell wrote:







-- For binary upgrade, create an empty extension and insert objects
into it
DROP EXTENSION IF EXISTS tablefunc;
SELECT pg_catalog.binary_upgrade_create_empty_extension('tablefunc',
'public', true, '1.0', NULL, NULL, ARRAY[]::pg_catalog.text[]);


Try the above on your schema and see what you get.


Yep - an empty extension. I think this logic is wrong. Creating an empty 
extension is fine and makes sense but extension owned relations should 
be created as the next step, not just at some time later.




So to be clear you ran pg_dump with  --binary-upgrade and the extension 
elements where created after the user table that tripped the error?




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: killing vacuum analyze process

2019-09-03 Thread Ron
Those are all idle.  I've got a cron job to kill all idle processes more 
than 60 (or 30, I don't remember) minutes old.


On 9/3/19 6:29 PM, Julie Nishimura wrote:
Thank you Tom. I can see bunch of old processes running ROLLBACK... Should 
I kill them or they only way to clear those is to restart the server?


postgres=# select * from pg_stat_activity where datname 
='symphony_middleware' and  query_start < '08/20/2019' and query like 
'%ROLLBACK%';
  datid  |       datname       |  pid   | usesysid |   usename       | 
application_name | client_addr  | client_hostname | client_port |         
backend_start   | xact_start |          query_start          | 
state_change          | wait_event_type | wait_event | state | backend_xid 
| backend_xmin |  query

-+-++--+-+--+--+-+-+---++---+---+-++---+-+--+--
 2342921 | symphony_middleware |  72165 |    16407 | symphony_middleware | 
                 | 10.24.33.226 |             |       37252 | 2019-06-04 
13:17:56.204483+00 |            | 2019-06-24 13:51:52.834401+00 | 
2019-06-24 13:51:52.834479+00 |                 |            | idle  |     
        |              | ROLLBACK
 2342921 | symphony_middleware | 159407 |    16407 | symphony_middleware | 
                 | 10.24.33.6   |             |       45482 | 2019-07-29 
16:23:36.271366+00 |            | 2019-08-16 18:28:27.924116+00 | 
2019-08-16 18:28:27.92419+00  |                 |            | idle  |     
        |              | ROLLBACK
 2342921 | symphony_middleware | 135443 |    16407 | symphony_middleware | 
                 | 10.24.33.226 |             |       47712 | 2019-05-31 
16:17:55.143017+00 |            | 2019-08-13 15:17:01.685057+00 | 
2019-08-13 15:17:01.685103+00 |                 |            | idle  |     
        |              | ROLLBACK
 2342921 | symphony_middleware | 135442 |    16407 | symphony_middleware | 
                 | 10.24.33.226 |             |       47710 | 2019-05-31 
16:17:55.132574+00 |            | 2019-08-13 15:17:32.973151+00 | 
2019-08-13 15:17:32.97322+00  |                 |            | idle  |     
        |              | ROLLBACK
 2342921 | symphony_middleware | 135440 |    16407 | symphony_middleware | 
                 | 10.24.33.226 |             |       47706 | 2019-05-31 
16:17:55.082091+00 |            | 2019-06-24 17:23:15.519127+00 | 
2019-06-24 17:23:15.519181+00 |                 |            | idle  |     
        |              | ROLLBACK
 2342921 | symphony_middleware |  38211 |    16407 | symphony_middleware | 
                 | 10.24.33.6   |             |       43318 | 2019-06-04 
10:33:19.742976+00 |            | 2019-06-24 17:23:09.212067+00 | 
2019-06-24 17:23:09.212117+00 |                 |            | idle  |     
        |              | ROLLBACK
 2342921 | symphony_middleware | 136304 |    16407 | symphony_middleware | 
                 | 10.24.33.6   |             |       51012 | 2019-05-31 
16:20:23.911493+00 |            | 2019-08-19 10:17:33.284189+00 | 
2019-08-19 10:17:33.284256+00 |                 |            | idle  |     
        |              | ROLLBACK

(7 rows)


*From:* Tom Lane 
*Sent:* Tuesday, September 3, 2019 4:11 PM
*To:* Julie Nishimura 
*Cc:* pgsql-general@lists.postgresql.org 
*Subject:* Re: killing vacuum analyze process
Julie Nishimura  writes:
> PostgreSQL 9.6.2 on x86_64-pc-linux-gnu
> It seems like we have one process running since 8/19. Is it any 
repercussion if we kill it?


> postgres=# select * from pg_stat_activity where pid = '18986';
> -[ RECORD 1 
]+-

> datid    | 2342921
> datname  | symphony_middleware
> pid  | 18986
> usesysid | 10
> usename  | postgres
> application_name |
> client_addr  |
> client_hostname  |
> client_port  |
> backend_start    | 2019-08-19 02:00:59.683198+00
> xact_start   | 2019-08-19 02:03:54.191741+00
> query_start  | 2019-08-19 02:03:54.191741+00
> state_change | 2019-08-19 02:03:54.191742+00
> wait_event_type  | BufferPin
> wait_event   | BufferPin
> state    | active
> backend_xid  |
> backend_xmin | 1075730757
> query    | autovacuum: VACUUM ANALYZE public.resource_build (to 
prevent wraparound)


Since this is an anti-wraparound vacuum, autovacuum is just going to
launch another one pretty soon if you kill this one. Assuming that
the buffer pin blockage is real and not some kind of broken shared
memory state, the new one will hang up at the same spot.  You'd be
better advised to find out what's pinning that buffer and kill that.

Admittedly 

Re: Upgrade 96 -> 11

2019-09-03 Thread James Sewell
On Wed, 4 Sep 2019 at 5:47 am, Adrian Klaver 
wrote:

> On 9/2/19 5:52 PM, Adrian Klaver wrote:
>
> >> It's still creating the schema elements when it fails, it hasn't
> >> started linking yet
> >
> > Alright at least you still a working 9.6 cluster .
> >
> > Not sure where to go from here. Like you I am not sure how it can CREATE
> > EXTENSION and not actually follow through on that. Especially with no
> > errors for that operation. I'm going to have to think on this. Hopefully
> > someone else has an idea on this and can chime in.
>
> Aah. I don't have postgis installed, still:
>
> pg_dump --binary-upgrade -s  -d production -U postgres >
> production_binary.sql
>
> -- For binary upgrade, create an empty extension and insert objects into it
> DROP EXTENSION IF EXISTS tablefunc;
> SELECT pg_catalog.binary_upgrade_create_empty_extension('tablefunc',
> 'public', true, '1.0', NULL, NULL, ARRAY[]::pg_catalog.text[]);
>
>
> Try the above on your schema and see what you get.


Yep - an empty extension. I think this logic is wrong. Creating an empty
extension is fine and makes sense but extension owned relations should be
created as the next step, not just at some time later.

>
>
>
> >
> >>
> >>  >
> >>  > I have set PGBINOLD, PGBINNEW, PGDATAOLD, PGDATANEW correctly.
> >>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
-- 
James Sewell,
Chief Architect

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
P (+61) 2 8099 9000  W www.jirotech.com  F (+61) 2 8099 9099

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: killing vacuum analyze process

2019-09-03 Thread Tom Lane
Julie Nishimura  writes:
> Thank you Tom. I can see bunch of old processes running ROLLBACK... Should I 
> kill them or they only way to clear those is to restart the server?

As Ron pointed out, those processes aren't actually running ROLLBACK;
that was just their last query.  They're idle (and not "idle in
transaction") so in theory they're not holding any locks.

After thinking more about it I realize that the process holding the
buffer pin doesn't have to be as old as the VACUUM; so you probably
should just look at pg_locks and see what else has got a lock on
the table the VACUUM is trying to process.  The hypothetical offender
is probably somewhat old, or you'd not have noticed the blockage,
but we don't have any clear evidence as to how old.

regards, tom lane




Re: Upgrade 96 -> 11

2019-09-03 Thread James Sewell
If anyone hits this it is an issue with using the Geography type with a non
4326 SRID on a table and pg_upgrade.

It should be fixed (the check dropped as it's something of a relic) in the
next version of PostGIS. In the meantime you would have to patch it out
yourself.

(
https://github.com/postgis/postgis/blob/svn-trunk/postgis/gserialized_typmod.c#L296
 )


On Wed, 4 Sep 2019 at 10:03, Adrian Klaver 
wrote:

> On 9/3/19 3:45 PM, James Sewell wrote:
> >
> >
>
> >
> > -- For binary upgrade, create an empty extension and insert objects
> > into it
> > DROP EXTENSION IF EXISTS tablefunc;
> > SELECT pg_catalog.binary_upgrade_create_empty_extension('tablefunc',
> > 'public', true, '1.0', NULL, NULL, ARRAY[]::pg_catalog.text[]);
> >
> >
> > Try the above on your schema and see what you get.
> >
> >
> > Yep - an empty extension. I think this logic is wrong. Creating an empty
> > extension is fine and makes sense but extension owned relations should
> > be created as the next step, not just at some time later.
> >
>
> So to be clear you ran pg_dump with  --binary-upgrade and the extension
> elements where created after the user table that tripped the error?
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.