Postgres Crashing

2020-02-03 Thread Doug Roberts
Hello,

I'm having an issue where a process in Postgres is crashing and cause the
server to go into recovery mode.

I'm getting the following errors in the log.

2020-02-03 14:12:57.473 EST [11992] [0]WARNING:  57P02: terminating
connection because of crash of another server process
2020-02-03 14:12:57.473 EST [11992] [0]DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2020-02-03 14:12:57.473 EST [11992] [0]HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
2020-02-03 14:12:57.473 EST [11992] [0]CONTEXT:  while locking tuple
(4101,2) in relation "containers"
SQL statement "UPDATE containers
   SET type_uid = COALESCE(declared_type_uid, type_uid),
   carton_type_uid = COALESCE(declared_carton_type_uid,
carton_type_uid),
   status_uid = COALESCE(declared_status_uid, status_uid),
   order_uid = COALESCE(in_order_uid, order_uid),
   wave_uid = COALESCE(in_wave_uid, wave_uid),
   length = COALESCE(in_length, carton_length, length),
   width = COALESCE(in_width, carton_width, width),
   height = COALESCE(in_height, carton_height, height),
   weight = COALESCE(in_weight, weight),
   weight_minimum = COALESCE(in_weight_minimum, weight_minimum),
   weight_maximum = COALESCE(in_weight_maximum, weight_maximum),
   weight_expected = COALESCE(in_weight_expected,
weight_expected),
   first_seen_decision_point_id =
COALESCE(first_seen_decision_point_id, in_last_seen_decision_point_id),
   first_seen_datetime = COALESCE(first_seen_datetime,
last_seen_date_time),
   last_seen_decision_point_id =
COALESCE(in_last_seen_decision_point_id, last_seen_decision_point_id),
   last_seen_datetime = COALESCE(last_seen_date_time,
last_seen_datetime),
   recirculation_count = COALESCE(in_recirculation_count,
recirculation_count),
   project_flags = COALESCE(in_project_flags, project_flags),
   passed_weight_check = COALESCE(in_passed_weight_check,
passed_weight_check)
   WHERE uid = in_uid"
PL/pgSQL function
containers_add_update(integer,integer,integer,integer,integer,integer,double
precision,double precision,double precision,double precision,double
precision,double precision,double precision,integer,timestamp without time
zone,character varying,bigint,boolean) line 60 at SQL statement
2020-02-03 14:12:57.473 EST [11992] [0]LOCATION:  quickdie, postgres.c:2717
2020-02-03 14:12:57.473 EST [12260] [0]WARNING:  57P02: terminating
connection because of crash of another server process
2020-02-03 14:12:57.473 EST [12260] [0]DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2020-02-03 14:12:57.473 EST [12260] [0]HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
2020-02-03 14:12:57.473 EST [12260] [0]LOCATION:  quickdie, postgres.c:2717
2020-02-03 14:12:57.476 EST [24552] [0]WARNING:  57P02: terminating
connection because of crash of another server process
2020-02-03 14:12:57.476 EST [24552] [0]DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2020-02-03 14:12:57.476 EST [24552] [0]HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
2020-02-03 14:12:57.476 EST [24552] [0]LOCATION:  quickdie, postgres.c:2717
2020-02-03 14:12:57.479 EST [23844] [0]WARNING:  57P02: terminating
connection because of crash of another server process
2020-02-03 14:12:57.479 EST [23844] [0]DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2020-02-03 14:12:57.479 EST [23844] [0]HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
2020-02-03 14:12:57.479 EST [23844] [0]LOCATION:  quickdie, postgres.c:2717
2020-02-03 14:12:57.586 EST [25992] [0]FATAL:  57P03: the database system
is in recovery mode
2020-02-03 14:12:57.586 EST [25992] [0]LOCATION:  ProcessStartupPacket,
postmaster.c:2275
2020-02-03 14:12:57.587 EST [19428] [0]FATAL:  57P03: the database system
is in recovery mode
2020-02-03 14:12:57.587 EST [19428] [0]LOCATION:  ProcessStartupPacket,
postmaster.c:2275
2020-02-03 14:12:57.627 EST [24968] [0]LOG:  0: all server processes
terminated; reinitializing
2020-02-03 14:12:57.627 EST [24968] [0]LOCATION:  PostmasterStateMachine,
postmaster.c:3912
2020-02-03 14:12:57.697 EST [16620] [0]LOG:  0: database system was

Re: Postgres Crashing

2020-02-04 Thread Doug Roberts
> So how did containers_reset_recirc() come to clash with
> containers_add_update()?

They are clashing because another portion of our system is running and
updating containers. The reset recirc function was run at the same time to
see how our system and the database would handle it.

The recirc string is formatted like 2000=3,1000=6,5000=0. So the reset
recirc function with take a UID (1000 for example) and use that to remove
1000=x from all of the recirc counts for all of the containers that have
1000=x.

We are currently using PG 12.0.

Thanks,

Doug

On Mon, Feb 3, 2020 at 6:21 PM Tom Lane  wrote:

> Adrian Klaver  writes:
> > Please reply to list also.
>
> > On 2/3/20 2:18 PM, Doug Roberts wrote:
> >> Here is what the reset recirc function is doing.
> >> ...
> >> UPDATE containers
> >> ...
>
> > So how did containers_reset_recirc() come to clash with
> > containers_add_update()?
>
> If this is PG 12.0 or 12.1, a likely theory is that this is an
> EvalPlanQual bug (which'd be triggered during concurrent updates
> of the same row in the table, so that squares with the observation
> that locking the table prevents it).  The known bugs in that area
> require either before-row-update triggers on the table, or
> child tables (either partitioning or traditional inheritance).
> So I wonder what the schema of table "containers" looks like.
>
> Or you could have hit some new bug ... but there's not enough
> info here to diagnose.
>
> regards, tom lane
>


Re: Postgres Crashing

2020-02-04 Thread Doug Roberts
LOCATION:  StartupXLOG, xlog.c:7307
2020-02-04 10:26:55.861 EST [16700] [0] FATAL:  57P03: the database system
is in recovery mode
2020-02-04 10:26:55.861 EST [16700] [0] LOCATION:  ProcessStartupPacket,
postmaster.c:2275
2020-02-04 10:26:57.016 EST [20788] [0] LOG:  0: database system is
ready to accept connections

On Tue, Feb 4, 2020 at 10:50 AM Doug Roberts  wrote:

> Here is a stacktrace with what happened before and after the crash.
>
> 2020-02-04 10:26:16.841 EST [20788] [0] LOG:  0: server process (PID
> 12168) was terminated by exception 0xC005
> 2020-02-04 10:26:16.841 EST [20788] [0] DETAIL:  Failed process was
> running: select CONTAINERS_RESET_RECIRC_BY_DP(3000)
> 2020-02-04 10:26:16.841 EST [20788] [0] HINT:  See C include file
> "ntstatus.h" for a description of the hexadecimal value.
> 2020-02-04 10:26:16.841 EST [20788] [0] LOCATION:  LogChildExit,
> postmaster.c:3670
> 2020-02-04 10:26:16.841 EST [20788] [0] LOG:  0: terminating any other
> active server processes
> 2020-02-04 10:26:16.841 EST [20788] [0] LOCATION:  HandleChildCrash,
> postmaster.c:3400
> 2020-02-04 10:26:16.873 EST [1212] [0] WARNING:  57P02: terminating
> connection because of crash of another server process
> 2020-02-04 10:26:16.873 EST [1212] [0] DETAIL:  The postmaster has
> commanded this server process to roll back the current transaction and
> exit, because another server process exited abnormally and possibly
> corrupted shared memory.
> 2020-02-04 10:26:16.873 EST [1212] [0] HINT:  In a moment you should be
> able to reconnect to the database and repeat your command.
> 2020-02-04 10:26:16.873 EST [1212] [0] LOCATION:  quickdie, postgres.c:2717
> 2020-02-04 10:26:16.873 EST [19436] [0] WARNING:  57P02: terminating
> connection because of crash of another server process
> 2020-02-04 10:26:16.873 EST [19436] [0] DETAIL:  The postmaster has
> commanded this server process to roll back the current transaction and
> exit, because another server process exited abnormally and possibly
> corrupted shared memory.
> 2020-02-04 10:26:16.873 EST [19436] [0] HINT:  In a moment you should be
> able to reconnect to the database and repeat your command.
> 2020-02-04 10:26:16.873 EST [19436] [0] LOCATION:  quickdie,
> postgres.c:2717
> 2020-02-04 10:26:16.874 EST [13428] [0] WARNING:  57P02: terminating
> connection because of crash of another server process
> 2020-02-04 10:26:16.874 EST [13428] [0] DETAIL:  The postmaster has
> commanded this server process to roll back the current transaction and
> exit, because another server process exited abnormally and possibly
> corrupted shared memory.
> 2020-02-04 10:26:16.874 EST [13428] [0] HINT:  In a moment you should be
> able to reconnect to the database and repeat your command.
> 2020-02-04 10:26:16.874 EST [13428] [0] CONTEXT:  while locking tuple
> (0,115) in relation "containers"
> SQL statement "UPDATE containers
>SET type_uid = COALESCE(declared_type_uid, type_uid),
>carton_type_uid = COALESCE(declared_carton_type_uid,
> carton_type_uid),
>status_uid = COALESCE(declared_status_uid, status_uid),
>order_uid = COALESCE(in_order_uid, order_uid),
>wave_uid = COALESCE(in_wave_uid, wave_uid),
>length = COALESCE(in_length, carton_length, length),
>width = COALESCE(in_width, carton_width, width),
>height = COALESCE(in_height, carton_height, height),
>weight = COALESCE(in_weight, weight),
>weight_minimum = COALESCE(in_weight_minimum,
> weight_minimum),
>weight_maximum = COALESCE(in_weight_maximum,
> weight_maximum),
>weight_expected = COALESCE(in_weight_expected,
> weight_expected),
>first_seen_DP_id = COALESCE(first_seen_DP_id,
> in_last_seen_DP_id),
>first_seen_datetime = COALESCE(first_seen_datetime,
> last_seen_date_time),
>last_seen_DP_id = COALESCE(in_last_seen_DP_id,
> last_seen_DP_id),
>last_seen_datetime = COALESCE(last_seen_date_time,
> last_seen_datetime),
>recirculation_count = COALESCE(in_recirculation_count,
> recirculation_count),
>project_flags = COALESCE(in_project_flags, project_flags),
>passed_weight_check = COALESCE(in_passed_weight_check,
> passed_weight_check)
>WHERE uid = in_uid"
> PL/pgSQL function
> containers_add_update(integer,integer,integer,integer,integer,integer,double
> precision,double precision,double precision,double precision,double
> precision,double precision,double precision,integer,timestamp without time
> zone,character varying,bigint,boolean) line 60 at SQL

Re: Postgres Crashing

2020-02-04 Thread Doug Roberts
Sure. Ok then.

On Tue, Feb 4, 2020 at 11:18 AM Adrian Klaver 
wrote:

> On 2/4/20 8:06 AM, Doug Roberts wrote:
> > Hello,
> >
> > Here is a stacktrace of what happened before and after the crash.
>
> Actually the below is the Postgres log. Per Tom's previous post the
> procedure to get a stack trace can be found here:
>
>
> https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
>
> >
> > Thanks,
> >
> > Doug
> >
> > 2020-02-04 10:26:16.841 EST [20788] [0] LOG:  0: server process (PID
> > 12168) was terminated by exception 0xC005
> > 2020-02-04 10:26:16.841 EST [20788] [0] DETAIL:  Failed process was
> > running: select CONTAINERS_RESET_RECIRC_BY_DP(3000)
> > 2020-02-04 10:26:16.841 EST [20788] [0] HINT:  See C include file
> > "ntstatus.h" for a description of the hexadecimal value.
> > 2020-02-04 10:26:16.841 EST [20788] [0] LOCATION:  LogChildExit,
> > postmaster.c:3670
> > 2020-02-04 10:26:16.841 EST [20788] [0] LOG:  0: terminating any
> > other active server processes
> > 2020-02-04 10:26:16.841 EST [20788] [0] LOCATION:  HandleChildCrash,
> > postmaster.c:3400
> > 2020-02-04 10:26:16.873 EST [1212] [0] WARNING:  57P02: terminating
> > connection because of crash of another server process
> > 2020-02-04 10:26:16.873 EST [1212] [0] DETAIL:  The postmaster has
> > commanded this server process to roll back the current transaction and
> > exit, because another server process exited abnormally and possibly
> > corrupted shared memory.
> > 2020-02-04 10:26:16.873 EST [1212] [0] HINT:  In a moment you should be
> > able to reconnect to the database and repeat your command.
> > 2020-02-04 10:26:16.873 EST [1212] [0] LOCATION:  quickdie,
> postgres.c:2717
> > 2020-02-04 10:26:16.873 EST [19436] [0] WARNING:  57P02: terminating
> > connection because of crash of another server process
> > 2020-02-04 10:26:16.873 EST [19436] [0] DETAIL:  The postmaster has
> > commanded this server process to roll back the current transaction and
> > exit, because another server process exited abnormally and possibly
> > corrupted shared memory.
> > 2020-02-04 10:26:16.873 EST [19436] [0] HINT:  In a moment you should be
> > able to reconnect to the database and repeat your command.
> > 2020-02-04 10:26:16.873 EST [19436] [0] LOCATION:  quickdie,
> postgres.c:2717
> > 2020-02-04 10:26:16.874 EST [13428] [0] WARNING:  57P02: terminating
> > connection because of crash of another server process
> > 2020-02-04 10:26:16.874 EST [13428] [0] DETAIL:  The postmaster has
> > commanded this server process to roll back the current transaction and
> > exit, because another server process exited abnormally and possibly
> > corrupted shared memory.
> > 2020-02-04 10:26:16.874 EST [13428] [0] HINT:  In a moment you should be
> > able to reconnect to the database and repeat your command.
> > 2020-02-04 10:26:16.874 EST [13428] [0] CONTEXT:  while locking tuple
> > (0,115) in relation "containers"
> > SQL statement "UPDATE containers
> > SET type_uid = COALESCE(declared_type_uid, type_uid),
> > carton_type_uid = COALESCE(declared_carton_type_uid,
> > carton_type_uid),
> > status_uid = COALESCE(declared_status_uid, status_uid),
> > order_uid = COALESCE(in_order_uid, order_uid),
> > wave_uid = COALESCE(in_wave_uid, wave_uid),
> > length = COALESCE(in_length, carton_length, length),
> > width = COALESCE(in_width, carton_width, width),
> > height = COALESCE(in_height, carton_height, height),
> > weight = COALESCE(in_weight, weight),
> > weight_minimum = COALESCE(in_weight_minimum,
> > weight_minimum),
> > weight_maximum = COALESCE(in_weight_maximum,
> > weight_maximum),
> > weight_expected = COALESCE(in_weight_expected,
> > weight_expected),
> > first_seen_DP_id = COALESCE(first_seen_DP_id,
> > in_last_seen_DP_id),
> > first_seen_datetime = COALESCE(first_seen_datetime,
> > last_seen_date_time),
> > last_seen_DP_id = COALESCE(in_last_seen_DP_id,
> > last_seen_DP_id),
> > last_seen_datetime = COALESCE(last_seen_date_time,
> > last_seen_datetime),
> > recirculation_count = COALESCE(in_recirculation_count,
> > recirculation_count),
> > project_flags = COALESCE(in_project_flags,
> project_flags),
> >

Re: Postgres Crashing

2020-02-04 Thread Doug Roberts
Hello,

Hopefully the following stack trace is more helpful.

Exception thrown at 0x000140446403 in postgres.exe: 0xC005: Access
violation reading location 0xFFF8. occurred

> postgres.exe!pfree(void * pointer) Line 1033 C
  postgres.exe!tts_buffer_heap_clear(TupleTableSlot * slot) Line 653 C
  [Inline Frame] postgres.exe!ExecClearTuple(TupleTableSlot *) Line 428 C
  postgres.exe!ExecForceStoreHeapTuple(HeapTupleData * tuple,
TupleTableSlot * slot, bool shouldFree) Line 1448 C
  postgres.exe!ExecBRUpdateTriggers(EState * estate, EPQState * epqstate,
ResultRelInfo * relinfo, ItemPointerData * tupleid, HeapTupleData *
fdw_trigtuple, TupleTableSlot * newslot) Line 3117 C
  postgres.exe!ExecUpdate(ModifyTableState * mtstate, ItemPointerData *
tupleid, HeapTupleData * oldtuple, TupleTableSlot * slot, TupleTableSlot *
planSlot, EPQState * epqstate, EState * estate, bool canSetTag) Line 1072 C
  postgres.exe!ExecModifyTable(PlanState * pstate) Line 2223 C
  [Inline Frame] postgres.exe!ExecProcNode(PlanState *) Line 239 C
  postgres.exe!ExecutePlan(EState * estate, PlanState * planstate, bool
use_parallel_mode, CmdType operation, bool sendTuples, unsigned __int64
numberTuples, ScanDirection direction, _DestReceiver * dest, bool
execute_once) Line 1652 C
  postgres.exe!standard_ExecutorRun(QueryDesc * queryDesc, ScanDirection
direction, unsigned __int64 count, bool execute_once) Line 378 C
  postgres.exe!_SPI_pquery(QueryDesc * queryDesc, bool fire_triggers,
unsigned __int64 tcount) Line 2523 C
  postgres.exe!_SPI_execute_plan(_SPI_plan * plan, ParamListInfoData *
paramLI, SnapshotData * snapshot, SnapshotData * crosscheck_snapshot, bool
read_only, bool fire_triggers, unsigned __int64 tcount) Line 2298 C
  postgres.exe!SPI_execute_plan_with_paramlist(_SPI_plan * plan,
ParamListInfoData * params, bool read_only, long tcount) Line 581 C
  plpgsql.dll!exec_stmt_execsql(PLpgSQL_execstate * estate,
PLpgSQL_stmt_execsql * stmt) Line 4162 C
  plpgsql.dll!exec_stmt(PLpgSQL_execstate * estate, PLpgSQL_stmt * stmt)
Line 2033 C
  [Inline Frame] plpgsql.dll!exec_stmts(PLpgSQL_execstate * stmts, List *)
Line 1924 C
  plpgsql.dll!exec_stmt_block(PLpgSQL_execstate * estate,
PLpgSQL_stmt_block * block) Line 1865 C
  plpgsql.dll!exec_stmt(PLpgSQL_execstate * estate, PLpgSQL_stmt * stmt)
Line 1957 C
  plpgsql.dll!plpgsql_exec_function(PLpgSQL_function * func,
FunctionCallInfoBaseData * fcinfo, EState * simple_eval_estate, bool
atomic) Line 590 C
  plpgsql.dll!plpgsql_call_handler(FunctionCallInfoBaseData * fcinfo) Line
267 C
  postgres.exe!ExecInterpExpr(ExprState * state, ExprContext * econtext,
bool * isnull) Line 626 C
  [Inline Frame] postgres.exe!ExecEvalExprSwitchContext(ExprState *) Line
307 C
  postgres.exe!ExecProject(ProjectionInfo * projInfo) Line 351 C
  [Inline Frame] postgres.exe!ExecProcNode(PlanState *) Line 239 C
  postgres.exe!ExecutePlan(EState * estate, PlanState * planstate, bool
use_parallel_mode, CmdType operation, bool sendTuples, unsigned __int64
numberTuples, ScanDirection direction, _DestReceiver * dest, bool
execute_once) Line 1652 C
  postgres.exe!standard_ExecutorRun(QueryDesc * queryDesc, ScanDirection
direction, unsigned __int64 count, bool execute_once) Line 378 C
  postgres.exe!PortalRunSelect(PortalData * portal, bool forward, long
count, _DestReceiver * dest) Line 931 C
  postgres.exe!PortalRun(PortalData * portal, long count, bool isTopLevel,
bool run_once, _DestReceiver * dest, _DestReceiver * altdest, char *
completionTag) Line 777 C
  postgres.exe!exec_execute_message(const char * portal_name, long
max_rows) Line 2098 C
  postgres.exe!PostgresMain(int argc, char * * argv, const char * dbname,
const char * username) Line 4299 C
  postgres.exe!BackendRun(Port * port) Line 4432 C
  postgres.exe!SubPostmasterMain(int argc, char * * argv) Line 4955 C
  postgres.exe!main(int argc, char * * argv) Line 216 C
  [External Code]

On Tue, Feb 4, 2020 at 11:40 AM Adrian Klaver 
wrote:

> On 2/4/20 6:20 AM, Doug Roberts wrote:
> >> So how did containers_reset_recirc() come to clash with
> >> containers_add_update()?
> >
> > They are clashing because another portion of our system is running and
> > updating containers. The reset recirc function was run at the same time
> > to see how our system and the database would handle it.
>
> So does your system have the things Tom mentioned below?:
>
> "The known bugs in that area
> require either before-row-update triggers on the table, or
> child tables (either partitioning or traditional inheritance).
> So I wonder what the schema of table "containers" looks like."
>
> >
> > The recirc string is formatted like 2000=3,1000=6,5000=0. So the reset
> > recirc function with take a UID (1000 for example) and use that to
> > remove 1000=x from all of the recirc counts for all of the containers
> > that have 1000=x.
> &g

Re: Postgres Crashing

2020-02-04 Thread Doug Roberts
Seems to be working fine now that I've upgraded to 12.1. I'll keep an eye
out for 12.2. However, we are not using a before row update trigger. We are
using an after insert trigger on the containers table though.

Thanks,

Doug

On Tue, Feb 4, 2020 at 2:34 PM Tom Lane  wrote:

> Doug Roberts  writes:
> > Hopefully the following stack trace is more helpful.
>
> > Exception thrown at 0x000140446403 in postgres.exe: 0xC005:
> Access
> > violation reading location 0xFFF8. occurred
>
> >> postgres.exe!pfree(void * pointer) Line 1033 C
> >   postgres.exe!tts_buffer_heap_clear(TupleTableSlot * slot) Line 653 C
> >   [Inline Frame] postgres.exe!ExecClearTuple(TupleTableSlot *) Line 428 C
> >   postgres.exe!ExecForceStoreHeapTuple(HeapTupleData * tuple,
> > TupleTableSlot * slot, bool shouldFree) Line 1448 C
> >   postgres.exe!ExecBRUpdateTriggers(EState * estate, EPQState * epqstate,
> > ResultRelInfo * relinfo, ItemPointerData * tupleid, HeapTupleData *
> > fdw_trigtuple, TupleTableSlot * newslot) Line 3117 C
>
> Ah, so you *are* using before-row update triggers.  Almost certainly,
> this is the same bug fixed by commit 60e97d63e, so you should be okay
> if you update to 12.1.  (There are some related issues that will be
> fixed in 12.2, due out next week.)
>
> regards, tom lane
>