Windows 10 got stuck with PostgreSQL at starting up. Adding delay lets it avoid.

2018-06-28 Thread TAKATSUKA Haruka
I got a trouble in PostgreSQL 9.3.x on Windows 10.
I would like to add new delay code as an official build option.

Windows 10 sometime (approximately once in 300 tries) hung up 
at OS starting up. The logs say it happened while the PostgreSQL 
service was starting. When OS stopped, some postgres auxiliary 
process were started and some were not started yet. 

The Windows dump say some threads of the postgres auxiliary process
are waiting OS level locks and the logon processes’thread are
also waiting a lock. MS help desk said that PostgreSQL’s OS level 
deadlock caused OS freeze. I think it is strange story. But, 
in fact, it not happened in repeated tests when I got rid of 
PostgreSQL from the initial auto-starting services.

I tweaked PostgreSQL 9.3.x (the newest from the repository) to add 
0.5 or 3.0 seconds delay after each sub process starts. 
And then the hung up was gone. This test patch is attached. 
It is only implemented for Windows. Also, I did not use existing 
pg_usleep because it contains locking codes (e.g. WaitForSingleObject
and Enter/LeaveCriticalSection).

Although Windows OS may have some problems, I think we should have
a means to avoid it. Can PostgreSQL be accepted such delay codes
as build-time options by preprocessor variables?


Thanks,
Takatsuka Haruka
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index d6fc2ed..ff03ebd 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -398,6 +398,30 @@ extern int optreset;   /* might not be 
declared by system headers */
 static DNSServiceRef bonjour_sdref = NULL;
 #endif
 
+#define USE_AFTER_AUX_FORK_SLEEP 3000
+
+#ifdef USE_AFTER_AUX_FORK_SLEEP
+#ifndef WIN32
+#define AFTER_AUX_FORK_SLEEP()
+#else
+#define AFTER_AUX_FORK_SLEEP() do { SleepEx(USE_AFTER_AUX_FORK_SLEEP, FALSE); 
} while(0)
+#endif
+#else
+#define AFTER_AUX_FORK_SLEEP()
+#endif
+
+#define USE_AFTER_BACKEND_FORK_SLEEP 500
+
+#ifdef USE_AFTER_BACKEND_FORK_SLEEP
+#ifndef WIN32
+#define AFTER_BACKEND_FORK_SLEEP()
+#else
+#define AFTER_BACKEND_FORK_SLEEP() do { SleepEx(USE_AFTER_BACKEND_FORK_SLEEP, 
FALSE); } while(0)
+#endif
+#else
+#define AFTER_BACKEND_FORK_SLEEP()
+#endif
+
 /*
  * postmaster.c - function prototypes
  */
@@ -1709,6 +1733,7 @@ ServerLoop(void)
 */
StreamClose(port->sock);
ConnFree(port);
+   AFTER_BACKEND_FORK_SLEEP();
}
}
}
@@ -2801,11 +2826,20 @@ reaper(SIGNAL_ARGS)
 * situation, some of them may be alive already.
 */
if (!IsBinaryUpgrade && AutoVacuumingActive() && 
AutoVacPID == 0)
+   {
AutoVacPID = StartAutoVacLauncher();
+   AFTER_AUX_FORK_SLEEP(); 
+   }
if (XLogArchivingActive() && PgArchPID == 0)
+   {
PgArchPID = pgarch_start();
+   AFTER_AUX_FORK_SLEEP();
+   }
if (PgStatPID == 0)
+   {
PgStatPID = pgstat_start();
+   AFTER_AUX_FORK_SLEEP();
+   }
 
/* some workers may be scheduled to start now */
maybe_start_bgworker();
@@ -5259,6 +5293,7 @@ StartChildProcess(AuxProcType type)
/*
 * in parent, successful fork
 */
+   AFTER_AUX_FORK_SLEEP();
return pid;
 }
 


Re: Windows 10 got stuck with PostgreSQL at starting up. Adding delay lets it avoid.

2018-06-29 Thread TAKATSUKA Haruka


On Fri, 29 Jun 2018 08:34:18 +0200
Thomas Kellerer  wrote:

> Did you try setting the service to "delayed start"?

We didn't try it yet. Thanks to give an idea. I think that
MS would advise us already if it were a just solution for this case.
Anyway, we will try and confirm it.

Thanks,
Takatsuka Haruka


> TAKATSUKA Haruka schrieb am 29.06.2018 um 08:03:
> > I got a trouble in PostgreSQL 9.3.x on Windows 10.
> > I would like to add new delay code as an official build option.
> > 
> > Windows 10 sometime (approximately once in 300 tries) hung up 
> > at OS starting up. The logs say it happened while the PostgreSQL 
> > service was starting. When OS stopped, some postgres auxiliary 
> > process were started and some were not started yet. 
> > 
> > The Windows dump say some threads of the postgres auxiliary process
> > are waiting OS level locks and the logon processes’thread are
> > also waiting a lock. MS help desk said that PostgreSQL’s OS level 
> > deadlock caused OS freeze. I think it is strange story. But, 
> > in fact, it not happened in repeated tests when I got rid of 
> > PostgreSQL from the initial auto-starting services.
> > 
> > I tweaked PostgreSQL 9.3.x (the newest from the repository) to add 
> > 0.5 or 3.0 seconds delay after each sub process starts. 
> > And then the hung up was gone. This test patch is attached. 
> > It is only implemented for Windows. Also, I did not use existing 
> > pg_usleep because it contains locking codes (e.g. WaitForSingleObject
> > and Enter/LeaveCriticalSection).
> > 
> > Although Windows OS may have some problems, I think we should have
> > a means to avoid it. Can PostgreSQL be accepted such delay codes
> > as build-time options by preprocessor variables?




Re: Windows 10 got stuck with PostgreSQL at starting up. Adding delay lets it avoid.

2018-07-03 Thread TAKATSUKA Haruka
> On Fri, 29 Jun 2018 08:34:18 +0200
> Thomas Kellerer  wrote:
> 
> > Did you try setting the service to "delayed start"?
> 
> We didn't try it yet. Thanks to give an idea. I think that
> MS would advise us already if it were a just solution for this case.
> Anyway, we will try and confirm it.

We tried it. "Delayed start" was ineffective.
Also setting a script that start the PostgreSQL service
on the "startup" folder was ineffective.

It has not been solved except by PostgreSQL code modification (inserting delay).

Thanks,
Haruka Takatsuka


> > TAKATSUKA Haruka schrieb am 29.06.2018 um 08:03:
> > > I got a trouble in PostgreSQL 9.3.x on Windows 10.
> > > I would like to add new delay code as an official build option.
> > > 
> > > Windows 10 sometime (approximately once in 300 tries) hung up 
> > > at OS starting up. The logs say it happened while the PostgreSQL 
> > > service was starting. When OS stopped, some postgres auxiliary 
> > > process were started and some were not started yet. 
> > > 
> > > The Windows dump say some threads of the postgres auxiliary process
> > > are waiting OS level locks and the logon processes’thread are
> > > also waiting a lock. MS help desk said that PostgreSQL’s OS level 
> > > deadlock caused OS freeze. I think it is strange story. But, 
> > > in fact, it not happened in repeated tests when I got rid of 
> > > PostgreSQL from the initial auto-starting services.
> > > 
> > > I tweaked PostgreSQL 9.3.x (the newest from the repository) to add 
> > > 0.5 or 3.0 seconds delay after each sub process starts. 
> > > And then the hung up was gone. This test patch is attached. 
> > > It is only implemented for Windows. Also, I did not use existing 
> > > pg_usleep because it contains locking codes (e.g. WaitForSingleObject
> > > and Enter/LeaveCriticalSection).
> > > 
> > > Although Windows OS may have some problems, I think we should have
> > > a means to avoid it. Can PostgreSQL be accepted such delay codes
> > > as build-time options by preprocessor variables?




Hot standby failing with PANIC: WAL contains references to invalid pages

2017-12-14 Thread TAKATSUKA Haruka
Hi folks,

I got the same failure with the following report on Amazon Linux 2017.09
and postgresql95-9.5.8-1.73.amzn1.x86_64.

The WAL's RMGR ID and its type (Heap2/VISIBLE) are also same.
The servers had worked for over 2 weeks and the standby crashed.

Firstly we suspected something low layer curruption, but any exact
reason haven't been found yet.
I think possibility of a bug now because of the other similar report.

Does anyone has any idea or similar experience?

[Hot standby failing with page # of relation # is uninitialized]
https://www.postgresql.org/message-id/a46709a5-3c41-7acb-1db0-60a317283...@tracktrans.com

(log output @stanby)
2017-12-11 18:14:16 JST [5094]: [13-1] db=,user=,state=01000 WARNING: page 
1646627 of relation pg_tblspc/16407/PG_9.5_201510051/16406/31164 is 
uninitialized
2017-12-11 18:14:16 JST [5094]: [14-1] db=,user=,state=01000 CONTEXT: xlog redo 
Heap2/VISIBLE: cutoff xid 210320
2017-12-11 18:14:16 JST [5094]: [15-1] db=,user=,state=XX000 PANIC: WAL 
contains references to invalid pages
2017-12-11 18:14:16 JST [5094]: [16-1] db=,user=,state=XX000 CONTEXT: xlog redo 
Heap2/VISIBLE: cutoff xid 210320
2017-12-11 18:14:16 JST [5088]: [5-1] db=,user=,state=0 LOG: startup 
process (PID 5094) was terminated by signal 6: Aborted
2017-12-11 18:14:16 JST [5088]: [6-1] db=,user=,state=0 LOG: terminating 
any other active server processes

 with best regards,
 Haruka Takatsuka