postgresql systemd service fails to start only on boot but not manually

2018-09-22 Thread Doron Behar
Hello everyone,

I'm having trouble with my PostgreSQL Systemd service and I wonder
whether this is a problem with the software or some problem with the
packaging or build.

My server fails to start PostgreSQL only on boot, if I restart it
manually afterwards it doesn't have any problem starting. Here is the
log extracted from the journal:

```
2018-09-21 20:46:40.028 CEST [306] LOG:  listening on IPv4 address "127.0.0.1", 
port 5432
2018-09-21 20:46:40.036 CEST [306] LOG:  listening on Unix socket 
"/run/postgresql/.s.PGSQL.5432"
2018-09-21 20:46:40.233 CEST [337] LOG:  database system was shut down at 
2018-09-21 20:46:21 CEST
2018-09-21 20:48:10.441 CEST [352] WARNING:  worker took too long to start; 
canceled
2018-09-21 20:49:10.469 CEST [352] WARNING:  worker took too long to start; 
canceled
2018-09-21 20:49:10.478 CEST [306] LOG:  database system is ready to accept 
connections
2018-09-21 20:49:10.486 CEST [306] LOG:  received fast shutdown request
2018-09-21 20:49:10.497 CEST [306] LOG:  aborting any active transactions
2018-09-21 20:49:10.498 CEST [637] FATAL:  terminating autovacuum process due 
to administrator command
2018-09-21 20:49:10.500 CEST [306] LOG:  worker process: logical replication 
launcher (PID 636) exited with code 1
2018-09-21 20:49:10.501 CEST [349] LOG:  shutting down
2018-09-21 20:49:10.518 CEST [306] LOG:  database system is shut down
```

Any Ideas? I'd really like to fix this since now after every reboot I have to
remember to manually restart it.

Thanks.



Re: postgresql systemd service fails to start only on boot but not manually

2018-09-23 Thread Doron Behar
On Sat, Sep 22, 2018 at 07:14:33AM -0700, Adrian Klaver wrote:
> 
> Linux distro and version?

Arch Linux

$ uname -a
Linux vps 4.18.9-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 19 21:19:17 UTC 
2018 x86_64 GNU/Linux

It's a VPS with one CPU core on it and 2G RAM - not very much I know,
hosted on OVH

> 
> Assuming Postgres version 10+ given logical replication warning. Still
> actual version would be nice.
> 
> How did you install Postgres?

With the package manager and the files used in it's build are viewable
from here:
https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/postgresql

> 
> Where did systemd script come from?
> 
> What is in the systemd script?

The systemd service was installed with the package

[Unit]
Description=PostgreSQL database server
After=network.target

[Service]
Type=notify
TimeoutSec=120
User=postgres
Group=postgres

Environment=PGROOT=/var/lib/postgres

SyslogIdentifier=postgres
PIDFile=/var/lib/postgres/data/postmaster.pid
RuntimeDirectory=postgresql
RuntimeDirectoryMode=755

ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGROOT}/data
ExecStart=/usr/bin/postgres -D ${PGROOT}/data
ExecReload=/bin/kill -HUP ${MAINPID}
KillMode=mixed
KillSignal=SIGINT

# Due to PostgreSQL's use of shared memory, OOM killer is often 
overzealous in
# killing Postgres, so adjust it downward
OOMScoreAdjust=-200

# Additional security-related features
PrivateTmp=true
ProtectHome=true
ProtectSystem=full
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

> 
> What does the log show when you do a successful manual start?

2018-09-22 09:38:44.470 CEST [15251] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2018-09-22 09:38:44.472 CEST [15251] LOG:  listening on Unix socket 
"/run/postgresql/.s.PGSQL.5432"
2018-09-22 09:38:44.485 CEST [15253] LOG:  database system was shut 
down at 2018-09-21 20:49:10 CEST
2018-09-22 09:38:44.490 CEST [15251] LOG:  database system is ready to 
accept connections

> 
> What does the system log show when the Postgres reboot startup fails?

This was posted in my 1st message.



Re: postgresql systemd service fails to start only on boot but not manually

2018-09-23 Thread Doron Behar
On Sat, Sep 22, 2018 at 04:58:18PM +0200, Christoph Moench-Tegeder wrote:
> ## Doron Behar (doron.be...@gmail.com):
> 
> > My server fails to start PostgreSQL only on boot, if I restart it
> > manually afterwards it doesn't have any problem starting. Here is the
> > log extracted from the journal:
> > 
> > ```
> > 2018-09-21 20:46:40.028 CEST [306] LOG:  listening on IPv4 address 
> > "127.0.0.1", port 5432
> > 2018-09-21 20:46:40.036 CEST [306] LOG:  listening on Unix socket 
> > "/run/postgresql/.s.PGSQL.5432"
> > 2018-09-21 20:46:40.233 CEST [337] LOG:  database system was shut down at 
> > 2018-09-21 20:46:21 CEST
> > 2018-09-21 20:48:10.441 CEST [352] WARNING:  worker took too long to start; 
> > canceled
> > 2018-09-21 20:49:10.469 CEST [352] WARNING:  worker took too long to start; 
> > canceled
> 
> This would indicate that your machine is overloaded during start -
> perhaps there's just too much being started at the same time?
> ObRant: that's what happens if people take "system startup duration"
> as a benchmark and optimize for that - sure, running one clumsy shell
> script after another isn't effective usage of today's systems,
> but starting eight dozens programs all at once may have other
> side effects. Really, with the hardware taking small ages to find
> it's own arse before even loading the boot loader, those few seconds
> weren't worth optimizing - and if people reboot their computers so
> often that startup time takes a measurable toll on their productive
> day, perhaps they should rather spend their time thinking about their
> usage pattern than "optimizing" the startup process.
> 
> So, now that I've got that off my chest... your machine propably tries to
> do too much at the same time when booting: the worker processes take
> longer than 90 seconds to start. Slow CPU or storage maybe?
> 
> > 2018-09-21 20:49:10.478 CEST [306] LOG:  database system is ready to accept 
> > connections
> > 2018-09-21 20:49:10.486 CEST [306] LOG:  received fast shutdown request
> 
> And in the mean time, systemd has lost it's patience, declares the
> start as failed and terminates the process group. (The default systemd
> timeout is 90 seconds, at least in some releases of systemd, so
> this fits quite nicely).
> 
> You could try to work around this by increasing TimeoutStartSec
> in postgresql's systemd unit (or even globally), which perhaps
> only hides the problem until the next service suddenly doesn't
> start anymore.
> You could move postgresql to the end of the boot order by
> adding "After=..." to the Unit section of the systemd service
> file, the value behind "After=" being all the other services in
> the same target, which should reduce parallelism and improve
> PostgreSQL's startup behaviour.
> A more advanced variant of that would be to create a new
> systemd target, make that start "After" multiuser.target
> or even graphical.target (depending on your setup), make sure
> it "Requires" the current default systemd target and make
> postgresql the only additional service in that target.
> (This would be the cleanest solution, but you should get some
> grasp of systemd and how your specific distribution uses it
> before meddling with the default targets; I don't know every
> distribution/version variant of systemd integration, so I
> can't give that specific instructions here).
> Or you figure out what the heck your machine is running
> during startup any why it is that slow, and try to fix that.
> 
> Regards,
> Christoph

Thanks for your very detailed answer, that helped me a lot. I've
increased `TimeoutSec=` to infinity in the systemd service since it was
set initially to 120 seconds which apparently wasn't enough for my poor
VPS with 2G RAM and 1 CPU core. That worked great, I still feel like I
have slow startups but at least PostgreSQL doesn't totally fail to start
on boot.

I'll try to debug the slow startups on my own, thanks again for
everything!

Doron.