[Bug 1939898] Re: Unnatended postgresql-12 upgrade caused MAAS internal error

Christian Ehrhardt  Mon, 16 Aug 2021 07:40:57 -0700

Ok, I've set up a system as instructed - thanks again Nobuto!

Using maas from the snap 
  3.0/stable:       3.0.0-10029-g.986ea3e45        2021-06-22 (15003) 141MB -



And postgresql (old) from packaging.
root@f-pgdebug:/# dpkg -l | grep postgres
ii  postgresql                     12+214                                all    
      object-relational SQL database (supported version)
ii  postgresql-12                  12.2-4                                amd64  
      object-relational SQL database, version 12 server
ii  postgresql-client-12           12.2-4                                amd64  
      front-end programs for PostgreSQL 12
ii  postgresql-client-common       214                                   all    
      manager for multiple PostgreSQL client versions
ii  postgresql-common              214                                   all    
      PostgreSQL database-cluster manager

We see the maas DB active under that service (as it is the default)
root@f-pgdebug:/# systemctl status postgresql@12-main.service
● postgresql@12-main.service - PostgreSQL Cluster 12-main
     Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; 
vendor preset: enabled)
     Active: active (running) since Mon 2021-08-16 13:36:08 UTC; 14min ago
   Main PID: 3176 (postgres)
      Tasks: 9 (limit: 38266)
     Memory: 39.7M
     CGroup: /system.slice/system-postgresql.slice/postgresql@12-main.service
             ├─3176 /usr/lib/postgresql/12/bin/postgres -D 
/var/lib/postgresql/12/main -c 
config_file=/etc/postgresql/12/main/postgresql.conf
             ├─3178 postgres: 12/main: checkpointer
             ├─3179 postgres: 12/main: background writer
             ├─3180 postgres: 12/main: walwriter
             ├─3181 postgres: 12/main: autovacuum launcher
             ├─3182 postgres: 12/main: stats collector
             ├─3183 postgres: 12/main: logical replication launcher
             ├─5861 postgres: 12/main: maasdb maasdb 127.0.0.1(51262) idle
             └─6214 postgres: 12/main: maasdb maasdb 127.0.0.1(51660) idle


And we can see the tables maas created:

root@f-pgdebug:/# sudo -u postgres psql -d maasdb -c '\dt'

                             List of relations
 Schema |                       Name                       | Type  | Owner  
--------+--------------------------------------------------+-------+--------
 public | auth_group                                       | table | maasdb
 public | auth_group_permissions                           | table | maasdb
 public | auth_permission                                  | table | maasdb
 public | auth_user                                        | table | maasdb
...


Now as Nobuto already explained, the default way the packaging works is that 
"restarted in the postinst of the postgres-common package so there should be no 
orphaned library in memory.".
And due to that vice versa - if you do any other setup for your database - be 
it for HA or anything else - you'd need to track and manage restarts yourself.
But we want to do it the wrong way here - so I'm disabling the services and 
start the DB into a local unmanaged process via pg_ctl.
(Again thanks Nobuto for the steps)

At this point now the service is down and we have the manually started
one:

root@f-pgdebug:/# systemctl status postgresql@12-main.service
● postgresql@12-main.service - PostgreSQL Cluster 12-main
     Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; 
vendor preset: enabled)
     Active: inactive (dead)


0   112    7152       1  20   0  87204 16800 poll_s Ss   ?          0:00 
/usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main
1   112    7154    7152  20   0  87204  3840 ep_pol Ss   ?          0:00  \_ 
postgres: checkpointer   
1   112    7155    7152  20   0  87204  5116 ep_pol Ss   ?          0:00  \_ 
postgres: background writer   
1   112    7156    7152  20   0  87204  5096 ep_pol Ss   ?          0:00  \_ 
postgres: walwriter   
1   112    7157    7152  20   0  87612  7196 ep_pol Ss   ?          0:00  \_ 
postgres: autovacuum launcher   
1   112    7158    7152  20   0  72060  4752 ep_pol Ss   ?          0:00  \_ 
postgres: stats collector   
1   112    7159    7152  20   0  87496  5952 ep_pol Ss   ?          0:00  \_ 
postgres: logical replication launcher 



$ v=12.8-0ubuntu0.20.04.1; apt install postgresql-12=$v postgresql-client-12=$v 
libpq5=$v

The upgrade itself worked, but starting from that moment the running server 
will have issues.
It will continue to report (no maas needed):


2021-08-16 14:00:28.681 GMT [8940] ERROR:  could not load library 
"/usr/lib/postgresql/12/lib/plpgsql.so": /usr/lib/postgresql/12/lib/plpgsql.so: 
undefined symbol: EnsurePortalSnapshotExists
2021-08-16 14:00:28.681 GMT [8940] STATEMENT:  INSERT INTO 
"metadataserver_script" ("created", "updated", "name", "title", "description", 
"tags", "script_type", "hardware_type", "parallel", "results", "parameters", 
"packages", "timeout", "destructive", "default", "script_id", "for_hardware", 
"may_reboot", "recommission", "apply_configured_networking") VALUES 
('2021-08-16T14:00:28.679137'::timestamp, 
'2021-08-16T14:00:28.679137'::timestamp, '20-maas-01-install-lldpd', 'Install 
and configure lldpd for passive capture.', 'Install and configure lldpd for 
passive capture.', ARRAY['node']::text[], 0, 0, 0, '{}', '{}', '{"apt": 
["lldpd"]}', '0 days 30.000000 seconds'::interval, false, true, 4439, 
'{}'::varchar(255)[], false, false, false) RETURNING 
"metadataserver_script"."id"

This is trggered by an insert that was run by maas in background, but
that isn't generally true.

I can run new selects and inserts (obviously using the new lib as it is a new 
process) like:
root@f-pgdebug:/# sudo -u postgres psql -d maasdb -c "INSERT INTO auth_group 
VALUES ('5', 'test');"
INSERT 0 1
root@f-pgdebug:/# sudo -u postgres psql -d maasdb -c 'SELECT * FROM auth_group;'
 id | name 
----+------
  5 | test
(1 row)

So only that background insert from maas is affected a new psql call is
not even when running against the old database.

@MAAS team:
Which component would issue that failing update to metadataserver_script that 
we see occuring?
I guess that is one of the python processes of maas.
Could those by any chance have loaded an "old" lib from the host? It isn't a 
devmode/classic snap so I'd not expect it.

Maybe if we know the component that issues the failing we could first
try to "snap --shell ..." to trigger the same issue. And once we know
what inside the snap can cause it try to do the very same on the host to
eliminate snap/maas entirely from the equation (or find the cause while
doing so).

The list of candidates (if it is indeed from the snap) would be
root@f-pgdebug:/# systemctl status snap.maas.supervisor.service
● snap.maas.supervisor.service - Service for snap application maas.supervisor
     Loaded: loaded (/etc/systemd/system/snap.maas.supervisor.service; enabled; 
vendor preset: enabled)
     Active: active (running) since Mon 2021-08-16 13:42:24 UTC; 47min ago
   Main PID: 5451 (python3)
      Tasks: 29 (limit: 38266)
     Memory: 242.1M
     CGroup: /system.slice/snap.maas.supervisor.service
             ├─5451 python3 /snap/maas/15003/bin/supervisord -d 
/var/snap/maas/15003/supervisord -c 
/var/snap/maas/15003/supervisord/supervisord.conf -n
             ├─5810 /snap/maas/15003/usr/sbin/named -c 
/var/snap/maas/15003/bind/named.conf -g
             ├─5812 /snap/maas/15003/usr/sbin/chronyd -u root -d -f 
/var/snap/maas/15003/etc/chrony/chrony.conf -x
             ├─5813 python3 /snap/maas/15003/sbin/rackd
             ├─5814 python3 /snap/maas/15003/sbin/regiond
             ├─5887 nginx: master process /snap/maas/15003/usr/sbin/nginx -c 
/var/snap/maas/15003/http/nginx.conf
             ├─5891 nginx: worker process
             ├─5892 nginx: worker process
             ├─5893 nginx: worker process
             └─5894 nginx: worker process

Nothing that I do on the host fails the same way :-/
I've pinged the maas team to help finding the component.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1939898

Title:
  Unnatended postgresql-12 upgrade caused MAAS internal error

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1939898/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1939898] Re: Unnatended postgresql-12 upgrade caused MAAS internal error

Reply via email to