Unable to Connect to DB Instance

2018-07-04 Thread Boblitz John
Forwarded from pgadmin-support – wrong list

From: Ashesh Vashi [mailto:ashesh.va...@enterprisedb.com]
Sent: Mittwoch, 4. Juli 2018 12:14
To: Boblitz John 
Cc: pgadmin-support 
Subject: Re: Unable to Connect to DB Instance


On Wed, Jul 4, 2018, 15:19 Boblitz John 
mailto:john.bobl...@bertschi.com>> wrote:
Good Morning,

Beginning yesterday morning, users have been unable to fully connect to our DB 
Instance.


1.   At the time of the initial report – I was connected to the DB via 
pgAdmin and could perform queries without problem.

2.   Users reported messages similar to “could not open file 
"global/11801": No such file or directory”

3.   At that time, connection logging was turned off and there were no 
messages in the log files.

4.   As this is a development environment, I turned logging on in the 
config and restarted the DB

5.   After restart, neither I, nor the Users could fully reconnect.

6.   I have performed a SYS Level backup (tar of the whole postgres 
directory tree)

7.   I cannot perform a DB level Backup (same errors occur)

System Details
Linux Debian  7.11
Postgres  9.1 (9.1.24lts-0+d)

Please send your queries to 
pgsql-gene...@postgresql.org<mailto:pgsql-gene...@postgresql.org> for database 
server issues.
This is a pgAdmin support list.

-- Thanks, Ashesh
It appears that we can connect to the DB Server itself as I get “connection 
received” and “connection authorized” – but when trying to access the DB 
itself, several errors are raised (see below).
I am assuming that some internals are no longer consistent – the file 
“global/11801” for instance really does not exist on the system.

Questions:


1.   Is there any way to recover from this (backup is unfortunately rather 
old)

2.   What are possible causes?  I’d like to prevent this from happening on 
my production servers.

** I am aware that we are on older releases, and yes, we plan to migrate to 
more current releases “soon” ™ …


Thanks in advance.

John Boblitz


Exceprt from Log:
2018-07-04 09:15:13 CEST 192.168.250.50(28559) [unknown]LOG:  connection 
received: host=192.168.250.50 port=28559
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresLOG:  connection 
authorized: user=dbadmin database=postgres
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresERROR:  could not open 
file "global/11801": No such file or directory
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresSTATEMENT:  SELECT 
usecreatedb, usesuper, CASE WHEN usesuper THEN pg_postmaster_start_time() ELSE 
NULL END as upsince, CASE WHEN usesuper THEN pg_conf_load_time() ELSE NULL END 
as confloadedsince, CASE WHEN usesuper THEN pg_is_in_recovery() ELSE NULL END 
as inrecovery, CASE WHEN usesuper THEN pg_last_xlog_receive_location() ELSE 
NULL END as receiveloc, CASE WHEN usesuper THEN pg_last_xlog_replay_location() 
ELSE NULL END as replayloc, CASE WHEN usesuper THEN 
pg_last_xact_replay_timestamp() ELSE NULL END as replay_timestamp, CASE WHEN 
usesuper AND pg_is_in_recovery() THEN pg_is_xlog_replay_paused() ELSE NULL END 
as isreplaypaused
  FROM pg_user WHERE usename=current_user
2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresERROR:  could not open 
file "global/11801": No such file or directory
2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresSTATEMENT:  SELECT 
rolcreaterole, rolcreatedb FROM pg_roles WHERE rolname = current_user;
2018-07-04 09:15:22 CEST 192.168.250.50(28561) [unknown]LOG:  connection 
received: host=192.168.250.50 port=28561
2018-07-04 09:15:22 CEST 192.168.250.50(28561) g11BaseLOG:  connection 
authorized: user=dbadmin database=g11Base
2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR:  could not open 
file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No such file or directory
2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseSTATEMENT:  SELECT CASE 
WHEN nspname LIKE E'pg\\_temp\\_%' THEN 1
WHEN (nspname LIKE E'pg\\_%') THEN 0
ELSE 3 END AS nsptyp,
   nsp.nspname, nsp.oid, pg_get_userbyid(nspowner) AS 
namespaceowner, nspacl, description,   has_schema_privilege(nsp.oid, 
'CREATE') as cancreate,
(SELECT array_agg(label) FROM pg_seclabels sl1 WHERE 
sl1.objoid=nsp.oid) AS labels,
(SELECT array_agg(provider) FROM pg_seclabels sl2 WHERE 
sl2.objoid=nsp.oid) AS providers
  FROM pg_namespace nsp
  LEFT OUTER JOIN pg_description des ON (des.objoid=nsp.oid AND 
des.classoid='pg_namespace'::regclass)
WHERE NOT ((nspname = 'pg_catalog' AND EXISTS (SELECT 1 FROM 
pg_class WHERE relname = 'pg_class' AND relnamespace = nsp.oid LIMIT 1)) OR
(nspname = 'pgagent' AND EXISTS (SELECT 1 FROM pg_class WHERE 
relname = &

RE: Unable to Connect to DB Instance

2018-07-04 Thread Boblitz John
Hello Tom,

Thanks - I get "pg_db_role_setting" as a response.

I have already attempted to reindex system but get:

NOTICE:  table "pg_catalog.pg_class" was reindexed
NOTICE:  table "pg_catalog.pg_statistic" was reindexed
NOTICE:  table "pg_catalog.pg_type" was reindexed
NOTICE:  table "pg_catalog.pg_attribute" was reindexed
NOTICE:  table "pg_catalog.pg_authid" was reindexed
ERROR:  could not open file "base/11919/11680": No such file or directory


John


> -Original Message-
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Mittwoch, 4. Juli 2018 17:50
> To: Boblitz John 
> Cc: pgsql-gene...@postgresql.org
> Subject: Re: Unable to Connect to DB Instance
> 
> Boblitz John  writes:
> > 2.   Users reported messages similar to "could not open file
> "global/11801": No such file or directory"
> 
> I'd try "select relname from pg_class where pg_relation_filenode(oid) =
> 11801" to see if you can identify the problematic relation that way.
> 
> If you're lucky, this is just loss of some system catalog index in which case
> reindexing will fix it.  The fact that you're able to get through connecting,
> and the errors only show up with queries, is somewhat promising given that
> the problem looks like it's related to pg_authid or pg_db_role_setting.
> 
>   regards, tom lane



RE: Unable to Connect to DB Instance (SOLVED)

2018-07-06 Thread Boblitz John
Good Morning,

First and foremost - thank you for the info.

While we still cannot determine that cause of the missing files (I see no 
indication of them being deleted)
I was able to use a combination of touch and the "blunderbuss" to fully 
recreate the two databases.  While it may not
always be useful - here is what I did

1) Run psql
2) Connect to db1
3) Run REINDEX SYSTEM db1 (I'm on 9.1 so options may vary)
4) Wait -> message some file not found - 
5) in second session touch the file (pay attention to the directory)
6) goto 3) 

This until no error, then same procedure for REINDEX DATABASE

There were files missing from base, global and pg_tblspc!

Rinse and repeat for the second DB.  This obviously took quite a bit of time - 
but in the end, I got it to work.

Again, this worked for me and I had a backup of the postgres instance itself, 
so I could take the chance.


Regards,

John Boblitz



> -Original Message-
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Mittwoch, 4. Juli 2018 21:37
> To: Boblitz John 
> Cc: pgsql-gene...@postgresql.org
> Subject: Re: Unable to Connect to DB Instance
> 
> Boblitz John  writes:
> > Thanks - I get "pg_db_role_setting" as a response.
> 
> Hm ... not its index?  If the table itself is gone, it's surprising that you 
> can get
> through session startup.
> 
> > I have already attempted to reindex system but get:
> 
> I had in mind just reindexing the specific table you're having trouble with 
> ...
> but this:
> 
> > ERROR:  could not open file "base/11919/11680": No such file or
> > directory
> 
> shows that there's another table that also has a problem, and there may be
> more :-(.  I don't know what the odds are that you can get out of this
> completely.  I would NOT recommend "reindex system" as a blunderbuss
> solution.  You do not know how much is corrupted and there's a significant
> chance of making things worse by tromping over the whole database using
> catalogs of uncertain reliability.
> 
> Did you identify which table 11680 is?
> 
> In the case of pg_db_role_setting, a possible solution is to "touch" the
> missing file so it exists; it'll be empty, which means that you'll have lost 
> any
> ALTER DATABASE/ROLE SET settings, but that's better than not being able to
> dump at all.  (You might then need to REINDEX pg_db_role_setting to get its
> indexes in sync with it being empty.)
> 
> Whether an equally drastic answer is tolerable for your other missing
> table(s) depends on what they are...
> 
>   regards, tom lane