Re: No xmin in pg_database

2021-09-08 Thread Alexander Kass
I've asked the user to perform `SELECT xmin, * from pg_attribute WHERE
attrelid = 'pg_catalog.pg_database'::regclass` to check the
attributes.
The user has sent me that a couple of times: both times xmin is
different, attrelid is different, both have a big value.
Does that mean that pg_database is recreated?
Also system attributes are different. My result looks like:
++++--+
|xmin|attrelid|attname |attnum|
++++--+
|1   |1262|tableoid|-6|
|1   |1262|cmax|-5|
|1   |1262|xmax|-4|
|1   |1262|cmin|-3|
|1   |1262|xmin|-2|
|1   |1262|ctid|-1|
|1   |1262|oid |1 |
|1   |1262|datname |2 |
...

His result (apart from dat* attributes) contains only tableoid and
oid. Also his attnum numeration starts from tableoid = 1, oid = 2,
datname = 3
I'm puzzled

PS: I'm not familiar with mailing lists. Is it ok to attach images here?

On Thu, Sep 2, 2021 at 1:41 PM Tom Lane  wrote:
>
> Laurenz Albe  writes:
> > On Thu, 2021-09-02 at 08:50 +0300, Alexander Kass wrote:
> >> We have a small number of users that do not have xmin in pg_database
> >> (we've asked them to try `select xmin from pg_database` and got
> >> `column xmin does not exist`).
>
> > All PostgreSQL tables have "xmin", and all catalog tables do as well.
>
> Indeed.  This seems to be evidence of corruption in the pg_attribute
> catalog.  If you're really lucky, reindexing pg_attribute might fix
> it, though I wonder what other problems there are.  (It's odd though
> that identical corruption would happen to different installations.)
>
> regards, tom lane




Re: No xmin in pg_database

2021-09-08 Thread Tom Lane
Alexander Kass  writes:
> I've asked the user to perform `SELECT xmin, * from pg_attribute WHERE
> attrelid = 'pg_catalog.pg_database'::regclass` to check the
> attributes.
> The user has sent me that a couple of times: both times xmin is
> different, attrelid is different, both have a big value.
> Does that mean that pg_database is recreated?
> Also system attributes are different. My result looks like:
> ...
> His result (apart from dat* attributes) contains only tableoid and
> oid. Also his attnum numeration starts from tableoid = 1, oid = 2,
> datname = 3
> I'm puzzled

It's pretty hard to read these details and not conclude that somebody
has been manually manipulating that catalog.  Missing rows might possibly
be explained by index corruption ... but there is no scenario under which
pg_database could have an OID different from 1262, nor could this change
of attnums for the surviving rows ever have happened via Postgres-internal
processes.

I'm about ready to file this under "you broke it, you get to keep both
pieces".

regards, tom lane




Logical Replication to Older Version

2021-09-08 Thread Miles Elam
We recently upgraded from v9.6 to v13 but are seeing some problems. It's on
AWS Aurora, so I won't ask to diagnose a heavily altered version. We're
hoping for better results with v12, but when we set up logical replication
from v13 to v12, while the initial data snapshot copies the data,
pg_stat_replication remains empty.

There are no v13-specific features or idioms in use in the databases. As a
sanity check, should it be possible to set up logical replication from a
newer version of PostgreSQL to an older version? Has this scenario been
tested? Has the logical replication protocol changed between versions?


Thanks in advance,

Miles Elam


Re: Logical Replication to Older Version

2021-09-08 Thread Miles Elam
Follow up to this. Turns out we had a table without a primary key which
halted the ongoing replication.

Reviewing this document in detail now.
https://pgdash.io/blog/postgres-replication-gotchas.html

- Miles Elam


How does postgres behave if several indexes have (nearly) identical conditions?

2021-09-08 Thread Koen De Groote
Greetings all.

Example table:

CREATE TABLE my_table (
id serial PRIMARY KEY,
a001 BOOLEAN default 't',
a002 BOOLEAN default 'f',
a003 BOOLEAN default 't',
a004 BOOLEAN default 'f'
);

And these 2 indexes:

create index index_001 on my_table using btree (a001,a002,a003) where
a001=true and a002=false;

create index index_002 on my_table using btree (a003) where a001=true and
a002=false;

Now take this query:

select * from my_table where a001=true;

Which index will postgres pick? I'm wondering how postgres goes about
picking an index to consider.

And if it will consider others if the analysis of the first says a seqscan
would be better than the index it first considered?

Regards,
Koen De Groote


Re: How does postgres behave if several indexes have (nearly) identical conditions?

2021-09-08 Thread Koen De Groote
Forgot to mention, this is on Postgres 11.2

On Wed, Sep 8, 2021 at 11:04 PM Koen De Groote  wrote:

> Greetings all.
>
> Example table:
>
> CREATE TABLE my_table (
> id serial PRIMARY KEY,
> a001 BOOLEAN default 't',
> a002 BOOLEAN default 'f',
> a003 BOOLEAN default 't',
> a004 BOOLEAN default 'f'
> );
>
> And these 2 indexes:
>
> create index index_001 on my_table using btree (a001,a002,a003) where
> a001=true and a002=false;
>
> create index index_002 on my_table using btree (a003) where a001=true and
> a002=false;
>
> Now take this query:
>
> select * from my_table where a001=true;
>
> Which index will postgres pick? I'm wondering how postgres goes about
> picking an index to consider.
>
> And if it will consider others if the analysis of the first says a seqscan
> would be better than the index it first considered?
>
> Regards,
> Koen De Groote
>
>


Re: How does postgres behave if several indexes have (nearly) identical conditions?

2021-09-08 Thread David G. Johnston
On Wednesday, September 8, 2021, Koen De Groote  wrote:

>
>
> create index index_001 on my_table using btree (a001,a002,a003) where
> a001=true and a002=false;
>
> create index index_002 on my_table using btree (a003) where a001=true and
> a002=false;
>
> Now take this query:
>
> select * from my_table where a001=true;
>
> Which index will postgres pick? I'm wondering how postgres goes about
> picking an index to consider.
>

Neither…since neither partial index condition is present in the where
clause of the query.

David J.


Re: How does postgres behave if several indexes have (nearly) identical conditions?

2021-09-08 Thread David G. Johnston
On Wednesday, September 8, 2021, Koen De Groote  wrote:

> Forgot to mention, this is on Postgres 11.2
>

You should stop worrying about performance and  indexes and instead focus
on system stability and security - i.e., upgrade to a supported version.

David J.


spiexceptions.UndefinedFile: could not open file "base/16482/681830": No such file or directory

2021-09-08 Thread Celia McInnis
Help - I don't know why I am getting this message, and I don't know how to
fix it. Any advice will be greatly appreciated.

Note that the file does exist:! (How do I know if it is looking under the
correct directory? Other times I have done similar temporary table
creations with no problems!):


*ls -l /zfs/postgres/postgres13/base/16482/681830-rw--- 1 postgres
postgres 122880 Sep  7 18:37 /zfs/postgres/postgres13/base/16482/681830*

I am using postgresql 13 on ubuntu 20.04, and postgresql is on a zfs
filesysem which reports no errors.  I can access every table in my
database, and can run every stored procedure (plpgsql and plpython3u) on it
except the one that generates the above error..

Here is the output from my run:









*select cmm_tsv_to_tables('/zfs/EXPERIMENT_A_C_NORMAL_0_1_2/10');NOTICE:
 ['found_patterns.tsv', 'classification.tsv', 'mined_patterns.tsv']NOTICE:
 ['tmp_2021_09_08_19_49_42_354417_found_patterns',
'tmp_2021_09_08_19_49_42_354417_classification',
'tmp_2021_09_08_19_49_42_354417_mined_patterns']NOTICE:  CREATE  TABLE
tmp_2021_09_08_19_49_42_354417_found_patterns(Class TEXT,filename
TEXT,pattern TEXT)ERROR:  spiexceptions.UndefinedFile: could not open file
"base/16482/681830": No such file or directoryLINE 1: CREATE  TABLE
tmp_2021_09_08_19_49_42_354417_found_patterns(...
^QUERY:  CREATE  TABLE tmp_2021_09_08_19_49_42_354417_found_patterns(Class
TEXT,filename TEXT,pattern TEXT)CONTEXT:  Traceback (most recent call
last):  PL/Python function "cmm_tsv_to_tables", line 39, in 


*plpy.execute(sql1)PL/Python function "cmm_tsv_to_tables"*

and here (and attached) is the function up to the point where the error
happens:
-
CREATE OR REPLACE FUNCTION cmm_tsv_to_tables( tsv_dir TEXT) RETURNS TEXT AS
$$
import csv
import datetime
import glob
import os
import regex
import time

ts = time.time()
prefix = "tmp_" +
regex.sub(r"\D+","_",str(datetime.datetime.fromtimestamp(time.time()).isoformat()))
+ "_"

if not os.path.exists(tsv_dir):
plpy.notice("Scout output directory " + tsv_dir + " does not exist.
Rerun this software with a directory which contains the tsv files you wish
to database.")
return ""

try:
os.chdir(tsv_dir)
except:
plpy.notice("postgresql does not have access to the directory " +
tsv_dir + " of tsv files. Exiting!")
return ""

tsv_file = glob.glob('*.tsv')
if len(tsv_file) == 0:
plpy.notice("There are no tsv files in " + tsv_dir + ". Exiting!")
return ""
plpy.notice(tsv_file)

table = [ prefix+os.path.splitext(x)[0] for x in tsv_file]
plpy.notice(table)

# create tables with text columns, some of which will later be casted to
numbers, arrays, etc.
for j in range(len(tsv_file)):
with open(tsv_file[j]) as f:
cols = f.readline().split()
colsdef = ','.join([x + ' TEXT' for x in cols])
sql1 = "CREATE  TABLE {}({})".format(table[j], colsdef)
plpy.notice(sql1)
#return 'exit'
plpy.execute(sql1)
sql1 = "COPY {} FROM '{}' WITH (FORMAT 'csv', HEADER, DELIMITER E'\t',
NULL 'NULL')".format(table[j], tsv_file[j]);
plpy.notice(sql1)
plpy.execute(sql1)
return "Tables were written"
$$ LANGUAGE plpython3u;


cmm_tsv_to_tables.plpython3u
Description: Binary data


Re: How does postgres behave if several indexes have (nearly) identical conditions?

2021-09-08 Thread Koen De Groote
And initial setup is wrong. There should be no 'and a002=false' in the
indexes.

On Wed, Sep 8, 2021 at 11:15 PM Koen De Groote  wrote:

> Forgot to mention, this is on Postgres 11.2
>
> On Wed, Sep 8, 2021 at 11:04 PM Koen De Groote  wrote:
>
>> Greetings all.
>>
>> Example table:
>>
>> CREATE TABLE my_table (
>> id serial PRIMARY KEY,
>> a001 BOOLEAN default 't',
>> a002 BOOLEAN default 'f',
>> a003 BOOLEAN default 't',
>> a004 BOOLEAN default 'f'
>> );
>>
>> And these 2 indexes:
>>
>> create index index_001 on my_table using btree (a001,a002,a003) where
>> a001=true and a002=false;
>>
>> create index index_002 on my_table using btree (a003) where a001=true and
>> a002=false;
>>
>> Now take this query:
>>
>> select * from my_table where a001=true;
>>
>> Which index will postgres pick? I'm wondering how postgres goes about
>> picking an index to consider.
>>
>> And if it will consider others if the analysis of the first says a
>> seqscan would be better than the index it first considered?
>>
>> Regards,
>> Koen De Groote
>>
>>


Re: How does postgres behave if several indexes have (nearly) identical conditions?

2021-09-08 Thread David G. Johnston
On Wednesday, September 8, 2021, Koen De Groote  wrote:

> And initial setup is wrong. There should be no 'and a002=false' in the
> indexes.
>
>
>>> create index index_001 on my_table using btree (a001,a002,a003) where
>>> a001=true and a002=false;
>>>
>>> create index index_002 on my_table using btree (a003) where a001=true
>>> and a002=false;
>>>
>>> Now take this query:
>>>
>>> select * from my_table where a001=true;
>>>
>>> Which index will postgres pick? I'm wondering how postgres goes about
>>> picking an index to consider.
>>>
>>> And if it will consider others if the analysis of the first says a
>>> seqscan would be better than the index it first considered?
>>>
>>

Still probably neither since the sequential scan is likely the better
choice (it depends on the number of true rows compared to all rows).  It
will have to look at the statistical data for both but given that the
three-column one is strictly worse than the single column version (because
the indexed columns don’t contribute anything worthwhile) it will mostly
likely be a choice between a sequential scan and the index 002.  Though if
002 has lots of bloat compared to index 001 the later may beat it out - but
that just means your system needs index maintenance performed.

David J.


Re: spiexceptions.UndefinedFile: could not open file "base/16482/681830": No such file or directory

2021-09-08 Thread Thomas Munro
On Thu, Sep 9, 2021 at 9:19 AM Celia McInnis  wrote:
> Note that the file does exist:! (How do I know if it is looking under the 
> correct directory? Other times I have done similar temporary table creations 
> with no problems!):

PostgreSQL internally uses relative paths.  It's probably not a very
good idea to use 'chdir' in a procedure.




Re: spiexceptions.UndefinedFile: could not open file "base/16482/681830": No such file or directory

2021-09-08 Thread Celia McInnis
YES!!! The chdir caused the error! Thanks very much for this help. BTW it
was worse than just "not a very good idea" to use chdir - After generating
the error, I would lose the ability to see things in my database until
restarting postgresql. All's okay now!

On Wed, Sep 8, 2021 at 8:56 PM Thomas Munro  wrote:

> On Thu, Sep 9, 2021 at 9:19 AM Celia McInnis 
> wrote:
> > Note that the file does exist:! (How do I know if it is looking under
> the correct directory? Other times I have done similar temporary table
> creations with no problems!):
>
> PostgreSQL internally uses relative paths.  It's probably not a very
> good idea to use 'chdir' in a procedure.
>


Postgresql 11: terminating walsender process due to replication timeout

2021-09-08 Thread Abhishek Bhola
I have found some questions about the same error, but didn't find any of
them answering my problem.

The setup is that I have two Postgres11 clusters (A and B) and they are
making use of publication and subscription features to copy data from A to
B.

A (source DB- publication) --> B (target DB - subscription)

This works fine, but often (not always) when the data volume being inserted
on a table in node A increases, it gives the following error.

"terminating walsender process due to replication timeout"

The data volume at the moment being entered is about 30K rows per second
continuously for hours through COPY command.

Earlier the wal_sender_timeout was set to 5 sec and I would see this error
much often. I then increased it to 1 min and the frequency of this error
reduced. But I don't want to keep increasing it without understanding what
is causing it. I looked at the code of walsender.c and know the exact lines
where it's coming from.

But I am still not clear which parameter is making the sender assume that
the receiver node is inactive and therefore it should stop the wal_sender.

Can anyone please suggest what changes I should make to remove this error?

sourcedb=# show wal_sender_timeout;
 wal_sender_timeout
 1min
(1 row)

sourcedb=# select * from pg_replication_slots;
 slot_name  |  plugin  | slot_type | datoid |
database | temporary | active | active_pid | xmin | catalog_xmin |
restart_lsn   | confirmed_flush_lsn
+--+---++--+---+++--+--++-
 sub_target_DB  | pgoutput | logical   |  16501 |
sourcedb | f | t  |  68229 |  | 98839088 |
116D0/C36886F8 | 116D0/C3E5D370



targetdb=# show wal_receiver_timeout;
 wal_receiver_timeout--
 1min
(1 row)


targetdb=# show wal_retrieve_retry_interval ;
 wal_retrieve_retry_interval-
 5s
(1 row)

targetdb=# show wal_receiver_status_interval;
 wal_receiver_status_interval--
 2s
(1 row)

targetdb=# select * from pg_stat_subscription;
   subid|  subname   |  pid  | relid |
received_lsn  |  last_msg_send_time   |
last_msg_receipt_time | latest_end_lsn |l
atest_end_time++---+---++---+---++---
 2378695757 | sub_target_DB  | 62371 |   |
116D1/2BA8F170 | 2021-08-20 09:05:15.398423+09 | 2021-08-20
09:05:15.398471+09 | 116D1/2BA8F170 | 2021-08-20 09:05:15.398423+09

Increased the wal_sender_timeout to 5 mins and the error started
appearing more frequently instead. Not only that, it even killed the
active subscription and stopped replicating data. Had to restart it.
So clearly, just increasing the wal_sender_timeout hasn't helped.

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Subscriber to Get Only Some of The Tables From Publisher

2021-09-08 Thread Avi Weinberg
Hi,

I have a publisher with around 30 tables.  I have two types of subscribers.  
Both types needs 25 "common" tables from the publisher plus 2-3 specific tables 
for each type of subscriber.

For maintenance and monitoring reasons it is better for me to have both 
subscribers point to the same publisher but "skip/ignore" 2-3 tables that are 
not needed for them.  If this is not possible, I will be forced to create 2 
publishers.

Any idea hot to ignore tables from publisher?
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Re: Postgresql 11: terminating walsender process due to replication timeout

2021-09-08 Thread Kyotaro Horiguchi
At Thu, 9 Sep 2021 14:52:25 +0900, Abhishek Bhola 
 wrote in 
> I have found some questions about the same error, but didn't find any of
> them answering my problem.
> 
> The setup is that I have two Postgres11 clusters (A and B) and they are
> making use of publication and subscription features to copy data from A to
> B.
> 
> A (source DB- publication) --> B (target DB - subscription)
> 
> This works fine, but often (not always) when the data volume being inserted
> on a table in node A increases, it gives the following error.
> 
> "terminating walsender process due to replication timeout"
> 
> The data volume at the moment being entered is about 30K rows per second
> continuously for hours through COPY command.
> 
> Earlier the wal_sender_timeout was set to 5 sec and I would see this error
> much often. I then increased it to 1 min and the frequency of this error
> reduced. But I don't want to keep increasing it without understanding what
> is causing it. I looked at the code of walsender.c and know the exact lines
> where it's coming from.
> 
> But I am still not clear which parameter is making the sender assume that
> the receiver node is inactive and therefore it should stop the wal_sender.
> 
> Can anyone please suggest what changes I should make to remove this error?

What minor-version is the Postgres server mentioned? PostgreSQL 11
have gotten the following fix at 11.6, which could be related to the
trouble.

https://www.postgresql.org/docs/11/release-11-6.html

> Fix timeout handling in logical replication walreceiver processes
> (Julien Rouhaud)
> 
> Erroneous logic prevented wal_receiver_timeout from working in
> logical replication deployments.

The details of the fix is here.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3f60f690fac1bf375b92cf2f8682e8fe8f69098
> Fix timeout handling in logical replication worker
> 
> The timestamp tracking the last moment a message is received in a
> logical replication worker was initialized in each loop checking if a
> message was received or not, causing wal_receiver_timeout to be ignored
> in basically any logical replication deployments.  This also broke the
> ping sent to the server when reaching half of wal_receiver_timeout.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center