Re: No xmin in pg_database
I've asked the user to perform `SELECT xmin, * from pg_attribute WHERE attrelid = 'pg_catalog.pg_database'::regclass` to check the attributes. The user has sent me that a couple of times: both times xmin is different, attrelid is different, both have a big value. Does that mean that pg_database is recreated? Also system attributes are different. My result looks like: ++++--+ |xmin|attrelid|attname |attnum| ++++--+ |1 |1262|tableoid|-6| |1 |1262|cmax|-5| |1 |1262|xmax|-4| |1 |1262|cmin|-3| |1 |1262|xmin|-2| |1 |1262|ctid|-1| |1 |1262|oid |1 | |1 |1262|datname |2 | ... His result (apart from dat* attributes) contains only tableoid and oid. Also his attnum numeration starts from tableoid = 1, oid = 2, datname = 3 I'm puzzled PS: I'm not familiar with mailing lists. Is it ok to attach images here? On Thu, Sep 2, 2021 at 1:41 PM Tom Lane wrote: > > Laurenz Albe writes: > > On Thu, 2021-09-02 at 08:50 +0300, Alexander Kass wrote: > >> We have a small number of users that do not have xmin in pg_database > >> (we've asked them to try `select xmin from pg_database` and got > >> `column xmin does not exist`). > > > All PostgreSQL tables have "xmin", and all catalog tables do as well. > > Indeed. This seems to be evidence of corruption in the pg_attribute > catalog. If you're really lucky, reindexing pg_attribute might fix > it, though I wonder what other problems there are. (It's odd though > that identical corruption would happen to different installations.) > > regards, tom lane
Re: No xmin in pg_database
Alexander Kass writes: > I've asked the user to perform `SELECT xmin, * from pg_attribute WHERE > attrelid = 'pg_catalog.pg_database'::regclass` to check the > attributes. > The user has sent me that a couple of times: both times xmin is > different, attrelid is different, both have a big value. > Does that mean that pg_database is recreated? > Also system attributes are different. My result looks like: > ... > His result (apart from dat* attributes) contains only tableoid and > oid. Also his attnum numeration starts from tableoid = 1, oid = 2, > datname = 3 > I'm puzzled It's pretty hard to read these details and not conclude that somebody has been manually manipulating that catalog. Missing rows might possibly be explained by index corruption ... but there is no scenario under which pg_database could have an OID different from 1262, nor could this change of attnums for the surviving rows ever have happened via Postgres-internal processes. I'm about ready to file this under "you broke it, you get to keep both pieces". regards, tom lane
Logical Replication to Older Version
We recently upgraded from v9.6 to v13 but are seeing some problems. It's on AWS Aurora, so I won't ask to diagnose a heavily altered version. We're hoping for better results with v12, but when we set up logical replication from v13 to v12, while the initial data snapshot copies the data, pg_stat_replication remains empty. There are no v13-specific features or idioms in use in the databases. As a sanity check, should it be possible to set up logical replication from a newer version of PostgreSQL to an older version? Has this scenario been tested? Has the logical replication protocol changed between versions? Thanks in advance, Miles Elam
Re: Logical Replication to Older Version
Follow up to this. Turns out we had a table without a primary key which halted the ongoing replication. Reviewing this document in detail now. https://pgdash.io/blog/postgres-replication-gotchas.html - Miles Elam
How does postgres behave if several indexes have (nearly) identical conditions?
Greetings all. Example table: CREATE TABLE my_table ( id serial PRIMARY KEY, a001 BOOLEAN default 't', a002 BOOLEAN default 'f', a003 BOOLEAN default 't', a004 BOOLEAN default 'f' ); And these 2 indexes: create index index_001 on my_table using btree (a001,a002,a003) where a001=true and a002=false; create index index_002 on my_table using btree (a003) where a001=true and a002=false; Now take this query: select * from my_table where a001=true; Which index will postgres pick? I'm wondering how postgres goes about picking an index to consider. And if it will consider others if the analysis of the first says a seqscan would be better than the index it first considered? Regards, Koen De Groote
Re: How does postgres behave if several indexes have (nearly) identical conditions?
Forgot to mention, this is on Postgres 11.2 On Wed, Sep 8, 2021 at 11:04 PM Koen De Groote wrote: > Greetings all. > > Example table: > > CREATE TABLE my_table ( > id serial PRIMARY KEY, > a001 BOOLEAN default 't', > a002 BOOLEAN default 'f', > a003 BOOLEAN default 't', > a004 BOOLEAN default 'f' > ); > > And these 2 indexes: > > create index index_001 on my_table using btree (a001,a002,a003) where > a001=true and a002=false; > > create index index_002 on my_table using btree (a003) where a001=true and > a002=false; > > Now take this query: > > select * from my_table where a001=true; > > Which index will postgres pick? I'm wondering how postgres goes about > picking an index to consider. > > And if it will consider others if the analysis of the first says a seqscan > would be better than the index it first considered? > > Regards, > Koen De Groote > >
Re: How does postgres behave if several indexes have (nearly) identical conditions?
On Wednesday, September 8, 2021, Koen De Groote wrote: > > > create index index_001 on my_table using btree (a001,a002,a003) where > a001=true and a002=false; > > create index index_002 on my_table using btree (a003) where a001=true and > a002=false; > > Now take this query: > > select * from my_table where a001=true; > > Which index will postgres pick? I'm wondering how postgres goes about > picking an index to consider. > Neither…since neither partial index condition is present in the where clause of the query. David J.
Re: How does postgres behave if several indexes have (nearly) identical conditions?
On Wednesday, September 8, 2021, Koen De Groote wrote: > Forgot to mention, this is on Postgres 11.2 > You should stop worrying about performance and indexes and instead focus on system stability and security - i.e., upgrade to a supported version. David J.
spiexceptions.UndefinedFile: could not open file "base/16482/681830": No such file or directory
Help - I don't know why I am getting this message, and I don't know how to fix it. Any advice will be greatly appreciated. Note that the file does exist:! (How do I know if it is looking under the correct directory? Other times I have done similar temporary table creations with no problems!): *ls -l /zfs/postgres/postgres13/base/16482/681830-rw--- 1 postgres postgres 122880 Sep 7 18:37 /zfs/postgres/postgres13/base/16482/681830* I am using postgresql 13 on ubuntu 20.04, and postgresql is on a zfs filesysem which reports no errors. I can access every table in my database, and can run every stored procedure (plpgsql and plpython3u) on it except the one that generates the above error.. Here is the output from my run: *select cmm_tsv_to_tables('/zfs/EXPERIMENT_A_C_NORMAL_0_1_2/10');NOTICE: ['found_patterns.tsv', 'classification.tsv', 'mined_patterns.tsv']NOTICE: ['tmp_2021_09_08_19_49_42_354417_found_patterns', 'tmp_2021_09_08_19_49_42_354417_classification', 'tmp_2021_09_08_19_49_42_354417_mined_patterns']NOTICE: CREATE TABLE tmp_2021_09_08_19_49_42_354417_found_patterns(Class TEXT,filename TEXT,pattern TEXT)ERROR: spiexceptions.UndefinedFile: could not open file "base/16482/681830": No such file or directoryLINE 1: CREATE TABLE tmp_2021_09_08_19_49_42_354417_found_patterns(... ^QUERY: CREATE TABLE tmp_2021_09_08_19_49_42_354417_found_patterns(Class TEXT,filename TEXT,pattern TEXT)CONTEXT: Traceback (most recent call last): PL/Python function "cmm_tsv_to_tables", line 39, in *plpy.execute(sql1)PL/Python function "cmm_tsv_to_tables"* and here (and attached) is the function up to the point where the error happens: - CREATE OR REPLACE FUNCTION cmm_tsv_to_tables( tsv_dir TEXT) RETURNS TEXT AS $$ import csv import datetime import glob import os import regex import time ts = time.time() prefix = "tmp_" + regex.sub(r"\D+","_",str(datetime.datetime.fromtimestamp(time.time()).isoformat())) + "_" if not os.path.exists(tsv_dir): plpy.notice("Scout output directory " + tsv_dir + " does not exist. Rerun this software with a directory which contains the tsv files you wish to database.") return "" try: os.chdir(tsv_dir) except: plpy.notice("postgresql does not have access to the directory " + tsv_dir + " of tsv files. Exiting!") return "" tsv_file = glob.glob('*.tsv') if len(tsv_file) == 0: plpy.notice("There are no tsv files in " + tsv_dir + ". Exiting!") return "" plpy.notice(tsv_file) table = [ prefix+os.path.splitext(x)[0] for x in tsv_file] plpy.notice(table) # create tables with text columns, some of which will later be casted to numbers, arrays, etc. for j in range(len(tsv_file)): with open(tsv_file[j]) as f: cols = f.readline().split() colsdef = ','.join([x + ' TEXT' for x in cols]) sql1 = "CREATE TABLE {}({})".format(table[j], colsdef) plpy.notice(sql1) #return 'exit' plpy.execute(sql1) sql1 = "COPY {} FROM '{}' WITH (FORMAT 'csv', HEADER, DELIMITER E'\t', NULL 'NULL')".format(table[j], tsv_file[j]); plpy.notice(sql1) plpy.execute(sql1) return "Tables were written" $$ LANGUAGE plpython3u; cmm_tsv_to_tables.plpython3u Description: Binary data
Re: How does postgres behave if several indexes have (nearly) identical conditions?
And initial setup is wrong. There should be no 'and a002=false' in the indexes. On Wed, Sep 8, 2021 at 11:15 PM Koen De Groote wrote: > Forgot to mention, this is on Postgres 11.2 > > On Wed, Sep 8, 2021 at 11:04 PM Koen De Groote wrote: > >> Greetings all. >> >> Example table: >> >> CREATE TABLE my_table ( >> id serial PRIMARY KEY, >> a001 BOOLEAN default 't', >> a002 BOOLEAN default 'f', >> a003 BOOLEAN default 't', >> a004 BOOLEAN default 'f' >> ); >> >> And these 2 indexes: >> >> create index index_001 on my_table using btree (a001,a002,a003) where >> a001=true and a002=false; >> >> create index index_002 on my_table using btree (a003) where a001=true and >> a002=false; >> >> Now take this query: >> >> select * from my_table where a001=true; >> >> Which index will postgres pick? I'm wondering how postgres goes about >> picking an index to consider. >> >> And if it will consider others if the analysis of the first says a >> seqscan would be better than the index it first considered? >> >> Regards, >> Koen De Groote >> >>
Re: How does postgres behave if several indexes have (nearly) identical conditions?
On Wednesday, September 8, 2021, Koen De Groote wrote: > And initial setup is wrong. There should be no 'and a002=false' in the > indexes. > > >>> create index index_001 on my_table using btree (a001,a002,a003) where >>> a001=true and a002=false; >>> >>> create index index_002 on my_table using btree (a003) where a001=true >>> and a002=false; >>> >>> Now take this query: >>> >>> select * from my_table where a001=true; >>> >>> Which index will postgres pick? I'm wondering how postgres goes about >>> picking an index to consider. >>> >>> And if it will consider others if the analysis of the first says a >>> seqscan would be better than the index it first considered? >>> >> Still probably neither since the sequential scan is likely the better choice (it depends on the number of true rows compared to all rows). It will have to look at the statistical data for both but given that the three-column one is strictly worse than the single column version (because the indexed columns don’t contribute anything worthwhile) it will mostly likely be a choice between a sequential scan and the index 002. Though if 002 has lots of bloat compared to index 001 the later may beat it out - but that just means your system needs index maintenance performed. David J.
Re: spiexceptions.UndefinedFile: could not open file "base/16482/681830": No such file or directory
On Thu, Sep 9, 2021 at 9:19 AM Celia McInnis wrote: > Note that the file does exist:! (How do I know if it is looking under the > correct directory? Other times I have done similar temporary table creations > with no problems!): PostgreSQL internally uses relative paths. It's probably not a very good idea to use 'chdir' in a procedure.
Re: spiexceptions.UndefinedFile: could not open file "base/16482/681830": No such file or directory
YES!!! The chdir caused the error! Thanks very much for this help. BTW it was worse than just "not a very good idea" to use chdir - After generating the error, I would lose the ability to see things in my database until restarting postgresql. All's okay now! On Wed, Sep 8, 2021 at 8:56 PM Thomas Munro wrote: > On Thu, Sep 9, 2021 at 9:19 AM Celia McInnis > wrote: > > Note that the file does exist:! (How do I know if it is looking under > the correct directory? Other times I have done similar temporary table > creations with no problems!): > > PostgreSQL internally uses relative paths. It's probably not a very > good idea to use 'chdir' in a procedure. >
Postgresql 11: terminating walsender process due to replication timeout
I have found some questions about the same error, but didn't find any of them answering my problem. The setup is that I have two Postgres11 clusters (A and B) and they are making use of publication and subscription features to copy data from A to B. A (source DB- publication) --> B (target DB - subscription) This works fine, but often (not always) when the data volume being inserted on a table in node A increases, it gives the following error. "terminating walsender process due to replication timeout" The data volume at the moment being entered is about 30K rows per second continuously for hours through COPY command. Earlier the wal_sender_timeout was set to 5 sec and I would see this error much often. I then increased it to 1 min and the frequency of this error reduced. But I don't want to keep increasing it without understanding what is causing it. I looked at the code of walsender.c and know the exact lines where it's coming from. But I am still not clear which parameter is making the sender assume that the receiver node is inactive and therefore it should stop the wal_sender. Can anyone please suggest what changes I should make to remove this error? sourcedb=# show wal_sender_timeout; wal_sender_timeout 1min (1 row) sourcedb=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn +--+---++--+---+++--+--++- sub_target_DB | pgoutput | logical | 16501 | sourcedb | f | t | 68229 | | 98839088 | 116D0/C36886F8 | 116D0/C3E5D370 targetdb=# show wal_receiver_timeout; wal_receiver_timeout-- 1min (1 row) targetdb=# show wal_retrieve_retry_interval ; wal_retrieve_retry_interval- 5s (1 row) targetdb=# show wal_receiver_status_interval; wal_receiver_status_interval-- 2s (1 row) targetdb=# select * from pg_stat_subscription; subid| subname | pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn |l atest_end_time++---+---++---+---++--- 2378695757 | sub_target_DB | 62371 | | 116D1/2BA8F170 | 2021-08-20 09:05:15.398423+09 | 2021-08-20 09:05:15.398471+09 | 116D1/2BA8F170 | 2021-08-20 09:05:15.398423+09 Increased the wal_sender_timeout to 5 mins and the error started appearing more frequently instead. Not only that, it even killed the active subscription and stopped replicating data. Had to restart it. So clearly, just increasing the wal_sender_timeout hasn't helped. -- _This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_
Subscriber to Get Only Some of The Tables From Publisher
Hi, I have a publisher with around 30 tables. I have two types of subscribers. Both types needs 25 "common" tables from the publisher plus 2-3 specific tables for each type of subscriber. For maintenance and monitoring reasons it is better for me to have both subscribers point to the same publisher but "skip/ignore" 2-3 tables that are not needed for them. If this is not possible, I will be forced to create 2 publishers. Any idea hot to ignore tables from publisher? IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.
Re: Postgresql 11: terminating walsender process due to replication timeout
At Thu, 9 Sep 2021 14:52:25 +0900, Abhishek Bhola wrote in > I have found some questions about the same error, but didn't find any of > them answering my problem. > > The setup is that I have two Postgres11 clusters (A and B) and they are > making use of publication and subscription features to copy data from A to > B. > > A (source DB- publication) --> B (target DB - subscription) > > This works fine, but often (not always) when the data volume being inserted > on a table in node A increases, it gives the following error. > > "terminating walsender process due to replication timeout" > > The data volume at the moment being entered is about 30K rows per second > continuously for hours through COPY command. > > Earlier the wal_sender_timeout was set to 5 sec and I would see this error > much often. I then increased it to 1 min and the frequency of this error > reduced. But I don't want to keep increasing it without understanding what > is causing it. I looked at the code of walsender.c and know the exact lines > where it's coming from. > > But I am still not clear which parameter is making the sender assume that > the receiver node is inactive and therefore it should stop the wal_sender. > > Can anyone please suggest what changes I should make to remove this error? What minor-version is the Postgres server mentioned? PostgreSQL 11 have gotten the following fix at 11.6, which could be related to the trouble. https://www.postgresql.org/docs/11/release-11-6.html > Fix timeout handling in logical replication walreceiver processes > (Julien Rouhaud) > > Erroneous logic prevented wal_receiver_timeout from working in > logical replication deployments. The details of the fix is here. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3f60f690fac1bf375b92cf2f8682e8fe8f69098 > Fix timeout handling in logical replication worker > > The timestamp tracking the last moment a message is received in a > logical replication worker was initialized in each loop checking if a > message was received or not, causing wal_receiver_timeout to be ignored > in basically any logical replication deployments. This also broke the > ping sent to the server when reaching half of wal_receiver_timeout. regards. -- Kyotaro Horiguchi NTT Open Source Software Center