Re: Errors with physical replication
Hello. At Mon, 21 May 2018 05:18:57 -0700 (MST), greigwise wrote in <1526905137308-0.p...@n3.nabble.com> > Hello. > > We are on Postgresql version 9.6.6. We have 2 EC2 instances in different > Amazon regions and we are doing physical replication via VPN. It all seems > to work just fine most of the time. I'm noticing in the logs that we have > recurring erros (maybe 10 or 12 times per day) that look like this: > 2018-05-17 06:36:14 UTC 5af0599f.210d LOG: invalid resource manager ID 49 > 2018-05-17 06:36:14 UTC 5afd22de.7ac4 LOG: started streaming WAL from > 2018-05-17 07:20:17 UTC 5afd22de.7ac4 FATAL: could not receive data from > WAL stream: server closed the connection unexpectedly > Or some that also look like this: > > 2018-05-17 07:20:17 UTC 5af0599f.210d LOG: record with incorrect prev-link > 2018-05-17 07:20:18 UTC 5afd2d31.1889 LOG: started streaming WAL from > 2018-05-17 08:03:28 UTC 5afd2d31.1889 FATAL: could not receive data from > WAL stream: server closed the connection unexpectedly > And some like this: > > 2018-05-17 23:00:13 UTC 5afd63ec.26fc LOG: invalid magic number in > log segment 00010385003C, offset 10436608 > 2018-05-17 23:00:14 UTC 5afe097d.49aa LOG: started streaming WAL from > primary at 385/3C00 on timeline 1 You recplication connection seems quite unstable and disconnected frequently. After disconnection, you will see several kinds of "I find a broken record in my WAL file" and they are cues for standby to switch to streaming. Itself is a normal operation as PostgreSQL with one known exception. > Then, like maybe once every couple months or so, we have a crash with logs > looking like this: > 2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology WARNING: > terminating connection because of crash of another server process I think the lines follow an error message like "FATAL: invalid memory alloc request size 3075129344". This is a kind of "broken record" but it is known to lead standby to crash. It is disucussed here. > [bug fix] Cascaded standby cannot start after a clean shutdown https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05#0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05 > When this last error occurs, the recovery is to go on the replica and remove > all the WAL logs from the pg_xlog director and then restart Postgresql. > Everything seems to recover and come up fine. I've done some tests > comparing counts between the replica and the primary and everything seems > synced just fine from all I can tell. It is right recovery steps, as far as looking the attached log messages. > So, a couple of questions. 1) Should I be worried that my replica is > corrupt in some way or given that everything *seems* ok, is it reasonable to > believe that things are working correctly in spite of these errors being > reported. 2) Is there something I should configure differently to avoid > some of these errors? It doesn't seem worth warrying from the viewpoint of data integrity, but if walsender/walreceiver timeouts fire too frequently, you might need to increase them for increased stability. > Thanks in advance for any help. > > Greig Wise regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?
Hello, Bryn. At Tue, 6 Aug 2019 15:18:51 -0700, Bryn Llewellyn wrote in > Here’s how I’ve tried to describe what I see for p2() with AUTOCOMMIT ON for > myself: > > 1. my call p2() starts a txn. > > 2. However, during the execution of the proc, the usual autocommit behavior > is programmatically turned off by explicit PostgreSQL code. > > 3. Other explicit PostgreSQL code makes “start transaction” inside a proc > simply cause a runtime error under all circumstances. However, txns can be > ended by “commit” or “rollback”. And new ones can be started—but only > implicitly by executing a SQL statement that, as a top level SQL, would start > a txn. In-procedure transaction control premises that no transaction is active before calling the procedure. https://www.postgresql.org/docs/11/sql-call.html > If CALL is executed in a transaction block, then the called > procedure cannot execute transaction control > statements. Transaction control statements are only allowed if > CALL is executed in its own transaction. With AUTOCOMMIT=off, implicit BEGIN is invoked just before CALLing p2() if no transaction is active. Thus p2() is always called having a transaction active, which inhibits in-procedure transaction control. I'm not sure why you want to turn AUTOCOMNIT off, but even with AUTOCOMMIT on, explict BEGIN prevents per-command COMMIT as you perhaps know. https://www.postgresql.org/docs/11/app-psql.html > When on (the default), each SQL command is automatically > committed upon successful completion. To postpone commit in > this mode, you must enter a BEGIN or START TRANSACTION SQL > command. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: question about zeroes in the wal file names
At Sun, 18 Aug 2019 16:17:03 +0200, Luca Ferrari wrote in > I'm just curious to better understand the naming convention behind wal > files, because I've seen on a system of mine that the wals created > were: > > 0005020E00FF > 0005020F > > while I was expecting 20E0x100. So I digged into the code and I've > seen, from the XLogFileName macro, that the last part is built as the > reminder of the number of segments per wal file: > > #define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes)\ > snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli,\ > (uint32) ((logSegNo) / XLogSegmentsPerXLogId(wal_segsz_bytes)), \ > (uint32) ((logSegNo) % XLogSegmentsPerXLogId(wal_segsz_bytes))) > > > and with the default wal size of 16 MB that gives a remainder of 256 > (FF + 1). Assuming I haven't missed anything, this means that there > are 6 zeroes that will never change in the last 8 chars of the wal > filename. Is therefore this only done to handle PostgreSQL WAL sizes > of 4 GB each? I think that's right. I don't see it used in other than WAL file names. As the macro name suggests, the 4GB block is internally called as "xlog id" (XLogId). The block size is determined so that the location within it can be handled as a 32-bit integer. Older versions of postgresql defined XLogRecPtr as a pair of two 32-bit integers. xlogdefs.h.~REL9_1_0~:17 /* * Pointer to a location in the XLOG. These pointers are 64 bits wide, * because we don't want them ever to overflow. * * NOTE: xrecoff == 0 is used to indicate an invalid pointer. This is OK * because we use page headers in the XLOG, so no XLOG record can start * right at the beginning of a file. * * NOTE: the "log file number" is somewhat misnamed, since the actual files * making up the XLOG are much smaller than 4Gb. Each actual file is an * XLogSegSize-byte "segment" of a logical log file having the indicated * xlogid. The log file number and segment number together identify a * physical XLOG file. Segment number and offset within the physical file * are computed from xrecoff div and mod XLogSegSize. */ typedef struct XLogRecPtr { uint32 xlogid; /* log file #, 0 based */ uint32 xrecoff;/* byte offset of location in log file */ } XLogRecPtr; regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: postmaster utilization
At Mon, 19 Aug 2019 10:07:30 -0400, Stephen Frost wrote in <20190819140730.gh16...@tamriel.snowman.net> > Greetings, > > * Ron (ronljohnso...@gmail.com) wrote: > > On 8/19/19 5:40 AM, Shiwangini Shishulkar wrote: > > >We have scheduled postgres full backup on centos 7 machine. DB size is > > >around 66 GB. We observed while backup is running, postmaster CPU % > > >reaches to 90 - 100%,which results very strange behavior: that we are not > > >able to perform any DDL and DML on to the database during the time. > > >Please suggest a way to limit it's CPU utilization or else let us know how > > >we can access the database during the backup window. Please let us know on > > >priority since this is our prod database. > > > > Are you compressing the backup? > > How many threads are you running? > > How many CPUs on your system? > > What version of Postgres? > > ... What are you using to run the backup? It seems to have been reported as BUG #15961.. https://www.postgresql.org/message-id/15965-413bf5d18aaef...@postgresql.org > PostgreSQL version: 11.4 > Operating system: CentOS Linux release 7.6.1810 (Core) > pg_dump -U postgres -d wg -f wg.sql regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg full text search very slow for Chinese characters
Hi. At Tue, 10 Sep 2019 18:42:26 +0200 (CEST), Andreas Joseph Krogh wrote in > På tirsdag 10. september 2019 kl. 18:21:45, skrev Tom Lane > <mailto:t...@sss.pgh.pa.us>>: Jimmy Huang writes: > > I tried pg_trgm and my own customized token parser > https://github.com/huangjimmy/pg_cjk_parser > > pg_trgm is going to be fairly useless for indexing text that's mostly > multibyte characters, since its unit of indexable data is just 3 bytes > (not characters). I don't know of any comparable issue in the core > tsvector logic, though. The numbers you're quoting do sound quite awful, > but I share Cory's suspicion that it's something about your setup rather > than an inherent Postgres issue. > > regards, tom lane We experienced quite awful performance when we hosted the > DB on virtual servers (~5 years ago) and it turned out we hit the write-cache > limit (then 8GB), which resulted in ~1MB/s IO thruput. Running iozone might > help tracing down IO-problems. -- > Andreas Joseph Krogh Multibyte characters also quickly bloats index by many many small buckets for every 3-characters combination of thouhsand of characters, which makes it useless. pg_bigm based on bigram/2-gram works better on multibyte characters. https://pgbigm.osdn.jp/index_en.html regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Referncing a calculated column in a select?
Hello. At Thu, 12 Sep 2019 23:16:01 +0200, "Peter J. Holzer" wrote in <20190912211601.ga3...@hjp.at> > On 2019-09-12 15:35:56 -0500, Ron wrote: > > On 9/12/19 2:23 PM, stan wrote: > > > I am creating some views, that have columns with fairly complex > > > calculations > > > in them. The I want to do further calculations using the result of this > > > calculation. Right now, I am just duplicating the first calculation in the > > > select fro the 2nd calculated column. There must be a batter way to do > > > this, right? > > > > I must be misunderstanding you, because you can just do more of what you're > > doing now. > > > > Here's an example of doing some calculations in the creation of the view, > > and then more calculations when selecting from the view: > > I think he wants to refer to other columns in the view. .. > What you can do is nest views: Doesn't subquery work? SELECT x, y, z FROM (SELECT f * 3 AS x, f AS y, g + 2 AS z FROM (SELECT a + 3 AS f, b + c AS g FROM t) AS t1 ) AS t2; t2 uses f in two columns, where f is calculated from t.a. Or CTE (WITH clause) might look better. WITH t1 AS (SELECT a + 3 AS f, b + c AS g FROM t), t2 AS (SELECT f * 3 AS x, f AS y, g + 2 AS z FROM t1) SELECT x, y, z FROM t2; regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Query Tuning
Hello. At Tue, 1 Oct 2019 12:42:24 +0530, Sonam Sharma wrote in > We have a query which is running slow and it's taking 26secs to complete.. > we have run the analyzer also and it's taking the same time. > > Any tool is there for query optimization or any suggestions. EXPLAIN ANALYZE (not just EXPLAIN) would be that. In many cases where a query takes an unexpectedly long time, rows estimation in some nodes would be largely different from actual rows. That leads to a wrong query plan. EXPLAIN ANALYZE may give you a clue for such kind of problem. The following output comes from EXPLAIN. I suppose that you already have a similar output having a "(actutal time=...)" clause after the "(cost=..)" clause. > My query plan looks like this : > > CTE Scan on approvalwflscreen (cost=8736.21..8737.25 rows=52 width=1214) > > CTE constants > > -> Result (cost=0.00..0.01 rows=1 width=44) > > CTE approval > > -> Sort (cost=7793.89..7805.22 rows=4530 width=292) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Create a logical and physical replication
Hello. At Mon, 4 Nov 2019 19:05:24 +0530, Ashok Kumar Tiwari wrote in > is it possible in postgres to create one physical replica for DR(data > recovery purpose) and another one logical replica for reporting purpose A physical replication standby cannot be a publisher. You will find that CREATE PUBLICATION fails. > master > physical standby (for DR Purpose) > | > | > | > | > logical replica for reporting purpose > > -- or cascading logic replication > > master > physical standby (for DR Purpose) - > > logical replica for reporting purpose The former works (as documented in [1]) but the latter doesn't. By the way, If I'm not missing anything I don't find a description on this restriction in the documentation. Shouldn't it be in [2], or [1]? [1]: https://www.postgresql.org/docs/12/logical-replication.html [2]: https://www.postgresql.org/docs/12/logical-replication-restrictions.html regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208
Hello. At Wed, 2 Oct 2019 19:24:02 -0400, Stephen Frost wrote in > Greetings, > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > Stephen Frost writes: > > > * Aleš Zelený (zeleny.a...@gmail.com) wrote: > > >> But recovery on replica failed to proceed WAL file > > >> 00010FED0039 with log message: " invalid contrecord length > > >> 1956 at FED/38FFE208". > > > > > Err- you've drawn the wrong conclusion from that message (and you're > > > certainly not alone- it's a terrible message and we should really have a > > > HINT there or something). > > > > Yeah, those messages are all pretty ancient, from when WAL was new and not > > to be trusted much. Perhaps the thing to do is move the existing info > > into DETAIL and make the primary message be something like "reached > > apparent end of WAL stream". > > Yes, +1 on that. The attached is something like that. AFAICS we can assume that an invalid record means the end of WAL as long as we are fetching successive records and XLogPageRead() has not logged anything. As the current comment in ReadRecord says, we don't have a message if standby has been triggered, but that's not always the case. Still we may have a message if new segment is available after triggering. I used a kind-of-tricky way to handle optional errdetail but it may be better to have some additional feature as an ereport subfunction. (Maybe named "errif" or something?) I think it works fine as expected but I find one arguable behavior. In the first patch, the "end of WAL" message is accompanied by source, LSN and timeline. LOG: reached end of WAL in archive at 0/A360 on timeline 8 DETAIL: invalid record length at 0/A360: wanted 24, got 0 The last two just seems good but the first one shows "archive" even on promotion. It is right that we scan archive after promotion but seems confusing. So I attached another patch that shows operation instead of WAL source. LOG: reached end of WAL during streaming reaplication at 0/A560 on timeline 8 DETAIL: invalid record length at 0/A560: wanted 24, got 0 What do you think about this? regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2bd3d0e5e5..70fd34659a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -852,6 +852,9 @@ static bool bgwriterLaunched = false; static int MyLockNo = 0; static bool holdingAllLocks = false; +/* Have we complaind about the record at the location? */ +static XLogRecPtr lastComplaint = 0; + #ifdef WAL_DEBUG static MemoryContext walDebugCxt = NULL; #endif @@ -889,6 +892,7 @@ static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt, XLogRecPtr tliRecPtr); static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); +static bool have_complained_at(XLogRecPtr RecPtr); static void XLogFileClose(void); static void PreallocXlogFiles(XLogRecPtr endptr); static void RemoveTempXlogFiles(void); @@ -4264,8 +4268,11 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, record = XLogReadRecord(xlogreader, RecPtr, &errormsg); ReadRecPtr = xlogreader->ReadRecPtr; EndRecPtr = xlogreader->EndRecPtr; + if (record == NULL) { + XLogRecPtr reportptr = RecPtr ? RecPtr : EndRecPtr; + if (readFile >= 0) { close(readFile); @@ -4273,13 +4280,24 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, } /* - * We only end up here without a message when XLogPageRead() - * failed - in that case we already logged something. + * When we end up here while reading successive records, we reached + * the end of WAL unless XLogPageRead() has logged something. We + * may or may not have a message here depending on the situation. + * Otherwise we just report the received message if any and if + * needed. + * + * Note: errormsg is alreay translated. */ - if (errormsg) -ereport(emode_for_corrupt_record(emode, - RecPtr ? RecPtr : EndRecPtr), - (errmsg_internal("%s", errormsg) /* already translated */ )); + if (RecPtr == InvalidXLogRecPtr && !have_complained_at(EndRecPtr)) +ereport(emode, + (errmsg("reached end of WAL in %s at %X/%X on timeline %u", +xlogSourceNames[currentSource], +(uint32) (reportptr >> 32), (uint32) reportptr, +ThisTimeLineID), + (errormsg ? errdetail_internal("%s", errormsg) : 0))); + else if (errormsg) +ereport(emode_for_corrupt_record(emode, reportptr), + (errmsg_
Re: pg_basebackup
Hello. At Mon, 23 Dec 2019 03:38:12 +, Daulat Ram wrote in > thanks Adrian, what about the > postmaster.opts file, this file was also skipped in backup. The file is overwritten at startup so there's no point in having it in a backup. Thus, it is deliberately excluded from a backup. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: How can I pushdown of functions used in targetlist with FDW ?
At Thu, 09 Jan 2020 14:12:25 -0500, Tom Lane wrote in > Stephen Frost writes: > > I agree with all of that- but it seems like we've got people asking for > > it, and I can understand why they are, and therefore I'd be inclined to > > provide a way for users to get that behavior and I'm even inclined to > > say that pushing down should be the 'default' if everything else is > > equal and it's possible to do so. > > [ shrug... ] I think that's a completely arbitrary value judgment > unsupported by any facts. Furthermore, it'd greatly expand our exposure > to problems with "we thought this function behaves the same on the remote > end, but it doesn't". That hazard makes me feel that the default should > *never* be to encourage pushing down, unless there's a pretty clear > potential win to justify the risk of breakage. > > If SQL/MED has a notion of a "foreign function" that is only executable on > the remote side, and that fails if you try to execute it locally, then > it'd make sense to implement that feature and then expect postgres_fdw to > try hard to push down such calls. But we don't have that (yet). I'm > unconvinced that ordinary functions ought to be treated as if remote > execution is preferable. Isn't ROUTING MAPPING [1] that? Definers should define one at their own risk as table constraints are. [1] https://www.postgresql.org/message-id/CAD21AoCSzZO%2Bd3EL-9w%3D97aC28%3DRb3%2BBh77wRLcP7HCH8-Bn9A%40mail.gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: How can I pushdown of functions used in targetlist with FDW ?
At Thu, 09 Jan 2020 20:25:20 -0500, Tom Lane wrote in > Kyotaro Horiguchi writes: > > Isn't ROUTING MAPPING [1] that? Definers should define one at their > > own risk as table constraints are. > > Hmm. It looks like that patch is moribund, and I can't say that I'm > excited about reviving it. Aside from the syntactical reserved-word > problems, it seems like a mighty heavyweight way of attacking the issue I don't mean that the patch itself is usable right now, but am simply asking about the feature. > --- that is, instead of a property directly attached to the function of > interest, you've now got this entire other system catalog structure that > has to be searched for relevant information. And that needs all the usual > support for a new object type, eg ALTER infrastructure, pg_dump support, > etc etc etc. Yeah, that sounds really combersome to me.. > Plus, once you've created a mapping, it's only good for one server so you > have to do it over again for each server. I guess in the general case > you have to have that, but I'd sure want some kind of shortcut for the > common case where e.g. ABS() means the same thing everywhere. As for the user side, I understand that they want to push down certain functions despite of the troubles, but it's better if simpler. About the idea of function attrributes, I think push-down'ability is not one of function's properties. postgres_fdw can push down many intrinsic functions. oracle_fdw can push down less functions. file_fdw cannot at all. So it doesn't seem rather an ability of FDW drivers. Putting aside the maintenance trouble, it would work for intrinsic functions. Even with the feature, we still cannot push down user-defined functions, or incompatible intrinsic functions, which the user wants to push down knowing about the incompatibility. If we allow that, we need to have a means to tell foreign servers about such functions. A means other than ROUTINE MAPPING I can come up with is server options and foreign table options. > The SQL committee do love verbosity, don't they. Sure.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208
Thank you, and sorry for overlooking your comment. At Thu, 14 Nov 2019 12:28:13 -0500, Tom Lane wrote in > Kyotaro Horiguchi writes: > > At Wed, 2 Oct 2019 19:24:02 -0400, Stephen Frost wrote > > in > >> * Tom Lane (t...@sss.pgh.pa.us) wrote: > >>> Yeah, those messages are all pretty ancient, from when WAL was new and not > >>> to be trusted much. Perhaps the thing to do is move the existing info > >>> into DETAIL and make the primary message be something like "reached > >>> apparent end of WAL stream". > > >> Yes, +1 on that. > > > What do you think about this? > > It seems overcomplicated. Why do you need to reach into > emode_for_corrupt_record's private state? > emode_for_corrupt_record's private state? I think we could just > change the message text without that, and leave the emode > machinations as-is. I tried to avoid messages at the same LSN. But it is done by emode_for_corrupt_record by reducing error level. I reverted that part. > I don't understand the restriction to "if (RecPtr == InvalidXLogRecPtr)" > either? Maybe that's fine but the comment fails to explain it. "When we end up here while reading successive recored" meant that, but it is not explicit. If RecPtr was not invalid, it means that the caller is requesting for a specific record that should exist. It is not end-of-wal at all. I rewrote the comment. > Another point is that, as the comment for emode_for_corrupt_record > notes, we don't really want to consider that we've hit end-of-WAL > when reading any source except XLOG_FROM_PG_WAL. So I think the > change of message ought to include a test of the source, but this > doesn't. Maybe no. The function just mutes messages for repeated error on XLOG_FROM_WAL. We consider end-of-WAL while XLOG_FROM_ARCHIVE. In this version the "reached end of WAL" is not emitted when emode_for_corrupt_record decided not to show the message. > Also, the business with an "operation" string violates the message > translatability guideline about "don't assemble a message out of > phrases". If we want to have that extra detail, it's better just > to make three separate ereport() calls with separately translatable > messages. Though I thought that the operation is just a noun and it is safely processed separately, I followed your comment. And this version takes the more verbose one of the previous two. > Also, it seems wrong that the page TLI check, just below, is where > it is and isn't part of the main set of page header sanity checks. > That's sort of unrelated to this patch, except not really, because > shouldn't a failure of that test also be treated as an "end of WAL" > condition? It seems checking if xlogreader did something wrong, since it is pluggable. I'm not sure there is any concrete reason for that, though. As for recovery, it is actually redundant because XLogFileReadAnyTLI already checked that for the record, but it doesn't matter. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 94ee9c982afc5b39bb062c59a95ac2323a4109c8 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 17 Jan 2020 17:15:58 +0900 Subject: [PATCH] Change end-of-WAL message less scary At the end of WAL during recovery, users read just "invalid record" message in logs, which is too scary. This patch change the message for the case to "reached end of WAL" to let users know that they are not in a trouble. --- src/backend/access/transam/xlog.c | 48 +-- 1 file changed, 40 insertions(+), 8 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7f4f784c0e..a7757ea4bf 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4279,6 +4279,10 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, EndRecPtr = xlogreader->EndRecPtr; if (record == NULL) { + int actual_emode = +emode_for_corrupt_record(emode, + RecPtr ? RecPtr : EndRecPtr); + if (readFile >= 0) { close(readFile); @@ -4286,15 +4290,43 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, } /* - * We only end up here without a message when XLogPageRead() - * failed - in that case we already logged something. In - * StandbyMode that only happens if we have been triggered, so we - * shouldn't loop anymore in that case. + * Invalid RecPtr here means we are reading successive records + * during recovery. If we get here during recovery, we assume that + * we reached the end of WAL. Otherwise something's really wrong + * and we report just only the errormsg if any. If we
Re: How to handle CASE statement with PostgreSQL without need for typecasting
Hello. At Tue, 18 Feb 2020 12:43:21 -0800, Adrian Klaver wrote in > test=> UPDATE t_update SET F1 = '02/23/2020'::unknown; > UPDATE 1 > > test=> UPDATE t_update SET F1 = (CASE WHEN ('02/18/2020' IS NULL ) > THEN ('02/18/2020' ) ELSE ('02/19/2020') END)::unknown; > ERROR: failed to find conversion function from unknown to timestamp > without time zone > > test=> UPDATE t_update SET F1 = (CASE WHEN ('02/18/2020' IS NULL ) > THEN ('02/18/2020' ) ELSE ('02/19/2020') END)::timestamp; > UPDATE 1 > > So there is some sort of different evaluation going on in the CASE > statement. The documentation says: https://www.postgresql.org/docs/10/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS > A cast applied to an unadorned string literal represents the initial > assignment of a type to a literal constant value, and so it will > succeed for any type (if the contents of the string literal are > acceptable input syntax for the data type). .. > However, automatic casting is only done for casts that are marked “OK > to apply implicitly” in the system catalogs. Other casts must be > invoked with explicit casting syntax. This restriction is intended to > prevent surprising conversions from being applied silently. Conversions from the type unkown is not registered in pg_cast. Also CREATE CAST on pseudo types like unknown is not allowed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: polygon && polygon and precision
Hello, Patrick. At Thu, 12 Mar 2020 10:16:15 -0700, Patrick Dowler wrote in > I am using polygon datatype with some very small values and discovered that > the overlaps operator appears to break down and give false positives for > values smaller than ~1e-6. I have tested this on 9.5 (.3 and .16) and 10.10. For usability, geometric arithmetics is performed having 1.0E-6 of tolerance. For example equalness between two floating point values is defined as: #define EPSILON 1.0E-06 #define FPeq(A, B) (fabs((A) - (B)) <= EPSILON) Point-same-as is defined using the comparison, so the following comparison gives true. =# select '(1.5E-06, 0)'::point ~= '(1.0E-06, 0.5E-06)'::point; ?column? -- t All comparisons are performed taking that degree of tolerance. By a rough calculation, that tolerance in long-lat coordinate is corresnponding to that of about 40 meters on the ground. > To reproduce (these examples only differ in the x-axis exponents) and > should be false but the first one is t(rue) > > select '((3.0e-07,-2),(9.0e-07,-2),(9.0e-07,1),(3.0e-07,1))'::polygon > && > '((2.0e-07,-0.1),(2.0e-07,0.1),(2.01e-07,0.1),(2.01e-07,-0.1))'::polygon; > > select '((3.0e-06,-2),(9.0e-06,-2),(9.0e-06,1),(3.0e-06,1))'::polygon > && > '((2.0e-06,-0.1),(2.0e-06,0.1),(2.01e-06,0.1),(2.01e-06,-0.1))'::polygon; > > Maybe suggests some single-precision floating point use in the > calculations... regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: It is possible to force periodically switch xlog?
At Wed, 1 Apr 2020 21:09:20 -0400, David Steele wrote in > On 4/1/20 7:19 PM, Edson Richter wrote: > > IIRC archive_timeout does precisely that. > > I would assume that "archive_mode" must be "on" or "always" - but I do > > have "archive_mode=off". > > I'm not sure what the point of switching the WAL segment would be with > archive_mode=off. > > What are you trying to accomplish? Also I'm a bit curious about that, but anyway archive_timeout works even with archive_mode=off. The name looks being a bit off of its substance. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
EINTR while resizing dsm segment.
I provided the subject, and added -hackers. > Hello, > I am running postgres 11.5 and we were having issues with shared segments. > So I increased the max_connection as suggested by you guys and reduced my > work_mem to 600M. > > Right now instead, it is the second time I see this error : > > ERROR: could not resize shared memory segment "/PostgreSQL.2137675995" to > 33624064 bytes: Interrupted system call The function posix_fallocate is protected against EINTR. | do | { | rc = posix_fallocate(fd, 0, size); | } while (rc == EINTR && !(ProcDiePending || QueryCancelPending)); But not for ftruncate and write. Don't we need to protect them from ENTRI as the attached? -- Kyotaro Horiguchi NTT Open Source Software Center >From 590b783f93995bfd1ec05dbcb2805a577372604d Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 2 Apr 2020 17:09:35 +0900 Subject: [PATCH] Protect dsm_impl from EINTR dsm_impl functions should not error-out by EINTR. --- src/backend/storage/ipc/dsm_impl.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c index 1972aecbed..f4e7350a5e 100644 --- a/src/backend/storage/ipc/dsm_impl.c +++ b/src/backend/storage/ipc/dsm_impl.c @@ -360,7 +360,11 @@ dsm_impl_posix_resize(int fd, off_t size) int rc; /* Truncate (or extend) the file to the requested size. */ - rc = ftruncate(fd, size); + do + { + rc = ftruncate(fd, size); + } while (rc < 0 && errno == EINTR && + !(ProcDiePending || QueryCancelPending)); /* * On Linux, a shm_open fd is backed by a tmpfs file. After resizing with @@ -874,11 +878,19 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size, while (success && remaining > 0) { Size goal = remaining; + Size rc; if (goal > ZBUFFER_SIZE) goal = ZBUFFER_SIZE; pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE); - if (write(fd, zbuffer, goal) == goal) + + do + { +rc = write(fd, zbuffer, goal); + } while (rc < 0 && errno == EINTR && + !(ProcDiePending || QueryCancelPending)); + + if (rc == goal) remaining -= goal; else success = false; -- 2.18.2
Re: pg_restore: could not close data file: Success
Hello. Added -hackers. At Wed, 15 Apr 2020 12:14:25 +0200, "Peter J. Holzer" wrote in > On 2020-04-15 12:01:46 +0200, Peter J. Holzer wrote: > > I'm trying to restore a backup on a different machine and it terminates > > with the not really helpful messages: > > > > pg_restore: [directory archiver] could not close data file: Success > > pg_restore: [parallel archiver] a worker process died unexpectedly > [...] > > My guess is that maybe one of the data files is damaged > > As is often the case the matter became obvious a few minutes after > writing the mail. > > There were indeed two file with length 0 in the dump. That happened > because the backup failed because it couldn't obtain a lock on a table. > > I nicer error message (something like "cannot decompress '13503.dat.gz': > Empty file") would have helped. Unfortunately, just emptying .dat.gz file doesn't worked for me. Anyway the message is emitted the following way. pg_backup_directoy.c: > if (cfclose(cfp) !=0) >fatal("could not close data file: %m"); %m doesn't work for some kinds of errors about compressed files but cfclose conseals the true cause. I'm surprised to find an old thread about the same issue. https://www.postgresql.org/message-id/20160307.174354.251049100.horiguchi.kyotaro%40lab.ntt.co.jp But I don't think it's not acceptable that use fake errno for gzclose, but cfclose properly passes-through the error code from gzclose, so it is enought that the caller should recognize the difference. Please find the attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c index 1417401086..3a8394d7f2 100644 --- a/src/bin/pg_dump/compress_io.c +++ b/src/bin/pg_dump/compress_io.c @@ -645,6 +645,13 @@ cfgets(cfp *fp, char *buf, int len) return fgets(buf, len, fp->uncompressedfp); } +/* + * cfclose close the stream + * + * Returns 0 if successfully closed the cfp. Most of errors are reported as -1 + * and errno is set. Otherwise the return value is the return value from + * gzclose and errno doesn't hold a meangful value. + */ int cfclose(cfp *fp) { @@ -665,6 +672,11 @@ cfclose(cfp *fp) #endif { result = fclose(fp->uncompressedfp); + + /* normalize error return, just in case EOF is not -1 */ + if (result != 0) + result = -1; + fp->uncompressedfp = NULL; } free_keep_errno(fp); diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c index c9cce5ed8a..ecc6aa5fbb 100644 --- a/src/bin/pg_dump/pg_backup_directory.c +++ b/src/bin/pg_dump/pg_backup_directory.c @@ -108,6 +108,7 @@ void InitArchiveFmt_Directory(ArchiveHandle *AH) { lclContext *ctx; + int ret; /* Assuming static functions, this can be copied for each format. */ AH->ArchiveEntryPtr = _ArchiveEntry; @@ -218,8 +219,14 @@ InitArchiveFmt_Directory(ArchiveHandle *AH) ReadToc(AH); /* Nothing else in the file, so close it again... */ - if (cfclose(tocFH) != 0) - fatal("could not close TOC file: %m"); + ret = cfclose(tocFH); + if (ret < 0) + { + if (ret == -1) +fatal("could not close TOC file: %m"); + else +fatal("could not close TOC file: zlib error (%d)", ret); + } ctx->dataFH = NULL; } } @@ -378,6 +385,7 @@ _PrintFileData(ArchiveHandle *AH, char *filename) char *buf; size_t buflen; cfp *cfp; + int ret; if (!filename) return; @@ -396,8 +404,15 @@ _PrintFileData(ArchiveHandle *AH, char *filename) } free(buf); - if (cfclose(cfp) !=0) - fatal("could not close data file: %m"); + + ret = cfclose(cfp); + if (ret < 0) + { + if (ret == -1) + fatal("could not close data file: %m"); + else + fatal("could not close data file: zlib error (%d)", ret); + } } /* @@ -429,6 +444,7 @@ _LoadBlobs(ArchiveHandle *AH) lclContext *ctx = (lclContext *) AH->formatData; char fname[MAXPGPATH]; char line[MAXPGPATH]; + int ret; StartRestoreBlobs(AH); @@ -460,9 +476,16 @@ _LoadBlobs(ArchiveHandle *AH) fatal("error reading large object TOC file \"%s\"", fname); - if (cfclose(ctx->blobsTocFH) != 0) - fatal("could not close large object TOC file \"%s\": %m", - fname); + ret = cfclose(ctx->blobsTocFH); + if (ret < 0) + { + if (ret == -1) + fatal("could not close large object TOC file \"%s\": %m", + fname); + else + fatal("could not close large object TOC file \"%s\": zlib error (%d)", + fname, ret); + } ctx->blobsTocFH = NULL; @@ -555,6 +578,7 @@ _CloseArchive(ArchiveHandle *AH) { cfp *tocFH; char fname[MAXPGPATH]; + int ret; setFilePath(AH, fname, "toc.dat"); @@ -
Re: pg_restore: could not close data file: Success
At Thu, 16 Apr 2020 14:40:09 +0900, Michael Paquier wrote in > On Thu, Apr 16, 2020 at 12:08:09PM +0900, Kyotaro Horiguchi wrote: > > I'm surprised to find an old thread about the same issue. > > > > https://www.postgresql.org/message-id/20160307.174354.251049100.horiguchi.kyotaro%40lab.ntt.co.jp > > > > But I don't think it's not acceptable that use fake errno for gzclose, > > but cfclose properly passes-through the error code from gzclose, so it > > is enought that the caller should recognize the difference. > > A problem with this patch is that we may forget again to add this > special error handling if more code paths use cfclose(). Definitely. The reason for the patch is the error codes are diffrent according to callers and some of callers don't even checking the error (seemingly intentionally). > As of HEAD, there are three code paths where cfclose() is called but > it does not generate an error: two when ending a blob and one when > ending a data file. Perhaps it would make sense to just move all this > error within the routine itself? Note that it would also mean > registering file names in lclContext or equivalent as that's an > important piece of the error message. Hmm. Sounds reasonable. I'm going to do that. Thanks! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: walreceiver termination
Hello. At Mon, 4 May 2020 09:09:15 -0500, Justin King wrote in > Would there be anyone that might be able to help troubleshoot this > issue -- or at least give me something that would be helpful to look > for? > > https://www.postgresql.org/message-id/flat/CAGH8ccdWLLGC7qag5pDUFbh96LbyzN_toORh2eY32-2P1%3Dtifg%40mail.gmail.com > https://www.postgresql.org/message-id/flat/CANQ55Tsoa6%3Dvk2YkeVUN7qO-2YdqJf_AMVQxqsVTYJm0qqQQuw%40mail.gmail.com > https://dba.stackexchange.com/questions/116569/postgresql-docker-incorrect-resource-manager-data-checksum-in-record-at-46f-6 > > I'm not the first one to report something similar and all the > complaints have a different filesystem in common -- particularly ZFS > (or btrfs, in the bottom case). Is there anything more we can do here > to help narrow down this issue? I'm happy to help, but I honestly > wouldn't even know where to begin. The sendto() call at the end of your strace output is "close connecion" request to wal sender and normally should be followed by close() and kill(). If it is really the last strace output, the sendto() is being blocked with buffer-full. My diagnosis of the situation is that your replication connection had a trouble and the TCP session is broken in the way wal receiver couldn't be aware of the breakage. As the result feedback message packets from wal receiver were detained in tcp send buffer then finally the last sendto() was blocked while sending the close-connection message. If it happens constantly, routers or firewalls between the primary and standby may be discarding sessions inadvertantly. I'm not sure how ZFS can be involved in this trouble, though. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: PostgreSQL-12 replication failover, pg_rewind fails
Hello. At Tue, 12 May 2020 06:32:30 +, Mariya Rampurawala wrote in > I am working on providing HA for replication, using automation scripts. > My set up consists on two nodes, Master and Slave. When master fails, The > slave is promoted to master. But when I try to re-register the old master as > slave, the pg_rewind command fails. Details below. ... > 1. Rewind again: > 2. -bash-4.2$ /usr/pgsql-12/bin/pg_rewind -D /pg_mnt/pg-12/data > --source-server="host=10.209.57.17 port=5432 user=postgres dbname=postgres" > > pg_rewind: servers diverged at WAL location 6/B9D8 on timeline 53 > > pg_rewind: error: could not open file > "/pg_mnt/pg-12/data/pg_wal/0035000600B9": No such file or > directory > > pg_rewind: fatal: could not find previous WAL record at 6/B9D8 > > > I have tried this multiple times but always face the same error. Can someone > help me resolve this? As the error message is saying, required WAL file has been removed on the old master. It is the normal behavior and described in the documentation. https://www.postgresql.org/docs/12/app-pgrewind.html > but if the target cluster ran for a long time after the divergence, > the old WAL files might no longer be present. In that case, they can > be manually copied from the WAL archive to the pg_wal directory, or > fetched on startup by configuring primary_conninfo or restore_command. So you seem to need to restore the required WAL files from archive or the current master. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails
At Tue, 12 May 2020 17:29:50 +0200, Laurenz Albe wrote in > On Tue, 2020-05-12 at 09:40 +, Mariya Rampurawala wrote: > > > but if the target cluster ran for a long time after the divergence, > > > the old WAL files might no longer be present. In that case, they can > > > be manually copied from the WAL archive to the pg_wal directory, or > > > fetched on startup by configuring primary_conninfo or restore_command. > > > > I hit this issue every time I follow the aforementioned steps, manually as > > well as with scripts. > > How long is "long time after divergence"? Is there a way I can make some > > configuration changes so that I don’t hit this issue? > > Is there anything I must change in my restore command? As mentioned in the documentation, pg_rewind uses the WAL records startng from the last checkpoint just before the divergence point. The divergence point is shown as the follows in your log messages. > pg_rewind: servers diverged at WAL location 6/B9D8 on timeline 53 pg_rewind scans backward starting from that location to find a checkpoint record, which is the oldest WAL record pg_rewind needs. As you see it is not a matter of calculation. There's no other way than actually restoring WAL segment files to read and try finding. > What you can do is to use a higher value for "wal_keep_segments". > Then PostgreSQL will keep around that number of old WAL segments, > which increases the chance for "pg_rewind" to succeed. So this is one effective way to reduce the chance to lose required WAL (segment) files. On PG12, an easy way to automatically restore all required WAL files would be restoring the WAL file every time pg_rewind complains that it is missing. Or, you could use pg_waldump to find a checkpoint record. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Hot and PITR backups in same server
At Mon, 18 May 2020 11:11:44 +0300, "Andrus" wrote in > Hi! > > >This is confused or at least confusing. > >- "max_wal_size" of 5TB is clearly insane. As Laurentz said, even if you set it to 5TB, no WAL files older than the files needed by the last checkpoint don't remain. If you don't need a hot-standby, you can use pg_receivewal to save WAL files on the "standby" server. If you need the hot-standby, WAL files should be archived. > >- I don't understand what you mean by "Start backup server for hot > >- standby backups". > >Do I get it right that you want to copy a streaming replication > >standby server's data > directory to perform PITR? > > I want to copy only pg_wal directory contents from this. If so, what you need seems to be pg_receivewal, not a full-fledged server. > After pg_basebackup has finished, copy of its data directory is saved > for possible PITR. > Its pg_wal contents will be replaced with current pg_wal directory > contents. > > After that recovery time will set it postgres.conf and separate server > in 5433 is used to preform PITR using this data directory. > > >That doesn't see like a good plan, because the standby > >usually won't be much behind the primary server, and you can only > >recover to a later > >point in time. > > I will get data from copy taken when hot backup server was created and > replace its pg_wal directory contents from pg_wal in current backup > server pg_wal directory. > Since max pg wal size is big it should contain all WAL segments from > time where base backup was taken. As mentioned above, that assumption is wrong. You need to archive WAL files out of pg_wal directory, or pg_recievewal. > >If you care to elaborate, perhaps the question can be answered. > > Currently binary async hot standby backup server is working OK and > replicates main sever database almost immediately. > > However if important data is deleted in same day, it cannot recovered > since hot standby server database has replicated same data as in > master server. > Master server disk space is limited. It cannot keep large number of > wal segments. > Standby server has lot of free disk space. recovery_min_apply_delay might be useful for you. With the setting, the standby follows the primary after that minutes or hours. https://www.postgresql.org/docs/12/runtime-config-replication.html#GUC-RECOVERY-MIN-APPLY-DELAY I haven't even tried by myself, but the following steps would perhaps work. 0. If data is found to be broken on the primary. 1. Stop the standby immediately and take a cold backup including pg_wal directory. 2. You may find the just dropped data in the standby. 3. If you need to advance the standby, you can proceed to recover upto arbitrary LSN after x min ago using recovery_target_lsn/time. If you go too far, start again from the backup taken in the step 1. https://www.postgresql.org/docs/12/runtime-config-wal.html#RUNTIME-CONFIG-WAL-RECOVERY-TARGET > I'm looking for a way to collect data for PITR recovery (WAL segments) > in standby server. > > I have initial base backup of data directory created using > pg_basebackup. > All WAL segments from master server after pg_basebackup should saved > in backup server and should be used for PITR recovery when needed. > > How to use hot standby server for this ? > > Or should separate cluster in 5433 port created and pg_rewind or > something other used for PITR. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Configuring more than one hot standby server
At Tue, 19 May 2020 22:39:20 +0530, Santhosh Kumar wrote in > Hi Community, > I read about "hot stand by" server and as far as the document goes, it > does not explicitly explain about, whether is it possible to configure more > than on database as a Hot stand by server. If we can configure more than > one hot stand by server, which database will take the role of master, in > case the original master instance crashes? Does a leader election happen? You can configure two or more standbys for one primary and cofigure some of or oall of them as synchronous standbys. Howerver, PostgreSQL doesn't choose the next-primary automatically. A replication cluster involving three or more servers can be configured in a couple of ways. You may configure them in a cascade replication chain, or in fanned-out configuration, or mixture of the two. If you made a cascaded replication set, you don't have a choice other than the first standby in the chain. before: [primary] - [standby 1] - [standby 2] ... after : [old standby 1] - [standby 2] ... If you made a fanned-out replication set. You can configure some of or all of them to be synchronous standbys, which is guaranteed to have the same data for all-commited transactions on the primary. If you configured priority-mode synchronous standbys, you can choose the next primary among the first n active standbys. If you didn't configured synchronous standbys, you would need to check up the all standbys and choose the most-"advanced" one. That is, the next primary is the standby having the largest receive-LSN. You need to reconstruct the replication set anyways. before: [primary] -+- [standby 1] LSN = x - 8 +- [standby 2] LSN = x - 2 <= the next primary +- [standby 3] LSN = x - 6 after : [old standby 2] -+- [standby 1] +- [standby 3] That is the same for the case of quorum-mode synchronous standbys setting. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: How to recover from compressed wal archieve in windows
At Wed, 20 May 2020 19:25:50 -0700, Adrian Klaver wrote in > On 5/20/20 6:27 PM, Michael Paquier wrote: > > On Wed, May 20, 2020 at 11:36:09AM -0700, Adrian Klaver wrote: > >> The next problem is that I'm pretty sure a WAL file with *.gz > >> extension will > >> not be able to be processed directly by the server. So you are going > >> to have > >> to uncompress it at some point before it gets restored. > > The short answer to that question is no. The backend does not > > uncompress the segment file. What happens is that the restore command > > copies the file defined by %f to the location of %p where is gets > > renamed to RECOVERYXLOG, and we expect the restore command to drop a > > 16MB file in og_wal/. There is a check on the size, which would fail > > if the WAL segment is still compressed. This logic is in > > RestoreArchivedFile() in xlogarchive.c. > > I figured that would be the case. > > So how is this handled?: > > wal_compression (boolean) > > When this parameter is on, the PostgreSQL server compresses a full > page image written to WAL when full_page_writes is on or during a base > backup. A compressed page image will be decompressed during WAL > replay. The default value is off. Only superusers can change this > setting. Difference from decompression by restore_command? A WAL (segment) file is filled with multiple WAL records. The "full page image", which is described to be compressed by the parameter, is a part of WAL record. A WAL file is filled with maybe-compressed WAL records and has the same size in the case where wal_compression is on. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: How to recover from compressed wal archieve in windows
At Thu, 21 May 2020 06:55:41 +0200, Paul Förster wrote in > Hi Adrian, > > > On 21. May, 2020, at 01:03, Adrian Klaver wrote: > > > > Yeah, the issue would be getting a version of gunzip that works on Windows. > > In the past I have had luck with: > > > > https://www.7-zip.org/ > > > > Not sure how well it works with redirects/pipes. > > I'm no windoze guru, so I don't know about that, but you may have a look at: > > http://gnuwin32.sourceforge.net/packages/gzip.htm Or cygwin or mingw? http://www.mingw.org/ regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: canceling statement due to conflict with recovery after pg_basebackup
At Wed, 3 Jun 2020 10:07:14 +0300, "Andrus" wrote in > Hi! Hi. > Async binary replication hot standby was started after pg_basebackup. > Running query in slave throws error > > ERROR: canceling statement due to conflict with recovery > > Why ? As written in the messages. > 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba DETAIL: User query > might have needed to see row versions that must be removed. Despite the master had removed some rows by vacuum and that must be reflected to your standby, the standby cannot do that because of the transaction that has started before the rows are vacuumed. More technically, the transaction on the standby was using a snapshot with older transaction ids than the vacuum cutoff transaction id on the master. So the standby needed to cancel the the statement or session in order to continue replication. > Query should return table and other sizes in decreasing order. > How to improve it so that this error does not occur. Hot-standby-feedback would work. https://www.postgresql.org/docs/12/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK > Log: > > 2020-06-03 09:40:52 EEST LOG: database system was interrupted; last > known up at 2020-06-03 07:59:56 EEST > 2020-06-03 09:41:10 EEST LOG: entering standby mode > 2020-06-03 09:41:10 EEST LOG: redo starts at 2E2/28 > 2020-06-03 09:41:19 EEST LOG: consistent recovery state reached at > 2E2/B5A56C8 > 2020-06-03 09:41:19 EEST LOG: database system is ready to accept read > only connections > 2020-06-03 09:41:19 EEST LOG: started streaming WAL from primary at > 2E2/C00 on timeline 1 > 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba ERROR: canceling > statement due to conflict with recovery > 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba DETAIL: User query > might have needed to see row versions that must be removed. > 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba STATEMENT: select > company_name(n.nspname)::char(20) as company, > relname::char(25), >pg_size_pretty(pg_total_relation_size(c.oid))::char(10) as totalsize, > n.nspname::char(12), >case > when c.relkind='i' then 'index' >when c.relkind='t' then 'toast' > when c.relkind='r' then 'table' > when c.relkind='v' then 'view' > when c.relkind='c' then 'composite type' > when c.relkind='S' then 'sequence' >else c.relkind::text > end ::char(14) as "type" > from > pg_class c >left join pg_namespace n on n.oid = c.relnamespace >left join pg_tablespace t on t.oid = c.reltablespace > where >(pg_total_relation_size(c.oid)>>21)>0 and c.relkind!='t' > order by >pg_total_relation_size(c.oid) desc regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Conflict with recovery on PG version 11.6
At Thu, 18 Jun 2020 23:29:49 +0300, Toomas Kristin wrote in > Hi, > > > There can be other reasons: > > > > - replicated ACCESS EXCLUSIVE locks that conflict with queries > > - replicated ACCESS EXCLUSIVE locks that cause deadlocks > > - buffer pins that are needed for replication but held by a query > > - dropped tablespaces that hold temporary files on the standby > > Thank you for ideas what to verify. > > > I told you the remedies above, why don't you like them? > > Basically I want to achieve situation where replication is not suspended > (lag is not more than 3 minutes) and statements on standby are not > terminated. Based on collected information I don’t see any connection between > vacuuming on master and termination of statements on standby. I can > temporarily disable vacuuming in order to be 100% sure this is the case. And > when I set max_standby_streaming_delay either -1 or as a very big number then > it helps avoid query termination but doesn’t help me about suspended > replication. All worked with same configuration on Postgres version 10.6, the > issue started after version upgrade. > > This is the reason why I am very keen to find out real cause for the conflict. FWIW in case you haven't tried yet, if you could find a DETAILS: line following to the ERROR: canceling.." message in server log, it would narrow the possibility. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: A query in Streaming Replication
At Sat, 13 Jun 2020 23:12:25 +0530, Sreerama Manoj wrote in > Hello, > > Forgot to add the version details. I'm using postgres 10.7 . > > On Sat, Jun 13, 2020, 20:26 Sreerama Manoj > wrote: > > > Hello, > > I use streaming replication in async mode. When master gets down, > > slave will be promoted using a trigger file. During this process ".partial" > > file will be created and a WAL file with same ID will be created in a > > different time line in slave. When master comes back as slave, it will be > > synced to the current master. This is the normal procedure as far as I > > understood. But in some cases, ".partial" file is not getting created and It is not guaranteed that a crashed master can be used as a new standby as-is, since there can be unsent WAL records on the old master after the LSN where the new master promoted. If files are assumed to be sound, pg_rewind will adjust the old master as a new standby. The .partial file is created when timeline diverges at midst of a WAL segment. It is useful when performing PITR to the end of the same timeline, rather than going into the next timeline. I don't have an idea at hand of how this can be relevant to the reusability of the old master.. > > peer DB which comes back as slave is unable to sync when this happens.. > > Please suggest if this happens in any scenario and how to overcome this. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Questions about Logical Replication - Issue ???
Hi, At Thu, 30 Jul 2020 14:54:08 +0100, FOUTE K. Jaurès wrote in > Hi everyone, > > Situation: > >- A Master Database on the HQ >- i make a dump of the master database to the Subdivision Server >- I create à Publication like: CREATE PUBLICATION >iNOV_MasterData_Table_Pub FOR TABLE M_Product; On the Master Database >- On the Subdivision Server, I create a Subscription like: CREATE >SUBSCRIPTION iNOV_MasterData_Table_XXX_Sub CONNECTION '' >PUBLICATION iNOV_MasterData_Table_Pub; >- On the log, I have this error: > - 2020-07-30 14:32:59.366 WAT [8022] ERROR: duplicate key value > violates unique constraint "m_product_pkey" > 2020-07-30 14:32:59.366 WAT [8022] DETAIL: Key > (m_product_id)=(1001426) already exists. > 2020-07-30 14:32:59.366 WAT [8022] CONTEXT: COPY m_product, line 1 > 2020-07-30 14:32:59.369 WAT [1536] LOG: background worker "logical > replication worker" (PID 8022) exited with exit code 1 > > What can I do to solve this? Is it normal ? It > BTW: When I create Subscription With the option (copy_data = false), I am > able to replicate the new record. As you know, initial table copy happens defaultly at subscription creation (the COPY command in the above log lines was doing that). If you are sure that the publisher table is in-sync with the subscriber one, you can use copy_data=false safely and it's the proper operation. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: 12.3 replicas falling over during WAL redo
At Sat, 1 Aug 2020 09:58:05 -0700, Ben Chobot wrote in > > > Alvaro Herrera wrote on 8/1/20 9:35 AM: > > On 2020-Aug-01, Ben Chobot wrote: > > > >> We have a few hundred postgres servers in AWS EC2, all of which do > >> streaming > >> replication to at least two replicas. As we've transitioned our fleet > >> to > >> from 9.5 to 12.3, we've noticed an alarming increase in the frequency > >> of a > >> streaming replica dying during replay. Postgres will log something > >> like: > >> > >> |2020-07-31T16:55:22.602488+00:00 hostA postgres[31875]: [19137-1] > >> |db=,user= > >> LOG: restartpoint starting: time 2020-07-31T16:55:24.637150+00:00 > >> hostA > >> postgres[24076]: [15754-1] db=,user= FATAL: incorrect index offsets > >> supplied > >> 2020-07-31T16:55:24.637261+00:00 hostA postgres[24076]: [15754-2] > >> db=,user= > >> CONTEXT: WAL redo at BCC/CB7AF8B0 for Btree/VACUUM: lastBlockVacuumed > >> 1720 > >> 2020-07-31T16:55:24.642877+00:00 hostA postgres[24074]: [8-1] > >> db=,user= LOG: > >> startup process (PID 24076) exited with exit code 1| > > I've never seen this one. > > > > Can you find out what the index is being modified by those LSNs -- is > > it > > always the same index? Can you have a look at nearby WAL records that > > touch the same page of the same index in each case? > > > > One possibility is that the storage forgot a previous write. > > I'd be happy to, if you tell me how. :) > > We're using xfs for our postgres filesystem, on ubuntu bionic. Of > course it's always possible there's something wrong in the filesystem > or the EBS layer, but that is one thing we have not changed in the > migration from 9.5 to 12.3. All of the cited log lines seem suggesting relation with deleted btree page items. As a possibility I can guess, that can happen if the pages were flushed out during a vacuum after the last checkpoint and full-page-writes didn't restored the page to the state before the index-item deletion happened(that is, if full_page_writes were set to off.). (If it found to be the cause, I'm not sure why that didn't happen on 9.5.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: 12.3 replicas falling over during WAL redo
At Tue, 4 Aug 2020 09:53:36 -0400, Alvaro Herrera wrote in > On 2020-Aug-03, Alvaro Herrera wrote: > > > > lsn | checksum | flags | lower | upper | special | pagesize | > > > version | prune_xid > > > --+--+---+---+---+-+--+-+--- > > > A0A/99BA11F8 | -215 | 0 | 180 | 7240 | 8176 | 8192 > > > | 4 | 0 > > > > > > As I understand what we're looking at, this means the WAL stream was > > > assuming this page was last touched by A0A/AB2C43D0, but the page itself > > > thinks it was last touched by A0A/99BA11F8, which means at least one write > > > to the page is missing? > > > > Yeah, that's exactly what we're seeing. Somehow an older page version > > was resurrected. Of course, this should never happen. > > ... although, the block should have been in shared buffers, and it is > there that the previous WAL record would have updated -- not necessarily > flushed to disk. Yeah. On the other hand, the WAL records shown upthread desn't have a FPW. > rmgr: Btree len (rec/tot): 72/72, tx: 76393394, lsn: > A0A/AB2C43D0, prev A0A/AB2C4378, desc: INSERT_LEAF off 41, blkref #0: rel > 16605/16613/60529051 blk 6501 > rmgr: Btree len (rec/tot): 72/72, tx: 76396065, lsn: > A0A/AC4204A0, prev A0A/AC420450, desc: INSERT_LEAF off 48, blkref #0: rel > 16605/16613/60529051 blk 6501 There must be a record for the page 6501 conveying FPW after the last checkpoint. If it is not found, something wrong on deciding whether to attach FPW. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: How bad is using queries with thousands of values for operators IN or ANY?
At Mon, 31 Aug 2020 16:04:43 +0200, Pavel Stehule wrote in > po 31. 8. 2020 v 13:29 odesílatel Thomas Kellerer napsal: > > > Thorsten Schöning schrieb am 31.08.2020 um 12:37: > > > So for what query size or number of IDs to compare in IN would you > > > consider a different approach at all? > > > > > > In my experience "hundreds" of IDs tend to be quite slow if used with an > > IN clause. > > > > Rewriting the IN to a JOIN against a VALUES clause is very often faster: > > > > So instead of: > > > > select * > > from t > > where id in (1,2,3, ,500); > > > > using this: > > > > select * > > from t > > join ( > >values (1),(2),(3),...(500) > > ) as x(id) on x.id = t.id > > > > produces more often than not a more efficient execution plan (assuming no > > values are duplicated in the IN list) > > > > Obviously I don't know if such a re-write is even feasible though. > > > > yes - this query probably will have a slow start, but the execution will be > fast. Unfortunately, there are not available statistics. FWIW, the attached is the dusted-off version of a part of a stalled development of mine, which unconditionally(!) creates on-the-fly statistics on VALUES list. It seems to work for certain cases, although the planning time increases significantly. =$ CREATE TABLE t1 AS SELECT a, a * 2 AS b FROM generate_series(0, 9) a; =$ CREATE INDEX ON t1 (a); > perl q.pl(*) | psql *: q.pl: > print "explain analyze select b from t1 join (values "; > foreach $i (0..1) { > print ", " if ($i > 0); > printf("(%d)", $i/10 + 1000); > } > print ") as v(v) on (v.v = t1.a);"; patched: Merge Join (cost=824.25..1005.19 rows=10001 width=4) (actual time=13.513..24.285 rows=10001 loops=1) Merge Cond: (t1.a = "*VALUES*".column1) -> Index Scan using t1_a_idx on t1 (cost=0.29..3050.29 rows=10 width=8) (actual time=0.033..1.629 rows=2002 loops=1) -> Sort (cost=789.47..814.47 rows=10001 width=4) (actual time=12.557..14.546 rows=10001 loops=1) Sort Key: "*VALUES*".column1 Sort Method: quicksort Memory: 931kB -> Values Scan on "*VALUES*" (cost=0.00..125.01 rows=10001 width=4) (actual time=0.002..8.271 rows=10001 loops=1) Planning Time: 17.290 ms Execution Time: 26.344 ms (9 rows) master: Hash Join (cost=250.03..2168.03 rows=10001 width=4) (actual time=14.482..77.205 rows=10001 loops=1) Hash Cond: (t1.a = "*VALUES*".column1) -> Seq Scan on t1 (cost=0.00..1443.00 rows=10 width=8) (actual time=0.017..23.540 rows=10 loops=1) -> Hash (cost=125.01..125.01 rows=10001 width=4) (actual time=13.786..13.788 rows=10001 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 480kB -> Values Scan on "*VALUES*" (cost=0.00..125.01 rows=10001 width=4) (actual time=0.002..8.503 rows=10001 loops=1) Planning Time: 12.365 ms Execution Time: 78.567 ms (8 rows) regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From c6680488178bb7eeae66c92f46d475ac69102481 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 6 Jul 2017 14:46:49 +0900 Subject: [PATCH] Temporary statistics on VALUES list When using "IN (VALUES" instead of "IN (list.." to obtain better plans, sometimes the final estimate goes bad because of the lack of that for VALUES. This patch allowes VALUES to have statistics and improves the estimate in the case. --- src/backend/commands/analyze.c | 289 +-- src/backend/utils/adt/selfuncs.c | 59 ++- src/include/commands/vacuum.h| 2 + src/include/nodes/parsenodes.h | 2 + 4 files changed, 260 insertions(+), 92 deletions(-) diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 8af12b5c6b..aba06415c8 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -102,6 +102,9 @@ static int compare_rows(const void *a, const void *b); static int acquire_inherited_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, double *totalrows, double *totaldeadrows); +static HeapTuple construct_stats_tuple(VacAttrStats *stats, + Oid attrrelid, bool inh, + Relation statsrel, HeapTuple oldtup); static void update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats); static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull); @@ -278,6 +281,86 @@ analyze_rel(Oid relid, RangeVar *relation, pgstat_progress_end_command(); } +/* + * analyze_values() -- analyze one attribute in a values list then put the + * result into va
Re: How bad is using queries with thousands of values for operators IN or ANY?
At Tue, 1 Sep 2020 18:46:01 +0200, Pavel Stehule wrote in > nice :) thanks ;p regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Date Format 9999-12-31-00.00.00.000000
At Thu, 15 Oct 2020 17:59:39 -0700, Adrian Klaver wrote in > On 10/15/20 1:58 PM, Dirk Krautschick wrote: > > Hi, > > because of a migration from DB2 we have a lot of timestamps like > > -12-31-00.00.00.00 > > I'm assuming these got stored in a varchar field? It seems like an (old-style?) ISO string representation of a date-time , and seems like DB2's default output format of a timestamp. The default parser of PostgreSQL doesn't handle that format. That doesn't identify '-' as a separtor between date and time nor a dot as a time field separator. > > What would be the best way to handle this in Postgres also related > > to overhead and performance (index usage?). > > Or is > > TO_TIMESTAMP('-12-31-00.00.00.00', '-MM-DD-HH24.MI.SS.US') > > the only way? And isn't it possible to define this like NLS parameters > > in Oracle > > system wide? I'm not sure what you're going to do exactly, but timestamp is generally more efficient than string in regard to both space and speed and I think that the above conversion is the only way of that conversion. PostgreSQL doesn't have a feature like the NLS parameters. If the database won't handle timestamps in that format in the future, converting that column into timestamptz (perhaps it is currently in text or varchar) works. ALTER TABLE ALTER COLUMN TYPE timestamp USING TO_TIMESTMAP(, 'format'); Otherwise that format is needed to be translated on-the-fly. I'm not sure which is efficient in that case. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problem with ssl and psql in Postgresql 13
Hi. At Thu, 17 Dec 2020 07:21:08 +, Gustavsson Mikael wrote in > But not from psql 13: > $ /usr/pgsql-13/bin/psql -d postgres -Ukalle -hserver -W > Password: > psql: error: FATAL: no pg_hba.conf entry for host "nnn.nn.n.nnn", user > "kalle", database "postgres", SSL off > FATAL: no pg_hba.conf entry for host "nnn.nn.n.nnn", user "kalle", database > "postgres", SSL off It looks like the option "ssl" (in postgresql.conf) is turned off on the v13 server. You will find lines like the following in the server log in that case. LOG: hostssl record cannot match because SSL is disabled HINT: Set ssl = on in postgresql.conf. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problem with ssl and psql in Postgresql 13
At Wed, 23 Dec 2020 17:34:05 -0500, Tom Lane wrote in > Oh ... wait a second. I've just managed to partially reproduce this > report. Assume that we have a working Kerberos environment on both If I kninit'ed and pg_hba.conf doesn't have a hostgssenc line and has a hostssl line, I got the following error from psql. $ psql -d postgres --set=sslmode=require -Uhoriguti -hmyhost.mydomain psql: error: FATAL: no pg_hba.conf entry for host "192.168.56.101", user "horiguti", database "postgres", SSL off FATAL: no pg_hba.conf entry for host "192.168.56.101", user "horiguti", database "postgres", SSL off The state machine enables gssenc if a valid credential is available regardless of whether a valid gsshostenc definition is available or not. I rembmer that I am annoyed a bit by this behavior, but I didn't care of that since I thought that SSL and GSS are not used togehter in an environemt. > ends. Then libpq will establish a connection with GSS encryption > and continue on with the usual PG authentication exchange. If that > part fails, then this logic (fe_connect.c around line 3350): > > /* > * If gssencmode is "prefer" and we're using GSSAPI, retry > * without it. > */ > if (conn->gssenc && conn->gssencmode[0] == 'p') > { > /* postmaster expects us to drop the connection */ > conn->try_gss = false; > pqDropConnection(conn, true); > conn->status = CONNECTION_NEEDED; > goto keep_going; > } > > drops the connection and tries again without GSS. But *we already > cleared allow_ssl_try*, in pqsecure_open_gss, so the next try will > not bother with SSL either. > So this directly explains the original report: if the only applicable > pg_hba.conf entry is "hostssl", right here is where libpq will see > that error, and it will go around and fail again because the next > try uses neither GSS nor SSL. > > However, in the tests Mikael ran after backing that pg_hba.conf entry > off to just "host", pg_hba wouldn't have caused an authentication-stage > failure, so it's not so clear why we'd have looped back at this step. In the hostssl case, there are two errors emitted on the server side: [24494] FATAL: no pg_hba.conf entry for host "192.168.56.101", user "horiguti", database "postgres", SSL off [24495] FATAL: no pg_hba.conf entry for host "192.168.56.101", user "horiguti", database "postgres", SSL off However, the first line is bogus. It should be like the following. FATAL: no pg_hba.conf entry for host "192.168.56.101", user "horiguti", database "postgres", GSSENC on FATAL: no pg_hba.conf entry for host "192.168.56.101", user "horiguti", database "postgres", SSL off The attached the first patch does that. The client picks up only the second error to show from psql. If I changed the hostssl to just host, host line matches GSSENC-connection and authentication succeeds without an error. > We'd need to explain the later cases to have a full theory. It's > plenty plausible that something else caused an auth-stage failure, > but if so, why would the second GSS-free try work? Does my expample above works? As the result, psql loses the cause of the first failure and shows the bogus message. On the other hand the server doesn't know the second try is the result of the failure of the first try. If we want to show meaningful messages to users, psql should show both the two messages, or need to modify the protocol to pass around the information on the failure of the previous try. > In any case, that allow_ssl_try reset in pqsecure_open_gss is definitely > even more broken than I thought before. I think we need to rejigger the > state machine so that it doesn't do that at all, but instead skips SSL > establishment when GSS is already active via an explicit test, rather > than munging the state variables. It could make things simpler, but I'm not sure though. Anyway the states are needed since PQconnectPoll(the state machine) needs to once exit for negotiation and each of the two paths needs a re-entry point. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index 3d80930968..204630e383 100644 --- a/src/backend/libpq/auth.c +++ b/src/backend/libpq/auth.c @@ -511,6 +511,26 @@ ClientAuthentication(Port *port) } else { +#ifdef ENABLE_GS
Re: Problem with ssl and psql in Postgresql 13
At Thu, 24 Dec 2020 11:54:32 -0500, Tom Lane wrote in > I wrote: > > Kyotaro Horiguchi writes: > >> The attached the first patch does that. > > > +1, it seems like a clear oversight that the GSSENC patches didn't adjust > > these messages. The reason SSL state is mentioned is that it's relevant > > to which pg_hba entry gets chosen; and once we invented "hostgssenc" > > entries, GSSENC state is also relevant. > > Thinking a little more about that: there are not four distinct states, > because GSS and SSL can't both be enabled (ProcessStartupPacket enforces > that). So I propose that instead of adding a new field, we make the > existing field say one of three things: "GSS encryption", "SSL > encryption", or "no encryption". As per attached. In the back branches, > it might be best to spell these as "GSS encryption", "SSL on", and "SSL > off", just to minimize the cosmetic change. Looks good to me. I tried the same case where - did kinit - pg_hba has hostssl line only I saw the following lines in server log, which seems good. FATAL: no pg_hba.conf entry for host "192.168.56.101", user "horig...@mydomain.com", database "postgres", GSS encryption FATAL: no pg_hba.conf entry for host "192.168.56.101", user "horig...@mydomain.com", database "postgres", no encryption regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Right way to restore logical replication
At Mon, 08 Feb 2021 22:42:21 +0700, Игорь Выскорко wrote in > Hi, community! > Unfortunately can't find answer in docs and google. Hope only for you) > [local]:5433 postgres@postgres=# drop publication pub ; > DROP PUBLICATION > Time: 3,793 ms > > [local]:5433 postgres@postgres=# insert into tbl(d) values ('test2'); > INSERT 0 1 > Time: 9,002 ms > > [local]:5433 postgres@postgres=# create publication pub for table tbl; > CREATE PUBLICATION > Time: 6,646 ms > > result: nothing changed, same errors appears again and again. I couldn't find > how to restore replication without drop&create subscription again. If you recreated the publication before the insert, replication would continue. > Questions here: > 1. what is going under the hood here - why walsender thinks that "publication > "pub" does not exist" when it actually exists? The answer is "because the publication did not exist at the time of the INSERT". Thus the insert cannot be replicated using the new publication. It is because logical replication tries to find publications using the same snapshot with the WAL record to be sent. Although it is the designed behavior, I'm not sure that is true also for pg_publication. > 2. what is the right way to restore replication in my example? The most conservative way is just to drop the subscription then delete all rows from the subscriber table then recreate the subscription. This allows the newly created publication to work. Also you can drop the subscription, then manually fix the subscriber table to sync with the publisher table, then create a new subscription using WITH (copy_data = false); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Postgres Wal Full
At Thu, 3 Mar 2022 07:46:45 +, pgdba pgdba wrote in > I have a problem from pg_wal. I am using postgresql version 11 and taking > backup and writing archive_command in postgresql.conf but did not archive wal > and my disk ise full from pg_wal. I research why is my wal is full and dont > found anything could you help me please ? The immediate cause of the WAL retension is the archive failure. PostgreSQL, while archiving is enabled, won't remove a WAL file until it is successfully archived. You will see the lines like the following in sergver log if archive failure happened. > 2022-03-03 17:15:21.314 JST LOG: archive command failed with exit code 1 If the archive command emitted some error messages, it can be caught in the server log together. The follwoing error is emitted by the archive_command "cp %p /archive/%f". > cp: cannot create regular file '/archive/00010001': No such > file or directory regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: PSQL bug?
At Thu, 17 Mar 2022 11:38:30 +0100, Dominique Devienne wrote in > Made a typo, using } instead of ) to test a weird table name, and got > disconnected. Normal? --DD > > ddevienne=> create table "t |" ( id int}; > ddevienne(> ); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Succeeded. The last line "Attempting reset: Succeeded." means that the connection is once disconnected before the second input line then reconnected by the input. Specifically I can reproduce the same result by the following steps. > ddevienne=> create table "t |" ( id int}; $ pg_ctl restart > ddevienne(> ); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Succeeded. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_create_logical_replication_slot in DB1 is blocked by a session in DB2
At Wed, 25 May 2022 11:01:43 -0400, Fred Habash wrote in > I'm running this command while connected to pg cluster DB1: > > SELECT * FROM pg_create_logical_replication_slot('test_slot_99', > 'test_decoding'); > > When I examine pg_locks, I see the session is waiting on virtualxid and > blocked and blocking sessions are on two different DBs. > > After doing some research, it looks like locks across DB can happen in > postgres if the session queries rely on 'shared tables'. Not sure if this > applies here. > > How can this be explained? The "blocked_pid" you showed is of PID=14305 but the reportedly "blocked" session is of PID=13405. So the 8602 doesn't seem to be involved the "trouble". You might need to reinspect the situation. > This is the session issuing the create slot command > > datid|datname |pid |leader_pid|usesysid|usename > |application_name|client_addr |client_hostname|client_port|backend_start > |xact_start |query_start|state_change > |wait_event_type|wait_event|state |backend_xid|backend_xmin|query > > > > |backend_type | > -|-|-|--||-|||---|---|---|---|---|---|---|--|--|---||---|--| > 16408|db1 |13405| | 16394|test99 | > |xx.xxx.xxx.x| | 53398|2022-05-25 09:12:41|2022-05-25 > 09:12:42|2022-05-25 09:12:42|2022-05-25 09:12:42|Lock > |virtualxid|active| |171577399 |BEGIN;declare > "SQL_CUR0x14680c0bace0" cursor with hold for SELECT lsn FROM > pg_create_logical_replication_slot('qitx6iolfhy5zfkl_00016408_66eb6ba3_1fe1_4ccd_95ed_fd3d2d5d4ad8', > 'test_decoding');fetch 1 in "SQL_CUR0x14680c0bace0"|client backend| > > > Session above is blocked by pid 8602 > > blocked_pid|blocked_user|blocking_pid|blocking_user |blocked_statement > > > > |current_statement_in_blocking_process > > > | > ---|||---|---|---| > 14305|pq_devops |8602|service_con|BEGIN;declare > "SQL_CUR0x1464680d6a60" cursor with hold for SELECT lsn FROM > pg_create_logical_replication_slot('4iipu5a2hnuyfp3u_00016408_036cac77_3854_4320_b329_e7209b4cccf9', > 'test_decoding');fetch 1 in "SQL_CUR0x1464680d6a60"|¶ SELECT ** > | > > > The blocked and blocking sessions are on two different DBs > > datid|datname |pid |leader_pid|usesysid|usename|application_name > |cl > -----|-----|-|--||---|--|-- > 16408|db1 |13405| | 16394|test99 | > |10 > 16407|db2 | 8602| |29429933|service_con_9 |PostgreSQL JDBC > Driver|10 > > Thank you regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SQL state: 42601. Execption handling.
At Tue, 07 Jun 2022 10:36:52 -0400, Tom Lane wrote in > =?UTF-8?B?0KY=?= writes: > > -- 2 -- > > do $$ > > begin > >select p.result from dbo.func() p into v_result; > > exception when SQLSTATE '42601' then > > raise '42601' ; > >when others then > > raise 'others' ; > > end ; > > $$ > > > > --Exception is not handled. This is the question. > > ERROR: "v_result" is not a known variable > > LINE 3 :select p.result from dbo.func() p into v_result; > > This error is thrown by the plpgsql parser, so you can't trap it > with "exception", any more than you could trap other syntax errors > detected by the parser. (As an extreme example, you could certainly > not trap it if you misspelled "exception".) FWIW, you can see the difference as the following difference. =# CREATE OR REPLACE FUNCTION f1() RETURNS void AS $$ declare v_result int; begin select p.result from dbo.func(d) p into v_result; end ; $$ LANGUAGE plpgsql; CREATE FUNCTION (Succeeds. That is, plpgsql parser doesn't catch it.) =# select f1(); ERROR: column "d" does not exist (Caught by SQL parser executed at runtime) =# CREATE OR REPLACE FUNCTION f2() RETURNS void AS $$ begin select p.result from dbo.func() p into v_result; end ; $$ LANGUAGE plpgsql; ERROR: "v_result" is not a known variable LINE 3:select p.result from dbo.func() p into v_result; (Fails, as plpgsql parser caught it.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: FATAL: could not receive timeline history file from the primary server: ERROR: could not open file "pg_wal/0000000x.history": No such file or directory
At Wed, 8 Jun 2022 16:43:55 +, pgdba pgdba wrote in > hello everyone, > > I want to install a two-node streaming replication with postgresql version > 11, but I'm getting this error when starting the service of my slave node. > Can you support me? I couldn't find the solution. > > FATAL: could not receive timeline history file from the primary server: > ERROR: could not open file "pg_wal/0002.history": No such file or > directory It seems like that the stanby is on TLI=1 and the primary is on 2 or greater (call it n). In that case, at replication start, the standby requests to the primary every history (000n.history) files from TLI=2 to TLI=n if the standby doesn't have. Thus, the message means 002.history has been somehow removed in pg_wal of the primary, but I'm not sure how come it happened. If you did remove some history files on primary, do not remove them for between the TLIs of primary and standby. > slave node has recovery.conf file and include primary_con_info informations. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: message log merge (streaming replication)
At Thu, 9 Jun 2022 16:26:24 +0900, Ian Lawrence Barwick wrote in > 2022年6月9日(木) 14:32 Peter Adlersburg : > > > > Dear fellow DBA's, > > > > > > While troubleshooting one of our production replication clusters (phys. > > streaming replication using the patroni framework) > > I stumbled over a - at least for me - strange phenomenon in the postgres > > logs of the two cluster members: > > > > *** node-01 *** > > > > [postgres@db-node-01 main]$ grep 'LOG: database' postgresql-2022-06-05.log > > time=2022-06-05 18:25:26 CEST, pid=1720 LOG: database system is shut down > > time=2022-06-05 18:25:29 CEST, pid=3252374 LOG: database system was shut > > down at 2022-06-05 18:25:23 CEST > > time=2022-06-05 18:25:31 CEST, pid=3252371 LOG: database system is ready > > to accept read only connections > > time=2022-06-05 18:29:11 CEST, pid=3252371 LOG: database system is ready > > to accept connections > > time=2022-06-05 18:32:01 CEST, pid=1816 LOG: database system was > > interrupted while in recovery at log time 2022-06-05 18:29:11 CEST > > time=2022-06-05 18:32:03 CEST, pid=1813 LOG: database system is ready to > > accept read only connections > > time=2022-06-05 19:00:26 CEST, pid=1813 LOG: database system is ready to > > accept connections > > > > *** node-02 *** > > > > [postgres@db-node-02 main]$ grep 'LOG: database' postgresql-2022-06-05.log > > time=2022-06-05 18:25:26 CEST, pid=1720 LOG: database system is shut down > > time=2022-06-05 18:25:29 CEST, pid=3252374 LOG: database system was shut > > down at 2022-06-05 18:25:23 CEST > > time=2022-06-05 18:25:31 CEST, pid=3252371 LOG: database system is ready > > to accept read only connections > > time=2022-06-05 18:29:11 CEST, pid=3252371 LOG: database system is ready > > to accept connections > > time=2022-06-05 18:32:01 CEST, pid=1816 LOG: database system was > > interrupted while in recovery at log time 2022-06-05 18:29:11 CEST > > time=2022-06-05 18:32:03 CEST, pid=1813 LOG: database system is ready to > > accept read only connections > > time=2022-06-05 19:00:26 CEST, pid=1813 LOG: database system is ready to > > accept connections > > > > The output is by no means complete - I only kept the duplicate entries. > > > > My question: > > > > How is it possible that the error logs are 'merged' across the two database > > nodes? > > That's impossible to determine on the basis of the available information. > > How is logging set up? What kind of environments are the nodes running in? > Is it possible they are able to write to a shared disk of some sort? > > > Are the message/error-logs also replicated? > > No. > > > Is this the intended behaviour? > > It's not typical behaviour, but I suppose it's conceivable someone designed > such a setup for some reason. If the node-02 is a replica of the node-01 or vise-versa, and the log file is in $PGDATA, it's possible that pg_basebackup (or just cp) copies in the server log file to the replica. In that case, the first half of the log file of the day of taking the backup is taken over from the primary and the last half differs. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Any way to understand state after data corruption failures during startup.
At Thu, 16 Jun 2022 19:46:25 -0700, Harinath Kanchu wrote in > > Hello, > > If postgres fail when it is starting as standby due to WAL file corruption, > can we get the status through any exit code or by parsing pg_control file ? > > I can see the “XX001” code is emitted in logs but parsing logs doesn’t seem > to be a good option. Please suggest. I'm not sure exactly what is what you want to achieve here, but there is a negative evidence for the reliability of detecting WAL corruption failure without searching server logs. [64923:startup] [XX000] 2022-06-17 13:30:52.493 JST PANIC: incorrect resource manager data checksum in record at 0/360 At least in this case, there's no trace of a server-stop due to WAL corruption left behind other than in server log. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: help for pg_wal issue
At Mon, 27 Jun 2022 12:28:18 +0200, Laurenz Albe wrote in > You forgot to tell us how exactly you are performing that backup. Yeah, but Google told me that Veritas may complain with that message. Anywhat the tools is, it seems like that the tool could not continue after it found that a file had been deleted before copying it. That means the server is live. If you take a copy of files from a live server, the backup tool should allow some files are removed or added while copying. If you are taking a base backup (aka online backup), you need to take several subsidiary steps. https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP If you still see the point, feel free to ask further here but with showing exactly how you are peforming that backup:p regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: help for pg_wal issue
At Tue, 28 Jun 2022 16:28:31 +0900 (JST), Kyotaro Horiguchi wrote in > At Mon, 27 Jun 2022 12:28:18 +0200, Laurenz Albe > wrote in > > You forgot to tell us how exactly you are performing that backup. > > Yeah, but Google told me that Veritas may complain with that > message. Anywhat the tools is, it seems like that the tool could not > continue after it found that a file had been deleted before copying > it. That means the server is live. If you take a copy of files from a > live server, the backup tool should allow some files are removed or > added while copying. > > If you are taking a base backup (aka online backup), you need to take > several subsidiary steps. > > https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP > - If you still see the point, feel free to ask further here but with Mmm.. My fingers are fat and eyes are slipperly. + If you don't still see the point, feel free to ask further here but with > showing exactly how you are peforming that backup:p > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center -- Kyotaro Horiguchi NTT Open Source Software Center
Re: General Inquiry
At Tue, 5 Jul 2022 08:49:40 +, "Cloete, F. (Francois)" wrote in > Good morning, > Can you please confirm if I need to contact the postgresql community fro some > assistance can I use this e-mail address or another e-mail ? pgsql-general is the right place for that. Welcome. > We keep on seeing the below entries in our postgresql.log file getting > written what seems to be every millisecond. > UTF [32640]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=[local] L Unfortunately the screen capture is missing the most significant part. But I guess that they are "LOG: connection received". If this is correct, log_connection is set to on on your server, which makes server record every incoming connection like this. The frequent log lines mean that someone on the local host is frequently (trying) making connections to the server. If they are unexpected incoming connections, one possibility is that something like health-check or keep alive thing is having a wrong interval setting. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Unable to start replica after failover
At Fri, 29 Jul 2022 15:01:44 +, "Lahnov, Igor" wrote in > * "could not find previous WAL record at E6F/C2436F50: invalid > resource manager ID 139 at E6F/C2436F50"; or > * "could not find previous WAL record at 54E/FB348118: unexpected > pageaddr 54E/7B34A000 in log segment 0005054E00FB, offset > 3448832". .. > To avoid the problem, we decided to stop using restore_command. Could you > please let us know if there is a better solution to the problem we've > described? Perhaps almost none of us don't see what is happning there, since you didn't give us sufficient information on the configuration and exact steps. But roughly it looks like shuffling/mixing of WAL files among several systems (or WAL archives) with different histories. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Unable to start replica after failover
> What additional information is needed? Usually server logs and the output of pg_rewind at the trouble time are needed as the first step. > Next, pg_rewind returns errors while reading the log from the backup > back, looking for the last checkpoint, which is quite reasonable > because, once a new leader starts, the point of divergence normally > ends up in the next timeline and the previous timeline's backup log > does not have a block with the LSN of the divergence. That sounds like pg_rewind is a crap. pg_rewind reads timeline history files from the both servers to find the last timeline up to where the two servers share the same history then determine the divergence point at the latest LSN where the two servers are known to share. Then it overwrites the pages modified since the common checkpoint until the last (shutdown) checkpoint on the previous leader that are modified in the *previous* timeline on the former leader by the data of the same pages *on the new leader*. No need for page data from the older timeline. If nothing's going wrong, pg_rewind is not expected to face the situation of: > could not find previous WAL record at E6F/C2436F50: invalid resource manager > ID 139 at E6F/C2436F50 > could not find previous WAL record at 54E/FB348118: unexpected pageaddr > 54E/7B34A000 in log segment 0005054E00FB, Which means the WAL files are somehow broken. > When pg_rewind is run, it also uses the log from the backup (the > lagging log from the new leader) instead of the partial log with > which the former leader has already been started. I don't see how come the former leader doesn't have access to the partial log (or the latest WAL file, I suppose)? It is essential for pg_rewind to work (since it exists nowhere other than there) and it must be in pg_wal directory unless someone removed it. Thus, I think we need the exact steps you and your system took after the failover happened about postgresql. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: recovery_command has precedence over phisical slots?
At Fri, 19 Aug 2022 18:37:53 +0200, Laurenz Albe wrote in > On Fri, 2022-08-19 at 16:54 +0200, Giovanni Biscontini wrote: > > Hello everyone, > > I'm experiencing a behaviour I don't really understand if is a > > misconfiguration or a wanted behaviour: > > 1) I set up a primary server (a.k.a. db1) with and archive_command to a > > storage > > 2) I set up a replica (a.k.a. db2) that created a slot named as slot_2 and > > that has the recovery_command set to read archived wal on the storage. > > If I shutdown replica db2 during a pgbench I see the safe_wal_size queried > > from pg_replication_slots on the primary decrease to a certain amount but > > still in the max_slot_wal_kepp_size window: even > > if I restart the replica db2 before the slot_state changes to unreserved or > > lost I see that the replica gets needed wals from the storage using > > recovery_command but doesn't use slot on primary. > > Only if I comment the recovery command on the .conf of the replica then it > > uses slot. > > If this is a wanted behaviour I can't understand the need of slots on > > primary. > > This is normal behavior and is no problem. > > After the standby has caught up using "restore_command", it will connection to > the primary as defined in "primary_conninfo" and stream WAL from there. The reason that db2 ran recovery beyond the slot LSN is the db2's restore_command (I guess) points to db1's archive. If db2 had its own archive directory or no archive (that is, restore_command is empty), archive recovery stops at (approximately) the slot LSN and replication will start from there (from the beginning of the segment, to be exact). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Unable to archive logs in standby server
At Tue, 30 Aug 2022 05:22:56 +, Meera Nair wrote in > Hi Guillaume/team, > > I set archive_mode = always in master and standby. > Archival to standby WAL directory completed when > > * standby server was restarted > * pg_stop_backup was executed in master > > But archival hangs when pg_stop_backup is executed in standby. > Could someone help to get this working? https://www.postgresql.org/docs/12/continuous-archiving.html > In the same connection as before, issue the command: > > SELECT * FROM pg_stop_backup(false, true); > > This terminates backup mode. On a primary, it also performs an > automatic switch to the next WAL segment. On a standby, it is not > possible to automatically switch WAL segments, so you may wish to run > pg_switch_wal on the primary to perform a manual switch. The reason > for the switch is to arrange for the last WAL segment file written > during the backup interval to be ready to archive. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Streaming wal from primiry terminating
At Wed, 28 Sep 2022 08:50:12 +, "Lahnov, Igor" wrote in > Hi, > After failover all stand by nodes could not start streaming wal recovery. > Streaming recovery start from 1473/A500, but standby start at > 1473/A5FFEE08, this seems to be the problem. It's not a problem at all. It is quite normal for standby to start streaming from the beginning of a WAL segment. > What can we do in this case to restore? > Is it possible to shift wal streaming recovery point on primary? > Can checkpoint on primary help in this situation? > 2022-09-26 14:08:23.672 [3747868] LOG: started streaming WAL from > primary at 1473/A500 on timeline 18 > 2022-09-26 14:08:24.363 [3747796] LOG: invalid record length at > 1473/A5FFEE08: wanted 24, got 0 > 2022-09-26 14:08:24.366 [3747868] FATAL: terminating walreceiver > process due to administrator command This seems to mean someone emtpied primary_conninfo. > 2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at > 1473/A5FFEE08: wanted 24, got 0 > 2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at > 1473/A5FFEE08: wanted 24, got 0 I don't fully understand the situation. A situation that leads the this state I can come up with is that somehow the standby restored an incomplete WAL segment from the primary. For example, in a case wheresomeone copied the current active WAL file from pg_wal to archive on the primary, or a case where restore_command on the standby fetches WAL files from pg_wal on the primary instead of its archive. Both are not normal operations. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: AW: Reducing bandwidth usage of database replication
At Fri, 4 Nov 2022 09:25:44 +, Sascha Zenglein wrote in > I have set the status interval to 600s and it still sends a status message > every 30s. > I also tried setting every available option higher, with no success: I guess you are not setting wal_sender_timeout on the primary side. Status messages are also sent in response to sender pings that is controlled by that variable. Wal sender sends a ping after a half of that setting since the last status message to request walreceiver to send a response. > checkpoint_warning = 93s > max_standby_streaming_delay = 130s > max_standby_archive_delay = 45s > wal_receiver_status_interval = 600s > wal_receiver_timeout = 1200s > wal_receiver_timeout = 65s > recovery_min_apply_delay = 600s > > The random values were to see which setting is limiting if I got above the > 30s limit. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: How to check stream replication latest history status
At Mon, 14 Nov 2022 00:09:36 +0800, 徐志宇徐 wrote in > If the primary and standby have long replay_lag. I don't want to > standby server promote. > > Is there any view or function which store the latest stream > replication gap(replay_lag) between primary and standby ? > Or it there any workaround to know this stream replication > gap(replay_lag) infomation ? Thanks for your help. replay_lag vanishes when the steram is gone. But commit timestamps might work for you. You can see when the last replayed transaction on the standby was commited on the primary. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Segmentation fault on RelationGetDescr in my first extension
At Thu, 29 Dec 2022 13:52:18 +0300, Дмитрий Цветков wrote in > I'm trying to write my first extension and open a table in it. > I use check_password_hook and my function executes at the moment of > changing user password. > > But if I try to open a table inside this function, I get Segmentation fault > on the line with "RelationGetDescr". > What am I doing wrong? table_open() should error out when it is going to return an invalid Relation*, and RelationGetDescr(rel) is "rel->rd_att". So the RelationGetDescr() cannot get a segv even if the table does not exist. # I modified contrib/passwordcheck that way and saw RelationGetDescr() # doesn't get segv in the shown context. I guess the segv comes from another place, or something else has broken memory until there. but anyway more information is needed for people to diagnose your situation more closely. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: How could I elog the tupleTableSlot to the fronted terminal?
At Wed, 1 Feb 2023 00:12:44 +0800, "jack...@gmail.com" wrote in > I use the debugtup to print, and I find out there are "printf", it doesn't > print anything > to the terminal. I need to know how to use this debugtup func. I think I use > it as a mistake It's not clear to me what the terminal means, but can you find it in server log instead? reagrds. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Logical replication fails when adding multiple replicas
At Wed, 22 Mar 2023 09:25:37 +, Will Roper wrote in > Thanks for the response Hou, > > I've had a look and when the tablesync workers are spinning up there are > some errors of the form: > > "2023-03-17 18:37:06.900 UTC [4071] LOG: logical replication table > synchronization worker for subscription > ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has > started" > "2023-03-17 18:37:06.976 UTC [4071] ERROR: could not create replication > slot ""pg_37986_sync_37922_7210774007126708177"": ERROR: replication slot > ""pg_37986_sync_37922_7210774007126708177"" already exists" The slot name format is "pg__sync__". It's no surprise this happens if the subscribers come from the same backup. If that's true, the simplest workaround would be to recreate the subscription multiple times, using a different number of repetitions for each subscriber so that the subscribers have subscriptions with different OIDs. I believe it's not prohitibed for subscribers to have the same system identifer, but the slot name generation logic for tablesync doesn't account for cases like this. We might need some server-wide value that's unique among subscribers and stable while table sync is running. I can't think of a better place than pg_subscription but I don't like it because it's not really necessary most of the the subscription's life. Do you think using the postmaster's startup time would work for this purpose? I'm assuming that the slot name doesn't need to persist across server restarts, but I'm not sure that's really true. diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 07eea504ba..a5b4f7cf7c 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -1214,7 +1214,7 @@ ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, Size szslot) { snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT, suboid, -relid, GetSystemIdentifier()); +relid, PgStartTime); } /* regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: terminating walsender process due to replication timeout
Hello. At Wed, 15 May 2019 10:04:12 +0300, ayaho...@ibagroup.eu wrote in > Hello, > Thank You for the response. > > Yes that's possible to monitor replication delay. But my questions were > not about monitoring network issues. > > I use exactly wal_sender_timeout=1s because it allows to detect > replication problems quickly. Though I don't have an exact idea of your configuration, it seems to me that your standby is simply getting behind more than one second from the master. If you regard the fact as a problem of replication, the configuration can be said to be finding the problem correctly. Since the keep-alive packet is sent in-band, it doesn't get to the standby before already-sent-but-not-processed packets. > So, I need clarification to the following questions: > Is it possible to use exactly this configuration and be sure that it will > be work properly. > What did I do wrong? Should I correct my configuration somehow? > Is this the same issue as mentioned here: > https://www.postgresql.org/message-id/e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com > > ? If it is so, why I do I face this problem again? It is not the same "problem". What was mentioned there is fast network making the sender-side loop busy, which prevents keepalive packet from sending. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: terminating walsender process due to replication timeout
Hello. At Fri, 17 May 2019 11:04:58 +0300, ayaho...@ibagroup.eu wrote in > Can frequent database operations cause getting a standby server behind? Is > there a way to avoid this situation? > I checked that walsender works well in my test if I set > wal_sender_timeout at least to 5 second. It depends on the transaction (WAL) traffic and the bandwidth of your network. The performacne difference between master and standby also affects. The possibilities I can guess are: - The bandwidth is narrow to the traffic. - The standby performs poorer than the master. - Your network is having a sort of trouble. Virtual network (local network in a virtual environment) tends to suffer network troubles caused by CPU saturation or something else. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: PostgreSQL 13 - Logical Replication - ERROR: could not receive data from WAL stream: SSL SYSCALL error: EOF detected
At Thu, 18 May 2023 21:00:08 +0100, FOUTE K. Jaurès wrote in > Hello everyone, > > I have a replication between PostgreSQL 12 to 13 in the production system. > Using Ubuntu 18.04 LTS > We have this error today. > > > > *2023-05-18 18:34:04.374 WAT [117322] ERROR: could not receive data from > WAL stream: SSL SYSCALL error: EOF detected > 2023-05-18 18:34:04.381 WAT [118393] LOG: logical replication apply worker > for subscription "inov_transactionnal_table_bertoua_sub_00" has started > 2023-05-18 18:34:04.423 WAT [1039] LOG: background worker "logical > replication worker" (PID 117322) exited with exit code 1 > 2023-05-18 18:47:51.485 WAT [66836] postgres@inov_henrietfreres_v71_00 LOG: > could not receive data from client: Connection timed out.* > > *Any idea how to solve this?* According to the message, the SSL-encrypted replication connection got disconnected unexpectedly. I suppose it is due to the death of the upstream server or some reasons outside of PostgreSQL. It seems like the issue had been persisting for a few minutes after that. Other than the server's death, I doubt some network hardware problems or changes of firewall or networking setup of the OS. I think it would be good idea to check for them first. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problematic enforcement of "ERROR: functions in index predicate must be marked IMMUTABLE"
At Sun, 9 Jul 2023 14:22:37 +, Avi Weinberg wrote in > Hi, > > If you attempt to create an index based on function that is not IMMUTABLE you > will get an exception "ERROR: functions in index predicate must be marked > IMMUTABLE". However, if you created the index when the function was > IMMUTABLE, but later on you updated the function and mistakenly removed the > IMMUTABLE key, you will not get any error to alert you that there is an index > based on this function and it should remain IMMUTABLE. > > I suggest triggering error message also when updating a function that is used > by index if it is no longer IMMUTABLE There's no way to truly verify a function is really immutable or not. So, as mentioned in the documentation, the function volatility categories are essentially a promise to the optimizer regarding the function's behavior. Even given this, premising users keeping the volatility marks in line with the actual behavior of their corresponding functions, it might be benetifical to prohibit changes to the volatility category while it's being used for indices. In the first place, that protecting indices from entering an inconsistent state, at least on the surface. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problematic enforcement of "ERROR: functions in index predicate must be marked IMMUTABLE"
At Tue, 11 Jul 2023 10:14:29 +0900 (JST), Kyotaro Horiguchi wrote in > At Sun, 9 Jul 2023 14:22:37 +, Avi Weinberg wrote in > > Hi, > > > > If you attempt to create an index based on function that is not IMMUTABLE > > you will get an exception "ERROR: functions in index predicate must be > > marked IMMUTABLE". However, if you created the index when the function was > > IMMUTABLE, but later on you updated the function and mistakenly removed the > > IMMUTABLE key, you will not get any error to alert you that there is an > > index based on this function and it should remain IMMUTABLE. > > > > I suggest triggering error message also when updating a function that is > > used by index if it is no longer IMMUTABLE > > There's no way to truly verify a function is really immutable or > not. So, as mentioned in the documentation, the function volatility > categories are essentially a promise to the optimizer regarding the > function's behavior. > > Even given this, premising users keeping the volatility marks in line > with the actual behavior of their corresponding functions, it might be > benetifical to prohibit changes to the volatility category while it's > being used for indices. In the first place, that protecting indices > from entering an inconsistent state, at least on the surface. Even after doing that, any functions used indeirectly are still allowed to be or to become non-immutable. Therefore, we might want to reject this because we can't execut it perfectly. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problematic enforcement of "ERROR: functions in index predicate must be marked IMMUTABLE"
At Mon, 10 Jul 2023 21:28:12 -0400, Tom Lane wrote in > In the end, adding such restrictions would just give a false sense > of security, because there would always be gaps in whatever we did. > As you quote from the documentation, volatility markings are a promise > by the user to the system, not vice versa. If you break your promise, > you get to keep both pieces of whatever trouble ensues. I agree to you, as I mentioned a-bit-too-late message.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Bogus temp file reporting?
At Thu, 27 Jul 2023 10:30:41 +0300, kg.postgre...@olympiakos.com wrote in > Hello, > > Is there a known issue with temp file reporting? I have a > production db that is reporting 231TB of temp space usage, which > can’t be true. temp_blks_written in pg_stat_statements sort of > confirms this, as the reported queries show just a few MB of temp > space usage. I suspect a reboot will clear this (to do) and wanted > to check with you to see if someone else has had similar > experiences. Just to clarify, "temprary files" and "temp blocks" refer to different entities in this context. The "temprary files" are ephemeral storage that is created and alive only for tasks like tuple sorting, whereas "temp blocks" are referring to the storage for temprary tables. If you're performing a massive insert into a temporary table, you'll find that pg_stat_database.temp_bytes doesn't increase at all. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Understanding pg_stat_io.evictions
At Fri, 28 Jul 2023 10:36:46 +, "Daniel Westermann (DWE)" wrote in > postgres=# insert into t select i, md5(i::text), md5(i::text), md5(i::text) > from generate_series(1,100) i; .. > client backend | 207 | normal > client backend | 0 | vacuum > > Shouldn't these evictions show up under context blkwrite? The description in > docs is: No, that's not the case because INSERT doesn't execute a bulk write. It is is only performed during COPY FROM, and the write side of some other comands, such as CREATE AS (and some internal table-copy operations.). > As far as I understand this, a ring buffer is used in this case. Do I miss > something? Maybe you're confusiong it with bulk-read cases? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: How does pg parse 'select '(1,2)''
At Sat, 5 Aug 2023 21:46:33 +0800, jacktby jacktby wrote in > I’m trying to add new grammar for my application. So I’m do research on > gram.y. > I think it will make the '(1,2)' as " '(' a_expr ')' "; but I can’t find out > something like > " a_expr ',' a_expr “, can you give me the details? If you are trying literally "SELECT (1, 2)", I think that the parser comes to the terminating a_expr through the following path. simple_select: SELECT .. opt_target_list ... opt_target_list: target_list target_list: target_el | target list '.' target_el target_el: a_expr a_expr: c_expr c_expr: implicit_row implicit_row: '(' expr_list ',' a_expr ')' expr_list: a_expr | expr_list ',' a_expr regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Could not read from file "pg_subtrans/00F5" at offset 122880: Success.
At Mon, 27 Nov 2023 19:58:13 +0100, Laurenz Albe wrote in > On Mon, 2023-11-27 at 11:50 +, Sri Mrudula Attili wrote: > > ERROR: could not access status of transaction 16087052 > > DETAIL: Could not read from file "pg_subtrans/00F5" at offset 122880: > > Success. By the way, just out of curiosity, but errno should not be zero at the time the message above was output, yet "%m" is showing "success", which implies errno = 0 in Linux. How can that happen? slru.c:961 > case SLRU_READ_FAILED: > if (errno) > ereport(ERROR, > (errcode_for_file_access(), > errmsg("could not access status of transaction %u", xid), > errdetail("Could not read from file \"%s\" at offset %d: %m.", regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Could not read from file "pg_subtrans/00F5" at offset 122880: Success.
At Wed, 29 Nov 2023 18:29:15 +0100, Alvaro Herrera wrote in > The code in master is completely different (it uses pg_pread rather than > seek + read): it does test for errno and reports accordingly. > > So, nothing to do here. Oops! Thank you and sorry for the noise. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: walreceiver fails on asynchronous replica [SEC=UNOFFICIAL]
At Fri, 23 Feb 2024 04:04:03 +, Mark Schloss wrote in > <2024-02-23 07:50:05.637 AEDT [1957121]: [1-1] user=,db= > LOG: started > streaming WAL from primary at 6/B000 on timeline 5 > <2024-02-23 07:50:05.696 AEDT [1957117]: [6-1] user=,db= > LOG: invalid > magic number in log segment 0005000600B0, offset 0 This appears to suggest that the WAL file that the standby fetched was zero-filled on the primary side, which cannot happen by a normal operation. A preallocated WAL segment can be zero-filled but it cannot be replicated under normal operations. > <2024-02-22 14:20:23.383 AEDT [565231]: [6-1] user=,db= > FATAL: terminating > walreceiver process due to administrator command This may suggest a config reload with some parameter changes. One possible scenario matching the log lines could be that someone switched primary_conninfo to a newly-restored primary. However, if the new primary had older data than the previously connected primary, possibly leading to the situation where the segment 0..5..6..B0 on it was a preallocated one that was filled with zeros, the standby could end up fetching the zero-filled WAL segment (file) and might fail this way. If this is the case, such operations should be avoided. Unfortunately, due to the lack of a reproducer or detailed operations that took place there, the best I can do now is to guess a possible scenario as described above. I'm not sure how come the situation actually arose. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: walreceiver fails on asynchronous replica [EXTERNAL] [SEC=UNOFFICIAL]
At Wed, 28 Feb 2024 03:23:37 +, Mark Schloss wrote in > Thank you for your reply. I can confirm there were no changes made to the > config of the replica. Okay. > Is there any significance in the parameters in the commit record - > 'inval msgs: catcache 21; sync'. I think it is not relevant. > - The walreceiver on the barman server did not fail but the WAL file does not > contain the commit transaction I don't have detailed knowledge of barman's behavior, but it seems to be quite normal for barman to write out only on receiving a commit record. What I don't understand here is how those WAL files on the barman server are related to the failed replicas. >From the information you provided, I guess that the replicas somehow obtain the currently-written WAL file from the barman server at a certain time through a route unknown to me, but you seem to know that. I think that relationship has not been explained here. Could you explain the routes and timings that WAL files are copied between the servers? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: walreceiver fails on asynchronous replica [EXTERNAL] [SEC=UNOFFICIAL]
Hi, Mark. At Thu, 29 Feb 2024 04:39:49 +, Mark Schloss wrote in > Thanks for looking at this. I think I complicated things by > including barman. I was just wanting to point out each primary > streams to two locations - the walreceiver on the replica and the > walreciver used by barman. We think the reason the barman > WAL-receiver didn't fail is because there is no apply of the WAL in > Barman but the Standby is applying and it's WAL-receiver got > terminated, so the barman server can be taken out of this picture > completely, they were just there as a by-product in trying to > determine the effect. We are only interested in the killing of the > standby wal-receiver and that the pg_waldump showed the failing lsn > was a commit. It seems that an issue raised in the -hackers thread [1] might be the same issue as yours. The discussion might be a help for you, although it's not clear what is happening yet. [1] https://www.postgresql.org/message-id/CAFh8B%3DmozC%2Be1wGJq0H%3D0O65goZju%2B6ab5AU7DEWCSUA2OtwDg%40mail.gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: PITR for an only object in postgres
At Thu, 18 Mar 2021 17:12:49 -0400, aslı cokay wrote in > Hi all, > > Is there any way to get for example a table version of 5 minutes ago, or we > should PITR to 5 minutes ago, dump the table and restore with this dump ? If you want to revert the recent changes on the table, I think there's no way other than time-travelling using PITR. Of couse you can't do that only for a part of database. If you often want to access 5-minutes ago image of the database, you could configure a delayed standby using recovery_min_apply_delay. I'm not sure PostgreSQL has the same feature for logical replication, but perhaps we don't have that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: "invalid contrecord" error on replica
At Sun, 2 May 2021 22:43:44 +0200, Adrien Nayrat wrote in > I also dumped 000100AA00A1 on the secondary and it > contains all the records until AA/A1004018. > > It is really weird, I don't understand how the secondary can miss the > last 2 records of A0? It seems he did not received the > CHECKPOINT_SHUTDOWN record? > > Any idea? This seems like stepping on the same issue with [1], in short, the secondary having received an incomplete record but the primary forgot of the record after restart. Specifically, primary was writing a WAL record that starts at A0FFFB70 and continues to A1xx segment. The secondary successfully received the first half of the record but the primary failed to write (then send) the last half of the record due to disk full. At this time it seems that the primary's last completed record ended at A0FFB70. Then the CHECKPOINT_SHUTDOWN record overwrote the already-halfly-sent record up to A0FFBE8 while restarting. On the secondary side, there's only the first half of the record, which had been forgotten by the primary and the last half starting at LSN A100 was still the future in the new history on the primary. After some time the primary reaches A100 but the first record in the segment is of course disagrees with the history of the secondary. 1: https://www.postgresql.org/message-id/CBDDFA01-6E40-46BB-9F98-9340F4379505%40amazon.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: some questions regarding replication issues and timeline/history files
At Tue, 15 Jun 2021 07:05:07 -0700 (MST), "email2ssk...@gmail.com" wrote in > Even I have this problem when I had to recover the database failed > switchover. > This is error is new primary server. > > < 2021-06-15 16:05:02.480 CEST > ERROR: requested starting point > AF/7D00 on timeline 1 is not in this server's history > < 2021-06-15 16:05:02.480 CEST > DETAIL: This server's history forked from > timeline 1 at AF/7C0F8D58. Your old primary looks like having continued running beyond 7D00 after the old standby promoted at 7C0F8D58. In short, the new standby experienced a diverged history from the new primary. You can use pg_rewind to adust the new standby sever in that case. FYI, you can reproduce the error by the folowing steps. 1. create a primary (A) 2. create a standby (B) connecting to A. 3. promote B. 4. connecting to A and run the following commands. =# select pg_switch_wal(); checkpoint; 5. stop A, then add primary_conninfo connecting to B to the conf file of A,then create the standby.signal file in the data directory of A. 6. You will get the similar error. To recover from the sitaution, run pg_rewind like the follows, for example. $ pg_rewind --target_pgdata= --target-server='connstr to B' pg_rewind: servers diverged at WAL location 0/360 on timeline 1 pg_rewind: rewinding from last common checkpoint at 0/260 on timeline 1 regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: wal_keep_segments and max_wal_size
At Mon, 5 Jul 2021 16:45:09 +0530, Atul Kumar wrote in > hi, > > (on postgres 9.6) I was just practicing to make my no. of wal files in > pg_xlog directory to be the same/equal as I pass on wal_keep_segments > paramater. > > say for example > show wal_keep_segments; > wal_keep_segments > --- > 125 This setting ensures to preserve this number of WAL files before the current WAL location (shown by pg_current_wal_lsn()). This means the WAL record is written to the roughly 126th live (not-recycled) file after a checkpoint. Because the setting instructs checkpoints to leave that number of WAL files regardless of other settings. > so I want my wal files should not be grow in numbers more than 125 so > for that I have changed the parameter of max_wal_size to 2 GB to limit > the size of pg_xlog directory. > > show max_wal_size; > max_wal_size > -- > 2GB > (1 row) Note that max_wal_size is a soft limit, which could be exceeded for certain cases for the reasons including wal_keep_segments. https://www.postgresql.org/docs/9.6/runtime-config-wal.html#GUC-MAX-WAL-SIZE > but I did some testing by bulk inserts then the number of wal files > has grown more than 125. > > and the size of pg_xlog directory also reached to 2.7 GB. > > /data/apps/edb/as96/data/pg_xlog 04:05:08]$ ls | wc -l > 173 > > /data/apps/edb/as96/data 04:05:11]$ du -sh pg_xlog/ > 2.7Gpg_xlog/ I'm not sure what value you set to checkpoint_completion_target but suppose 0.5, the default, with that setting and max_wal_size = 2GB, chekcpoint happens every 85 segments [*1]. Note that the 85 files start after the 125 files kept in pg_wal by wal_keep_segments. So no wonder even if the WAL files got more than 3GB. *1: max_wal_size / (checkpoint_completion_target + 1.0) = 1365MB = 85 files > I wonder why I faced such behavior bcz I limited the size of pg_xlog > directory by setting max_wal_size to 2GB that should be equivalent to > around 125 number of wal file in the pg_xlog directory. Does the above explanation makes sense for you? > please suggest how should I make both identical (wal_keep_segments and > max_wal_size). wal_keep_segments must be minimal amount required for pg_basebackup or replication, otherwise set to zero. If wal_keep_segments is greater than max_wal_size - [*1], that is, 43 16MB-files, max_wal_size would be overflown before XLOG-triggered checkpoint caused by max_wal_size starts. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Removing a subscription that does not exist
At Fri, 9 Jul 2021 10:49:46 -0600, Jeff Ross wrote in > Hello, > > I'm working with an RDS instance running 12 that has an old > subscription that I can't seem to drop. ... > It shows up here: > > mirror_admin@metro_logical> select * from pg_subscription; >oid │ subdbid │ subname │ subowner │ ... > ┼─┼─┼─┼ ... > 83645 │66754 │ cargowel_common │16394 │ ... > 83646 │66754 │ metro_prod │16394 │ ... > 51490 │14313 │ metro│16394 │ ... > (3 rows) ... > But not in here: > > mirror_admin@metro_logical> \dRs+ > List of subscriptions >Name │ Owner│ Enabled │ Publication > ─┼───┼─┼─── > cargowel_common │ mirror_admin │ t │ {cargowel_common_prod} > metro_prod │ mirror_admin │ t │ {metro_prod} > (2 rows) > > And it can't be disabled or dropped: Look at the subdbid field in the first query result. You were logging into the databsae with OID=66754 and the subscription "metro" belongs to the database 14313. The second command doesn't show metro which is not of the current database. > mirror_admin@metro_logical> alter subscription metro disable; > ERROR: subscription "metro" does not exist > Time: 24.263 ms > mirror_admin@metro_logical> drop subscription metro; > ERROR: subscription "metro" does not exist > Time: 23.648 ms > > I did try deleting it directly from the pg_subscription table but that > failed with a permission denied error. My suspicion is that's because > of the RDS environment. > > What else can I try to remove this old non-functional subscription? Thus you need to log in to the databse OID=14313 to manipulate on the subsciption metro. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: oldest WAL files not removed
At Wed, 1 Sep 2021 12:48:51 +0200, wrote in > Hi, > > > > Looking at WAL folder after a crash, I noticed that new files after > restarting overwrite the more recent files before the crash and not the > oldest, which was what I expected. > > Is that normal ? I got only one file marked .deleted. Does that happens > when a WAL file hase been completed updated in the database and if then > while all oldest files aren't marked .deleted after restarting ? > > > Example : > > Crash occurs Aug 31 22:03 which is the more recent Wal file, the oldest file > is Aug 30 17:20 (and 105 files between those two) > > After restarting Aug 30 17:20 is still there, Aug 31 22:03 disappeared, one > new file is Sep 1 12:15 marked .deleted (restarting date) and one new Sep 1 > 12:36 which I guess is normal. Right now, I see an new wal file and the > previous one marked .deleted which is ok. > > Why are the oldest wal files still there ?? Can I remove them ? > > Hope I'm clear enough and thanks for explanations, It would be very helpful you gave us the name of the files. Due to WAL file recycling, timestamps are frequently shuffled aginst names. In any case, no WAL files ought to be manually removed. If you don't need the recycled-for-future files that much, consider reducing min_wal_size. If you looked the files only in timestamp order, with a high odds, the "oldest" file is a recycled file to be used in future, and the "newest" file is the currently written one. If so, the reason that the oldest-in-timestamp file is still there is it is still waiting to be used. Even if you removed the to-be-used-in-future files, such files would increase to the same extent according to the setting of min_wal_size. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: oldest WAL files not removed
At Thu, 2 Sep 2021 08:21:37 +0200, wrote in > Hy Kyotaro, > > Thanks for this explanation. I joined the files to be complete. Thanks, it is informative. It looks like this if sorted in the file-name order. 01/09/2021 12:1516 777 216 0001008E0061.deleted 01/09/2021 16:3516 777 216 0001008E0086.deleted 01/09/2021 16:3516 777 216 0001008E0087 30/08/2021 20:4016 777 216 0001008E0088 ... 31/08/2021 17:5816 777 216 0001008E00CF > If you looked the files only in timestamp order, with a high odds, the > "oldest" file is a recycled file to be used in future, and the "newest" file > is the currently written one. If so, the reason that the > oldest-in-timestamp file is still there is it is still waiting to be used. > Even if you removed the to-be-used-in-future files, such files would > increase to the same extent according to the setting of min_wal_size. The reason that 0001008E0088 is still there is it is still waiting to be used. The files 0001008E0061/86.deleted have been already removed in the postgres' view but remain because someone else is still using it. If they persist too long, they could be removed manually if possible. The files 88 to CF look like preallocated, or recycled files. Since there are 76 files, it seems like min_wal_size is set to 1GB or so. If you don't need that many files preallocated in your pg_wal directory, consider reducing min_wal_size. But note that the preallocated files won't be gone right away just by doing that, If you really want to delete that file right away, the preallocated files are theoretically removable. You can see the current-writing file name by the following query then move to somewhere the files with names larger than the current file in the file-name order, then remove the files after making sure the system can restart. =# select pg_walfile_name(pg_current_wal_lsn()); If the system is active, the current file may advance so be careful not to remove files with its substantial contents. This is why I said "In any case, no WAL files ought to be manually removed." regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Postgresql 11: terminating walsender process due to replication timeout
At Thu, 9 Sep 2021 14:52:25 +0900, Abhishek Bhola wrote in > I have found some questions about the same error, but didn't find any of > them answering my problem. > > The setup is that I have two Postgres11 clusters (A and B) and they are > making use of publication and subscription features to copy data from A to > B. > > A (source DB- publication) --> B (target DB - subscription) > > This works fine, but often (not always) when the data volume being inserted > on a table in node A increases, it gives the following error. > > "terminating walsender process due to replication timeout" > > The data volume at the moment being entered is about 30K rows per second > continuously for hours through COPY command. > > Earlier the wal_sender_timeout was set to 5 sec and I would see this error > much often. I then increased it to 1 min and the frequency of this error > reduced. But I don't want to keep increasing it without understanding what > is causing it. I looked at the code of walsender.c and know the exact lines > where it's coming from. > > But I am still not clear which parameter is making the sender assume that > the receiver node is inactive and therefore it should stop the wal_sender. > > Can anyone please suggest what changes I should make to remove this error? What minor-version is the Postgres server mentioned? PostgreSQL 11 have gotten the following fix at 11.6, which could be related to the trouble. https://www.postgresql.org/docs/11/release-11-6.html > Fix timeout handling in logical replication walreceiver processes > (Julien Rouhaud) > > Erroneous logic prevented wal_receiver_timeout from working in > logical replication deployments. The details of the fix is here. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3f60f690fac1bf375b92cf2f8682e8fe8f69098 > Fix timeout handling in logical replication worker > > The timestamp tracking the last moment a message is received in a > logical replication worker was initialized in each loop checking if a > message was received or not, causing wal_receiver_timeout to be ignored > in basically any logical replication deployments. This also broke the > ping sent to the server when reaching half of wal_receiver_timeout. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Postgresql 11: terminating walsender process due to replication timeout
At Thu, 9 Sep 2021 16:06:25 +0900, Abhishek Bhola wrote in > sourcedb:~$ postgres --version > postgres (PostgreSQL) 11.6 > > Sorry for missing this information. > But looks like this fix is already included in the version I am running. Ok. I'm not sure but there may be a case where too-busy (or too poor relative to the publisher) subscriber cannot send a response for a long time. Usually keep-alive packets sent from publisher causes subscriber response even while busy time but it seems that if subscriber applies changes more than two times slower than the publisher sends, subscriber doesn't send a response in the timeout window. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Postgresql 11: terminating walsender process due to replication timeout
At Fri, 10 Sep 2021 16:55:48 +0900, Abhishek Bhola wrote in > So is there any solution to this issue? > I did try to increase the wal_sender_timeout and it broke the pub/sub. > I increased the wal_receiver_timeout and it wouldn't attempt to restart the > subscription until that time elapsed. > Due to that, the WAL segments got removed by the time it came up again and > it stopped working. > So given that the publisher is publishing at a higher rate than the > subscriber is subscribing, what can be done? Given that my assumption is right, to enable a subscriber to send a response, the subscriber needs to see a keepalive packet from publisher (sent with intervals of wal_sender_timeout/2) within every interval of wal_sender_timeout. Otherwise needs to "rest" by finding a gap in the data stream from the publisher with intervals shorter than wal_sender_timeout. The reason subscriber is kept busy is it receives the next data before it finishes applying the previous data. So possible workaround I came up with for now are: - Increase processing power of the subscriber, so as to increase the possibility that it can finish applying changes before the next data block comes from the publisher. Or, to make the subscriber can keep catching up to the publisher. This is the most appropriate solution, I think. - Throttle network bandwidth to obtain the same effect to the first reason above. (This may give a side effect that the bandwidth become finally insufficient.) - Break large transactions on the publisher into smaller pieces. Publisher sends data of a transaction at once at transaction commit, so this could average data transfer rate. - If you are setting *sufficient* logical_decoding_work_mem for such problematic large transactions, *decreasing* it might mitigate the issue. Lower logical_decoding_work_mem cause transaction data spill out to disk and the spilled data on disk could be sent at slower rate than on-memory data. Of course this is in exchange with total performance. - Streaming mode of logical replication introduced in PostgreSQL 14 might be able to mitigate the problem. It starts sending transaction data before the transaction completes. I'm not sure this is "fixed" for 13 or earlier, because a straight forward resolution surely decreases maximum processing rate at subscriber. > On Fri, Sep 10, 2021 at 9:26 AM Kyotaro Horiguchi > wrote: > > > At Thu, 9 Sep 2021 16:06:25 +0900, Abhishek Bhola < > > abhishek.bh...@japannext.co.jp> wrote in > > > sourcedb:~$ postgres --version > > > postgres (PostgreSQL) 11.6 > > > > > > Sorry for missing this information. > > > But looks like this fix is already included in the version I am running. > > > > Ok. I'm not sure but there may be a case where too-busy (or too poor > > relative to the publisher) subscriber cannot send a response for a > > long time. Usually keep-alive packets sent from publisher causes > > subscriber response even while busy time but it seems that if > > subscriber applies changes more than two times slower than the > > publisher sends, subscriber doesn't send a response in the timeout > > window. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: WAL File Recovery on Standby Server Stops Before End of WAL Files
At Wed, 27 Oct 2021 16:42:52 +, "Ryan, Les" wrote in > 2021-10-27 10:26:31.467 MDT [2012] LOG: redo starts at 419/5229A858 ... > 2021-10-27 10:26:36.188 MDT [2012] LOG: restored log file > "00010419005A" from archive > 2021-10-27 10:26:36.750 MDT [2012] LOG: consistent recovery state reached at > 419/5AB8 > 2021-10-27 10:26:36.752 MDT [6204] LOG: database system is ready to accept > read only connections > 2021-10-27 10:26:36.823 MDT [6040] LOG: started streaming WAL from primary > at 419/5A00 on timeline 1 > > * There are many more WAL files available starting with > 00010419005B but the restore process always stops at > 00010419005A. > > I have two questions: > > * Why does the WAL file recovery process now stop after it reads > 00010419005A? > * What do I need to do to get PostgreSQL to recover the rest of the > available WAL files. The info above alone donesn't clearly suggest anything about the reason. Could you show the result of "pg_waldump 00010419005A 2>&1 | tail -5"? What I'm expecting to see is an error message from pg_waldump before the end of the file. It would be the immediate cause of the problem. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: WAL File Recovery on Standby Server Stops Before End of WAL Files
At Thu, 28 Oct 2021 13:52:36 +, "Ryan, Les" wrote in > Hi Kyotaro and Dilip, > > Thank you for getting back to me. > > Kyotaro: I ran pg_dump and the output was "pg_waldump: fatal: could not read > file "00010419005A": read 50 of 8192". I'm guessing that it > means that wal file 00010419005A is corrupted and that is why the > recovery process stops there. Is there any way to fix the file? No way, unless the segment is still living in the primary's pg_wal. Your archive storage is unstable or archive_command is not performing its task reliably enough. > Dilip: setting the log level to debug2 did not provide any additional > information. Here are the log entries: > > 2021-10-28 06:51:06.166 MDT [7556] LOG: restored log file > "000104190059" from archive > 2021-10-28 06:51:06.464 MDT [7556] DEBUG: got WAL segment from archive > 2021-10-28 06:51:06.579 MDT [7556] LOG: restored log file > "00010419005A" from archive > 2021-10-28 06:51:06.854 MDT [7556] DEBUG: got WAL segment from archive > 2021-10-28 06:51:07.107 MDT [7556] LOG: consistent recovery state reached at > 419/5AB8 > 2021-10-28 06:51:07.107 MDT [7556] DEBUG: switched WAL source from archive > to stream after failure > 2021-10-28 06:51:07.109 MDT [7844] LOG: database system is ready to accept > read only connections > 2021-10-28 06:51:07.152 MDT [7844] DEBUG: forked new backend, pid=6900 > socket=6068 > > I set the log level to debug5 and here is what I got: > > 2021-10-28 06:25:41.262 MDT [6288] CONTEXT: WAL redo at 419/5ABFFF60 for > Btree/INSERT_LEAF: off 130 > 2021-10-28 06:25:41.262 MDT [6288] DEBUG: record known xact 33776257 > latestObservedXid 33776257 > 2021-10-28 06:25:41.262 MDT [6288] CONTEXT: WAL redo at 419/5ABFFFA0 for > Heap/INSERT: off 95 flags 0x00 > 2021-10-28 06:25:41.262 MDT [6288] LOG: consistent recovery state reached at > 419/5AB8 > 2021-10-28 06:25:41.263 MDT [6288] DEBUG: switched WAL source from archive > to stream after failure > 2021-10-28 06:25:41.264 MDT [5512] LOG: database system is ready to accept > read only connections > > Does the "switched WAL source from archive to stream after failure" indicate > a problem with the WAL file? Actually the message says that some trouble happend while reading file but that happens always at the end of wal. We might need a bit more detailed message about the trouble in higher debug level messages. > Anyway, it looks like I need to restore the standby server from a new backup. > Thank you both for your help. Unfortunately I think so, too. > Sincerely, > -Les -- Kyotaro Horiguchi NTT Open Source Software Center
Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11
At Tue, 30 Nov 2021 21:53:06 +0800, Yi Sun wrote in > # cat /home/sunyi/tls/root.crt /home/sunyi/tls/1/root.crl > /tmp/test_1.pem > # openssl verify -extended_crl -verbose -CAfile /tmp/test_1.pem -crl_check > /home/sunyi/tls/1/server.crt I guess what you really wanted to revoke was not server.crt but postgresql.crt. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11
At Wed, 1 Dec 2021 16:56:11 +0800, Yi Sun wrote in > We want to revoke server certificate, just don't know why doesn't take > affect > https://www.postgresql.org/docs/11/ssl-tcp.html > https://www.postgresql.org/docs/11/runtime-config-connection.html#GUC-SSL-CRL-FILE Understood. ~/.postgresq/root.crl is required to check server revokation. https://www.postgresql.org/docs/11/libpq-ssl.html > To allow server certificate verification, one or more root > certificates must be placed in the file ~/.postgresql/root.crt in the > user's home directory. (On Microsoft Windows the file is named > %APPDATA%\postgresql\root.crt.) Intermediate certificates should also > be added to the file if they are needed to link the certificate chain > sent by the server to the root certificates stored on the client. > > Certificate Revocation List (CRL) entries are also checked if the file > ~/.postgresql/root.crl exists (%APPDATA%\postgresql\root.crl on > Microsoft Windows). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11
Hi. At Thu, 2 Dec 2021 11:31:26 +0800, Yi Sun wrote in > Hi Kyotaro > > From the description, seems ~/.postgresql/root.crl is store client > revoked certificate No. Revocation is checked on the peer. There's no point for a server to check for revocation of its own certificate, and actually that doesn't happen. Revocation of a client certificate is checked on server side referencing server.crl. Revocation of a server certificate is checked on client side referencing postgresql.crl. For example, some web browsers make use of CRL of web *servers*, which is automatically maintained in background. You will see it work if you duped the server.crl as ~/.postgresql/root.crl on the client. (I spelled this wrongly in the previous message..) > https://www.postgresql.org/docs/11/libpq-ssl.html > ~/.postgresql/root.crl certificates revoked by certificate authorities server > certificate must not be on this list > Just don't know why server parameter ssl_crl_file parameter configured but > don't take affect As explained above, it is because the CRL specified by ssl_crl_file can only be used to verify client certificates. > https://www.postgresql.org/docs/11/runtime-config-connection.html#GUC-SSL-CRL-FILE > > ssl_crl_file (string) > > Specifies the name of the file containing the SSL server certificate > revocation list (CRL). Relative paths are relative to the data directory. > This parameter can only be set in the postgresql.conf file or on the server > command line. The default is empty, meaning no CRL file is loaded. Ah, the "server" in "SSL server certificate revocation list" looks like a noise word, rather misleading, or plain wrong, I'm not sure which one it actually is. Anyway I propose change the rephrase as "SSL client certification revocation list" as attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index ab617c7b86..4ac617615c 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1248,7 +1248,7 @@ include_dir 'conf.d' -Specifies the name of the file containing the SSL server certificate +Specifies the name of the file containing the SSL client certificate revocation list (CRL). Relative paths are relative to the data directory. This parameter can only be set in the postgresql.conf @@ -1267,7 +1267,7 @@ include_dir 'conf.d' -Specifies the name of the directory containing the SSL server +Specifies the name of the directory containing the SSL client certificate revocation list (CRL). Relative paths are relative to the data directory. This parameter can only be set in the postgresql.conf file or on the server command diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index c17d33a54f..eb3a0c6b55 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -1742,11 +1742,10 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname sslcrl -This parameter specifies the file name of the SSL certificate +This parameter specifies the file name of the SSL server certificate revocation list (CRL). Certificates listed in this file, if it -exists, will be rejected while attempting to authenticate the -server's certificate. If neither - nor +exists, will be rejected while attempting to authenticate the server's +certificate. If neither nor is set, this setting is taken as ~/.postgresql/root.crl. @@ -1758,9 +1757,9 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname sslcrldir -This parameter specifies the directory name of the SSL certificate -revocation list (CRL). Certificates listed in the files in this -directory, if it exists, will be rejected while attempting to +This parameter specifies the directory name of the SSL server +certificate revocation list (CRL). Certificates listed in the files +in this directory, if it exists, will be rejected while attempting to authenticate the server's certificate.
Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11
At Fri, 3 Dec 2021 15:41:51 +0800, Yi Sun wrote in > Hi Kyotaro, > > Thank you for your explanation, after putting the crl file to client, it > works now, thanks. Good to hear that. That portion of the documentation has been fixed on the repository, and it will be released in the next minor releases. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: One Partition by list is always chosen by planner
Hello, It seems to me that the type of category is not text and the right side in the query is actually casted to text. At Tue, 21 Nov 2017 12:23:02 -0700 (MST), legrand legrand wrote in <1511292182893-0.p...@n3.nabble.com> > and if I explain the same query with an other filter > > explain select * from wiki_data_part where category='fr' > > | Append (cost=0.00..14010.76 rows=291609 width=48) > | -> Seq Scan on wiki_data_part_f (cost=0.00..9975.04 rows=291339 > width=48) > | Filter: ((category)::text = 'fr'::text) > | -> Seq Scan on wiki_data_part_s (cost=0.00..4035.72 rows=270 width=50) > | Filter: ((category)::text = 'fr'::text) > > wiki_data_part_s is always chosen in the plan > | Filter: ((category)::text = 'fr'::text) "::text" on the left side hinders constraint-exclusion. Usually the left side won't get casted unless the right side is explicitly casted. Otherwise sometimes the left side gets casted so that it can be compared with the right side. A self-contained example is shown below. = create table p1 (cat char(1), b int) partition by list(cat); create table c11 partition of p1 for values in ('a', 'b', 'c'); create table c12 partition of p1 for values in ('x', 'y', 'z'); explain select * from p1 where cat = 'b'; | QUERY PLAN | | Append (cost=0.00..35.50 rows=10 width=12) |-> Seq Scan on c11 (cost=0.00..35.50 rows=10 width=12) | Filter: (cat = 'b'::bpchar) | (3 rows) explain select * from p1 where cat = 'b'::text; | QUERY PLAN | | Append (cost=0.00..81.20 rows=20 width=12) |-> Seq Scan on c11 (cost=0.00..40.60 rows=10 width=12) | Filter: ((cat)::text = 'b'::text) |-> Seq Scan on c12 (cost=0.00..40.60 rows=10 width=12) | Filter: ((cat)::text = 'b'::text) | (5 rows) regards, -- Kyotaro Horiguchi NTT Open Source Software Center