from:"Kyotaro Horiguchi"

Re: Errors with physical replication

2018-05-22 Thread Kyotaro HORIGUCHI

Hello.

At Mon, 21 May 2018 05:18:57 -0700 (MST), greigwise  
wrote in <1526905137308-0.p...@n3.nabble.com>
> Hello.  
> 
> We are on Postgresql version 9.6.6.  We have 2 EC2 instances in different
> Amazon regions and we are doing physical replication via VPN.  It all seems
> to work just fine most of the time.   I'm noticing in the logs that we have
> recurring erros (maybe 10 or 12 times per day) that look like this:



> 2018-05-17 06:36:14 UTC  5af0599f.210d  LOG:  invalid resource manager ID 49
> 2018-05-17 06:36:14 UTC  5afd22de.7ac4  LOG:  started streaming WAL from
> 2018-05-17 07:20:17 UTC  5afd22de.7ac4  FATAL:  could not receive data from
> WAL stream: server closed the connection unexpectedly

> Or some that also look like this:
> 
> 2018-05-17 07:20:17 UTC  5af0599f.210d  LOG:  record with incorrect prev-link
> 2018-05-17 07:20:18 UTC  5afd2d31.1889  LOG:  started streaming WAL from
> 2018-05-17 08:03:28 UTC  5afd2d31.1889  FATAL:  could not receive data from
> WAL stream: server closed the connection unexpectedly

> And some like this:
> 
> 2018-05-17 23:00:13 UTC  5afd63ec.26fc  LOG:  invalid magic number  in
> log segment 00010385003C, offset 10436608
> 2018-05-17 23:00:14 UTC  5afe097d.49aa  LOG:  started streaming WAL from
> primary at 385/3C00 on timeline 1

You recplication connection seems quite unstable and disconnected
frequently. After disconnection, you will see several kinds of "I
find a broken record in my WAL file" and they are cues for
standby to switch to streaming. Itself is a normal operation as
PostgreSQL with one known exception.

> Then, like maybe once every couple months or so, we have a crash with logs
> looking like this:


> 2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology WARNING: 
> terminating connection because of crash of another server process

I think the lines follow an error message like "FATAL: invalid
memory alloc request size 3075129344". This is a kind of "broken
record" but it is known to lead standby to crash. It is
disucussed here.

> [bug fix] Cascaded standby cannot start after a clean shutdown

https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05#0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05

> When this last error occurs, the recovery is to go on the replica and remove
> all the WAL logs from the pg_xlog director and then restart Postgresql. 
> Everything seems to recover and come up fine.  I've done some tests
> comparing counts between the replica and the primary and everything seems
> synced just fine from all I can tell.  

It is right recovery steps, as far as looking the attached log
messages.

> So, a couple of questions.  1) Should I be worried that my replica is
> corrupt in some way or given that everything *seems* ok, is it reasonable to
> believe that things are working correctly in spite of these errors being
> reported.  2)  Is there something I should configure differently to avoid
> some of these errors?

It doesn't seem worth warrying from the viewpoint of data
integrity, but if walsender/walreceiver timeouts fire too
frequently, you might need to increase them for increased
stability.

> Thanks in advance for any help.
> 
> Greig Wise

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-07 Thread Kyotaro Horiguchi

Hello, Bryn.

At Tue, 6 Aug 2019 15:18:51 -0700, Bryn Llewellyn  wrote in 

> Here’s how I’ve tried to describe what I see for p2() with AUTOCOMMIT ON for 
> myself:
> 
> 1. my call p2() starts a txn.
> 
> 2. However, during the execution of the proc, the usual autocommit behavior 
> is programmatically turned off by explicit PostgreSQL code.
> 
> 3. Other explicit PostgreSQL code makes “start transaction” inside a proc 
> simply cause a runtime error under all circumstances. However, txns can be 
> ended by “commit” or “rollback”. And new ones can be started—but only 
> implicitly by executing a SQL statement that, as a top level SQL, would start 
> a txn.

In-procedure transaction control premises that no transaction is
active before calling the procedure.

https://www.postgresql.org/docs/11/sql-call.html

> If CALL is executed in a transaction block, then the called
> procedure cannot execute transaction control
> statements. Transaction control statements are only allowed if
> CALL is executed in its own transaction.

With AUTOCOMMIT=off, implicit BEGIN is invoked just before
CALLing p2() if no transaction is active. Thus p2() is always
called having a transaction active, which inhibits in-procedure
transaction control.

I'm not sure why you want to turn AUTOCOMNIT off, but even with
AUTOCOMMIT on, explict BEGIN prevents per-command COMMIT as you
perhaps know.

https://www.postgresql.org/docs/11/app-psql.html
> When on (the default), each SQL command is automatically
> committed upon successful completion. To postpone commit in
> this mode, you must enter a BEGIN or START TRANSACTION SQL
> command.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: question about zeroes in the wal file names

2019-08-18 Thread Kyotaro Horiguchi

At Sun, 18 Aug 2019 16:17:03 +0200, Luca Ferrari  wrote in 

> I'm just curious to better understand the naming convention behind wal
> files, because I've seen on a system of mine that the wals created
> were:
> 
> 0005020E00FF
>  0005020F
> 
> while I was expecting 20E0x100. So I digged into the code and I've
> seen, from the XLogFileName macro, that the last part is built as the
> reminder of the number of segments per wal file:
> 
> #define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes)\
> snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli,\
>  (uint32) ((logSegNo) / XLogSegmentsPerXLogId(wal_segsz_bytes)), \
>  (uint32) ((logSegNo) % XLogSegmentsPerXLogId(wal_segsz_bytes)))
> 
> 
> and with the default wal size of 16 MB that gives a remainder of 256
> (FF + 1). Assuming I haven't missed anything, this means that there
> are 6 zeroes that will never change in the last 8 chars of the wal
> filename.  Is therefore this only done to handle PostgreSQL WAL sizes
> of 4 GB each?

I think that's right. I don't see it used in other than WAL file
names.  As the macro name suggests, the 4GB block is internally
called as "xlog id" (XLogId). The block size is determined so
that the location within it can be handled as a 32-bit
integer. Older versions of postgresql defined XLogRecPtr as a
pair of two 32-bit integers.

xlogdefs.h.~REL9_1_0~:17
/*
 * Pointer to a location in the XLOG.  These pointers are 64 bits wide,
 * because we don't want them ever to overflow.
 *
 * NOTE: xrecoff == 0 is used to indicate an invalid pointer.  This is OK
 * because we use page headers in the XLOG, so no XLOG record can start
 * right at the beginning of a file.
 *
 * NOTE: the "log file number" is somewhat misnamed, since the actual files
 * making up the XLOG are much smaller than 4Gb.  Each actual file is an
 * XLogSegSize-byte "segment" of a logical log file having the indicated
 * xlogid.  The log file number and segment number together identify a
 * physical XLOG file.  Segment number and offset within the physical file
 * are computed from xrecoff div and mod XLogSegSize.
 */
typedef struct XLogRecPtr
{
uint32  xlogid; /* log file #, 0 based */
uint32  xrecoff;/* byte offset of location in 
log file */
} XLogRecPtr;


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: postmaster utilization

2019-08-19 Thread Kyotaro Horiguchi

At Mon, 19 Aug 2019 10:07:30 -0400, Stephen Frost  wrote in 
<20190819140730.gh16...@tamriel.snowman.net>
> Greetings,
> 
> * Ron (ronljohnso...@gmail.com) wrote:
> > On 8/19/19 5:40 AM, Shiwangini Shishulkar wrote:
> > >We have scheduled postgres full backup on centos 7 machine. DB size is
> > >around 66 GB. We observed while backup is running, postmaster CPU %
> > >reaches to 90 - 100%,which results very strange behavior: that we are not
> > >able to perform any DDL and DML on to the database during the time.
> > >Please suggest a way to limit it's CPU utilization or else let us know how
> > >we can access the database during the backup window. Please let us know on
> > >priority since this is our prod database.
> > 
> > Are you compressing the backup?
> > How many threads are you running?
> > How many CPUs on your system?
> > What version of Postgres?
> 
> ... What are you using to run the backup?

It seems to have been reported as BUG #15961..

https://www.postgresql.org/message-id/15965-413bf5d18aaef...@postgresql.org

> PostgreSQL version: 11.4
> Operating system:   CentOS Linux release 7.6.1810 (Core)
> pg_dump -U postgres -d wg -f wg.sql 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg full text search very slow for Chinese characters

2019-09-10 Thread Kyotaro Horiguchi

Hi.

At Tue, 10 Sep 2019 18:42:26 +0200 (CEST), Andreas Joseph Krogh 
 wrote in 

> På tirsdag 10. september 2019 kl. 18:21:45, skrev Tom Lane 
>  <mailto:t...@sss.pgh.pa.us>>: Jimmy Huang  writes:
>  > I tried pg_trgm and my own customized token parser 
> https://github.com/huangjimmy/pg_cjk_parser
> 
>  pg_trgm is going to be fairly useless for indexing text that's mostly
>  multibyte characters, since its unit of indexable data is just 3 bytes
>  (not characters). I don't know of any comparable issue in the core
>  tsvector logic, though. The numbers you're quoting do sound quite awful,
>  but I share Cory's suspicion that it's something about your setup rather
>  than an inherent Postgres issue.
> 
>  regards, tom lane We experienced quite awful performance when we hosted the 
> DB on virtual servers (~5 years ago) and it turned out we hit the write-cache 
> limit (then 8GB), which resulted in ~1MB/s IO thruput. Running iozone might 
> help tracing down IO-problems. --
>  Andreas Joseph Krogh

Multibyte characters also quickly bloats index by many many small
buckets for every 3-characters combination of thouhsand of
characters, which makes it useless.

pg_bigm based on bigram/2-gram works better on multibyte
characters.

https://pgbigm.osdn.jp/index_en.html

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Referncing a calculated column in a select?

2019-09-12 Thread Kyotaro Horiguchi

Hello.

At Thu, 12 Sep 2019 23:16:01 +0200, "Peter J. Holzer"  wrote 
in <20190912211601.ga3...@hjp.at>
> On 2019-09-12 15:35:56 -0500, Ron wrote:
> > On 9/12/19 2:23 PM, stan wrote:
> > > I am creating some views, that have columns with fairly complex 
> > > calculations
> > > in them. The I want to do further calculations using the result of this
> > > calculation. Right now, I am just duplicating the first calculation in the
> > > select fro the 2nd calculated column. There must be a batter way to do
> > > this, right?
> > 
> > I must be misunderstanding you, because you can just do more of what you're
> > doing now.
> > 
> > Here's an example of doing some calculations in the creation of the view,
> > and then more calculations when selecting from the view:
> 
> I think he wants to refer to other columns in the view. 
..
> What you can do is nest views:

Doesn't subquery work?

SELECT x, y, z
FROM (SELECT f * 3 AS x, f AS y, g + 2 AS z
  FROM (SELECT a + 3 AS f, b + c AS g
FROM t) AS t1
 ) AS t2;

t2 uses f in two columns, where f is calculated from t.a.
Or CTE (WITH clause) might look better.

WITH t1 AS (SELECT a + 3 AS f, b + c AS g FROM t),
 t2 AS (SELECT f * 3 AS x, f AS y, g + 2 AS z FROM t1)
SELECT x, y, z FROM t2;

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Query Tuning

2019-10-01 Thread Kyotaro Horiguchi

Hello.

At Tue, 1 Oct 2019 12:42:24 +0530, Sonam Sharma  wrote in 

> We have a query which is running slow and it's taking 26secs to complete..
> we have run the analyzer also and it's taking the same time.
> 
> Any tool is there for query optimization or any suggestions.

EXPLAIN ANALYZE (not just EXPLAIN) would be that. In many cases
where a query takes an unexpectedly long time, rows estimation in
some nodes would be largely different from actual rows. That
leads to a wrong query plan. EXPLAIN ANALYZE may give you a clue
for such kind of problem.

The following output comes from EXPLAIN. I suppose that you
already have a similar output having a "(actutal time=...)"
clause after the "(cost=..)"  clause.

> My query plan looks like this :
> 
> CTE Scan on approvalwflscreen (cost=8736.21..8737.25 rows=52 width=1214)
> 
> CTE constants
> 
> -> Result (cost=0.00..0.01 rows=1 width=44)
> 
> CTE approval
> 
> -> Sort (cost=7793.89..7805.22 rows=4530 width=292)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Create a logical and physical replication

2019-11-04 Thread Kyotaro Horiguchi

Hello.

At Mon, 4 Nov 2019 19:05:24 +0530, Ashok Kumar Tiwari 
 wrote in 
> is it possible in postgres to create one physical replica for DR(data
> recovery purpose) and another one logical replica for reporting purpose

A physical replication standby cannot be a publisher. You will find
that CREATE PUBLICATION fails.

> master > physical standby (for DR  Purpose)
> |
> |
> |
> |
> logical replica for reporting purpose
> 
> -- or cascading logic replication
> 
>   master > physical standby (for DR  Purpose) - >
> logical replica for reporting purpose

The former works (as documented in [1]) but the latter doesn't.


By the way, If I'm not missing anything I don't find a description on
this restriction in the documentation.  Shouldn't it be in [2], or
[1]?

[1]: https://www.postgresql.org/docs/12/logical-replication.html
[2]: https://www.postgresql.org/docs/12/logical-replication-restrictions.html

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

2019-11-13 Thread Kyotaro Horiguchi

Hello.

At Wed, 2 Oct 2019 19:24:02 -0400, Stephen Frost  wrote in 
> Greetings,
> 
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
> > > * Aleš Zelený (zeleny.a...@gmail.com) wrote:
> > >> But recovery on replica failed to proceed WAL file
> > >> 00010FED0039  with log message: " invalid contrecord length
> > >> 1956 at FED/38FFE208".
> > 
> > > Err- you've drawn the wrong conclusion from that message (and you're
> > > certainly not alone- it's a terrible message and we should really have a
> > > HINT there or something).
> > 
> > Yeah, those messages are all pretty ancient, from when WAL was new and not
> > to be trusted much.  Perhaps the thing to do is move the existing info
> > into DETAIL and make the primary message be something like "reached
> > apparent end of WAL stream".
> 
> Yes, +1 on that.

The attached is something like that. AFAICS we can assume that an
invalid record means the end of WAL as long as we are fetching
successive records and XLogPageRead() has not logged anything.

As the current comment in ReadRecord says, we don't have a message if
standby has been triggered, but that's not always the case. Still we
may have a message if new segment is available after triggering.

I used a kind-of-tricky way to handle optional errdetail but it may be
better to have some additional feature as an ereport
subfunction. (Maybe named "errif" or something?)

I think it works fine as expected but I find one arguable behavior.

In the first patch, the "end of WAL" message is accompanied by source,
LSN and timeline.

LOG:  reached end of WAL in archive at 0/A360 on timeline 8
DETAIL:  invalid record length at 0/A360: wanted 24, got 0

The last two just seems good but the first one shows "archive" even on
promotion. It is right that we scan archive after promotion but seems
confusing. So I attached another patch that shows operation instead of
WAL source.

LOG:  reached end of WAL during streaming reaplication at 0/A560 on 
timeline 8
DETAIL:  invalid record length at 0/A560: wanted 24, got 0

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2bd3d0e5e5..70fd34659a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -852,6 +852,9 @@ static bool bgwriterLaunched = false;
 static int	MyLockNo = 0;
 static bool holdingAllLocks = false;
 
+/* Have we complaind about the record at the location? */
+static XLogRecPtr lastComplaint = 0;
+
 #ifdef WAL_DEBUG
 static MemoryContext walDebugCxt = NULL;
 #endif
@@ -889,6 +892,7 @@ static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		bool fetching_ckpt, XLogRecPtr tliRecPtr);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
+static bool have_complained_at(XLogRecPtr RecPtr);
 static void XLogFileClose(void);
 static void PreallocXlogFiles(XLogRecPtr endptr);
 static void RemoveTempXlogFiles(void);
@@ -4264,8 +4268,11 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 		record = XLogReadRecord(xlogreader, RecPtr, &errormsg);
 		ReadRecPtr = xlogreader->ReadRecPtr;
 		EndRecPtr = xlogreader->EndRecPtr;
+
 		if (record == NULL)
 		{
+			XLogRecPtr reportptr = RecPtr ? RecPtr : EndRecPtr;
+
 			if (readFile >= 0)
 			{
 close(readFile);
@@ -4273,13 +4280,24 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 			}
 
 			/*
-			 * We only end up here without a message when XLogPageRead()
-			 * failed - in that case we already logged something.
+			 * When we end up here while reading successive records, we reached
+			 * the end of WAL unless XLogPageRead() has logged something. We
+			 * may or may not have a message here depending on the situation.
+			 * Otherwise we just report the received message if any and if
+			 * needed.
+			 *
+			 * Note: errormsg is alreay translated.
 			 */
-			if (errormsg)
-ereport(emode_for_corrupt_record(emode,
- RecPtr ? RecPtr : EndRecPtr),
-		(errmsg_internal("%s", errormsg) /* already translated */ ));
+			if (RecPtr == InvalidXLogRecPtr && !have_complained_at(EndRecPtr))
+ereport(emode,
+		(errmsg("reached end of WAL in %s at %X/%X on timeline %u",
+xlogSourceNames[currentSource],
+(uint32) (reportptr >> 32), (uint32) reportptr,
+ThisTimeLineID),
+		 (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+			else if (errormsg)
+ereport(emode_for_corrupt_record(emode, reportptr),
+		(errmsg_

Re: pg_basebackup

2019-12-23 Thread Kyotaro Horiguchi

Hello.

At Mon, 23 Dec 2019 03:38:12 +, Daulat Ram  
wrote in 
> thanks Adrian, what about the
> postmaster.opts file, this file was also skipped in backup.

The file is overwritten at startup so there's no point in having it in
a backup. Thus, it is deliberately excluded from a backup.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: How can I pushdown of functions used in targetlist with FDW ?

2020-01-09 Thread Kyotaro Horiguchi

At Thu, 09 Jan 2020 14:12:25 -0500, Tom Lane  wrote in 
> Stephen Frost  writes:
> > I agree with all of that- but it seems like we've got people asking for
> > it, and I can understand why they are, and therefore I'd be inclined to
> > provide a way for users to get that behavior and I'm even inclined to
> > say that pushing down should be the 'default' if everything else is
> > equal and it's possible to do so.
> 
> [ shrug... ]  I think that's a completely arbitrary value judgment
> unsupported by any facts.  Furthermore, it'd greatly expand our exposure
> to problems with "we thought this function behaves the same on the remote
> end, but it doesn't".  That hazard makes me feel that the default should
> *never* be to encourage pushing down, unless there's a pretty clear
> potential win to justify the risk of breakage.
> 
> If SQL/MED has a notion of a "foreign function" that is only executable on
> the remote side, and that fails if you try to execute it locally, then
> it'd make sense to implement that feature and then expect postgres_fdw to
> try hard to push down such calls.  But we don't have that (yet).  I'm
> unconvinced that ordinary functions ought to be treated as if remote
> execution is preferable.

Isn't ROUTING MAPPING [1] that?  Definers should define one at their
own risk as table constraints are.

[1] 
https://www.postgresql.org/message-id/CAD21AoCSzZO%2Bd3EL-9w%3D97aC28%3DRb3%2BBh77wRLcP7HCH8-Bn9A%40mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: How can I pushdown of functions used in targetlist with FDW ?

2020-01-09 Thread Kyotaro Horiguchi

At Thu, 09 Jan 2020 20:25:20 -0500, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > Isn't ROUTING MAPPING [1] that?  Definers should define one at their
> > own risk as table constraints are.
> 
> Hmm.  It looks like that patch is moribund, and I can't say that I'm
> excited about reviving it.  Aside from the syntactical reserved-word
> problems, it seems like a mighty heavyweight way of attacking the issue

I don't mean that the patch itself is usable right now, but am simply
asking about the feature.

> --- that is, instead of a property directly attached to the function of
> interest, you've now got this entire other system catalog structure that
> has to be searched for relevant information.  And that needs all the usual
> support for a new object type, eg ALTER infrastructure, pg_dump support,
> etc etc etc.

Yeah, that sounds really combersome to me..

> Plus, once you've created a mapping, it's only good for one server so you
> have to do it over again for each server.  I guess in the general case
> you have to have that, but I'd sure want some kind of shortcut for the
> common case where e.g. ABS() means the same thing everywhere.

As for the user side, I understand that they want to push down certain
functions despite of the troubles, but it's better if simpler.

About the idea of function attrributes, I think push-down'ability is
not one of function's properties. postgres_fdw can push down many
intrinsic functions. oracle_fdw can push down less functions. file_fdw
cannot at all. So it doesn't seem rather an ability of FDW drivers.
Putting aside the maintenance trouble, it would work for intrinsic
functions.

Even with the feature, we still cannot push down user-defined
functions, or incompatible intrinsic functions, which the user wants
to push down knowing about the incompatibility.  If we allow that, we
need to have a means to tell foreign servers about such functions.  A
means other than ROUTINE MAPPING I can come up with is server options
and foreign table options.

> The SQL committee do love verbosity, don't they.

Sure..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

2020-01-17 Thread Kyotaro Horiguchi

Thank you, and sorry for overlooking your comment.

At Thu, 14 Nov 2019 12:28:13 -0500, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > At Wed, 2 Oct 2019 19:24:02 -0400, Stephen Frost  wrote 
> > in 
> >> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >>> Yeah, those messages are all pretty ancient, from when WAL was new and not
> >>> to be trusted much.  Perhaps the thing to do is move the existing info
> >>> into DETAIL and make the primary message be something like "reached
> >>> apparent end of WAL stream".
> 
> >> Yes, +1 on that.
> 
> > What do you think about this?
> 
> It seems overcomplicated.  Why do you need to reach into
> emode_for_corrupt_record's private state? 
> emode_for_corrupt_record's private state?  I think we could just
> change the message text without that, and leave the emode
> machinations as-is.

I tried to avoid messages at the same LSN. But it is done by
emode_for_corrupt_record by reducing error level. I reverted that
part.

> I don't understand the restriction to "if (RecPtr == InvalidXLogRecPtr)"
> either?  Maybe that's fine but the comment fails to explain it.

"When we end up here while reading successive recored" meant that, but
it is not explicit. If RecPtr was not invalid, it means that the
caller is requesting for a specific record that should exist.  It is
not end-of-wal at all. I rewrote the comment.

> Another point is that, as the comment for emode_for_corrupt_record
> notes, we don't really want to consider that we've hit end-of-WAL
> when reading any source except XLOG_FROM_PG_WAL.  So I think the
> change of message ought to include a test of the source, but this
> doesn't.

Maybe no. The function just mutes messages for repeated error on
XLOG_FROM_WAL. We consider end-of-WAL while XLOG_FROM_ARCHIVE.  In
this version the "reached end of WAL" is not emitted when
emode_for_corrupt_record decided not to show the message.

> Also, the business with an "operation" string violates the message
> translatability guideline about "don't assemble a message out of
> phrases".  If we want to have that extra detail, it's better just
> to make three separate ereport() calls with separately translatable
> messages.

Though I thought that the operation is just a noun and it is safely
processed separately, I followed your comment. And this version takes
the more verbose one of the previous two.

> Also, it seems wrong that the page TLI check, just below, is where
> it is and isn't part of the main set of page header sanity checks.
> That's sort of unrelated to this patch, except not really, because
> shouldn't a failure of that test also be treated as an "end of WAL"
> condition?

It seems checking if xlogreader did something wrong, since it is
pluggable.  I'm not sure there is any concrete reason for that,
though.  As for recovery, it is actually redundant because
XLogFileReadAnyTLI already checked that for the record, but it doesn't
matter.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 94ee9c982afc5b39bb062c59a95ac2323a4109c8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 17 Jan 2020 17:15:58 +0900
Subject: [PATCH] Change end-of-WAL message less scary

At the end of WAL during recovery, users read just "invalid record"
message in logs, which is too scary. This patch change the message for
the case to "reached end of WAL" to let users know that they are not
in a trouble.
---
 src/backend/access/transam/xlog.c | 48 +--
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7f4f784c0e..a7757ea4bf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4279,6 +4279,10 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 		EndRecPtr = xlogreader->EndRecPtr;
 		if (record == NULL)
 		{
+			int actual_emode =
+emode_for_corrupt_record(emode,
+		 RecPtr ? RecPtr : EndRecPtr);
+
 			if (readFile >= 0)
 			{
 close(readFile);
@@ -4286,15 +4290,43 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 			}
 
 			/*
-			 * We only end up here without a message when XLogPageRead()
-			 * failed - in that case we already logged something. In
-			 * StandbyMode that only happens if we have been triggered, so we
-			 * shouldn't loop anymore in that case.
+			 * Invalid RecPtr here means we are reading successive records
+			 * during recovery. If we get here during recovery, we assume that
+			 * we reached the end of WAL.  Otherwise something's really wrong
+			 * and we report just only the errormsg if any. If we

Re: How to handle CASE statement with PostgreSQL without need for typecasting

2020-02-18 Thread Kyotaro Horiguchi

Hello.

At Tue, 18 Feb 2020 12:43:21 -0800, Adrian Klaver  
wrote in 
> test=> UPDATE t_update SET F1 = '02/23/2020'::unknown;
> UPDATE 1
> 
> test=> UPDATE t_update SET F1 = (CASE WHEN ('02/18/2020' IS NULL )
> THEN ('02/18/2020' ) ELSE ('02/19/2020') END)::unknown;
> ERROR: failed to find conversion function from unknown to timestamp
> without time zone
> 
> test=> UPDATE t_update SET F1 = (CASE WHEN ('02/18/2020' IS NULL )
> THEN ('02/18/2020' ) ELSE ('02/19/2020') END)::timestamp;
> UPDATE 1
> 
> So there is some sort of different evaluation going on in the CASE
> statement.

The documentation says:

https://www.postgresql.org/docs/10/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS

> A cast applied to an unadorned string literal represents the initial
> assignment of a type to a literal constant value, and so it will
> succeed for any type (if the contents of the string literal are
> acceptable input syntax for the data type).
..
> However, automatic casting is only done for casts that are marked “OK
> to apply implicitly” in the system catalogs. Other casts must be
> invoked with explicit casting syntax. This restriction is intended to
> prevent surprising conversions from being applied silently.

Conversions from the type unkown is not registered in pg_cast.  Also
CREATE CAST on pseudo types like unknown is not allowed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: polygon && polygon and precision

2020-03-12 Thread Kyotaro Horiguchi

Hello, Patrick.

At Thu, 12 Mar 2020 10:16:15 -0700, Patrick Dowler  
wrote in 
> I am using polygon datatype with some very small values and discovered that
> the overlaps operator appears to break down and give false positives for
> values smaller than ~1e-6. I have tested this on 9.5 (.3 and .16) and 10.10.

For usability, geometric arithmetics is performed having 1.0E-6 of
tolerance. For example equalness between two floating point values is
defined as:

#define EPSILON 1.0E-06
#define FPeq(A, B)  (fabs((A) - (B)) <= EPSILON)

Point-same-as is defined using the comparison, so the following
comparison gives true.

=# select '(1.5E-06, 0)'::point ~= '(1.0E-06, 0.5E-06)'::point;
 ?column? 
--
 t

All comparisons are performed taking that degree of tolerance. By a
rough calculation, that tolerance in long-lat coordinate is
corresnponding to that of about 40 meters on the ground.

> To reproduce (these examples only differ in the x-axis exponents) and
> should be false but the first one is t(rue)
> 
> select '((3.0e-07,-2),(9.0e-07,-2),(9.0e-07,1),(3.0e-07,1))'::polygon
> &&
> '((2.0e-07,-0.1),(2.0e-07,0.1),(2.01e-07,0.1),(2.01e-07,-0.1))'::polygon;
> 
> select '((3.0e-06,-2),(9.0e-06,-2),(9.0e-06,1),(3.0e-06,1))'::polygon
> &&
> '((2.0e-06,-0.1),(2.0e-06,0.1),(2.01e-06,0.1),(2.01e-06,-0.1))'::polygon;
> 
> Maybe suggests some single-precision floating point use in the
> calculations...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: It is possible to force periodically switch xlog?

2020-04-01 Thread Kyotaro Horiguchi

At Wed, 1 Apr 2020 21:09:20 -0400, David Steele  wrote in 
> On 4/1/20 7:19 PM, Edson Richter wrote:
> > IIRC archive_timeout does precisely that.
> > I would assume that "archive_mode" must be "on" or "always" - but I do
> > have "archive_mode=off".
> 
> I'm not sure what the point of switching the WAL segment would be with
> archive_mode=off.
> 
> What are you trying to accomplish?

Also I'm a bit curious about that, but anyway archive_timeout works
even with archive_mode=off. The name looks being a bit off of its
substance.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

EINTR while resizing dsm segment.

2020-04-02 Thread Kyotaro Horiguchi

I provided the subject, and added -hackers.

> Hello,
> I am running postgres 11.5 and we were having issues with shared segments.
> So I increased the max_connection as suggested by you guys and reduced my
> work_mem to 600M.
> 
> Right now instead, it is the second time I see this error :
> 
> ERROR:  could not resize shared memory segment "/PostgreSQL.2137675995" to
> 33624064 bytes: Interrupted system call

The function posix_fallocate is protected against EINTR.

| do
| {
|   rc = posix_fallocate(fd, 0, size);
| } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));

But not for ftruncate and write. Don't we need to protect them from
ENTRI as the attached?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 590b783f93995bfd1ec05dbcb2805a577372604d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 2 Apr 2020 17:09:35 +0900
Subject: [PATCH] Protect dsm_impl from EINTR

dsm_impl functions should not error-out by EINTR.
---
 src/backend/storage/ipc/dsm_impl.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 1972aecbed..f4e7350a5e 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -360,7 +360,11 @@ dsm_impl_posix_resize(int fd, off_t size)
 	int			rc;
 
 	/* Truncate (or extend) the file to the requested size. */
-	rc = ftruncate(fd, size);
+	do
+	{
+		rc = ftruncate(fd, size);
+	} while (rc < 0 && errno == EINTR &&
+			 !(ProcDiePending || QueryCancelPending));
 
 	/*
 	 * On Linux, a shm_open fd is backed by a tmpfs file.  After resizing with
@@ -874,11 +878,19 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
 		while (success && remaining > 0)
 		{
 			Size		goal = remaining;
+			Size		rc;
 
 			if (goal > ZBUFFER_SIZE)
 goal = ZBUFFER_SIZE;
 			pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);
-			if (write(fd, zbuffer, goal) == goal)
+
+			do
+			{
+rc = write(fd, zbuffer, goal);
+			} while (rc < 0 && errno == EINTR &&
+	 !(ProcDiePending || QueryCancelPending));
+
+			if (rc == goal)
 remaining -= goal;
 			else
 success = false;
-- 
2.18.2

Re: pg_restore: could not close data file: Success

2020-04-15 Thread Kyotaro Horiguchi

Hello.

Added -hackers.

At Wed, 15 Apr 2020 12:14:25 +0200, "Peter J. Holzer"  wrote 
in 
> On 2020-04-15 12:01:46 +0200, Peter J. Holzer wrote:
> > I'm trying to restore a backup on a different machine and it terminates
> > with the not really helpful messages:
> > 
> > pg_restore: [directory archiver] could not close data file: Success
> > pg_restore: [parallel archiver] a worker process died unexpectedly
> [...]
> > My guess is that maybe one of the data files is damaged
> 
> As is often the case the matter became obvious a few minutes after
> writing the mail. 
> 
> There were indeed two file with length 0 in the dump. That happened
> because the backup failed because it couldn't obtain a lock on a table.
> 
> I nicer error message (something like "cannot decompress '13503.dat.gz':
> Empty file") would have helped.

Unfortunately, just emptying .dat.gz file doesn't worked for me.
Anyway the message is emitted the following way.

pg_backup_directoy.c:
>  if (cfclose(cfp) !=0)
>fatal("could not close data file: %m");

%m doesn't work for some kinds of errors about compressed files but
cfclose conseals the true cause.

I'm surprised to find an old thread about the same issue.

https://www.postgresql.org/message-id/20160307.174354.251049100.horiguchi.kyotaro%40lab.ntt.co.jp

But I don't think it's not acceptable that use fake errno for gzclose,
but cfclose properly passes-through the error code from gzclose, so it
is enought that the caller should recognize the difference.

Please find the attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index 1417401086..3a8394d7f2 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -645,6 +645,13 @@ cfgets(cfp *fp, char *buf, int len)
 		return fgets(buf, len, fp->uncompressedfp);
 }
 
+/*
+ * cfclose close the stream
+ *
+ * Returns 0 if successfully closed the cfp. Most of errors are reported as -1
+ * and errno is set.  Otherwise the return value is the return value from
+ * gzclose and errno doesn't hold a meangful value.
+ */
 int
 cfclose(cfp *fp)
 {
@@ -665,6 +672,11 @@ cfclose(cfp *fp)
 #endif
 	{
 		result = fclose(fp->uncompressedfp);
+
+		/* normalize error return, just in case EOF is not -1 */
+		if (result != 0)
+			result = -1;
+
 		fp->uncompressedfp = NULL;
 	}
 	free_keep_errno(fp);
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index c9cce5ed8a..ecc6aa5fbb 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -108,6 +108,7 @@ void
 InitArchiveFmt_Directory(ArchiveHandle *AH)
 {
 	lclContext *ctx;
+	int			ret;
 
 	/* Assuming static functions, this can be copied for each format. */
 	AH->ArchiveEntryPtr = _ArchiveEntry;
@@ -218,8 +219,14 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
 		ReadToc(AH);
 
 		/* Nothing else in the file, so close it again... */
-		if (cfclose(tocFH) != 0)
-			fatal("could not close TOC file: %m");
+		ret = cfclose(tocFH);
+		if (ret < 0)
+		{
+			if (ret == -1)
+fatal("could not close TOC file: %m");
+			else
+fatal("could not close TOC file: zlib error (%d)", ret);
+		}
 		ctx->dataFH = NULL;
 	}
 }
@@ -378,6 +385,7 @@ _PrintFileData(ArchiveHandle *AH, char *filename)
 	char	   *buf;
 	size_t		buflen;
 	cfp		   *cfp;
+	int			ret;
 
 	if (!filename)
 		return;
@@ -396,8 +404,15 @@ _PrintFileData(ArchiveHandle *AH, char *filename)
 	}
 
 	free(buf);
-	if (cfclose(cfp) !=0)
-		fatal("could not close data file: %m");
+
+	ret = cfclose(cfp);
+	if (ret < 0)
+	{
+		if (ret == -1)
+			fatal("could not close data file: %m");
+		else
+			fatal("could not close data file: zlib error (%d)", ret);
+	}
 }
 
 /*
@@ -429,6 +444,7 @@ _LoadBlobs(ArchiveHandle *AH)
 	lclContext *ctx = (lclContext *) AH->formatData;
 	char		fname[MAXPGPATH];
 	char		line[MAXPGPATH];
+	int			ret;
 
 	StartRestoreBlobs(AH);
 
@@ -460,9 +476,16 @@ _LoadBlobs(ArchiveHandle *AH)
 		fatal("error reading large object TOC file \"%s\"",
 			  fname);
 
-	if (cfclose(ctx->blobsTocFH) != 0)
-		fatal("could not close large object TOC file \"%s\": %m",
-			  fname);
+	ret = cfclose(ctx->blobsTocFH);
+	if (ret < 0)
+	{
+		if (ret == -1)
+			fatal("could not close large object TOC file \"%s\": %m",
+  fname);
+		else
+			fatal("could not close large object TOC file \"%s\": zlib error (%d)",
+  fname, ret);
+	}
 
 	ctx->blobsTocFH = NULL;
 
@@ -555,6 +578,7 @@ _CloseArchive(ArchiveHandle *AH)
 	{
 		cfp		   *tocFH;
 		char		fname[MAXPGPATH];
+		int			ret;
 
 		setFilePath(AH, fname, "toc.dat");
 
@@ -

Re: pg_restore: could not close data file: Success

2020-04-16 Thread Kyotaro Horiguchi

At Thu, 16 Apr 2020 14:40:09 +0900, Michael Paquier  wrote 
in 
> On Thu, Apr 16, 2020 at 12:08:09PM +0900, Kyotaro Horiguchi wrote:
> > I'm surprised to find an old thread about the same issue.
> > 
> > https://www.postgresql.org/message-id/20160307.174354.251049100.horiguchi.kyotaro%40lab.ntt.co.jp
> > 
> > But I don't think it's not acceptable that use fake errno for gzclose,
> > but cfclose properly passes-through the error code from gzclose, so it
> > is enought that the caller should recognize the difference.
> 
> A problem with this patch is that we may forget again to add this
> special error handling if more code paths use cfclose().

Definitely.  The reason for the patch is the error codes are diffrent
according to callers and some of callers don't even checking the error
(seemingly intentionally).

> As of HEAD, there are three code paths where cfclose() is called but
> it does not generate an error: two when ending a blob and one when
> ending a data file.  Perhaps it would make sense to just move all this
> error within the routine itself?  Note that it would also mean
> registering file names in lclContext or equivalent as that's an
> important piece of the error message.

Hmm. Sounds reasonable.  I'm going to do that.  Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: walreceiver termination

2020-05-07 Thread Kyotaro Horiguchi

Hello.

At Mon, 4 May 2020 09:09:15 -0500, Justin King  wrote in 
> Would there be anyone that might be able to help troubleshoot this
> issue -- or at least give me something that would be helpful to look
> for?
> 
> https://www.postgresql.org/message-id/flat/CAGH8ccdWLLGC7qag5pDUFbh96LbyzN_toORh2eY32-2P1%3Dtifg%40mail.gmail.com
> https://www.postgresql.org/message-id/flat/CANQ55Tsoa6%3Dvk2YkeVUN7qO-2YdqJf_AMVQxqsVTYJm0qqQQuw%40mail.gmail.com
> https://dba.stackexchange.com/questions/116569/postgresql-docker-incorrect-resource-manager-data-checksum-in-record-at-46f-6
> 
> I'm not the first one to report something similar and all the
> complaints have a different filesystem in common -- particularly ZFS
> (or btrfs, in the bottom case).  Is there anything more we can do here
> to help narrow down this issue?  I'm happy to help, but I honestly
> wouldn't even know where to begin.

The sendto() call at the end of your strace output is "close
connecion" request to wal sender and normally should be followed by
close() and kill().  If it is really the last strace output, the
sendto() is being blocked with buffer-full.

My diagnosis of the situation is that your replication connection had
a trouble and the TCP session is broken in the way wal receiver
couldn't be aware of the breakage.  As the result feedback message
packets from wal receiver were detained in tcp send buffer then
finally the last sendto() was blocked while sending the
close-connection message.

If it happens constantly, routers or firewalls between the primary and
standby may be discarding sessions inadvertantly.

I'm not sure how ZFS can be involved in this trouble, though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PostgreSQL-12 replication failover, pg_rewind fails

2020-05-12 Thread Kyotaro Horiguchi

Hello.

At Tue, 12 May 2020 06:32:30 +, Mariya Rampurawala 
 wrote in 
> I am working on providing HA for replication, using automation scripts.
> My set up consists on two nodes, Master and Slave. When master fails, The 
> slave is promoted to master. But when I try to re-register the old master as 
> slave, the pg_rewind command fails. Details below.
...
>   1.  Rewind again:
>   2.  -bash-4.2$ /usr/pgsql-12/bin/pg_rewind -D /pg_mnt/pg-12/data 
> --source-server="host=10.209.57.17  port=5432 user=postgres dbname=postgres"
> 
> pg_rewind: servers diverged at WAL location 6/B9D8 on timeline 53
> 
> pg_rewind: error: could not open file 
> "/pg_mnt/pg-12/data/pg_wal/0035000600B9": No such file or 
> directory
> 
> pg_rewind: fatal: could not find previous WAL record at 6/B9D8
> 
> 
> I have tried this multiple times but always face the same error. Can someone 
> help me resolve this?

As the error message is saying, required WAL file has been removed on
the old master.  It is the normal behavior and described in the
documentation.

https://www.postgresql.org/docs/12/app-pgrewind.html

> but if the target cluster ran for a long time after the divergence,
> the old WAL files might no longer be present. In that case, they can
> be manually copied from the WAL archive to the pg_wal directory, or
> fetched on startup by configuring primary_conninfo or restore_command.

So you seem to need to restore the required WAL files from archive or
the current master.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails

2020-05-12 Thread Kyotaro Horiguchi

At Tue, 12 May 2020 17:29:50 +0200, Laurenz Albe  
wrote in 
> On Tue, 2020-05-12 at 09:40 +, Mariya Rampurawala wrote:
> > > but if the target cluster ran for a long time after the divergence,
> > > the old WAL files might no longer be present. In that case, they can
> > > be manually copied from the WAL archive to the pg_wal directory, or
> > > fetched on startup by configuring primary_conninfo or restore_command.
> > 
> > I hit this issue every time I follow the aforementioned steps, manually as 
> > well as with scripts.
> > How long is "long time after divergence"? Is there a way I can make some
> > configuration changes so that I don’t hit this issue?
> > Is there anything I must change in my restore command?

As mentioned in the documentation, pg_rewind uses the WAL records
startng from the last checkpoint just before the divergence point. The
divergence point is shown as the follows in your log messages.

> pg_rewind: servers diverged at WAL location 6/B9D8 on timeline 53

pg_rewind scans backward starting from that location to find a
checkpoint record, which is the oldest WAL record pg_rewind needs.  As
you see it is not a matter of calculation.  There's no other way than
actually restoring WAL segment files to read and try finding.

> What you can do is to use a higher value for "wal_keep_segments".
> Then PostgreSQL will keep around that number of old WAL segments,
> which increases the chance for "pg_rewind" to succeed.

So this is one effective way to reduce the chance to lose required WAL
(segment) files.

On PG12, an easy way to automatically restore all required WAL files
would be restoring the WAL file every time pg_rewind complains that it
is missing.

Or, you could use pg_waldump to find a checkpoint record.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Hot and PITR backups in same server

2020-05-19 Thread Kyotaro Horiguchi

At Mon, 18 May 2020 11:11:44 +0300, "Andrus"  wrote in 
> Hi!
> 
> >This is confused or at least confusing.
> >- "max_wal_size" of 5TB is clearly insane.

As Laurentz said, even if you set it to 5TB, no WAL files older than
the files needed by the last checkpoint don't remain. If you don't
need a hot-standby, you can use pg_receivewal to save WAL files on the
"standby" server.  If you need the hot-standby, WAL files should be
archived.

> >- I don't understand what you mean by "Start backup server for hot
> >- standby backups".
> >Do I get it right that you want to copy a streaming replication
> >standby server's data
> directory to perform PITR?
> 
> I want to copy only pg_wal directory contents from this.

If so, what you need seems to be pg_receivewal, not a full-fledged
server.

> After pg_basebackup has finished, copy of its data directory is saved
> for possible PITR.
> Its pg_wal contents will be replaced with current pg_wal directory
> contents.
> 
> After that recovery time will set it postgres.conf and separate server
> in 5433 is used to preform PITR using this data directory.
> 
> >That doesn't see like a good plan, because the standby
> >usually won't be much behind the primary server, and you can only
> >recover to a later
> >point in time.
> 
> I will get data from copy taken when hot backup server was created and
> replace its pg_wal directory contents from pg_wal in current backup
> server pg_wal directory.
> Since max pg wal size is big it should contain all WAL segments from
> time where base backup was taken.

As mentioned above, that assumption is wrong.  You need to archive WAL
files out of pg_wal directory, or pg_recievewal.

> >If you care to elaborate, perhaps the question can be answered.
> 
> Currently binary async hot standby backup server is working OK and
> replicates main sever database almost immediately.
> 
> However if important data is deleted in same day, it cannot recovered
> since hot standby server database has replicated same data as in
> master server.
> Master server disk space is limited. It cannot keep large number of
> wal segments.
> Standby server has lot of free disk space.

recovery_min_apply_delay might be useful for you.  With the setting,
the standby follows the primary after that minutes or hours.

https://www.postgresql.org/docs/12/runtime-config-replication.html#GUC-RECOVERY-MIN-APPLY-DELAY

I haven't even tried by myself, but the following steps would perhaps
work.

0. If data is found to be broken on the primary.

1. Stop the standby immediately and take a cold backup including
 pg_wal directory.

2. You may find the just dropped data in the standby.

3. If you need to advance the standby, you can proceed to recover upto
  arbitrary LSN after x min ago using recovery_target_lsn/time.  If
  you go too far, start again from the backup taken in the step 1.
  
https://www.postgresql.org/docs/12/runtime-config-wal.html#RUNTIME-CONFIG-WAL-RECOVERY-TARGET


> I'm looking for a way to collect data for PITR recovery (WAL segments)
> in standby server.
> 
> I have initial base backup of data directory created using
> pg_basebackup.
> All WAL segments from master server after pg_basebackup should saved
> in backup server and should be used for PITR recovery when needed.
> 
> How to use hot standby server for this ?
> 
> Or should separate cluster in 5433 port created and pg_rewind or
> something other used for PITR.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Configuring more than one hot standby server

2020-05-19 Thread Kyotaro Horiguchi

At Tue, 19 May 2020 22:39:20 +0530, Santhosh Kumar  
wrote in 
> Hi Community,
>  I read about "hot stand by" server and as far as the document goes, it
> does not explicitly explain about, whether is it possible to configure more
> than on database as a Hot stand by server. If we can configure more than
> one hot stand by server, which database will take the role of master, in
> case the original master instance crashes? Does a leader election happen?

You can configure two or more standbys for one primary and cofigure
some of or oall of them as synchronous standbys.  Howerver, PostgreSQL
doesn't choose the next-primary automatically.

A replication cluster involving three or more servers can be
configured in a couple of ways.  You may configure them in a cascade
replication chain, or in fanned-out configuration, or mixture of the
two.

If you made a cascaded replication set, you don't have a choice other
than the first standby in the chain.

   before: [primary] - [standby 1] - [standby 2] ...
   after : [old standby 1] - [standby 2] ...

If you made a fanned-out replication set. You can configure some of or
all of them to be synchronous standbys, which is guaranteed to have
the same data for all-commited transactions on the primary.  If you
configured priority-mode synchronous standbys, you can choose the next
primary among the first n active standbys.  If you didn't configured
synchronous standbys, you would need to check up the all standbys and
choose the most-"advanced" one. That is, the next primary is the
standby having the largest receive-LSN.  You need to reconstruct the
replication set anyways.

   before: [primary] -+- [standby 1] LSN = x - 8
  +- [standby 2] LSN = x - 2  <= the next primary
  +- [standby 3] LSN = x - 6

   after : [old standby 2] -+- [standby 1]
+- [standby 3]

That is the same for the case of quorum-mode synchronous standbys
setting.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: How to recover from compressed wal archieve in windows

2020-05-20 Thread Kyotaro Horiguchi

At Wed, 20 May 2020 19:25:50 -0700, Adrian Klaver  
wrote in 
> On 5/20/20 6:27 PM, Michael Paquier wrote:
> > On Wed, May 20, 2020 at 11:36:09AM -0700, Adrian Klaver wrote:
> >> The next problem is that I'm pretty sure a WAL file with *.gz
> >> extension will
> >> not be able to be processed directly by the server. So you are going
> >> to have
> >> to uncompress it at some point before it gets restored.
> > The short answer to that question is no.  The backend does not
> > uncompress the segment file.  What happens is that the restore command
> > copies the file defined by %f to the location of %p where is gets
> > renamed to RECOVERYXLOG, and we expect the restore command to drop a
> > 16MB file in og_wal/.  There is a check on the size, which would fail
> > if the WAL segment is still compressed.  This logic is in
> > RestoreArchivedFile() in xlogarchive.c.
> 
> I figured that would be the case.
> 
> So how is this handled?:
> 
> wal_compression (boolean)
> 
> When this parameter is on, the PostgreSQL server compresses a full
> page image written to WAL when full_page_writes is on or during a base
> backup. A compressed page image will be decompressed during WAL
> replay. The default value is off. Only superusers can change this
> setting.

Difference from decompression by restore_command?

A WAL (segment) file is filled with multiple WAL records. The "full
page image", which is described to be compressed by the parameter, is
a part of WAL record. A WAL file is filled with maybe-compressed WAL
records and has the same size in the case where wal_compression is on.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: How to recover from compressed wal archieve in windows

2020-05-20 Thread Kyotaro Horiguchi

At Thu, 21 May 2020 06:55:41 +0200, Paul Förster  
wrote in 
> Hi Adrian,
> 
> > On 21. May, 2020, at 01:03, Adrian Klaver  wrote:
> > 
> > Yeah, the issue would be getting a version of gunzip that works on Windows. 
> > In the past I have had luck with:
> > 
> > https://www.7-zip.org/
> > 
> > Not sure how well it works with redirects/pipes.
> 
> I'm no windoze guru, so I don't know about that, but you may have a look at:
> 
> http://gnuwin32.sourceforge.net/packages/gzip.htm

Or cygwin or mingw?

http://www.mingw.org/

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: canceling statement due to conflict with recovery after pg_basebackup

2020-06-03 Thread Kyotaro Horiguchi

At Wed, 3 Jun 2020 10:07:14 +0300, "Andrus"  wrote in 
> Hi!

Hi.

> Async binary replication hot standby was started after pg_basebackup.
> Running query in slave throws error 
> 
> ERROR:  canceling statement due to conflict with recovery
> 
> Why ?

As written in the messages.
> 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba DETAIL: User query
> might have needed to see row versions that must be removed.

Despite the master had removed some rows by vacuum and that must be
reflected to your standby, the standby cannot do that because of the
transaction that has started before the rows are vacuumed. More
technically, the transaction on the standby was using a snapshot with
older transaction ids than the vacuum cutoff transaction id on the
master. So the standby needed to cancel the the statement or session
in order to continue replication.

> Query should return table and other sizes in decreasing order.
> How to improve it so that this error does not occur.

Hot-standby-feedback would work.

https://www.postgresql.org/docs/12/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK

> Log:
> 
> 2020-06-03 09:40:52 EEST LOG: database system was interrupted; last
> known up at 2020-06-03 07:59:56 EEST
> 2020-06-03 09:41:10 EEST  LOG:  entering standby mode
> 2020-06-03 09:41:10 EEST  LOG:  redo starts at 2E2/28
> 2020-06-03 09:41:19 EEST LOG: consistent recovery state reached at
> 2E2/B5A56C8
> 2020-06-03 09:41:19 EEST LOG: database system is ready to accept read
> only connections
> 2020-06-03 09:41:19 EEST LOG: started streaming WAL from primary at
> 2E2/C00 on timeline 1
> 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba ERROR: canceling
> statement due to conflict with recovery
> 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba DETAIL: User query
> might have needed to see row versions that must be removed.
> 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba STATEMENT:  select
> company_name(n.nspname)::char(20) as company,
>  relname::char(25),
>pg_size_pretty(pg_total_relation_size(c.oid))::char(10) as totalsize,
> n.nspname::char(12),
>case
>   when c.relkind='i' then 'index'
>when c.relkind='t' then 'toast'
>   when c.relkind='r' then 'table'
>   when c.relkind='v' then 'view'
>   when c.relkind='c' then 'composite type'
>   when c.relkind='S' then 'sequence'
>else c.relkind::text
>  end ::char(14) as "type"
> from
>    pg_class c
>left join pg_namespace n on n.oid = c.relnamespace
>left join pg_tablespace t on t.oid = c.reltablespace
> where
>(pg_total_relation_size(c.oid)>>21)>0 and c.relkind!='t'
> order by
>pg_total_relation_size(c.oid) desc

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Conflict with recovery on PG version 11.6

2020-06-18 Thread Kyotaro Horiguchi

At Thu, 18 Jun 2020 23:29:49 +0300, Toomas Kristin  
wrote in 
> Hi,
> 
> > There can be other reasons:
> > 
> > - replicated ACCESS EXCLUSIVE locks that conflict with queries
> > - replicated ACCESS EXCLUSIVE locks that cause deadlocks
> > - buffer pins that are needed for replication but held by a query
> > - dropped tablespaces that hold temporary files on the standby
> 
> Thank you for ideas what to verify.
> 
> > I told you the remedies above, why don't you like them?
> 
> Basically I want to achieve situation where replication is not suspended  
> (lag is not more than 3 minutes) and statements on standby are not 
> terminated. Based on collected information I don’t see any connection between 
> vacuuming on master and termination of statements on standby. I can 
> temporarily disable vacuuming in order to be 100% sure this is the case. And 
> when I set max_standby_streaming_delay either -1 or as a very big number then 
> it helps avoid query termination but doesn’t help me about suspended 
> replication. All worked with same configuration on Postgres version 10.6, the 
> issue started after version upgrade.
> 
> This is the reason why I am very keen to find out real cause for the conflict.

FWIW in case you haven't tried yet, if you could find a DETAILS: line
following to the ERROR: canceling.." message in server log, it would
narrow the possibility.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: A query in Streaming Replication

2020-06-19 Thread Kyotaro Horiguchi

At Sat, 13 Jun 2020 23:12:25 +0530, Sreerama Manoj 
 wrote in 
> Hello,
> 
> Forgot to add the version details. I'm using postgres 10.7 .
> 
> On Sat, Jun 13, 2020, 20:26 Sreerama Manoj 
> wrote:
> 
> > Hello,
> >   I use streaming replication in async mode. When master gets down,
> > slave will be promoted using a trigger file. During this process ".partial"
> > file will be created and a WAL file with same ID will be created in a
> > different time line in slave. When master comes back as slave, it will be
> > synced to the current master. This is the normal procedure as far as I
> > understood. But in some cases, ".partial" file is not getting created and

It is not guaranteed that a crashed master can be used as a new
standby as-is, since there can be unsent WAL records on the old master
after the LSN where the new master promoted.  If files are assumed to
be sound, pg_rewind will adjust the old master as  a new standby.

The .partial file is created when timeline diverges at midst of a WAL
segment. It is useful when performing PITR to the end of the same
timeline, rather than going into the next timeline.  I don't have an
idea at hand of how this can be relevant to the reusability of the old
master..

> > peer DB which comes back as slave is unable to sync when this happens..
> > Please suggest if this happens in any scenario and how to overcome this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Questions about Logical Replication - Issue ???

2020-07-30 Thread Kyotaro Horiguchi

Hi,

At Thu, 30 Jul 2020 14:54:08 +0100, FOUTE K. Jaurès  
wrote in 
> Hi everyone,
> 
> Situation:
> 
>- A Master Database on the HQ
>- i make a dump of the master database to the Subdivision Server
>- I create à Publication like: CREATE PUBLICATION
>iNOV_MasterData_Table_Pub FOR TABLE M_Product; On the Master Database
>- On the Subdivision Server, I create a Subscription like: CREATE
>SUBSCRIPTION iNOV_MasterData_Table_XXX_Sub CONNECTION ''
>PUBLICATION  iNOV_MasterData_Table_Pub;
>- On the log, I have this error:
>   - 2020-07-30 14:32:59.366 WAT [8022] ERROR:  duplicate key value
>   violates unique constraint "m_product_pkey"
>   2020-07-30 14:32:59.366 WAT [8022] DETAIL:  Key
>   (m_product_id)=(1001426) already exists.
>   2020-07-30 14:32:59.366 WAT [8022] CONTEXT:  COPY m_product, line 1
>   2020-07-30 14:32:59.369 WAT [1536] LOG:  background worker "logical
>   replication worker" (PID 8022) exited with exit code 1
> 
> What can I do to solve this? Is it normal ? It
> BTW: When I create Subscription With the option  (copy_data = false), I am
> able to replicate the new record.

As you know, initial table copy happens defaultly at subscription
creation (the COPY command in the above log lines was doing that). If
you are sure that the publisher table is in-sync with the subscriber
one, you can use copy_data=false safely and it's the proper operation.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: 12.3 replicas falling over during WAL redo

2020-08-02 Thread Kyotaro Horiguchi

At Sat, 1 Aug 2020 09:58:05 -0700, Ben Chobot  wrote in 
> 
> 
> Alvaro Herrera wrote on 8/1/20 9:35 AM:
> > On 2020-Aug-01, Ben Chobot wrote:
> >
> >> We have a few hundred postgres servers in AWS EC2, all of which do
> >> streaming
> >> replication to at least two replicas. As we've transitioned our fleet
> >> to
> >> from 9.5 to 12.3, we've noticed an alarming increase in the frequency
> >> of a
> >> streaming replica dying during replay. Postgres will log something
> >> like:
> >>
> >> |2020-07-31T16:55:22.602488+00:00 hostA postgres[31875]: [19137-1]
> >> |db=,user=
> >> LOG: restartpoint starting: time 2020-07-31T16:55:24.637150+00:00
> >> hostA
> >> postgres[24076]: [15754-1] db=,user= FATAL: incorrect index offsets
> >> supplied
> >> 2020-07-31T16:55:24.637261+00:00 hostA postgres[24076]: [15754-2]
> >> db=,user=
> >> CONTEXT: WAL redo at BCC/CB7AF8B0 for Btree/VACUUM: lastBlockVacuumed
> >> 1720
> >> 2020-07-31T16:55:24.642877+00:00 hostA postgres[24074]: [8-1]
> >> db=,user= LOG:
> >> startup process (PID 24076) exited with exit code 1|
> > I've never seen this one.
> >
> > Can you find out what the index is being modified by those LSNs -- is
> > it
> > always the same index?  Can you have a look at nearby WAL records that
> > touch the same page of the same index in each case?
> >
> > One possibility is that the storage forgot a previous write.
> 
> I'd be happy to, if you tell me how. :)
> 
> We're using xfs for our postgres filesystem, on ubuntu bionic. Of
> course it's always possible there's something wrong in the filesystem
> or the EBS layer, but that is one thing we have not changed in the
> migration from 9.5 to 12.3.

All of the cited log lines seem suggesting relation with deleted btree
page items. As a possibility I can guess, that can happen if the pages
were flushed out during a vacuum after the last checkpoint and
full-page-writes didn't restored the page to the state before the
index-item deletion happened(that is, if full_page_writes were set to
off.). (If it found to be the cause, I'm not sure why that didn't
happen on 9.5.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: 12.3 replicas falling over during WAL redo

2020-08-04 Thread Kyotaro Horiguchi

At Tue, 4 Aug 2020 09:53:36 -0400, Alvaro Herrera  
wrote in 
> On 2020-Aug-03, Alvaro Herrera wrote:
> 
> > >  lsn  | checksum | flags | lower | upper | special | pagesize |
> > > version | prune_xid
> > > --+--+---+---+---+-+--+-+---
> > >  A0A/99BA11F8 | -215 | 0 |   180 |  7240 |    8176 | 8192
> > > |   4 | 0
> > > 
> > > As I understand what we're looking at, this means the WAL stream was
> > > assuming this page was last touched by A0A/AB2C43D0, but the page itself
> > > thinks it was last touched by A0A/99BA11F8, which means at least one write
> > > to the page is missing?
> > 
> > Yeah, that's exactly what we're seeing.  Somehow an older page version
> > was resurrected.  Of course, this should never happen.
> 
> ... although, the block should have been in shared buffers, and it is
> there that the previous WAL record would have updated -- not necessarily
> flushed to disk.

Yeah. On the other hand, the WAL records shown upthread desn't have a FPW.

> rmgr: Btree   len (rec/tot): 72/72, tx:   76393394, lsn:
> A0A/AB2C43D0, prev A0A/AB2C4378, desc: INSERT_LEAF off 41, blkref #0: rel
> 16605/16613/60529051 blk 6501

> rmgr: Btree   len (rec/tot): 72/72, tx:   76396065, lsn:
> A0A/AC4204A0, prev A0A/AC420450, desc: INSERT_LEAF off 48, blkref #0: rel
> 16605/16613/60529051 blk 6501

There must be a record for the page 6501 conveying FPW after the last
checkpoint.  If it is not found, something wrong on deciding whether
to attach FPW.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: How bad is using queries with thousands of values for operators IN or ANY?

2020-09-01 Thread Kyotaro Horiguchi

At Mon, 31 Aug 2020 16:04:43 +0200, Pavel Stehule  
wrote in 
> po 31. 8. 2020 v 13:29 odesílatel Thomas Kellerer  napsal:
> 
> > Thorsten Schöning schrieb am 31.08.2020 um 12:37:
> > > So for what query size or number of IDs to compare in IN would you
> > > consider a different approach at all?
> >
> >
> > In my experience "hundreds" of IDs tend to be quite slow if used with an
> > IN clause.
> >
> > Rewriting the IN to a JOIN against a VALUES clause is very often faster:
> >
> > So instead of:
> >
> >   select *
> >   from t
> >   where id in (1,2,3,  ,500);
> >
> > using this:
> >
> >   select *
> >   from t
> > join (
> >values (1),(2),(3),...(500)
> > ) as x(id) on x.id = t.id
> >
> > produces more often than not a more efficient execution plan (assuming no
> > values are duplicated in the IN list)
> >
> > Obviously I don't know if such a re-write is even feasible though.
> >
> 
> yes - this query probably will have a slow start, but the execution will be
> fast. Unfortunately, there are not available statistics.

 FWIW, the attached is the dusted-off version of a part of a stalled
 development of mine, which unconditionally(!) creates on-the-fly
 statistics on VALUES list. It seems to work for certain cases,
 although the planning time increases significantly.
 
=$ CREATE TABLE t1 AS SELECT a, a * 2 AS b FROM generate_series(0, 9) a;
=$ CREATE INDEX ON t1 (a);
> perl q.pl(*) | psql

*: q.pl:
> print "explain analyze select b from t1 join (values ";
> foreach $i (0..1) {
>   print ", " if ($i > 0);
>   printf("(%d)", $i/10 + 1000);
> }
> print ") as v(v) on (v.v = t1.a);";


patched:

 Merge Join  (cost=824.25..1005.19 rows=10001 width=4) (actual 
time=13.513..24.285 rows=10001 loops=1)
   Merge Cond: (t1.a = "*VALUES*".column1)
   ->  Index Scan using t1_a_idx on t1  (cost=0.29..3050.29 rows=10 
width=8) (actual time=0.033..1.629 rows=2002 loops=1)
   ->  Sort  (cost=789.47..814.47 rows=10001 width=4) (actual 
time=12.557..14.546 rows=10001 loops=1)
 Sort Key: "*VALUES*".column1
 Sort Method: quicksort  Memory: 931kB
 ->  Values Scan on "*VALUES*"  (cost=0.00..125.01 rows=10001 width=4) 
(actual time=0.002..8.271 rows=10001 loops=1)
 Planning Time: 17.290 ms
 Execution Time: 26.344 ms
(9 rows)

master:
 Hash Join  (cost=250.03..2168.03 rows=10001 width=4) (actual 
time=14.482..77.205 rows=10001 loops=1)
   Hash Cond: (t1.a = "*VALUES*".column1)
   ->  Seq Scan on t1  (cost=0.00..1443.00 rows=10 width=8) (actual 
time=0.017..23.540 rows=10 loops=1)
   ->  Hash  (cost=125.01..125.01 rows=10001 width=4) (actual 
time=13.786..13.788 rows=10001 loops=1)
 Buckets: 16384  Batches: 1  Memory Usage: 480kB
     ->  Values Scan on "*VALUES*"  (cost=0.00..125.01 rows=10001 width=4) 
(actual time=0.002..8.503 rows=10001 loops=1)
 Planning Time: 12.365 ms
 Execution Time: 78.567 ms
(8 rows)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From c6680488178bb7eeae66c92f46d475ac69102481 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 6 Jul 2017 14:46:49 +0900
Subject: [PATCH] Temporary statistics on VALUES list

When using "IN (VALUES" instead of "IN (list.." to obtain better
plans, sometimes the final estimate goes bad because of the lack of
that for VALUES. This patch allowes VALUES to have statistics and
improves the estimate in the case.
---
 src/backend/commands/analyze.c   | 289 +--
 src/backend/utils/adt/selfuncs.c |  59 ++-
 src/include/commands/vacuum.h|   2 +
 src/include/nodes/parsenodes.h   |   2 +
 4 files changed, 260 insertions(+), 92 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8af12b5c6b..aba06415c8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -102,6 +102,9 @@ static int	compare_rows(const void *a, const void *b);
 static int	acquire_inherited_sample_rows(Relation onerel, int elevel,
 		  HeapTuple *rows, int targrows,
 		  double *totalrows, double *totaldeadrows);
+static HeapTuple construct_stats_tuple(VacAttrStats *stats,
+	   Oid attrrelid, bool inh,
+	   Relation statsrel, HeapTuple oldtup);
 static void update_attstats(Oid relid, bool inh,
 			int natts, VacAttrStats **vacattrstats);
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
@@ -278,6 +281,86 @@ analyze_rel(Oid relid, RangeVar *relation,
 	pgstat_progress_end_command();
 }
 
+/*
+ * analyze_values() -- analyze one attribute in a values list then put the
+ * result into va

Re: How bad is using queries with thousands of values for operators IN or ANY?

2020-09-01 Thread Kyotaro Horiguchi

At Tue, 1 Sep 2020 18:46:01 +0200, Pavel Stehule  
wrote in 
> nice :)

thanks ;p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Date Format 9999-12-31-00.00.00.000000

2020-10-15 Thread Kyotaro Horiguchi

At Thu, 15 Oct 2020 17:59:39 -0700, Adrian Klaver  
wrote in 
> On 10/15/20 1:58 PM, Dirk Krautschick wrote:
> > Hi,
> > because of a migration from DB2 we have a lot of timestamps like
> > -12-31-00.00.00.00
> 
> I'm assuming these got stored in a varchar field?

It seems like an (old-style?) ISO string representation of a date-time
, and seems like DB2's default output format of a timestamp.

The default parser of PostgreSQL doesn't handle that format. That
doesn't identify '-' as a separtor between date and time nor a dot as
a time field separator.

> > What would be the best way to handle this in Postgres also related
> > to overhead and performance (index usage?).
> > Or is
> > TO_TIMESTAMP('-12-31-00.00.00.00', '-MM-DD-HH24.MI.SS.US')
> > the only way? And isn't it possible to define this like NLS parameters
> > in Oracle
> > system wide?

I'm not sure what you're going to do exactly, but timestamp is
generally more efficient than string in regard to both space and speed
and I think that the above conversion is the only way of that
conversion. PostgreSQL doesn't have a feature like the NLS parameters.

If the database won't handle timestamps in that format in the future,
converting that column into timestamptz (perhaps it is currently in
text or varchar) works.

ALTER TABLE  ALTER COLUMN  TYPE timestamp USING 
TO_TIMESTMAP(, 'format');

Otherwise that format is needed to be translated on-the-fly. I'm not
sure which is efficient in that case.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Problem with ssl and psql in Postgresql 13

2020-12-17 Thread Kyotaro Horiguchi

Hi.

At Thu, 17 Dec 2020 07:21:08 +, Gustavsson Mikael 
 wrote in 
> But not from psql 13:
> $ /usr/pgsql-13/bin/psql -d postgres -Ukalle -hserver -W
> Password:
> psql: error: FATAL:  no pg_hba.conf entry for host "nnn.nn.n.nnn", user 
> "kalle", database "postgres", SSL off
> FATAL:  no pg_hba.conf entry for host "nnn.nn.n.nnn", user "kalle", database 
> "postgres", SSL off

It looks like the option "ssl" (in postgresql.conf) is turned off on
the v13 server.  You will find lines like the following in the server
log in that case.

LOG:  hostssl record cannot match because SSL is disabled
HINT:  Set ssl = on in postgresql.conf.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Problem with ssl and psql in Postgresql 13

2020-12-23 Thread Kyotaro Horiguchi

At Wed, 23 Dec 2020 17:34:05 -0500, Tom Lane  wrote in 
> Oh ... wait a second.  I've just managed to partially reproduce this
> report.  Assume that we have a working Kerberos environment on both

If I kninit'ed and pg_hba.conf doesn't have a hostgssenc line and has
a hostssl line, I got the following error from psql.

$ psql -d postgres --set=sslmode=require -Uhoriguti -hmyhost.mydomain
psql: error: FATAL:  no pg_hba.conf entry for host "192.168.56.101", user 
"horiguti", database "postgres", SSL off
FATAL:  no pg_hba.conf entry for host "192.168.56.101", user "horiguti", 
database "postgres", SSL off

The state machine enables gssenc if a valid credential is available
regardless of whether a valid gsshostenc definition is available or not.

I rembmer that I am annoyed a bit by this behavior, but I didn't care
of that since I thought that SSL and GSS are not used togehter in an
environemt.

> ends.  Then libpq will establish a connection with GSS encryption
> and continue on with the usual PG authentication exchange.  If that
> part fails, then this logic (fe_connect.c around line 3350):
> 
> /*
>  * If gssencmode is "prefer" and we're using GSSAPI, retry
>  * without it.
>  */
> if (conn->gssenc && conn->gssencmode[0] == 'p')
> {
> /* postmaster expects us to drop the connection */
> conn->try_gss = false;
> pqDropConnection(conn, true);
> conn->status = CONNECTION_NEEDED;
> goto keep_going;
> }
>
> drops the connection and tries again without GSS.  But *we already
> cleared allow_ssl_try*, in pqsecure_open_gss, so the next try will
> not bother with SSL either.
> So this directly explains the original report: if the only applicable
> pg_hba.conf entry is "hostssl", right here is where libpq will see
> that error, and it will go around and fail again because the next
> try uses neither GSS nor SSL.
>
> However, in the tests Mikael ran after backing that pg_hba.conf entry
> off to just "host", pg_hba wouldn't have caused an authentication-stage
> failure, so it's not so clear why we'd have looped back at this step.

In the hostssl case, there are two errors emitted on the server side:

[24494] FATAL:  no pg_hba.conf entry for host "192.168.56.101", user 
"horiguti", database "postgres", SSL off
[24495] FATAL:  no pg_hba.conf entry for host "192.168.56.101", user 
"horiguti", database "postgres", SSL off

However, the first line is bogus. It should be like the following.

FATAL:  no pg_hba.conf entry for host "192.168.56.101", user "horiguti", 
database "postgres", GSSENC on
FATAL:  no pg_hba.conf entry for host "192.168.56.101", user "horiguti", 
database "postgres", SSL off

The attached the first patch does that.
The client picks up only the second error to show from psql.

If I changed the hostssl to just host, host line matches
GSSENC-connection and authentication succeeds without an error.

> We'd need to explain the later cases to have a full theory.  It's
> plenty plausible that something else caused an auth-stage failure,
> but if so, why would the second GSS-free try work?

Does my expample above works?

As the result, psql loses the cause of the first failure and shows the
bogus message. On the other hand the server doesn't know the second
try is the result of the failure of the first try. If we want to show
meaningful messages to users, psql should show both the two messages,
or need to modify the protocol to pass around the information on the
failure of the previous try.

> In any case, that allow_ssl_try reset in pqsecure_open_gss is definitely
> even more broken than I thought before.  I think we need to rejigger the
> state machine so that it doesn't do that at all, but instead skips SSL
> establishment when GSS is already active via an explicit test, rather
> than munging the state variables.

It could make things simpler, but I'm not sure though. Anyway the
states are needed since PQconnectPoll(the state machine) needs to once
exit for negotiation and each of the two paths needs a re-entry point.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 3d80930968..204630e383 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -511,6 +511,26 @@ ClientAuthentication(Port *port)
 }
 else
 {
+#ifdef ENABLE_GS

Re: Problem with ssl and psql in Postgresql 13

2020-12-24 Thread Kyotaro Horiguchi

At Thu, 24 Dec 2020 11:54:32 -0500, Tom Lane  wrote in 
> I wrote:
> > Kyotaro Horiguchi  writes:
> >> The attached the first patch does that.
> 
> > +1, it seems like a clear oversight that the GSSENC patches didn't adjust
> > these messages.  The reason SSL state is mentioned is that it's relevant
> > to which pg_hba entry gets chosen; and once we invented "hostgssenc"
> > entries, GSSENC state is also relevant.
> 
> Thinking a little more about that: there are not four distinct states,
> because GSS and SSL can't both be enabled (ProcessStartupPacket enforces
> that).  So I propose that instead of adding a new field, we make the
> existing field say one of three things: "GSS encryption", "SSL
> encryption", or "no encryption".  As per attached.  In the back branches,
> it might be best to spell these as "GSS encryption", "SSL on", and "SSL
> off", just to minimize the cosmetic change.

Looks good to me.

I tried the same case where

- did kinit
- pg_hba has hostssl line only

I saw the following lines in server log, which seems good.

FATAL:  no pg_hba.conf entry for host "192.168.56.101", user 
"horig...@mydomain.com", database "postgres", GSS encryption
FATAL:  no pg_hba.conf entry for host "192.168.56.101", user 
"horig...@mydomain.com", database "postgres", no encryption

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Right way to restore logical replication

2021-02-08 Thread Kyotaro Horiguchi

At Mon, 08 Feb 2021 22:42:21 +0700, Игорь Выскорко  
wrote in 
> Hi, community!
> Unfortunately can't find answer in docs and google. Hope only for you)
> [local]:5433 postgres@postgres=# drop publication pub ;
> DROP PUBLICATION
> Time: 3,793 ms
> 
> [local]:5433 postgres@postgres=# insert into tbl(d) values ('test2');
> INSERT 0 1
> Time: 9,002 ms
> 
> [local]:5433 postgres@postgres=# create publication pub for table tbl;
> CREATE PUBLICATION
> Time: 6,646 ms
> 
> result: nothing changed, same errors appears again and again. I couldn't find
> how to restore replication without drop&create subscription again.

If you recreated the publication before the insert, replication would
continue.

> Questions here:
> 1. what is going under the hood here - why walsender thinks that "publication
> "pub" does not exist" when it actually exists?

The answer is "because the publication did not exist at the time of
the INSERT".  Thus the insert cannot be replicated using the new
publication.

It is because logical replication tries to find publications using the
same snapshot with the WAL record to be sent. Although it is the
designed behavior, I'm not sure that is true also for pg_publication.

> 2. what is the right way to restore replication in my example?

The most conservative way is just to drop the subscription then delete
all rows from the subscriber table then recreate the
subscription. This allows the newly created publication to work.

Also you can drop the subscription, then manually fix the subscriber
table to sync with the publisher table, then create a new subscription
using WITH (copy_data = false);

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Postgres Wal Full

2022-03-03 Thread Kyotaro Horiguchi

At Thu, 3 Mar 2022 07:46:45 +, pgdba pgdba  
wrote in 
> I have a problem from pg_wal. I am using postgresql version 11 and taking 
> backup and writing archive_command in postgresql.conf but did not archive wal 
> and my disk ise full from pg_wal. I research why is my wal is full and dont 
> found anything could  you help me please ?

The immediate cause of the WAL retension is the archive failure.
PostgreSQL, while archiving is enabled, won't remove a WAL file until
it is successfully archived.

You will see the lines like the following in sergver log if archive
failure happened.

> 2022-03-03 17:15:21.314 JST LOG:  archive command failed with exit code 1

If the archive command emitted some error messages, it can be caught
in the server log together.  The follwoing error is emitted by the
archive_command "cp %p /archive/%f".

> cp: cannot create regular file '/archive/00010001': No such 
> file or directory

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PSQL bug?

2022-03-17 Thread Kyotaro Horiguchi

At Thu, 17 Mar 2022 11:38:30 +0100, Dominique Devienne  
wrote in 
> Made a typo, using } instead of ) to test a weird table name, and got
> disconnected. Normal? --DD
> 
> ddevienne=> create table "t |" ( id int};
> ddevienne(> );
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.

The last line "Attempting reset: Succeeded." means that the connection
is once disconnected before the second input line then reconnected by
the input.

Specifically I can reproduce the same result by the following steps.

> ddevienne=> create table "t |" ( id int};
$ pg_ctl restart
> ddevienne(> );
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_create_logical_replication_slot in DB1 is blocked by a session in DB2

2022-05-26 Thread Kyotaro Horiguchi

At Wed, 25 May 2022 11:01:43 -0400, Fred Habash  wrote in 
> I'm running this command while connected to pg cluster DB1:
> 
> SELECT * FROM pg_create_logical_replication_slot('test_slot_99',
> 'test_decoding');
> 
> When I examine pg_locks, I see the session is waiting on virtualxid and
> blocked and blocking sessions are on two different DBs.
> 
> After doing some research, it looks like locks across DB can happen in
> postgres if the session queries rely on 'shared tables'. Not sure if this
> applies here.
> 
> How can this be explained?

The "blocked_pid" you showed is of PID=14305 but the reportedly
"blocked" session is of PID=13405. So the 8602 doesn't seem to be
involved the "trouble".  You might need to reinspect the situation.


> This is the session issuing the create slot command
> 
> datid|datname  |pid  |leader_pid|usesysid|usename
> |application_name|client_addr |client_hostname|client_port|backend_start
>   |xact_start |query_start|state_change
>  |wait_event_type|wait_event|state |backend_xid|backend_xmin|query
> 
> 
> 
> |backend_type  |
> -|-|-|--||-|||---|---|---|---|---|---|---|--|--|---||---|--|
> 16408|db1  |13405|  |   16394|test99   |
> |xx.xxx.xxx.x|   |  53398|2022-05-25 09:12:41|2022-05-25
> 09:12:42|2022-05-25 09:12:42|2022-05-25 09:12:42|Lock
>  |virtualxid|active|   |171577399   |BEGIN;declare
> "SQL_CUR0x14680c0bace0" cursor with hold for SELECT lsn FROM
> pg_create_logical_replication_slot('qitx6iolfhy5zfkl_00016408_66eb6ba3_1fe1_4ccd_95ed_fd3d2d5d4ad8',
> 'test_decoding');fetch 1 in "SQL_CUR0x14680c0bace0"|client backend|
> 
> 
> Session above is blocked by pid 8602
> 
> blocked_pid|blocked_user|blocking_pid|blocking_user  |blocked_statement
> 
> 
> 
> |current_statement_in_blocking_process
> 
> 
> |
> ---|||---|---|---|
>   14305|pq_devops   |8602|service_con|BEGIN;declare
> "SQL_CUR0x1464680d6a60" cursor with hold for SELECT lsn FROM
> pg_create_logical_replication_slot('4iipu5a2hnuyfp3u_00016408_036cac77_3854_4320_b329_e7209b4cccf9',
> 'test_decoding');fetch 1 in "SQL_CUR0x1464680d6a60"|¶  SELECT **
>   |
> 
> 
> The blocked and blocking sessions are on two different DBs
> 
> datid|datname  |pid  |leader_pid|usesysid|usename|application_name
> |cl
> -----|-----|-|--||---|--|--
> 16408|db1  |13405|  |   16394|test99 |
> |10
> 16407|db2  | 8602|  |29429933|service_con_9  |PostgreSQL JDBC
> Driver|10
> 
> Thank you

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: SQL state: 42601. Execption handling.

2022-06-07 Thread Kyotaro Horiguchi

At Tue, 07 Jun 2022 10:36:52 -0400, Tom Lane  wrote in 
> =?UTF-8?B?0KY=?=  writes:
> > -- 2 -- 
> > do  $$ 
> > begin  
> >select  p.result  from  dbo.func() p  into  v_result; 
> > exception   when  SQLSTATE  '42601'   then
> >  raise   '42601' ;
> >when   others   then  
> >  raise   'others' ; 
> > end ;
> > $$
> >  
> > --Exception is not handled. This is the question.
> > ERROR:   "v_result"   is   not  a known variable
> > LINE  3 :select  p.result  from  dbo.func() p  into  v_result;
> 
> This error is thrown by the plpgsql parser, so you can't trap it
> with "exception", any more than you could trap other syntax errors
> detected by the parser.  (As an extreme example, you could certainly
> not trap it if you misspelled "exception".)

FWIW, you can see the difference as the following difference.

=# CREATE OR REPLACE FUNCTION f1() RETURNS void AS  $$ 
declare
v_result  int; 
begin  
   select  p.result  from  dbo.func(d) p  into  v_result; 
end ;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
(Succeeds. That is, plpgsql parser doesn't catch it.)
=# select f1();
ERROR:  column "d" does not exist
(Caught by SQL parser executed at runtime)

=# CREATE OR REPLACE FUNCTION f2() RETURNS void AS  $$ 
begin  
   select  p.result  from  dbo.func() p  into  v_result; 
end ;
$$ LANGUAGE plpgsql;
ERROR:  "v_result" is not a known variable
LINE 3:select  p.result  from  dbo.func() p  into  v_result; 
(Fails, as plpgsql parser caught it.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: FATAL: could not receive timeline history file from the primary server: ERROR: could not open file "pg_wal/0000000x.history": No such file or directory

2022-06-08 Thread Kyotaro Horiguchi

At Wed, 8 Jun 2022 16:43:55 +, pgdba pgdba  
wrote in 
> hello everyone,
> 
> I want to install a two-node streaming replication with postgresql version 
> 11, but I'm getting this error when starting the service of my slave node. 
> Can you support me? I couldn't find the solution.
> 
> FATAL:  could not receive timeline history file from the primary server: 
> ERROR:  could not open file "pg_wal/0002.history": No such file or 
> directory

It seems like that the stanby is on TLI=1 and the primary is on 2 or
greater (call it n).  In that case, at replication start, the standby
requests to the primary every history (000n.history) files from
TLI=2 to TLI=n if the standby doesn't have.

Thus, the message means 002.history has been somehow removed in
pg_wal of the primary, but I'm not sure how come it happened.

If you did remove some history files on primary, do not remove them
for between the TLIs of primary and standby.

> slave node has recovery.conf file and include primary_con_info informations.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: message log merge (streaming replication)

2022-06-09 Thread Kyotaro Horiguchi

At Thu, 9 Jun 2022 16:26:24 +0900, Ian Lawrence Barwick  
wrote in 
> 2022年6月9日(木) 14:32 Peter Adlersburg :
> >
> > Dear fellow DBA's,
> >
> >
> > While troubleshooting one of our production replication clusters (phys. 
> > streaming replication using the patroni framework)
> > I stumbled over a - at least for me - strange phenomenon in the postgres 
> > logs of the two cluster members:
> >
> > *** node-01 ***
> >
> > [postgres@db-node-01 main]$ grep 'LOG:  database' postgresql-2022-06-05.log
> > time=2022-06-05 18:25:26 CEST, pid=1720 LOG:  database system is shut down
> > time=2022-06-05 18:25:29 CEST, pid=3252374 LOG:  database system was shut 
> > down at 2022-06-05 18:25:23 CEST
> > time=2022-06-05 18:25:31 CEST, pid=3252371 LOG:  database system is ready 
> > to accept read only connections
> > time=2022-06-05 18:29:11 CEST, pid=3252371 LOG:  database system is ready 
> > to accept connections
> > time=2022-06-05 18:32:01 CEST, pid=1816 LOG:  database system was 
> > interrupted while in recovery at log time 2022-06-05 18:29:11 CEST
> > time=2022-06-05 18:32:03 CEST, pid=1813 LOG:  database system is ready to 
> > accept read only connections
> > time=2022-06-05 19:00:26 CEST, pid=1813 LOG:  database system is ready to 
> > accept connections
> >
> > *** node-02 ***
> >
> > [postgres@db-node-02 main]$ grep 'LOG:  database' postgresql-2022-06-05.log
> > time=2022-06-05 18:25:26 CEST, pid=1720 LOG:  database system is shut down
> > time=2022-06-05 18:25:29 CEST, pid=3252374 LOG:  database system was shut 
> > down at 2022-06-05 18:25:23 CEST
> > time=2022-06-05 18:25:31 CEST, pid=3252371 LOG:  database system is ready 
> > to accept read only connections
> > time=2022-06-05 18:29:11 CEST, pid=3252371 LOG:  database system is ready 
> > to accept connections
> > time=2022-06-05 18:32:01 CEST, pid=1816 LOG:  database system was 
> > interrupted while in recovery at log time 2022-06-05 18:29:11 CEST
> > time=2022-06-05 18:32:03 CEST, pid=1813 LOG:  database system is ready to 
> > accept read only connections
> > time=2022-06-05 19:00:26 CEST, pid=1813 LOG:  database system is ready to 
> > accept connections
> >
> > The output is by no means complete - I only kept the duplicate entries.
> >
> > My question:
> >
> > How is it possible that the error logs are 'merged' across the two database 
> > nodes?
> 
> That's impossible to determine on the basis of the available information.
> 
> How is logging set up? What kind of environments are the nodes running in?
> Is it possible they are able to write to a shared disk of some sort?
> 
> > Are the message/error-logs also replicated?
> 
> No.
> 
> > Is this the intended behaviour?
> 
> It's not typical behaviour, but I suppose it's conceivable someone designed
> such a setup for some reason.

If the node-02 is a replica of the node-01 or vise-versa, and the log
file is in $PGDATA, it's possible that pg_basebackup (or just cp)
copies in the server log file to the replica.  In that case, the first
half of the log file of the day of taking the backup is taken over
from the primary and the last half differs.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Any way to understand state after data corruption failures during startup.

2022-06-16 Thread Kyotaro Horiguchi

At Thu, 16 Jun 2022 19:46:25 -0700, Harinath Kanchu  wrote 
in 
> 
> Hello,
> 
> If postgres fail when it is starting as standby due to WAL file corruption, 
> can we get the status through any exit code or by parsing pg_control file ?
> 
> I can see the  “XX001” code is emitted in logs but parsing logs doesn’t seem 
> to be a good option. Please suggest.

I'm not sure exactly what is what you want to achieve here, but there
is a negative evidence for the reliability of detecting WAL corruption
failure without searching server logs.

[64923:startup] [XX000] 2022-06-17 13:30:52.493 JST PANIC:  incorrect resource 
manager data checksum in record at 0/360

At least in this case, there's no trace of a server-stop due to WAL
corruption left behind other than in server log.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: help for pg_wal issue

2022-06-28 Thread Kyotaro Horiguchi

At Mon, 27 Jun 2022 12:28:18 +0200, Laurenz Albe  
wrote in 
> You forgot to tell us how exactly you are performing that backup.

Yeah, but Google told me that Veritas may complain with that
message. Anywhat the tools is, it seems like that the tool could not
continue after it found that a file had been deleted before copying
it. That means the server is live.  If you take a copy of files from a
live server, the backup tool should allow some files are removed or
added while copying.

If you are taking a base backup (aka online backup), you need to take
several subsidiary steps.

https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP

If you still see the point, feel free to ask further here but with
showing exactly how you are peforming that backup:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: help for pg_wal issue

2022-06-28 Thread Kyotaro Horiguchi

At Tue, 28 Jun 2022 16:28:31 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Mon, 27 Jun 2022 12:28:18 +0200, Laurenz Albe  
> wrote in 
> > You forgot to tell us how exactly you are performing that backup.
> 
> Yeah, but Google told me that Veritas may complain with that
> message. Anywhat the tools is, it seems like that the tool could not
> continue after it found that a file had been deleted before copying
> it. That means the server is live.  If you take a copy of files from a
> live server, the backup tool should allow some files are removed or
> added while copying.
> 
> If you are taking a base backup (aka online backup), you need to take
> several subsidiary steps.
> 
> https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
> 

- If you still see the point, feel free to ask further here but with

Mmm.. My fingers are fat and eyes are slipperly.

+ If you don't still see the point, feel free to ask further here but with

> showing exactly how you are peforming that backup:p
> 
> regards.
> 
> -- 
> Kyotaro Horiguchi
> NTT Open Source Software Center

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: General Inquiry

2022-07-05 Thread Kyotaro Horiguchi

At Tue, 5 Jul 2022 08:49:40 +, "Cloete, F. (Francois)" 
 wrote in 
> Good morning,
> Can you please confirm if I need to contact the postgresql community fro some 
> assistance can I use this e-mail address or another e-mail ?

pgsql-general is the right place for that. Welcome.

> We keep on seeing the below entries in our postgresql.log file getting 
> written what seems to be every millisecond.

> UTF [32640]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=[local] L

Unfortunately the screen capture is missing the most significant
part. But I guess that they are "LOG: connection received".

If this is correct, log_connection is set to on on your server, which
makes server record every incoming connection like this.  The frequent
log lines mean that someone on the local host is frequently (trying)
making connections to the server.

If they are unexpected incoming connections, one possibility is that
something like health-check or keep alive thing is having a wrong
interval setting.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Unable to start replica after failover

2022-08-04 Thread Kyotaro Horiguchi

At Fri, 29 Jul 2022 15:01:44 +, "Lahnov, Igor"  
wrote in 
> * "could not find previous WAL record at E6F/C2436F50: invalid 
> resource manager ID 139 at E6F/C2436F50"; or
> * "could not find previous WAL record at 54E/FB348118: unexpected 
> pageaddr 54E/7B34A000 in log segment 0005054E00FB, offset 
> 3448832".
..
> To avoid the problem, we decided to stop using restore_command. Could you 
> please let us know if there is a better solution to the problem we've 
> described?

Perhaps almost none of us don't see what is happning there, since you didn't 
give us sufficient information on the configuration and exact steps.

But roughly it looks like shuffling/mixing of WAL files among several
systems (or WAL archives) with different histories.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Unable to start replica after failover

2022-08-22 Thread Kyotaro Horiguchi

> What additional information is needed?

Usually server logs and the output of pg_rewind at the trouble time
are needed as the first step.

> Next, pg_rewind returns errors while reading the log from the backup
> back, looking for the last checkpoint, which is quite reasonable
> because, once a new leader starts, the point of divergence normally
> ends up in the next timeline and the previous timeline's backup log
> does not have a block with the LSN of the divergence.

That sounds like pg_rewind is a crap.

pg_rewind reads timeline history files from the both servers to find
the last timeline up to where the two servers share the same history
then determine the divergence point at the latest LSN where the two
servers are known to share. Then it overwrites the pages modified
since the common checkpoint until the last (shutdown) checkpoint on
the previous leader that are modified in the *previous* timeline on the
former leader by the data of the same pages *on the new leader*. No
need for page data from the older timeline. If nothing's going wrong,
pg_rewind is not expected to face the situation of:

> could not find previous WAL record at E6F/C2436F50: invalid resource manager 
> ID 139 at E6F/C2436F50
> could not find previous WAL record at 54E/FB348118: unexpected pageaddr 
> 54E/7B34A000 in log segment 0005054E00FB, 

Which means the WAL files are somehow broken.

> When pg_rewind is run, it also uses the log from the backup (the
> lagging log from the new leader) instead of the partial log with
> which the former leader has already been started.

I don't see how come the former leader doesn't have access to the
partial log (or the latest WAL file, I suppose)? It is essential for
pg_rewind to work (since it exists nowhere other than there) and it
must be in pg_wal directory unless someone removed it.

Thus, I think we need the exact steps you and your system took after
the failover happened about postgresql.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: recovery_command has precedence over phisical slots?

2022-08-23 Thread Kyotaro Horiguchi

At Fri, 19 Aug 2022 18:37:53 +0200, Laurenz Albe  
wrote in 
> On Fri, 2022-08-19 at 16:54 +0200, Giovanni Biscontini wrote:
> > Hello everyone, 
> > I'm experiencing a behaviour I don't really understand if is a 
> > misconfiguration or a wanted behaviour:
> > 1) I set up a primary server (a.k.a. db1) with and archive_command to a 
> > storage
> > 2) I set up a replica (a.k.a. db2) that created a slot named as slot_2 and 
> > that has the recovery_command set to read archived wal on the storage.
> > If I shutdown replica db2 during a pgbench I see the safe_wal_size queried 
> > from pg_replication_slots on the primary decrease to a certain amount but 
> > still in the max_slot_wal_kepp_size window: even
> > if I restart the replica db2 before the slot_state changes to unreserved or 
> > lost I see that the replica gets needed wals from the storage using 
> > recovery_command but doesn't use slot on primary.
> > Only if I comment the recovery command on the .conf of the replica then it 
> > uses slot.
> > If this is a wanted behaviour I can't understand the need of slots on 
> > primary.
> 
> This is normal behavior and is no problem.
> 
> After the standby has caught up using "restore_command", it will connection to
> the primary as defined in "primary_conninfo" and stream WAL from there.

The reason that db2 ran recovery beyond the slot LSN is the db2's
restore_command (I guess) points to db1's archive.  If db2 had its own
archive directory or no archive (that is, restore_command is empty),
archive recovery stops at (approximately) the slot LSN and replication
will start from there (from the beginning of the segment, to be
exact).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Unable to archive logs in standby server

2022-09-04 Thread Kyotaro Horiguchi

At Tue, 30 Aug 2022 05:22:56 +, Meera Nair  wrote in 
> Hi Guillaume/team,
> 
> I set archive_mode = always in master and standby.
> Archival to standby WAL directory completed when
> 
>   *   standby server was restarted
>   *   pg_stop_backup was executed in master
> 
> But archival hangs when pg_stop_backup is executed in standby.
> Could someone help to get this working?

https://www.postgresql.org/docs/12/continuous-archiving.html

> In the same connection as before, issue the command:
> 
> SELECT * FROM pg_stop_backup(false, true);
> 
> This terminates backup mode. On a primary, it also performs an
> automatic switch to the next WAL segment. On a standby, it is not
> possible to automatically switch WAL segments, so you may wish to run
> pg_switch_wal on the primary to perform a manual switch. The reason
> for the switch is to arrange for the last WAL segment file written
> during the backup interval to be ready to archive.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Streaming wal from primiry terminating

2022-09-28 Thread Kyotaro Horiguchi

At Wed, 28 Sep 2022 08:50:12 +, "Lahnov, Igor"  
wrote in 
> Hi,
> After failover all stand by nodes could not start streaming wal recovery.
> Streaming recovery start from 1473/A500, but standby start at 
> 1473/A5FFEE08, this seems to be the problem.

It's not a problem at all. It is quite normal for standby to start
streaming from the beginning of a WAL segment.

> What can we do in this case to restore?
> Is it possible to shift wal streaming recovery point on primary?
> Can checkpoint on primary help in this situation?


> 2022-09-26 14:08:23.672  [3747868] LOG:  started streaming WAL from 
> primary at 1473/A500 on timeline 18
> 2022-09-26 14:08:24.363  [3747796] LOG:  invalid record length at 
> 1473/A5FFEE08: wanted 24, got 0
> 2022-09-26 14:08:24.366  [3747868] FATAL:  terminating walreceiver 
> process due to administrator command

This seems to mean someone emtpied primary_conninfo.

> 2022-09-26 14:08:24.366  [3747796] LOG:  invalid record length at 
> 1473/A5FFEE08: wanted 24, got 0
> 2022-09-26 14:08:24.366  [3747796] LOG:  invalid record length at 
> 1473/A5FFEE08: wanted 24, got 0

I don't fully understand the situation. A situation that leads the
this state I can come up with is that somehow the standby restored an
incomplete WAL segment from the primary. For example, in a case
wheresomeone copied the current active WAL file from pg_wal to archive
on the primary, or a case where restore_command on the standby fetches
WAL files from pg_wal on the primary instead of its archive. Both are
not normal operations.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: AW: Reducing bandwidth usage of database replication

2022-11-06 Thread Kyotaro Horiguchi

At Fri, 4 Nov 2022 09:25:44 +, Sascha Zenglein  wrote 
in 
> I have set the status interval to 600s and it still sends a status message 
> every 30s.
> I also tried setting every available option higher, with no success:

I guess you are not setting wal_sender_timeout on the primary side.

Status messages are also sent in response to sender pings that is
controlled by that variable.  Wal sender sends a ping after a half of
that setting since the last status message to request walreceiver to
send a response.

> checkpoint_warning = 93s
> max_standby_streaming_delay = 130s
> max_standby_archive_delay = 45s
> wal_receiver_status_interval = 600s
> wal_receiver_timeout = 1200s
> wal_receiver_timeout = 65s
> recovery_min_apply_delay = 600s
> 
> The random values were to see which setting is limiting if I got above the 
> 30s limit.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: How to check stream replication latest history status

2022-11-14 Thread Kyotaro Horiguchi

At Mon, 14 Nov 2022 00:09:36 +0800, 徐志宇徐  wrote in 
>   If the primary and standby have long replay_lag. I don't want to
> standby server promote.
> 
>   Is there any view or function which store the latest stream
> replication gap(replay_lag) between primary and standby ?
>   Or it there any workaround to know this stream replication
> gap(replay_lag) infomation ?  Thanks for your help.

replay_lag vanishes when the steram is gone. But commit timestamps
might work for you. You can see when the last replayed transaction on
the standby was commited on the primary.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Segmentation fault on RelationGetDescr in my first extension

2023-01-09 Thread Kyotaro Horiguchi

At Thu, 29 Dec 2022 13:52:18 +0300, Дмитрий Цветков  
wrote in 
> I'm trying to write my first extension and open a table in it.
> I use check_password_hook and my function executes at the moment of
> changing user password.
> 
> But if I try to open a table inside this function, I get Segmentation fault
> on the line with   "RelationGetDescr".
> What am I doing wrong?

table_open() should error out when it is going to return an invalid
Relation*, and RelationGetDescr(rel) is "rel->rd_att". So the
RelationGetDescr() cannot get a segv even if the table does not exist.

# I modified contrib/passwordcheck that way and saw RelationGetDescr()
# doesn't get segv in the shown context.

I guess the segv comes from another place, or something else has
broken memory until there. but anyway more information is needed for
people to diagnose your situation more closely.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: How could I elog the tupleTableSlot to the fronted terminal?

2023-02-01 Thread Kyotaro Horiguchi

At Wed, 1 Feb 2023 00:12:44 +0800, "jack...@gmail.com"  
wrote in 
> I use the debugtup to print, and I find out there are "printf", it doesn't 
> print anything
> to the terminal. I need to know how to use this debugtup func. I think I use 
> it as a mistake

It's not clear to me what the terminal means, but can you find it in
server log instead?

reagrds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Logical replication fails when adding multiple replicas

2023-03-23 Thread Kyotaro Horiguchi

At Wed, 22 Mar 2023 09:25:37 +, Will Roper 
 wrote in 
> Thanks for the response Hou,
> 
> I've had a look and when the tablesync workers are spinning up there are
> some errors of the form:
> 
> "2023-03-17 18:37:06.900 UTC [4071] LOG:  logical replication table
> synchronization worker for subscription
> ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has
> started"
> "2023-03-17 18:37:06.976 UTC [4071] ERROR:  could not create replication
> slot ""pg_37986_sync_37922_7210774007126708177"": ERROR:  replication slot
> ""pg_37986_sync_37922_7210774007126708177"" already exists"

The slot name format is "pg__sync__". It's no
surprise this happens if the subscribers come from the same
backup.

If that's true, the simplest workaround would be to recreate the
subscription multiple times, using a different number of repetitions
for each subscriber so that the subscribers have subscriptions with
different OIDs.



I believe it's not prohitibed for subscribers to have the same system
identifer, but the slot name generation logic for tablesync doesn't
account for cases like this.  We might need some server-wide value
that's unique among subscribers and stable while table sync is
running.  I can't think of a better place than pg_subscription but I
don't like it because it's not really necessary most of the the
subscription's life.

Do you think using the postmaster's startup time would work for this
purpose?  I'm assuming that the slot name doesn't need to persist
across server restarts, but I'm not sure that's really true.


diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 07eea504ba..a5b4f7cf7c 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -1214,7 +1214,7 @@ ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
char 
*syncslotname, Size szslot)
 {
    snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT, suboid,
-relid, GetSystemIdentifier());
+relid, PgStartTime);
 }
 
 /*


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: terminating walsender process due to replication timeout

2019-05-16 Thread Kyotaro HORIGUCHI

Hello.

At Wed, 15 May 2019 10:04:12 +0300, ayaho...@ibagroup.eu wrote in 

> Hello,
> Thank You for the response.
> 
> Yes that's possible to monitor replication delay. But my questions were 
> not about monitoring network issues. 
> 
> I use exactly wal_sender_timeout=1s because it allows to detect 
> replication problems quickly.

Though I don't have an exact idea of your configuration, it seems
to me that your standby is simply getting behind more than one
second from the master. If you regard the fact as a problem of
replication, the configuration can be said to be finding the
problem correctly.

Since the keep-alive packet is sent in-band, it doesn't get to
the standby before already-sent-but-not-processed packets.

> So, I need clarification to the following  questions:
> Is  it possible to use exactly this configuration and be sure that it will 
> be work properly.
> What did I do wrong? Should I correct my configuration somehow?
> Is this the same issue  as mentioned here: 
> https://www.postgresql.org/message-id/e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com
>  
> ? If it is so, why I do I face this problem again?

It is not the same "problem". What was mentioned there is fast
network making the sender-side loop busy, which prevents
keepalive packet from sending.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: terminating walsender process due to replication timeout

2019-05-23 Thread Kyotaro HORIGUCHI

Hello.

At Fri, 17 May 2019 11:04:58 +0300, ayaho...@ibagroup.eu wrote in 

> Can frequent database operations cause getting a standby server behind? Is 
> there a way to avoid this situation?
> I checked that walsender works well in my test  if I set 
> wal_sender_timeout at least to 5 second.

It depends on the transaction (WAL) traffic and the bandwidth of
your network. The performacne difference between master and
standby also affects.

The possibilities I can guess are:

- The bandwidth is narrow to the traffic.

- The standby performs poorer than the master.

- Your network is having a sort of trouble. Virtual network
  (local network in a virtual environment) tends to suffer
  network troubles caused by CPU saturation or something else.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PostgreSQL 13 - Logical Replication - ERROR: could not receive data from WAL stream: SSL SYSCALL error: EOF detected

2023-05-18 Thread Kyotaro Horiguchi

At Thu, 18 May 2023 21:00:08 +0100, FOUTE K. Jaurès  
wrote in 
> Hello everyone,
> 
> I have a replication between PostgreSQL 12 to 13 in the production system.
> Using Ubuntu 18.04 LTS
> We have this error today.
> 
> 
> 
> *2023-05-18 18:34:04.374 WAT [117322] ERROR:  could not receive data from
> WAL stream: SSL SYSCALL error: EOF detected
> 2023-05-18 18:34:04.381 WAT [118393] LOG:  logical replication apply worker 
> for subscription "inov_transactionnal_table_bertoua_sub_00" has started
> 2023-05-18 18:34:04.423 WAT [1039] LOG:  background worker "logical 
> replication worker" (PID 117322) exited with exit code 1
> 2023-05-18 18:47:51.485 WAT [66836] postgres@inov_henrietfreres_v71_00 LOG:  
> could not receive data from client: Connection timed out.*
> 
> *Any idea how to solve this?*

According to the message, the SSL-encrypted replication connection got
disconnected unexpectedly. I suppose it is due to the death of the
upstream server or some reasons outside of PostgreSQL. It seems like
the issue had been persisting for a few minutes after that. Other than
the server's death, I doubt some network hardware problems or changes
of firewall or networking setup of the OS. I think it would be good
idea to check for them first.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Problematic enforcement of "ERROR: functions in index predicate must be marked IMMUTABLE"

2023-07-10 Thread Kyotaro Horiguchi

At Sun, 9 Jul 2023 14:22:37 +, Avi Weinberg  wrote in 
> Hi,
> 
> If you attempt to create an index based on function that is not IMMUTABLE you 
> will get an exception "ERROR:  functions in index predicate must be marked 
> IMMUTABLE".  However, if you created the index when the function was 
> IMMUTABLE, but later on you updated the function and mistakenly removed the 
> IMMUTABLE key, you will not get any error to alert you that there is an index 
> based on this function and it should remain IMMUTABLE.
> 
> I suggest triggering error message also when updating a function that is used 
> by index if it is no longer IMMUTABLE

There's no way to truly verify a function is really immutable or
not. So, as mentioned in the documentation, the function volatility
categories are essentially a promise to the optimizer regarding the
function's behavior.

Even given this, premising users keeping the volatility marks in line
with the actual behavior of their corresponding functions, it might be
benetifical to prohibit changes to the volatility category while it's
being used for indices. In the first place, that protecting indices
from entering an inconsistent state, at least on the surface.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Problematic enforcement of "ERROR: functions in index predicate must be marked IMMUTABLE"

2023-07-10 Thread Kyotaro Horiguchi

At Tue, 11 Jul 2023 10:14:29 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Sun, 9 Jul 2023 14:22:37 +, Avi Weinberg  wrote in 
> > Hi,
> > 
> > If you attempt to create an index based on function that is not IMMUTABLE 
> > you will get an exception "ERROR:  functions in index predicate must be 
> > marked IMMUTABLE".  However, if you created the index when the function was 
> > IMMUTABLE, but later on you updated the function and mistakenly removed the 
> > IMMUTABLE key, you will not get any error to alert you that there is an 
> > index based on this function and it should remain IMMUTABLE.
> > 
> > I suggest triggering error message also when updating a function that is 
> > used by index if it is no longer IMMUTABLE
> 
> There's no way to truly verify a function is really immutable or
> not. So, as mentioned in the documentation, the function volatility
> categories are essentially a promise to the optimizer regarding the
> function's behavior.
> 
> Even given this, premising users keeping the volatility marks in line
> with the actual behavior of their corresponding functions, it might be
> benetifical to prohibit changes to the volatility category while it's
> being used for indices. In the first place, that protecting indices
> from entering an inconsistent state, at least on the surface.

Even after doing that, any functions used indeirectly are still
allowed to be or to become non-immutable. Therefore, we might want to
reject this because we can't execut it perfectly.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Problematic enforcement of "ERROR: functions in index predicate must be marked IMMUTABLE"

2023-07-11 Thread Kyotaro Horiguchi

At Mon, 10 Jul 2023 21:28:12 -0400, Tom Lane  wrote in 
> In the end, adding such restrictions would just give a false sense
> of security, because there would always be gaps in whatever we did.
> As you quote from the documentation, volatility markings are a promise
> by the user to the system, not vice versa.  If you break your promise,
> you get to keep both pieces of whatever trouble ensues.

I agree to you, as I mentioned a-bit-too-late message..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Bogus temp file reporting?

2023-07-27 Thread Kyotaro Horiguchi

At Thu, 27 Jul 2023 10:30:41 +0300, kg.postgre...@olympiakos.com wrote in 
> Hello,
> 

> Is there a known issue with temp file reporting?  I have a
> production db that is reporting 231TB of temp space usage, which
> can’t be true.  temp_blks_written in pg_stat_statements sort of
> confirms this, as the reported queries show just a few MB of temp
> space usage.  I suspect a reboot will clear this (to do) and wanted
> to check with you to see if someone else has had similar
> experiences.


Just to clarify, "temprary files" and "temp blocks" refer to different
entities in this context. The "temprary files" are ephemeral storage
that is created and alive only for tasks like tuple sorting, whereas
"temp blocks" are referring to the storage for temprary tables.

If you're performing a massive insert into a temporary table, you'll
find that pg_stat_database.temp_bytes doesn't increase at all.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Understanding pg_stat_io.evictions

2023-07-30 Thread Kyotaro Horiguchi

At Fri, 28 Jul 2023 10:36:46 +, "Daniel Westermann (DWE)" 
 wrote in 
> postgres=# insert into t select i,  md5(i::text), md5(i::text), md5(i::text) 
> from generate_series(1,100) i;
..
>  client backend |   207 | normal
>  client backend | 0 | vacuum
> 
> Shouldn't these evictions show up under context blkwrite? The description in 
> docs is:

No, that's not the case because INSERT doesn't execute a bulk
write. It is is only performed during COPY FROM, and the write side of
some other comands, such as CREATE AS (and some internal table-copy
operations.).

> As far as I understand this, a ring buffer is used in this case. Do I miss 
> something?

Maybe you're confusiong it with bulk-read cases?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: How does pg parse 'select '(1,2)''

2023-08-06 Thread Kyotaro Horiguchi

At Sat, 5 Aug 2023 21:46:33 +0800, jacktby jacktby  wrote in 
> I’m trying to add new grammar for my application. So I’m do research on 
> gram.y.
> I think it will make the '(1,2)' as " '(' a_expr ')' "; but I can’t find out 
> something like
> " a_expr ',' a_expr “, can you give me the details?

If you are trying literally "SELECT (1, 2)", I think that the parser
comes to the terminating a_expr through the following path.


simple_select: SELECT .. opt_target_list ...
opt_target_list: target_list
target_list: target_el | target list '.' target_el
target_el: a_expr
a_expr: c_expr
c_expr: implicit_row
implicit_row: '(' expr_list ',' a_expr ')'
expr_list: a_expr | expr_list ',' a_expr

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Could not read from file "pg_subtrans/00F5" at offset 122880: Success.

2023-11-28 Thread Kyotaro Horiguchi

At Mon, 27 Nov 2023 19:58:13 +0100, Laurenz Albe  
wrote in 
> On Mon, 2023-11-27 at 11:50 +, Sri Mrudula Attili wrote:
> > ERROR:  could not access status of transaction 16087052
> > DETAIL:  Could not read from file "pg_subtrans/00F5" at offset 122880: 
> > Success.

By the way, just out of curiosity, but errno should not be zero at the
time the message above was output, yet "%m" is showing "success",
which implies errno = 0 in Linux. How can that happen?

slru.c:961
> case SLRU_READ_FAILED:
>   if (errno)
> ereport(ERROR,
> (errcode_for_file_access(),
>  errmsg("could not access status of transaction %u", xid),
>  errdetail("Could not read from file \"%s\" at offset %d: %m.",

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Could not read from file "pg_subtrans/00F5" at offset 122880: Success.

2023-11-29 Thread Kyotaro Horiguchi

At Wed, 29 Nov 2023 18:29:15 +0100, Alvaro Herrera  
wrote in 
> The code in master is completely different (it uses pg_pread rather than
> seek + read): it does test for errno and reports accordingly.
> 
> So, nothing to do here.

Oops! Thank you and sorry for the noise.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: walreceiver fails on asynchronous replica [SEC=UNOFFICIAL]

2024-02-26 Thread Kyotaro Horiguchi

At Fri, 23 Feb 2024 04:04:03 +, Mark Schloss  
wrote in 
> <2024-02-23 07:50:05.637 AEDT [1957121]: [1-1] user=,db= > LOG:  started 
> streaming WAL from primary at 6/B000 on timeline 5
> <2024-02-23 07:50:05.696 AEDT [1957117]: [6-1] user=,db= > LOG:  invalid 
> magic number  in log segment 0005000600B0, offset 0

This appears to suggest that the WAL file that the standby fetched was
zero-filled on the primary side, which cannot happen by a normal
operation. A preallocated WAL segment can be zero-filled but it cannot
be replicated under normal operations.

> <2024-02-22 14:20:23.383 AEDT [565231]: [6-1] user=,db= > FATAL:  terminating 
> walreceiver process due to administrator command

This may suggest a config reload with some parameter changes.

One possible scenario matching the log lines could be that someone
switched primary_conninfo to a newly-restored primary. However, if the
new primary had older data than the previously connected primary,
possibly leading to the situation where the segment 0..5..6..B0 on it
was a preallocated one that was filled with zeros, the standby could
end up fetching the zero-filled WAL segment (file) and might fail this
way. If this is the case, such operations should be avoided.

Unfortunately, due to the lack of a reproducer or detailed operations
that took place there, the best I can do now is to guess a possible
scenario as described above. I'm not sure how come the situation
actually arose.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: walreceiver fails on asynchronous replica [EXTERNAL] [SEC=UNOFFICIAL]

2024-02-27 Thread Kyotaro Horiguchi

At Wed, 28 Feb 2024 03:23:37 +, Mark Schloss  
wrote in 
> Thank you for your reply. I can confirm there were no changes made to the 
> config of the replica.

Okay.

> Is there any significance in the parameters in the commit record -
> 'inval msgs: catcache 21; sync'.

I think it is not relevant.

> - The walreceiver on the barman server did not fail but the WAL file does not 
> contain the commit transaction

I don't have detailed knowledge of barman's behavior, but it seems to
be quite normal for barman to write out only on receiving a commit
record. What I don't understand here is how those WAL files on the
barman server are related to the failed replicas.

>From the information you provided, I guess that the replicas somehow
obtain the currently-written WAL file from the barman server at a
certain time through a route unknown to me, but you seem to know
that. I think that relationship has not been explained here.

Could you explain the routes and timings that WAL files are copied
between the servers?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: walreceiver fails on asynchronous replica [EXTERNAL] [SEC=UNOFFICIAL]

2024-02-29 Thread Kyotaro Horiguchi

Hi, Mark.

At Thu, 29 Feb 2024 04:39:49 +, Mark Schloss  
wrote in 
> Thanks for looking at this. I think I complicated things by
> including barman. I was just wanting to point out each primary
> streams to two locations - the walreceiver on the replica and the
> walreciver used by barman. We think the reason the barman
> WAL-receiver didn't fail is because there is no apply of the WAL in
> Barman but the Standby is applying and it's WAL-receiver got
> terminated, so the barman server can be taken out of this picture
> completely, they were just there as a by-product in trying to
> determine the effect.  We are only interested in the killing of the
> standby wal-receiver and that the pg_waldump showed the failing lsn
> was a commit.

It seems that an issue raised in the -hackers thread [1] might be the
same issue as yours. The discussion might be a help for you, although
it's not clear what is happening yet.

[1] 
https://www.postgresql.org/message-id/CAFh8B%3DmozC%2Be1wGJq0H%3D0O65goZju%2B6ab5AU7DEWCSUA2OtwDg%40mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PITR for an only object in postgres

2021-03-22 Thread Kyotaro Horiguchi

At Thu, 18 Mar 2021 17:12:49 -0400, aslı cokay  wrote in 
> Hi all,
> 
> Is there any way to get for example a table version of 5 minutes ago, or we
> should PITR to 5 minutes ago, dump the table and restore with this dump ?

If you want to revert the recent changes on the table, I think there's
no way other than time-travelling using PITR.  Of couse you can't do
that only for a part of database.

If you often want to access 5-minutes ago image of the database, you
could configure a delayed standby using recovery_min_apply_delay.

I'm not sure PostgreSQL has the same feature for logical replication,
but perhaps we don't have that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: "invalid contrecord" error on replica

2021-05-05 Thread Kyotaro Horiguchi

At Sun, 2 May 2021 22:43:44 +0200, Adrien Nayrat  
wrote in 
> I also dumped 000100AA00A1 on the secondary and it
> contains all the records until AA/A1004018.
> 
> It is really weird, I don't understand how the secondary can miss the
> last 2 records of A0? It seems he did not received the
> CHECKPOINT_SHUTDOWN record?
> 
> Any idea?

This seems like stepping on the same issue with [1], in short, the
secondary having received an incomplete record but the primary forgot
of the record after restart.

Specifically, primary was writing a WAL record that starts at A0FFFB70
and continues to A1xx segment. The secondary successfully received
the first half of the record but the primary failed to write (then
send) the last half of the record due to disk full.

At this time it seems that the primary's last completed record ended
at A0FFB70. Then the CHECKPOINT_SHUTDOWN record overwrote the
already-halfly-sent record up to A0FFBE8 while restarting.

On the secondary side, there's only the first half of the record,
which had been forgotten by the primary and the last half starting at
LSN A100 was still the future in the new history on the primary.

After some time the primary reaches A100 but the first record in
the segment is of course disagrees with the history of the secondary.

1: 
https://www.postgresql.org/message-id/CBDDFA01-6E40-46BB-9F98-9340F4379505%40amazon.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: some questions regarding replication issues and timeline/history files

2021-06-15 Thread Kyotaro Horiguchi

At Tue, 15 Jun 2021 07:05:07 -0700 (MST), "email2ssk...@gmail.com" 
 wrote in 
> Even I have this problem when I had to recover the database failed
> switchover.
> This is error is new primary server. 
> 
> < 2021-06-15 16:05:02.480 CEST > ERROR:  requested starting point
> AF/7D00 on timeline 1 is not in this server's history
> < 2021-06-15 16:05:02.480 CEST > DETAIL:  This server's history forked from
> timeline 1 at AF/7C0F8D58.

Your old primary looks like having continued running beyond 7D00
after the old standby promoted at 7C0F8D58.  In short, the new standby
experienced a diverged history from the new primary.

You can use pg_rewind to adust the new standby sever in that case.


FYI, you can reproduce the error by the folowing steps.

1. create a primary  (A)
2. create a standby  (B) connecting to A.
3. promote B.

4. connecting to A and run the following commands.

   =# select pg_switch_wal(); checkpoint;

5. stop A, then add primary_conninfo connecting to B to the conf file
   of A,then create the standby.signal file in the data directory of
   A.

6. You will get the similar error.


To recover from the sitaution, run pg_rewind like the follows, for example.

$ pg_rewind --target_pgdata= --target-server='connstr to B'
pg_rewind: servers diverged at WAL location 0/360 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/260 on timeline 1


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: wal_keep_segments and max_wal_size

2021-07-07 Thread Kyotaro Horiguchi

At Mon, 5 Jul 2021 16:45:09 +0530, Atul Kumar  wrote in 
> hi,
> 
> (on postgres 9.6) I was just practicing to make my no. of wal files in
> pg_xlog directory to be the same/equal as I pass on wal_keep_segments
> paramater.
> 
> say for example
> show wal_keep_segments;
>  wal_keep_segments
> ---
>  125

This setting ensures to preserve this number of WAL files before the
current WAL location (shown by pg_current_wal_lsn()). This means the
WAL record is written to the roughly 126th live (not-recycled) file
after a checkpoint. Because the setting instructs checkpoints to leave
that number of WAL files regardless of other settings.

> so I want my wal files should not be grow in numbers more than 125 so
> for that I have changed the parameter of max_wal_size to 2 GB to limit
> the size of pg_xlog directory.
> 
>  show max_wal_size;
>  max_wal_size
> --
>  2GB
> (1 row)

Note that max_wal_size is a soft limit, which could be exceeded for
certain cases for the reasons including wal_keep_segments.

https://www.postgresql.org/docs/9.6/runtime-config-wal.html#GUC-MAX-WAL-SIZE


> but I did some testing by bulk inserts then the number of wal files
> has grown more than 125.
> 
> and the size of pg_xlog directory also reached to 2.7 GB.
> 
> /data/apps/edb/as96/data/pg_xlog 04:05:08]$ ls | wc -l
> 173
> 
> /data/apps/edb/as96/data 04:05:11]$ du -sh pg_xlog/
> 2.7Gpg_xlog/

I'm not sure what value you set to checkpoint_completion_target but
suppose 0.5, the default, with that setting and max_wal_size = 2GB,
chekcpoint happens every 85 segments [*1].  Note that the 85 files
start after the 125 files kept in pg_wal by wal_keep_segments. So no
wonder even if the WAL files got more than 3GB.

*1: max_wal_size / (checkpoint_completion_target + 1.0) = 1365MB = 85 files

> I wonder why I faced such behavior bcz I  limited the size of pg_xlog
> directory by setting max_wal_size to 2GB that should be equivalent to
> around 125 number of wal file in the pg_xlog directory.

Does the above explanation makes sense for you?

> please suggest how should I make both identical (wal_keep_segments and
> max_wal_size).

wal_keep_segments must be minimal amount required for pg_basebackup or
replication, otherwise set to zero.

If wal_keep_segments is greater than max_wal_size - [*1], that is, 43
16MB-files, max_wal_size would be overflown before XLOG-triggered
checkpoint caused by max_wal_size starts.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Removing a subscription that does not exist

2021-07-11 Thread Kyotaro Horiguchi

At Fri, 9 Jul 2021 10:49:46 -0600, Jeff Ross  wrote in 
> Hello,
> 
> I'm working with an RDS instance running 12 that has an old
> subscription that I can't seem to drop.
...
> It shows up here:
> 
> mirror_admin@metro_logical> select * from pg_subscription;
>oid  │ subdbid  │ subname  │ subowner │ ...
> ┼─┼─┼─┼ ...
>   83645 │66754 │ cargowel_common  │16394 │ ...
>   83646 │66754 │ metro_prod   │16394 │ ...
>   51490 │14313 │ metro│16394 │ ...
> (3 rows)
...
> But not in here:
> 
> mirror_admin@metro_logical> \dRs+
> List of subscriptions
>Name   │ Owner│ Enabled  │ Publication 
> ─┼───┼─┼───
>  cargowel_common  │ mirror_admin │ t   │ {cargowel_common_prod} 
>  metro_prod   │ mirror_admin │ t   │ {metro_prod} 
> (2 rows)
> 
> And it can't be disabled or dropped:

Look at the subdbid field in the first query result.  You were logging
into the databsae with OID=66754 and the subscription "metro" belongs
to the database 14313.  The second command doesn't show metro which is
not of the current database.

> mirror_admin@metro_logical> alter subscription metro disable;
> ERROR:  subscription "metro" does not exist
> Time: 24.263 ms
> mirror_admin@metro_logical> drop subscription metro;
> ERROR:  subscription "metro" does not exist
> Time: 23.648 ms
> 
> I did try deleting it directly from the pg_subscription table but that
> failed with a permission denied error.  My suspicion is that's because
> of the RDS environment.
> 
> What else can I try to remove this old non-functional subscription?

Thus you need to log in to the databse OID=14313 to manipulate on the
subsciption metro.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: oldest WAL files not removed

2021-09-01 Thread Kyotaro Horiguchi

At Wed, 1 Sep 2021 12:48:51 +0200,  wrote in 
> Hi,
> 
>  
> 
> Looking at WAL folder after a crash, I noticed that new files after
> restarting overwrite the more recent files before the crash and not the
> oldest, which was what I expected. 
> 
> Is that normal ?  I got only one file marked .deleted. Does that happens
> when a WAL file hase been completed updated in the database and if then
> while all oldest files aren't marked .deleted after restarting ?
> 
> 
> Example :
> 
> Crash occurs Aug 31 22:03 which is the more recent Wal file, the oldest file
> is Aug 30 17:20 (and 105 files between those two)
> 
> After restarting Aug 30 17:20 is still there, Aug 31 22:03 disappeared, one
> new file is Sep 1 12:15 marked .deleted (restarting date) and one new Sep 1
> 12:36 which I guess is normal. Right now, I see an new wal file and the
> previous one marked .deleted which is ok. 
> 
> Why are the oldest wal files still there ?? Can I remove them ?
> 
> Hope I'm clear enough and thanks for explanations,

It would be very helpful you gave us the name of the files.  Due to
WAL file recycling, timestamps are frequently shuffled aginst names.

In any case, no WAL files ought to be manually removed.  If you don't
need the recycled-for-future files that much, consider reducing
min_wal_size.

If you looked the files only in timestamp order, with a high odds, the
"oldest" file is a recycled file to be used in future, and the
"newest" file is the currently written one.  If so, the reason that
the oldest-in-timestamp file is still there is it is still waiting to
be used.  Even if you removed the to-be-used-in-future files, such
files would increase to the same extent according to the setting of
min_wal_size.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: oldest WAL files not removed

2021-09-02 Thread Kyotaro Horiguchi

At Thu, 2 Sep 2021 08:21:37 +0200,  wrote in 
> Hy Kyotaro,
> 
> Thanks for this explanation. I joined the files to be complete.

Thanks, it is informative.  It looks like this if sorted in the file-name order.

01/09/2021  12:1516 777 216 0001008E0061.deleted
01/09/2021  16:3516 777 216 0001008E0086.deleted
01/09/2021  16:3516 777 216 0001008E0087
30/08/2021  20:4016 777 216 0001008E0088
...
31/08/2021  17:5816 777 216 0001008E00CF

> If you looked the files only in timestamp order, with a high odds, the
> "oldest" file is a recycled file to be used in future, and the "newest" file
> is the currently written one.  If so, the reason that the
> oldest-in-timestamp file is still there is it is still waiting to be used.
> Even if you removed the to-be-used-in-future files, such files would
> increase to the same extent according to the setting of min_wal_size.

The reason that 0001008E0088 is still there is it is still
waiting to be used.  The files 0001008E0061/86.deleted
have been already removed in the postgres' view but remain because
someone else is still using it. If they persist too long, they could
be removed manually if possible.

The files 88 to CF look like preallocated, or recycled files. Since
there are 76 files, it seems like min_wal_size is set to 1GB or so. If
you don't need that many files preallocated in your pg_wal directory,
consider reducing min_wal_size.  But note that the preallocated files
won't be gone right away just by doing that,

If you really want to delete that file right away, the preallocated
files are theoretically removable.  You can see the current-writing
file name by the following query then move to somewhere the files with
names larger than the current file in the file-name order, then remove
the files after making sure the system can restart.

=# select pg_walfile_name(pg_current_wal_lsn());

If the system is active, the current file may advance so be careful
not to remove files with its substantial contents. This is why I said
"In any case, no WAL files ought to be manually removed."

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Postgresql 11: terminating walsender process due to replication timeout

2021-09-08 Thread Kyotaro Horiguchi

At Thu, 9 Sep 2021 14:52:25 +0900, Abhishek Bhola 
 wrote in 
> I have found some questions about the same error, but didn't find any of
> them answering my problem.
> 
> The setup is that I have two Postgres11 clusters (A and B) and they are
> making use of publication and subscription features to copy data from A to
> B.
> 
> A (source DB- publication) --> B (target DB - subscription)
> 
> This works fine, but often (not always) when the data volume being inserted
> on a table in node A increases, it gives the following error.
> 
> "terminating walsender process due to replication timeout"
> 
> The data volume at the moment being entered is about 30K rows per second
> continuously for hours through COPY command.
> 
> Earlier the wal_sender_timeout was set to 5 sec and I would see this error
> much often. I then increased it to 1 min and the frequency of this error
> reduced. But I don't want to keep increasing it without understanding what
> is causing it. I looked at the code of walsender.c and know the exact lines
> where it's coming from.
> 
> But I am still not clear which parameter is making the sender assume that
> the receiver node is inactive and therefore it should stop the wal_sender.
> 
> Can anyone please suggest what changes I should make to remove this error?

What minor-version is the Postgres server mentioned? PostgreSQL 11
have gotten the following fix at 11.6, which could be related to the
trouble.

https://www.postgresql.org/docs/11/release-11-6.html

> Fix timeout handling in logical replication walreceiver processes
> (Julien Rouhaud)
> 
> Erroneous logic prevented wal_receiver_timeout from working in
> logical replication deployments.

The details of the fix is here.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3f60f690fac1bf375b92cf2f8682e8fe8f69098
> Fix timeout handling in logical replication worker
> 
> The timestamp tracking the last moment a message is received in a
> logical replication worker was initialized in each loop checking if a
> message was received or not, causing wal_receiver_timeout to be ignored
> in basically any logical replication deployments.  This also broke the
> ping sent to the server when reaching half of wal_receiver_timeout.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Postgresql 11: terminating walsender process due to replication timeout

2021-09-09 Thread Kyotaro Horiguchi

At Thu, 9 Sep 2021 16:06:25 +0900, Abhishek Bhola 
 wrote in 
> sourcedb:~$ postgres --version
> postgres (PostgreSQL) 11.6
> 
> Sorry for missing this information.
> But looks like this fix is already included in the version I am running.

Ok. I'm not sure but there may be a case where too-busy (or too poor
relative to the publisher) subscriber cannot send a response for a
long time. Usually keep-alive packets sent from publisher causes
subscriber response even while busy time but it seems that if
subscriber applies changes more than two times slower than the
publisher sends, subscriber doesn't send a response in the timeout
window.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Postgresql 11: terminating walsender process due to replication timeout

2021-09-12 Thread Kyotaro Horiguchi

At Fri, 10 Sep 2021 16:55:48 +0900, Abhishek Bhola 
 wrote in 
> So is there any solution to this issue?
> I did try to increase the wal_sender_timeout and it broke the pub/sub.
> I increased the wal_receiver_timeout and it wouldn't attempt to restart the
> subscription until that time elapsed.
> Due to that, the WAL segments got removed by the time it came up again and
> it stopped working.
> So given that the publisher is publishing at a higher rate than the
> subscriber is subscribing, what can be done?

Given that my assumption is right, to enable a subscriber to send a
response, the subscriber needs to see a keepalive packet from
publisher (sent with intervals of wal_sender_timeout/2) within every
interval of wal_sender_timeout. Otherwise needs to "rest" by finding a
gap in the data stream from the publisher with intervals shorter than
wal_sender_timeout.

The reason subscriber is kept busy is it receives the next data
before it finishes applying the previous data.  So possible workaround
I came up with for now are:

- Increase processing power of the subscriber, so as to increase the
  possibility that it can finish applying changes before the next data
  block comes from the publisher. Or, to make the subscriber can keep
  catching up to the publisher. This is the most appropriate solution,
  I think.

- Throttle network bandwidth to obtain the same effect to the first
  reason above. (This may give a side effect that the bandwidth become
  finally insufficient.)

- Break large transactions on the publisher into smaller pieces.
  Publisher sends data of a transaction at once at transaction commit,
  so this could average data transfer rate.

- If you are setting *sufficient* logical_decoding_work_mem for such
  problematic large transactions, *decreasing* it might mitigate the
  issue.  Lower logical_decoding_work_mem cause transaction data spill
  out to disk and the spilled data on disk could be sent at slower
  rate than on-memory data.  Of course this is in exchange with total
  performance.

- Streaming mode of logical replication introduced in PostgreSQL 14
  might be able to mitigate the problem.  It starts sending
  transaction data before the transaction completes.

I'm not sure this is "fixed" for 13 or earlier, because a straight
forward resolution surely decreases maximum processing rate at
subscriber.


> On Fri, Sep 10, 2021 at 9:26 AM Kyotaro Horiguchi 
> wrote:
> 
> > At Thu, 9 Sep 2021 16:06:25 +0900, Abhishek Bhola <
> > abhishek.bh...@japannext.co.jp> wrote in
> > > sourcedb:~$ postgres --version
> > > postgres (PostgreSQL) 11.6
> > >
> > > Sorry for missing this information.
> > > But looks like this fix is already included in the version I am running.
> >
> > Ok. I'm not sure but there may be a case where too-busy (or too poor
> > relative to the publisher) subscriber cannot send a response for a
> > long time. Usually keep-alive packets sent from publisher causes
> > subscriber response even while busy time but it seems that if
> > subscriber applies changes more than two times slower than the
> > publisher sends, subscriber doesn't send a response in the timeout
> > window.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: WAL File Recovery on Standby Server Stops Before End of WAL Files

2021-10-27 Thread Kyotaro Horiguchi

At Wed, 27 Oct 2021 16:42:52 +, "Ryan, Les"  wrote in 
> 2021-10-27 10:26:31.467 MDT [2012] LOG:  redo starts at 419/5229A858
...
> 2021-10-27 10:26:36.188 MDT [2012] LOG:  restored log file 
> "00010419005A" from archive
> 2021-10-27 10:26:36.750 MDT [2012] LOG:  consistent recovery state reached at 
> 419/5AB8
> 2021-10-27 10:26:36.752 MDT [6204] LOG:  database system is ready to accept 
> read only connections
> 2021-10-27 10:26:36.823 MDT [6040] LOG:  started streaming WAL from primary 
> at 419/5A00 on timeline 1
> 
>   *   There are many more WAL files available starting with 
> 00010419005B but the restore process always stops at 
> 00010419005A.
> 
> I have two questions:
> 
>   *   Why does the WAL file recovery process now stop after it reads 
> 00010419005A?
>   *   What do I need to do to get PostgreSQL to recover the rest of the 
> available WAL files.

The info above alone donesn't clearly suggest anything about the
reason. Could you show the result of "pg_waldump
00010419005A 2>&1 | tail -5"?  What I'm expecting to see
is an error message from pg_waldump before the end of the file. It
would be the immediate cause of the problem.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: WAL File Recovery on Standby Server Stops Before End of WAL Files

2021-10-28 Thread Kyotaro Horiguchi

At Thu, 28 Oct 2021 13:52:36 +, "Ryan, Les"  wrote in 
> Hi Kyotaro and Dilip,
> 
> Thank you for getting back to me.
> 
> Kyotaro: I ran pg_dump and the output was "pg_waldump: fatal: could not read 
> file "00010419005A": read 50 of 8192".  I'm guessing that it 
> means that wal file 00010419005A is corrupted and that is why the 
> recovery process stops there.  Is there any way to fix the file?

No way, unless the segment is still living in the primary's
pg_wal. Your archive storage is unstable or archive_command is not
performing its task reliably enough.

> Dilip:   setting the log level to debug2 did not provide any additional 
> information.  Here are the log entries:
> 
> 2021-10-28 06:51:06.166 MDT [7556] LOG:  restored log file 
> "000104190059" from archive
> 2021-10-28 06:51:06.464 MDT [7556] DEBUG:  got WAL segment from archive
> 2021-10-28 06:51:06.579 MDT [7556] LOG:  restored log file 
> "00010419005A" from archive
> 2021-10-28 06:51:06.854 MDT [7556] DEBUG:  got WAL segment from archive
> 2021-10-28 06:51:07.107 MDT [7556] LOG:  consistent recovery state reached at 
> 419/5AB8
> 2021-10-28 06:51:07.107 MDT [7556] DEBUG:  switched WAL source from archive 
> to stream after failure
> 2021-10-28 06:51:07.109 MDT [7844] LOG:  database system is ready to accept 
> read only connections
> 2021-10-28 06:51:07.152 MDT [7844] DEBUG:  forked new backend, pid=6900 
> socket=6068
> 
> I set the log level to debug5 and here is what I got:
> 
> 2021-10-28 06:25:41.262 MDT [6288] CONTEXT:  WAL redo at 419/5ABFFF60 for 
> Btree/INSERT_LEAF: off 130
> 2021-10-28 06:25:41.262 MDT [6288] DEBUG:  record known xact 33776257 
> latestObservedXid 33776257
> 2021-10-28 06:25:41.262 MDT [6288] CONTEXT:  WAL redo at 419/5ABFFFA0 for 
> Heap/INSERT: off 95 flags 0x00
> 2021-10-28 06:25:41.262 MDT [6288] LOG:  consistent recovery state reached at 
> 419/5AB8
> 2021-10-28 06:25:41.263 MDT [6288] DEBUG:  switched WAL source from archive 
> to stream after failure
> 2021-10-28 06:25:41.264 MDT [5512] LOG:  database system is ready to accept 
> read only connections
> 
> Does the "switched WAL source from archive to stream after failure" indicate 
> a problem with the WAL file?

Actually the message says that some trouble happend while reading file
but that happens always at the end of wal.  We might need a bit more
detailed message about the trouble in higher debug level messages.

> Anyway, it looks like I need to restore the standby server from a new backup. 
>  Thank you both for your help.

Unfortunately I think so, too.

> Sincerely,
> -Les

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11

2021-11-30 Thread Kyotaro Horiguchi

At Tue, 30 Nov 2021 21:53:06 +0800, Yi Sun  wrote in 
> # cat /home/sunyi/tls/root.crt /home/sunyi/tls/1/root.crl > /tmp/test_1.pem
> # openssl verify -extended_crl -verbose -CAfile /tmp/test_1.pem -crl_check
> /home/sunyi/tls/1/server.crt

I guess what you really wanted to revoke was not server.crt but
postgresql.crt.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11

2021-12-01 Thread Kyotaro Horiguchi

At Wed, 1 Dec 2021 16:56:11 +0800, Yi Sun  wrote in 
> We want to revoke server certificate, just don't know why doesn't take
> affect
> https://www.postgresql.org/docs/11/ssl-tcp.html
> https://www.postgresql.org/docs/11/runtime-config-connection.html#GUC-SSL-CRL-FILE

Understood.  ~/.postgresq/root.crl is required to check server
revokation.

https://www.postgresql.org/docs/11/libpq-ssl.html

> To allow server certificate verification, one or more root
> certificates must be placed in the file ~/.postgresql/root.crt in the
> user's home directory. (On Microsoft Windows the file is named
> %APPDATA%\postgresql\root.crt.) Intermediate certificates should also
> be added to the file if they are needed to link the certificate chain
> sent by the server to the root certificates stored on the client.
> 
> Certificate Revocation List (CRL) entries are also checked if the file
> ~/.postgresql/root.crl exists (%APPDATA%\postgresql\root.crl on
> Microsoft Windows).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11

2021-12-01 Thread Kyotaro Horiguchi

Hi.

At Thu, 2 Dec 2021 11:31:26 +0800, Yi Sun  wrote in 
> Hi Kyotaro
> 
> From the description, seems  ~/.postgresql/root.crl  is store client
> revoked certificate

No.  Revocation is checked on the peer. There's no point for a server
to check for revocation of its own certificate, and actually that
doesn't happen. Revocation of a client certificate is checked on
server side referencing server.crl. Revocation of a server certificate
is checked on client side referencing postgresql.crl. For example,
some web browsers make use of CRL of web *servers*, which is
automatically maintained in background.

You will see it work if you duped the server.crl as
~/.postgresql/root.crl on the client.  (I spelled this wrongly in the
previous message..)

> https://www.postgresql.org/docs/11/libpq-ssl.html
> ~/.postgresql/root.crl certificates revoked by certificate authorities server
> certificate must not be on this list
> Just don't know why server parameter ssl_crl_file parameter configured but
> don't take affect

As explained above, it is because the CRL specified by ssl_crl_file
can only be used to verify client certificates.

> https://www.postgresql.org/docs/11/runtime-config-connection.html#GUC-SSL-CRL-FILE
> 
> ssl_crl_file (string)
> 
> Specifies the name of the file containing the SSL server certificate
> revocation list (CRL). Relative paths are relative to the data directory.
> This parameter can only be set in the postgresql.conf file or on the server
> command line. The default is empty, meaning no CRL file is loaded.

Ah, the "server" in "SSL server certificate revocation list" looks
like a noise word, rather misleading, or plain wrong, I'm not sure
which one it actually is.


Anyway I propose change the rephrase as "SSL client certification
revocation list" as attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ab617c7b86..4ac617615c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1248,7 +1248,7 @@ include_dir 'conf.d'
   
   

-Specifies the name of the file containing the SSL server certificate
+Specifies the name of the file containing the SSL client certificate
 revocation list (CRL).
 Relative paths are relative to the data directory.
 This parameter can only be set in the postgresql.conf
@@ -1267,7 +1267,7 @@ include_dir 'conf.d'
   
   

-Specifies the name of the directory containing the SSL server
+Specifies the name of the directory containing the SSL client
 certificate revocation list (CRL).  Relative paths are relative to the
 data directory.  This parameter can only be set in
 the postgresql.conf file or on the server command
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index c17d33a54f..eb3a0c6b55 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1742,11 +1742,10 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   sslcrl
   

-This parameter specifies the file name of the SSL certificate
+This parameter specifies the file name of the SSL server certificate
 revocation list (CRL).  Certificates listed in this file, if it
-exists, will be rejected while attempting to authenticate the
-server's certificate.  If neither
- nor
+exists, will be rejected while attempting to authenticate the server's
+certificate.  If neither  nor
  is set, this setting is
 taken as
 ~/.postgresql/root.crl.
@@ -1758,9 +1757,9 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   sslcrldir
   

-This parameter specifies the directory name of the SSL certificate
-revocation list (CRL).  Certificates listed in the files in this
-directory, if it exists, will be rejected while attempting to
+This parameter specifies the directory name of the SSL server
+certificate revocation list (CRL).  Certificates listed in the files
+in this directory, if it exists, will be rejected while attempting to
 authenticate the server's certificate.

Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11

2021-12-05 Thread Kyotaro Horiguchi

At Fri, 3 Dec 2021 15:41:51 +0800, Yi Sun  wrote in 
> Hi Kyotaro,
> 
> Thank you for your explanation, after putting the crl file to client, it
> works now, thanks.

Good to hear that. That portion of the documentation has been fixed on
the repository, and it will be released in the next minor releases.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: One Partition by list is always chosen by planner

2017-11-21 Thread Kyotaro HORIGUCHI

Hello,

It seems to me that the type of category is not text and the
right side in the query is actually casted to text.

At Tue, 21 Nov 2017 12:23:02 -0700 (MST), legrand legrand 
 wrote in <1511292182893-0.p...@n3.nabble.com>
> and if I explain the same query with an other filter
> 
> explain select * from wiki_data_part where category='fr'
> 
> | Append  (cost=0.00..14010.76 rows=291609 width=48) 
> |   ->  Seq Scan on wiki_data_part_f  (cost=0.00..9975.04 rows=291339
> width=48)
> | Filter: ((category)::text = 'fr'::text)
> |   ->  Seq Scan on wiki_data_part_s  (cost=0.00..4035.72 rows=270 width=50)
> | Filter: ((category)::text = 'fr'::text)
> 
> wiki_data_part_s is always chosen in the plan


> | Filter: ((category)::text = 'fr'::text)

"::text" on the left side hinders constraint-exclusion.

Usually the left side won't get casted unless the right side is
explicitly casted. Otherwise sometimes the left side gets casted
so that it can be compared with the right side.


A self-contained example is shown below.

=
create table p1 (cat char(1), b int) partition by list(cat);
create table c11 partition of p1 for values in ('a', 'b', 'c');
create table c12 partition of p1 for values in ('x', 'y', 'z');
explain select * from p1 where cat = 'b';
|  QUERY PLAN 
| 
|  Append  (cost=0.00..35.50 rows=10 width=12)
|->  Seq Scan on c11  (cost=0.00..35.50 rows=10 width=12)
|  Filter: (cat = 'b'::bpchar)
| (3 rows)
explain select * from p1 where cat = 'b'::text;
|  QUERY PLAN 
| 
|  Append  (cost=0.00..81.20 rows=20 width=12)
|->  Seq Scan on c11  (cost=0.00..40.60 rows=10 width=12)
|  Filter: ((cat)::text = 'b'::text)
|->  Seq Scan on c12  (cost=0.00..40.60 rows=10 width=12)
|  Filter: ((cat)::text = 'b'::text)
| (5 rows)


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

90 matches

Mail list logo