Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-17 Thread Thomas Munro
On Sun, Feb 17, 2019 at 4:56 AM Tom Lane  wrote:
> Thomas Munro  writes:
> >>> Really?  The specification says that it starts I/O, not that it waits
> >>> around for any to finish.
>
> > Right, there was some discussion of that, and I didn't (and still
> > don't) think it'd be wise to rely on undocumented knowledge about
> > which flags can eat errors based on a drive-by reading of a particular
> > snapshot of the Linux tree.  The man page says it can return EIO; I
> > think we should assume that it might actually do that.
>
> I had a thought about this: maybe we should restrict the scope of this
> behavior to be "panic on EIO", not "panic on anything within hailing
> distance of fsync".
>
> The direction you and Andres seem to want to go in is to add a pile of
> unprincipled exception cases, which seems like a recipe for constant
> pain to me.  I think we might be better off with a whitelist of errnos
> that mean trouble, instead of a blacklist of some that don't.  I'm
> especially troubled by the idea that blacklisting some errnos might
> reduce to ignoring them completely, which would be a step backwards
> from our pre-PANIC behavior.

Hmm.  Well, at least ENOSPC should be treated the same way as EIO.
Here's an experiment that seems to confirm some speculations about NFS
on Linux from the earlier threads:

$ uname -a
Linux debian 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23)
x86_64 GNU/Linux
$ dpkg -l nfs-kernel-server | tail -1
ii  nfs-kernel-server 1:1.3.4-2.4  amd64support for NFS kernel server

First, set up a 10MB loop-back filesystem:

$ dd if=/dev/zero of=/tmp/10mb.loopback bs=1024 count=1
$ sudo losetup /dev/loop0 /tmp/10mb.loopback
$ sudo mkfs -t ext3 -m 1 -v /dev/loop0
...
$ sudo mkdir /mnt/test_loopback
$ sudo mount -t ext3 /dev/loop0 /mnt/test_loopback

Then, export that via NFS:

$ tail -1 /etc/exports
/mnt/test_loopback localhost(rw,sync,no_subtree_check)
$ sudo exportfs -av
exporting localhost:/mnt/test_loopback

Next, mount that over NFS:

$ sudo mkdir /mnt/test_loopback_remote
$ sudo mount localhost:/mnt/test_loopback /mnt/test_loopback_remote

Now, fill up the whole disk with a file full of newlines:

$ sudo mkdir /mnt/test_loopback/dir
$ sudo chown $USER:$USER /mnt/test_loopback/dir
$ tr "\000" "\n" < /dev/zero > /mnt/test_loopback_remote/dir/file
tr: write error: No space left on device
tr: write error
$ df -h /mnt/test_loopback*
FilesystemSize  Used Avail Use% Mounted on
/dev/loop08.5M  8.4M 0 100% /mnt/test_loopback
localhost:/mnt/test_loopback  8.5M  8.4M 0 100% /mnt/test_loopback_remote

Now, run a program that appends a greeting and then calls fsync() twice:

$ cat test.c
#include 
#include 
#include 

int main(int argc, char *argv[])
{
int fd, rc;

fd = open("/mnt/test_loopback_remote/dir/file", O_RDWR | O_APPEND);
if (fd < 0)
{
perror("open");
return 1;
}
rc = write(fd, "hello world\n", 12);
if (rc < 0)
perror("write");
else if (rc < 12)
fprintf(stderr, "only managed to write %d bytes\n", rc);
rc = fsync(fd);
if (rc < 0)
perror("fsync 1");
rc = fsync(fd);
if (rc < 0)
perror("fsync 2");
rc = close(fd);
if (rc < 0)
perror("close");

return 0;
}
$ cc test.c
$ ./a.out
fsync 1: No space left on device
$

The write() and the second fsync() reported success.  Great, let's go
and look at our precious data, both through NFS and locally:

$ tail -3 /mnt/test_loopback_remote/dir/file



$ tail -3 /mnt/test_loopback/dir/file



$

It's gone.  If you try it again with a file containing just a few
newlines so there is free space, it works correctly and you see the
appended greeting.  Perhaps the same sort of thing might happen with
remote EDQUOT, but I haven't tried that.  Perhaps there are some
things that could be tuned that would avoid that?

(Some speculation about NFS:  To avoid data-loss from running out of
disk space, I think PostgreSQL requires either a filesystem that
reserves space when we're extending a file, so that we can exclude the
possibility of ENOSPC before we evict data from our own shared
buffers, or a page cache that doesn't drop dirty flags or whole
buffers on failure so we can meaningfully retry once space becomes
available.  As far as I know, the former would be theoretically
possible with NFS, if the client and server are using NFSv4.2+ with
ALLOCATE support and glibc and kernel versions both support true
fallocate() and pass it all the way through, but current versions
either don't support fallocate() on NFS files at all (this 4.18 kernel
doesn't) or sometimes emulate it by writing zeroes, which is useless
for remote space reservation purposes and (according to some sources I
found) there is currently no reliable way to find out about that
though libc.  If that situation i

Re: FDW, too long to run explain

2019-02-17 Thread auxsvr
Related to this question:

Postgresql cursors are in most cases I've tried extremely slow. The cause is as 
described in my previous answer, in my experience. Is there any plan or way to 
improve this situation? For example, for FDW one would expect the plan on the 
remote side to be similar, if not identical, to the one locally, with the 
exception of the setup cost.
-- 
Regards,
Peter





Re: [External] Re: FDW, too long to run explain

2019-02-17 Thread Vijaykumar Jain
I am yet to figure out the reason, what we have done is implement fake
columns to represent samples and giving them random numbers and keeping
other bulls to fake limit.

Most of the queries that were impacted were the ones that did not push
order by and limit to foreign servers.
I am also trying to upgrade pg11 to make use of parallelisation.
For now I am making use of materialised view on each shard and using
predicates that get pushed directly to ensure a simple plan is created.
There is a compromise but this is what is reasonable for now.

On Sun, 17 Feb 2019 at 4:27 PM auxsvr  wrote:

> Related to this question:
>
> Postgresql cursors are in most cases I've tried extremely slow. The cause
> is as described in my previous answer, in my experience. Is there any plan
> or way to improve this situation? For example, for FDW one would expect the
> plan on the remote side to be similar, if not identical, to the one
> locally, with the exception of the setup cost.
> --
> Regards,
> Peter
>
>
>
> --

Regards,
Vijay


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-17 Thread Ravi Krishna

If this one appears in the list, then it means the problem is with AOL.




Re: Postgresql RDS DB Latency Chossing Hash join Plan

2019-02-17 Thread github kran
On Thu, Feb 14, 2019 at 4:58 PM Michael Lewis  wrote:

> This is beyond my expertise except to say that if your storage is SSDs in
> AWS, then you definitely want random_page_cost close to the same as
> seq_page_cost (1 by default) assuming your data is likely to be in cache as
> discussed in the documentation
> .
> As it says- "Reducing this value relative to seq_page_cost will cause the
> system to prefer index scans" as you saw. Changing the value on production
> would again depend on the storage type used, and how good the cache hit
> rate is.
>
> As far as I know, dropping old partitions should not be significantly
> impactful to the system other than no longer needing to store that data
> (cost, time for full backups, etc).
>
> Again, as I understand things, there is not a big impact from having old
> unused tables in terms of maintenance. They should be ignored by normal
> processes.
>
> Glad you got your issue resolved.
>
>
> *Michael Lewis*
>

 Thanks for the feedback.You have been giving your thoughts/suggestions
since the beginning of the case. It was helpful.  I think I realized later
based on your suggestion to increase the default statistics target from
100. It was not correctly initially
 as I had that set at session level without setting them on the
partition tables. As next steps I have the stats to 1000 on all of the
partitiion tables we use from Nov 2018 to Feb 19 and then did ANALYZE. Iam
currently running a load test to test
how the DB performance is behaving right now and so far its running
good than before. ( I have reset all the previous changes done except the
statistics change).  I will keep you posted after the test finishes

Questions.
1)  Can i further increase the Setting to 3000 and see the system
behaves. ?. How do I know the best value to be used for my database in
terms of the sampling limit with the default statistics setting ?.
 2) Apart from analyzing the tables do I need to do any other changes
with the statistics setting ?
 3)  Also the current work mem is set to 4 MB and we didnt play with
this value so far. For future needs can I increase the WORK MEM setting ?.

Appreciate your reply.

Thanks


> On Thu, Feb 14, 2019 at 3:11 PM github kran  wrote:
>
>>
>>
>> On Thu, Feb 14, 2019 at 12:43 PM Michael Lewis 
>> wrote:
>>
>>> How many total rows in these tables? I am assuming these are partitions
>>> and those 100 million rows got distributed. If the data difference is
>>> significant, then you aren't guaranteed similar performance. You may want
>>> to follow more of the suggested steps on.
>>>
>>> https://wiki.postgresql.org/wiki/SlowQueryQuestions
>>>
>>>
>>> *Michael Lewis  |  Software Engineer*
>>> *Entrata*
>>> *c: **619.370.8697 <619-370-8697>*
>>>
>>> Michael - Yes correct the data of 100 million rows is distributed to all
>> the partitions.
>> FInally I feel we have come to conclusion after we changed the
>> random_page_cost from 4 to 2 in Non prod instance and we see improvements
>> in the query which use to take from 1 minute to 1 -2 seconds.
>> That's correct we have around 490 million rows in few of our partition
>> tables. The partition tables are created for every 7 days.
>>
>> We ran our API test which hits the backend database Aurora RDS PostgreSQL
>> and see our query response times , requests/sec are better than before. Do
>> you recommend this setting on a Production instance? Right now we are
>> planning to go
>> implement this option of random_page_cost to 2. ( We know it also works
>> if we turn off the hash_join but we dont want to implement this change but
>> rather use random_page_cost to 2).
>>
>> Questions.
>> 1) What is your recommendation on this ? Can we modify this change on
>> Prod instance which is performing better today or only keep this change to
>> Non prod instance ?. ( Actually we want to implement this change on Non
>> Prod instance)
>> 2) Are there are any downside in changing this value ?.
>> 3) Also another question we have a 64TB storage and running on r4 16 x
>> large EC2 instance. Few of our partitions are bigger and few of them are
>> smaller. We have data from last  9 months and we are planning to keep the
>> data for about
>> close to 1 year till May. Do you see any concerns on this ?. Eventually
>> we are thinking to archive this data in next 2 months by dropping of older
>> partitions.
>> 3) What could be the problems of keeping the data longer if there is a
>> regular maintenance like  VACUUM and other maintenace activities
>>
>>
>>
>>>
>>> On Thu, Feb 14, 2019 at 8:48 AM github kran 
>>> wrote:
>>>


 On Wed, Feb 13, 2019 at 11:38 AM Michael Lewis 
 wrote:

> I didn't see your email yesterday, sorry about that. Index scans
> instead of sequential scans and nested loop instead of hash join means 
> that
> you have bad row count estimates on "Non prod Aurora R

Re: FDW, too long to run explain

2019-02-17 Thread Jeff Janes
On Mon, Feb 4, 2019 at 2:15 AM Vijaykumar Jain  wrote:

>
> now we have some long running queries via FDW that take minutes and get
> killed explain runs as idle in transaction on remote servers.
>

Are you saying the EXPLAIN itself gets killed, or execution of the plan
generated based on the EXPLAIN (issued under use_remote_estimate = true)
gets killed?  Who is doing the killing, the local side or the foreign
side?  Can you include verbatim log entries for this?


> now the explain for
> select * from sh01.view1  keeps running for minutes sometimes,
>
> then fetch too keeps running for minutes, although the total rows are <
> 1 maybe.
> idle in transaction | FETCH 1 FROM c1
>

What is this?  Is it from some monitoring tool, or pg_stat_activity, or
what?  And is it on the local side or the foreign side?


> other questions:
> also, what is the cost of fetch_size?
>

It will always fetch rows from the foreign server in this sized chunks.  A
larger fetch_size will have less network latency and computational overhead
if many rows are going to be consumed, but also consume more memory on the
local server as all rows are stored in memory per each chunk.  Also, in the
case of a LIMIT, it reads a large number of rows even if most of them may
be unneeded.  Conceptually, the LIMIT could be used to modify the FETCH
downward to match the LIMIT, but that is not implemented.  In the case of a
view over UNION ALL, I don't think the individual subqueries even know what
the global LIMIT is.


>
> I mean given we have a query
>
> select * from foobar limit 1; via FDW
> limit 1 does not get pushed.
> so it seems all rows some to FDW node and then limit is applied?
>

It should not read all rows.  It should read as many multiples of
fetch_size as needed, which should just be 1 multiple in this case.

Cheers,

Jeff

>


Re: [External] Re: FDW, too long to run explain

2019-02-17 Thread Vijaykumar Jain
Assuming your questions as 1,2,3, please find my answers below.

1)"explain" on foreign servers run as "idle in transactions". coz they were
running very long (in the order of some minutes) , pgbouncer (in tx level
pooling) setting kill them (as idle in tx time limit exceeded of 5 mins) or
else results in too many connections piling up.
2)yes, i get those from pg_stat_activity, it truncates the full statement,
but  it shows up as * EXPLAIN select col1, col2  *  00:00:44 | idle in
transaction (this is just one of the screenshots i have). (on the foreign
side)
3)yes, i think we kind of understood that part (fetch and memory), but i am
not sure if that is used as any hint in plan generation too. i am sorry, i
did not put auto explain on, on foreign servers, as that required a restart
of the server.

(this is the real content of the screenshot ,yes 13 mins), masking the
colname and viewname
20678 | 00:13:38.990025 | EXPLAIN SELECT cols from view | idle in
transaction

 the explain analyze of the same query on the foreign server is in ms.


I am sorry, i am vague about the queries in the email. i cannot reproduce
it, as we do not have multiple shards of 500G in my qa environment and i
cannot take dump of prod to test that in our test env coz of gdpr :)
but as i said in the mail, we were speculating since limit was not passed,
the plans may have been bad. We tricked the foreign server by using a
sample column to fake limit push down, and now have improved response
times. We made vaccum/analyze very aggressive to ensure stats are never
stale after large updates or deletes.

Unless someone can else reproduce, I guess, i'll close this mail. (I'll try
to reproduce it myself again, but for now i have less data to share to
convince anyone that happened.




Regards,
Vijay


On Sun, Feb 17, 2019 at 11:11 PM Jeff Janes  wrote:

> On Mon, Feb 4, 2019 at 2:15 AM Vijaykumar Jain 
> wrote:
>
>>
>> now we have some long running queries via FDW that take minutes and get
>> killed explain runs as idle in transaction on remote servers.
>>
>
> Are you saying the EXPLAIN itself gets killed, or execution of the plan
> generated based on the EXPLAIN (issued under use_remote_estimate = true)
> gets killed?  Who is doing the killing, the local side or the foreign
> side?  Can you include verbatim log entries for this?
>

explain on foreign servers run as "idle in transactions". coz they were
running very long (in the order of some minutes) , pgbouncer setting kill
them (as idle in tx time limit exceeded of 5 mins) or else results in too
many connections piling up.


>
>> now the explain for
>> select * from sh01.view1  keeps running for minutes sometimes,
>>
>> then fetch too keeps running for minutes, although the total rows are <
>> 1 maybe.
>> idle in transaction | FETCH 1 FROM c1
>>
>
> What is this?  Is it from some monitoring tool, or pg_stat_activity, or
> what?  And is it on the local side or the foreign side?
>
yes, pg_stat_activity, it truncates the full statement, but  it shows up as
* EXPLAIN select col1, col2  *  00:00:44 | idle in transaction (this is
just one of the screenshots i have). (on the foreign side)


>
>
>> other questions:
>> also, what is the cost of fetch_size?
>>
>
> It will always fetch rows from the foreign server in this sized chunks.  A
> larger fetch_size will have less network latency and computational overhead
> if many rows are going to be consumed, but also consume more memory on the
> local server as all rows are stored in memory per each chunk.  Also, in the
> case of a LIMIT, it reads a large number of rows even if most of them may
> be unneeded.  Conceptually, the LIMIT could be used to modify the FETCH
> downward to match the LIMIT, but that is not implemented.  In the case of a
> view over UNION ALL, I don't think the individual subqueries even know what
> the global LIMIT is.
>
Yep, i guess that is where i think the plan may have

>
>
>>
>> I mean given we have a query
>>
>> select * from foobar limit 1; via FDW
>> limit 1 does not get pushed.
>> so it seems all rows some to FDW node and then limit is applied?
>>
>
> It should not read all rows.  It should read as many multiples of
> fetch_size as needed, which should just be 1 multiple in this case.
>
Yep, i think we kind of understood that part, but i am not sure if that is
used to generate the plan too. i am sorry, i did not put auto explain on,
on foreign servers, as that required a restart of the server.

>
> Cheers,
>
> Jeff
>
>>


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-17 Thread Andres Freund
Hi,

On 2019-02-17 23:29:09 +1300, Thomas Munro wrote:
> Hmm.  Well, at least ENOSPC should be treated the same way as EIO.
> Here's an experiment that seems to confirm some speculations about NFS
> on Linux from the earlier threads:

I wish we'd' a good way to have test scripts in the tree for something
like that, but using postgres. Unfortunately it's not easy to write
portable setup scripts for it.


> So far I still think that we should panic if fsync() returns any error
> number at all.  For sync_file_range(), it sounds like maybe you think
> we should leave the warning-spewing in there for ENOSYS, to do exactly
> what we did before on principle since that's what back-branches are
> all about?  Something like:
> 
>   ereport(errno == ENOSYS ? WARNING : data_sync_elevel(WARNING),
> 
> Perhaps for master we could skip it completely, or somehow warn just
> once, and/or switch to one of our other implementations at runtime?  I
> don't really have a strong view on that, not being a user of that
> system.  Will they ever implement it?  Are there other systems we care
> about that don't implement it?  (Android?)

I'm not sure I see much need for leaving the warning in out of
principle. Feels like we ought to sync_file_range once at postmaster
startup and then just force-disable the flush GUCs if not available?

Greetings,

Andres Freund



Re: FDW, too long to run explain

2019-02-17 Thread Jeff Janes
On Sun, Feb 17, 2019 at 12:41 PM Jeff Janes  wrote:

> On Mon, Feb 4, 2019 at 2:15 AM Vijaykumar Jain 
> wrote:
>
>>
>> now we have some long running queries via FDW that take minutes and get
>> killed explain runs as idle in transaction on remote servers.
>>
>
> Are you saying the EXPLAIN itself gets killed, or execution of the plan
> generated based on the EXPLAIN (issued under use_remote_estimate = true)
> gets killed?  Who is doing the killing, the local side or the foreign
> side?  Can you include verbatim log entries for this?
>

After thinking about it a bit more, I think I see the issue here.  The
EXPLAIN pursuant to use_remote_estimate is issued in the same remote
transaction as the following DECLARE and FETCH's are.  But after the
EXPLAIN is issued, the local server executes the query for a different FDW
to satisfy some other branch of the UNION ALL, giving the first FDW
connection time to do an idle-in-transaction timeout.  This happens even if
no rows need to fetched from that FDW, because another branch of the UNION
ALL satisfied the LIMIT.

A question for the PostgreSQL hackers would be, Is it necessary and
desirable that the EXPLAIN be issued in the same transaction as the
eventual DECLARE and FETCHes?  I don't think it is.  I guess if the foreign
side table definition got changed between EXPLAIN and DECLARE it would
cause problems, but changing the foreign side definition out of sync with
the local side can cause problems anyway, so is that important to preserve?

Changing that might narrow but not completely fix the problem, as there
might still be delays between the DECLARE and the FETCH or between
successive FETCHes.

So a question for you would be, why do have such an aggressive setting
for idle_in_transaction_session_timeout that it causes this to happen?
Couldn't you relax it, perhaps just for the role used for the FDW
connections?

Cheers,

Jeff

>


Re: FDW, too long to run explain

2019-02-17 Thread Tom Lane
Jeff Janes  writes:
> A question for the PostgreSQL hackers would be, Is it necessary and
> desirable that the EXPLAIN be issued in the same transaction as the
> eventual DECLARE and FETCHes?  I don't think it is.

It seems like a good idea to me.  I certainly don't think "I've got
an idle-in-transaction timeout on the remote that's shorter than my
local transaction runtime" is a plausible argument for changing that.
You could trip over that with a slow query regardless of whether we
separated the EXPLAIN step, just because there's no guarantee how
often we'll ask the FDW to fetch some rows.

> I guess if the foreign
> side table definition got changed between EXPLAIN and DECLARE it would
> cause problems, but changing the foreign side definition out of sync with
> the local side can cause problems anyway, so is that important to preserve?

I believe that the EXPLAIN will leave the remote transaction holding
AccessShareLock on the query's tables, meaning that doing it in one
transaction provides some positive protection against such problems,
which we'd lose if we changed this.

regards, tom lane



Re: [External] Re: FDW, too long to run explain

2019-02-17 Thread Jeff Janes
On Sun, Feb 17, 2019 at 1:52 PM Vijaykumar Jain  wrote:

> Assuming your questions as 1,2,3, please find my answers below.
>
> 1)"explain" on foreign servers run as "idle in transactions". coz they
> were running very long (in the order of some minutes) , pgbouncer (in tx
> level pooling) setting kill them (as idle in tx time limit exceeded of 5
> mins) or else results in too many connections piling up.
> 2)yes, i get those from pg_stat_activity, it truncates the full statement,
> but  it shows up as * EXPLAIN select col1, col2  *  00:00:44 | idle in
> transaction (this is just one of the screenshots i have). (on the foreign
> side)
>

You are misinterpreting that data.  The EXPLAIN is not currently running.
It is the last statement that was running prior to the connection going
idle-in-transaction.  See my just previous email--I think the reason it is
idle is that the local is servicing some other part of the query (probably
on a different FDW), and that is taking a long time.

Are all the connections piling up from postgres_fdw, or are many of them
from other applications?  I think your timeout is just shifting symptoms
around without fixing the underlying problem, while also making that
underlying problem hard to diagnose.



> 3)yes, i think we kind of understood that part (fetch and memory), but i
> am not sure if that is used as any hint in plan generation too.
>

The query is planned as part of a cursor.  As such, it will
use cursor_tuple_fraction as the "hint".  Perhaps you could tweak this
parameter on the foreign side.  I think that a low setting for this
parameter should give similar plans as a small LIMIT would give you, while
large settings would give the same plans as a large (or no) LIMIT would.

I think postgres_fdw should pass does the LIMIT when it can do so, but it
doesn't currently.

Cheers,

Jeff

>


Re: [External] Re: FDW, too long to run explain

2019-02-17 Thread Vijaykumar Jain
Hey Jeff,

yes, we now relaxed the idle in transaction setting to 15 mins.

i was hesitant to increase the settings as it blocked auto vaccum. We use
hot_standby_feedback = true also as we split reads/writes and allow long
running queries on read replicas, this too affects auto vaccum.
so overall, all the options i set to ensure auto vaccum gets triggered get
impacted by increased idle in tx and hot_standby_feedback = true, both of
which seem to be necessary for the setup now.

we have been trying to work with sharding using (mutli coordinator FDW) on
our own (and have been successful although have hiccups), using directory
based sharding in pg10. (if we cannot handle growth, all goes to mongo for
its automatic sharding and failover)

I have to admit we can do better here though. we need to rebalance the data
in the shards when we come close to 90% disk. those are long delete/upsert
queries. We have very aggressive autovaccum to ensure we do not have a lot
of stale stats.
I have plans to rearchitect the whole setup with pg11 where we plan to
introduce time based sharding and then table partitioning in each shard
further by time and also use Materialized views, for day old data with pre
aggregated fields on each shard so that explain does not have to work too
hard :)

and then create foreign tables and attach them as partitions. similar to
https://github.com/MasahikoSawada/pgconf-asia-demo/tree/c47e25bf589c7d401c9d342329b400ec26eb61db

i guess, i am diverting the query, but just saying :)
Thanks for suggestions and help Jeff. Appreciate it.

Regards,
Vijay


On Mon, Feb 18, 2019 at 12:39 AM Tom Lane  wrote:

> Jeff Janes  writes:
> > A question for the PostgreSQL hackers would be, Is it necessary and
> > desirable that the EXPLAIN be issued in the same transaction as the
> > eventual DECLARE and FETCHes?  I don't think it is.
>
> It seems like a good idea to me.  I certainly don't think "I've got
> an idle-in-transaction timeout on the remote that's shorter than my
> local transaction runtime" is a plausible argument for changing that.
> You could trip over that with a slow query regardless of whether we
> separated the EXPLAIN step, just because there's no guarantee how
> often we'll ask the FDW to fetch some rows.
>
> > I guess if the foreign
> > side table definition got changed between EXPLAIN and DECLARE it would
> > cause problems, but changing the foreign side definition out of sync with
> > the local side can cause problems anyway, so is that important to
> preserve?
>
> I believe that the EXPLAIN will leave the remote transaction holding
> AccessShareLock on the query's tables, meaning that doing it in one
> transaction provides some positive protection against such problems,
> which we'd lose if we changed this.
>
> regards, tom lane
>


Re: [External] Re: FDW, too long to run explain

2019-02-17 Thread Vijaykumar Jain
Regards,
Vijay


On Mon, Feb 18, 2019 at 12:56 AM Jeff Janes  wrote:

> On Sun, Feb 17, 2019 at 1:52 PM Vijaykumar Jain 
> wrote:
>
>> Assuming your questions as 1,2,3, please find my answers below.
>>
>> 1)"explain" on foreign servers run as "idle in transactions". coz they
>> were running very long (in the order of some minutes) , pgbouncer (in tx
>> level pooling) setting kill them (as idle in tx time limit exceeded of 5
>> mins) or else results in too many connections piling up.
>> 2)yes, i get those from pg_stat_activity, it truncates the full
>> statement, but  it shows up as * EXPLAIN select col1, col2  *  00:00:44
>> | idle in transaction (this is just one of the screenshots i have). (on the
>> foreign side)
>>
>
> You are misinterpreting that data.  The EXPLAIN is not currently running.
> It is the last statement that was running prior to the connection going
> idle-in-transaction.  See my just previous email--I think the reason it is
> idle is that the local is servicing some other part of the query (probably
> on a different FDW), and that is taking a long time.
>
Ok, i raked this from the logs where enabled log_min_duration_statement =
10s

2019-01-31 12:48:18 UTC LOG:  duration: 29863.311 ms  statement: EXPLAIN
SELECT blah, FROM public.view WHERE ((scheduled_bdt >= '2019-01-20'::date))
AND ((scheduled_bdt <= '2019-01-26'::date)) AND ((somekey = ANY
('{269029,123399,263164,261487}'::bigint[])))   (both the columns are
indexed)

>
> Are all the connections piling up from postgres_fdw, or are many of them
> from other applications?  I think your timeout is just shifting symptoms
> around without fixing the underlying problem, while also making that
> underlying problem hard to diagnose.
>
same application, but when more than one person is using the analytical
tool that runs the underlying query.

>
>
>
>> 3)yes, i think we kind of understood that part (fetch and memory), but i
>> am not sure if that is used as any hint in plan generation too.
>>
>
> The query is planned as part of a cursor.  As such, it will
> use cursor_tuple_fraction as the "hint".  Perhaps you could tweak this
> parameter on the foreign side.  I think that a low setting for this
> parameter should give similar plans as a small LIMIT would give you, while
> large settings would give the same plans as a large (or no) LIMIT would.
>
> I think postgres_fdw should pass does the LIMIT when it can do so, but it
> doesn't currently.
>
As i already said, we have overcome the limit issue with a fake sample
column in the huge tables. that way we limit the number of rows on the
foreign server itself before the fetch. this is not the best and has its
edge cases, but yeah, it works for now.


>
> Cheers,
>
> Jeff
>
>>


Re: FDW, too long to run explain

2019-02-17 Thread auxsvr
On Sunday, 17 February 2019 20:58:47 EET Jeff Janes wrote:
> A question for the PostgreSQL hackers would be, Is it necessary and
> desirable that the EXPLAIN be issued in the same transaction as the
> eventual DECLARE and FETCHes?  I don't think it is.  I guess if the foreign
> side table definition got changed between EXPLAIN and DECLARE it would
> cause problems, but changing the foreign side definition out of sync with
> the local side can cause problems anyway, so is that important to preserve?

Won't separate transactions cause issues if the statistics of the table change 
in the meantime in a way that affects the plan?

> Cheers,
> 
> Jeff
-- 
Regards,
Peter





Re: [External] Re: FDW, too long to run explain

2019-02-17 Thread Jeff Janes
On Sun, Feb 17, 2019 at 2:37 PM Vijaykumar Jain  wrote:

>
> Ok, i raked this from the logs where enabled log_min_duration_statement =
> 10s
>
> 2019-01-31 12:48:18 UTC LOG:  duration: 29863.311 ms  statement: EXPLAIN
> SELECT blah, FROM public.view WHERE ((scheduled_bdt >= '2019-01-20'::date))
> AND ((scheduled_bdt <= '2019-01-26'::date)) AND ((somekey = ANY
> ('{269029,123399,263164,261487}'::bigint[])))   (both the columns are
> indexed)
>

That is interesting.  Was that in the logs for the local or the foreign
side?  And is it common, or rare?

If on the local side, could it be that the EXPLAINs sent to the foreign
side are being made to wait by the connection pooler, leading to long
delays?  If that is from the foreign side, then it should be conceptually
unrelated to FDW.  Any chance you could reproduce the slowness in your test
environment?  Slowness in the planner is probably related to the schema
structure, not the data itself.

I don't think this would be related to the idle-in-transaction, except that
one FDW connection maybe idle-in-transaction after its EXPLAIN is done
because it is waiting for another FDW connection to slowly run its EXPLAIN.

Cheers,

Jeff

>


Re: [External] Re: FDW, too long to run explain

2019-02-17 Thread Vijaykumar Jain
Ok.
I’ll try to work on it this week and see if i am able to reproduce anything.

On Mon, 18 Feb 2019 at 2:30 AM Jeff Janes  wrote:

>
>
> On Sun, Feb 17, 2019 at 2:37 PM Vijaykumar Jain 
> wrote:
>
>>
>> Ok, i raked this from the logs where enabled log_min_duration_statement =
>> 10s
>>
>> 2019-01-31 12:48:18 UTC LOG:  duration: 29863.311 ms  statement: EXPLAIN
>> SELECT blah, FROM public.view WHERE ((scheduled_bdt >= '2019-01-20'::date))
>> AND ((scheduled_bdt <= '2019-01-26'::date)) AND ((somekey = ANY
>> ('{269029,123399,263164,261487}'::bigint[])))   (both the columns are
>> indexed)
>>
>
> That is interesting.  Was that in the logs for the local or the foreign
> side?  And is it common, or rare?
>
> If on the local side, could it be that the EXPLAINs sent to the foreign
> side are being made to wait by the connection pooler, leading to long
> delays?  If that is from the foreign side, then it should be conceptually
> unrelated to FDW.  Any chance you could reproduce the slowness in your test
> environment?  Slowness in the planner is probably related to the schema
> structure, not the data itself.
>
> I don't think this would be related to the idle-in-transaction, except
> that one FDW connection maybe idle-in-transaction after its EXPLAIN is done
> because it is waiting for another FDW connection to slowly run its EXPLAIN.
>
> Cheers,
>
> Jeff
>
>> --

Regards,
Vijay


Re: [External] Re: FDW, too long to run explain

2019-02-17 Thread Jeff Janes
On Sun, Feb 17, 2019 at 6:32 AM Vijaykumar Jain  wrote:

> I am yet to figure out the reason, what we have done is implement fake
> columns to represent samples and giving them random numbers and keeping
> other bulls to fake limit.
>
> Most of the queries that were impacted were the ones that did not push
> order by and limit to foreign servers.
> I am also trying to upgrade pg11 to make use of parallelisation.
>

postgres_fdw operates through declared cursors, and declared cursors
inhibit parallel query.  This doesn't change in v11, see
https://www.postgresql.org/docs/11/when-can-parallel-query-be-used.html

I'm not aware of any other changes in v11 that are likely to help you out.

Cheers,

Jeff

>


pg_stat_statements doesn't track commit from pl/pgsql blocks

2019-02-17 Thread legrand legrand
Hello,

It seems that pgss doesn't track commit (nor rollback) commands from
pl/pgsql blocks.
using psql in version 11.1:

select pg_stat_statements_reset();
do $$ begin commit; end $$;
select calls,query from pg_stat_statements;

...

I don't know how difficult it would be to fix it,
but this could help in trouble shouting.

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Table Inheritance and Foreign Keys

2019-02-17 Thread Riaan Stander
Good day

 

I've got a scenario where I'm building a table structure that allows for
building a tree. In order to properly separate the different column
requirements I've looked at using table inheritance.

Example structure:

 

CREATE TABLE base_table

(

idbigint primary key,

parent_idbigint,

some_common_columns ..

)

 

CREATE TABLE folder

(

idbigint primary key,

some_folder_specific_columns .

) INHERITS(base_table)

 

CREATE TABLE node_type_x

(

idbigint primary key,

some_node_x_specific_columns..

) INHERITS (base_table)

 

Now obviously when inserting folders and node type x records they would go
to their respective tables, but with the table inheritance I can query
base_table to get them all.

The rules state that folders can only be inserted under folders, but node
type x can go under folders or other node type x's.

 

Now if I want to add a foreign key to the base table from node type x I get
an error due to Postgres not supported this scenario. The documentation
talks about this, but also state as follows

"These deficiencies will probably be fixed in some future release, but in
the meantime considerable care is needed in deciding whether inheritance is
useful for your application"

 

My question is if anybody knows more as to if and when this actually will be
allowed/solved.

 

Regards

Riaan



Re: Table Inheritance and Foreign Keys

2019-02-17 Thread Tom Lane
"Riaan Stander"  writes:
> I've got a scenario where I'm building a table structure that allows for
> building a tree. In order to properly separate the different column
> requirements I've looked at using table inheritance.
> [ but can't point a foreign key at an inheritance tree ]
> My question is if anybody knows more as to if and when this actually will be
> allowed/solved.

Don't hold your breath :-(.  It's been like that for decades.

We do, as of v11, support foreign keys referencing partitioned tables,
but that relies on the ability to declare a unique constraint across
a partitioned table, which in turn relies on there being a known
partitioning key (which has to be part of the unique column set).

Generic inheritance doesn't have that much semantic knowledge about
how the data is divvied up, so it's not really practical to support
these things in a generic inheritance tree.

Probably we should revise that bit of documentation to point people
at the partitioning features; I doubt anything's ever going to be
done about this for generic inheritance.

regards, tom lane



Re: Channel binding not supported using scram-sha-256 passwords

2019-02-17 Thread Michael Paquier
On Fri, Feb 15, 2019 at 04:18:40PM -0500, Hugh Ranalli wrote:
> I did see that. However, I'm not *trying* to use it. I set up accounts with
> scram-sha-256 passwords, and when trying to connect I get this message.
> Hence why I tried to disable it.

tls-server-end-point is implemented as channel binding type, and the
only things which got removed as the connection parameter
scram_channel_binding and the channel binding type tls-unique.  So if
you use SSL then channel binding will be used.

On my side, if I connect to a server built with SSL and SCRAM then
channel binding is used and works.

Now, the error message "channel binding not supported by this build"
would show up by either the backend or the frontend if
X509_get_signature_nid() is not present in the version of OpenSSL your
version of libpq (for the frontend) or your backend are linked to.
This function has been added in OpenSSL 1.0.2, so it seems to me that
you have an OpenSSL version mismatch between your client and the
server.  My guess is that the client uses OpenSSL 1.0.2, but the
server is linked to OpenSSL 1.0.1 or older.

(Note: I am not seeing anything bad in the code.)
--
Michael


signature.asc
Description: PGP signature


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-17 Thread Michael Paquier
On Sun, Feb 17, 2019 at 10:54:54AM -0800, Andres Freund wrote:
> On 2019-02-17 23:29:09 +1300, Thomas Munro wrote:
>> Hmm.  Well, at least ENOSPC should be treated the same way as EIO.
>> Here's an experiment that seems to confirm some speculations about NFS
>> on Linux from the earlier threads:
> 
> I wish we'd' a good way to have test scripts in the tree for something
> like that, but using postgres. Unfortunately it's not easy to write
> portable setup scripts for it.

Yes, it seems to me as well that ENOSPC should be treated as much as
EIO.  Just looking at the code for data_sync_retry we should really
have some errno filtering.
--
Michael


signature.asc
Description: PGP signature


Re: Promoted slave tries to archive previously archived WAL file

2019-02-17 Thread Michael Paquier
On Fri, Feb 15, 2019 at 12:03:39PM -0800, Andre Piwoni wrote:
> My slave nodes were created using pg_basebackup with --wal-method=stream.
> If I understand right this option streams WAL files generated during backup
> and this WAL file was 00010002 but its contents were
> different from what was on the primary and in WAL archive. When I changed
> --wal-method=fetch which collects WAL files generated during the backup at
> the end then diff did not detect any changes. When I failover I don't seem
> to have the issue with new primary to archive.
> 
> It seems streaming WAL segments created during backup produced corrupt file
> based on diff.

Nope, --wal-method=stream and fetch behave as they are designed for.
"fetch" would include all the contents of pg_wal/ as part of a
BASE_BACKUP command using the replication protocol while "stream"
would make the base backup avoid copying all WAL files, copying them
in parallel with a secondary process which uses the same facility as
pg_receiverwal, in which case the set of files in
pg_wal/archive_status/ has little meaning.

I am also curious about the way you use to do failovers.  Let me
guess: you stop the standby, delete its recovery.conf and then restart
the former standby?  This would prevent a timeline jump at promotion
which would explain the conflicts you are seeing when archiving two
times the same segment.
--
Michael


signature.asc
Description: PGP signature


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-17 Thread Thomas Munro
On Mon, Feb 18, 2019 at 2:19 PM Michael Paquier  wrote:
> On Sun, Feb 17, 2019 at 10:54:54AM -0800, Andres Freund wrote:
> > On 2019-02-17 23:29:09 +1300, Thomas Munro wrote:
> >> Hmm.  Well, at least ENOSPC should be treated the same way as EIO.
> >> Here's an experiment that seems to confirm some speculations about NFS
> >> on Linux from the earlier threads:
> >
> > I wish we'd' a good way to have test scripts in the tree for something
> > like that, but using postgres. Unfortunately it's not easy to write
> > portable setup scripts for it.
>
> Yes, it seems to me as well that ENOSPC should be treated as much as
> EIO.  Just looking at the code for data_sync_retry we should really
> have some errno filtering.

I agree with you up to a point:  It would make some amount of sense
for data_sync_elevel() not to promote to PANIC if errno == ENOSYS;
then for sync_file_range() you'd get WARNING and for fsync() you'd get
ERROR (since that's what those call sites pass in) on hypothetical
kernels that lack those syscalls.  As I argued earlier, ENOSYS seems
to be the only errno that we know for sure to be non-destructive to
the page cache since it promises it didn't run any kernel code at all
(or rather didn't make it past the front door), so it's the *only*
errno that belongs on such a whitelist IMHO.  That would get us back
to where we were for WSL users in 11.1.

The question is whether we want to go further than that and provide a
better experience for WSL users, now that we know that it was already
spewing warnings.  One way to do that might be not to bother with
errno filtering at all, but instead (as Andres suggested) do a test of
whether sync_file_range() is implemented on this kernel/emulator at
startup and if not, just disable it somehow.  Then we don't need any
filtering.

Here is a restatement of my rationale for not including errno
filtering in the first place:  Take a look at the documented errnos
for fsync() on Linux.  Which of those tell us that it's sane to retry?
 AFAICS they all either mean that it ran filesystem code that is known
to toast your data on error (data loss has already occurred), or your
file descriptor is bogus (code somewhere is seriously busted and all
bets are off).

-- 
Thomas Munro
http://www.enterprisedb.com