Re: psql cli tool and connection pooling

2018-12-28 Thread Sebastiaan Mannem
Hi D.J.,

Hope this helps.

Generally, I tend to think of it like there are three separate features 
provided by connection poolers:
Connection pooling, where you are trying to save connection overhead
Offloading read-only queries to standby’s
Delivering transparent client failover, where you can failover master/standby 
transparent to client connections

Depending on the solution you choose, it might implement some of these features.
Reading your mail, you are looking for all of them, and are not clear yet which 
to focus on.
I would bring in a specialist at this moment, but let me try to give you a head 
start:

I am aware if wo main connection pooling implementations and they all deliver 
some of these features:
The one built into the application language
Java has a connection pooling mechanisme built in
.NET has one too
There might be others 
Libpq has native functionality for transparent client failover (psql is based 
on libpq)
Connection poolers that mimic a postgres backend,
Pgpool-II is one like that
Pgbouncer is another example
There are others, but let's stick to these two for now.

Since you mention psql, the first implementation will not help you that much 
(except for transparent client failover).
The second implementation will do what you require. You connect to the pooler, 
and the pooler connects to postgres.
To psql, connecting to the pooler is transparent. He connects to a port and 
gets a Postgres connection.
What happens in the background of that, is transparent.

Now, getting into your comments / questions:
> I would love to see a feature on psql cli tool where i can point it to a 
> connection pool provider back end instead of a postgres host.
Great, look at Pgpool-II and PgBouncer. They have overlapping use cases, but 
depending on the exact situation, might be that one fits better than the other.

> I read that pgpool2 emulates a postgres server, but i've never tried it myself
Yes it does (as do all connection poolers that mimic a postgres backend,)

> I know that some languages provide connection pooling, but i'm relying 
> heavily on psql and its features and i think built-in master/slave connection 
> pooling in psql cli would make my day, and that of any other dirty bash 
> scripters out there.
Sound like you are looking for client connection failover here.
Read this: 
https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING 
 
and specifically '34.1.1.3. Specifying Multiple Hosts’ for the most basic 
approach to implement this.
> 
> Can anyone comment on their experience with pgpool2 in a high activity 
> production environment?
Yes. It works, and depending on your use case, it can even add performance 
enhancing options.
On the other hand, it tries to fix many things in one tool, and that makes it a 
complex solution too.
And it adds limitations to the solution too. I have seen a lot of 
implementations, where people focussed on one thing, but neglected another 
important thing.
My best advice is: Bring in a specialist for this one.

> Are there other tools or suggestions anyone can point me to?
Well, read the documentation on Pgpool-II: 
http://www.pgpool.net/mediawiki/index.php/Documentation 

And look into PGBouncer too: https://pgbouncer.github.io/faq.html 


> Is there any appetite to support connection pooling natively by either the 
> postmaster or the psql cli or some other device that could be contrib to the 
> source tree?
There is client failover in libpq.
I think t was specifically decided to not fix connection pooling in core, since 
fixing it in the app layer / external connection poolers keeps Postgres cor 
code cleaner.
And there are a lot of situations, where you want connection pooler features, 
so let's keep lean code for that.
Fixing the 'read-only queries’ feature must be done on the  client side at all 
times.

> Does it even matter? Is server version 10 ddos-proof, other than 
> max_connections?
There is no real DDOS proof. In the end, any system can be brought down by a 
DDOS attack if done under the right circumstances.
And all mitigations for DDOS can be circumvented in one way or another.
This is not specific to Postgres. It is a very generic thing.
You can build a very DDOS-resilient solution with postgres. But that greatly 
depends on what you want to mitigate and how much effort you want to put into 
it.

An example is connection exhaustion: You can manage that in a lot of ways
Superuser connections vs normal connections
Limit max connections per user
You can do a lot with customer logon triggers
etc.
But every mitigation needs some thinking, setting some limit, and depending on 
what you want to do, you might need to code (like a logon trigger).

EnterpriseDB has a lot of experience with this regard. And we have a product 
that even extents possibilities her

Re: Query Performance Issue

2018-12-28 Thread Alexey Bashtanov



*https://explain.depesz.com/s/Pra8a*


Could you share the query itself please?
And the tables definitions including indexes.


work_mem : 8MB
That's not a lot. The 16-batches hash join may have worked faster if you 
had resources to increase work_mem.




Re: Query Performance Issue

2018-12-28 Thread Justin Pryzby
On Thu, Dec 27, 2018 at 10:25:47PM +0300, neslişah demirci wrote:
> Have this explain analyze output :
> 
> *https://explain.depesz.com/s/Pra8a *

Row counts are being badly underestimated leading to nested loop joins:
|Index Scan using 
product_content_recommendation_main2_recommended_content_id_idx on 
product_content_recommendation_main2 prm (cost=0.57..2,031.03 ROWS=345 width=8) 
(actual time=0.098..68.314 ROWS=3,347 loops=1)
|Index Cond: (recommended_content_id = 3371132)
|Filter: (version = 1)

Apparently, recommended_content_id and version aren't independent condition,
but postgres thinks they are.

Would you send statistics about those tables ? MCVs, ndistinct, etc.
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

I think the solution is to upgrade (at least) to PG10 and CREATE STATISTICS
(dependencies).

https://www.postgresql.org/docs/10/catalog-pg-statistic-ext.html
https://www.postgresql.org/docs/10/sql-createstatistics.html
https://www.postgresql.org/docs/10/planner-stats.html#PLANNER-STATS-EXTENDED
https://www.postgresql.org/docs/10/multivariate-statistics-examples.html

Justin



Re: Query Performance Issue

2018-12-28 Thread David Rowley
On Sat, 29 Dec 2018 at 04:32, Justin Pryzby  wrote:
> I think the solution is to upgrade (at least) to PG10 and CREATE STATISTICS
> (dependencies).

Unfortunately, I don't think that'll help this situation. Extended
statistics are currently only handled for base quals, not join quals.
See dependency_is_compatible_clause().

It would be interesting to see how far out the estimate is without the
version = 1 clause.  If just the recommended_content_id clause is
underestimated enough it could be enough to have the planner choose
the nested loop. Perhaps upping the stats on that column may help, but
it may only help so far as to reduce the chances of a nested loop. If
the number of distinct recommended_content_id values is higher than
the statistic targets and is skewed enough then there still may be
some magic values in there that end up causing a bad plan.

It would also be good to know what random_page_cost is set to, and
also if effective_cache_size isn't set too high.  Increasing
random_page_cost would help reduce the chances of this nested loop
plan, but it's a pretty global change and could also have a negative
effect on other queries.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Query Performance Issue

2018-12-28 Thread Justin Pryzby
On Thu, Dec 27, 2018 at 10:25:47PM +0300, neslişah demirci wrote:
> Have this explain analyze output :
> 
> *https://explain.depesz.com/s/Pra8a *

On Sat, Dec 29, 2018 at 07:58:28PM +1300, David Rowley wrote:
> On Sat, 29 Dec 2018 at 04:32, Justin Pryzby  wrote:
> > I think the solution is to upgrade (at least) to PG10 and CREATE STATISTICS
> > (dependencies).
> 
> Unfortunately, I don't think that'll help this situation. Extended
> statistics are currently only handled for base quals, not join quals.
> See dependency_is_compatible_clause().

Right, understand.

Corrrect me if I'm wrong though, but I think the first major misestimate is in
the scan, not the join:

|Index Scan using 
product_content_recommendation_main2_recommended_content_id_idx on 
product_content_recommendation_main2 prm (cost=0.57..2,031.03 rows=345 width=8) 
(actual time=0.098..68.314 rows=3,347 loops=1)
|Index Cond: (recommended_content_id = 3371132)
|Filter: (version = 1)
|Rows Removed by Filter: 2708

Justin