impact of auto explain on overall performance

2019-03-14 Thread Stephan Schmidt
Hello,

i’m currently working on a high Performance Database and want to make sure that 
whenever there are slow queries during regular operations i’ve got all 
Information about the query in my logs. So auto_explain come to mind, but the 
documentation explicitly states that it Comes at a cost. My Question is, how 
big is the latency added by auto_explain in percentage or ms ?

Best regards,

stephan


Re: impact of auto explain on overall performance

2019-03-14 Thread Justin Pryzby
On Thu, Mar 14, 2019 at 07:29:17AM +, Stephan Schmidt wrote:
> i’m currently working on a high Performance Database and want to make sure 
> that whenever there are slow queries during regular operations i’ve got all 
> Information about the query in my logs. So auto_explain come to mind, but the 
> documentation explicitly states that it Comes at a cost. My Question is, how 
> big is the latency added by auto_explain in percentage or ms ?

https://www.postgresql.org/docs/current/auto-explain.html
|log_analyze
...
|When this parameter is on, per-plan-node timing occurs for all statements 
executed, whether or not they run long enough to actually get logged. This can 
have an extremely negative impact on performance. Turning off 
auto_explain.log_timing ameliorates the performance cost, at the price of 
obtaining less information.

|auto_explain.log_timing (boolean)
|auto_explain.log_timing controls whether per-node timing information is 
printed when an execution plan is logged; it's equivalent to the TIMING option 
of EXPLAIN. The overhead of repeatedly reading the system clock can slow down 
queries significantly on some systems, so it may be useful to set this 
parameter to off when only actual row counts, and not exact times, are needed. 
This parameter has no effect unless auto_explain.log_analyze is enabled. This 
parameter is on by default. Only superusers can change this setting.

I believe the cost actually varies significantly with the type of plan "node",
with "nested loops" incurring much higher overhead.

I think you could compare using explain(analyze) vs explain(analyze,timing
off).  While you're at it, compare without explain at all.

I suspect the overhead is inconsequential if you set log_timing=off and set
log_min_duration such that only the slowest queries are logged.

Then, you can manually run "explain (analyze,costs on)" on any problematic
queries to avoid interfering with production clients.

Justin



Re: impact of auto explain on overall performance

2019-03-14 Thread Adrien NAYRAT

On 3/14/19 9:23 AM, Justin Pryzby wrote:

On Thu, Mar 14, 2019 at 07:29:17AM +, Stephan Schmidt wrote:

i’m currently working on a high Performance Database and want to make sure that 
whenever there are slow queries during regular operations i’ve got all 
Information about the query in my logs. So auto_explain come to mind, but the 
documentation explicitly states that it Comes at a cost. My Question is, how 
big is the latency added by auto_explain in percentage or ms ?


https://www.postgresql.org/docs/current/auto-explain.html
|log_analyze
...
|When this parameter is on, per-plan-node timing occurs for all statements 
executed, whether or not they run long enough to actually get logged. This can 
have an extremely negative impact on performance. Turning off 
auto_explain.log_timing ameliorates the performance cost, at the price of 
obtaining less information.

|auto_explain.log_timing (boolean)
|auto_explain.log_timing controls whether per-node timing information is 
printed when an execution plan is logged; it's equivalent to the TIMING option 
of EXPLAIN. The overhead of repeatedly reading the system clock can slow down 
queries significantly on some systems, so it may be useful to set this 
parameter to off when only actual row counts, and not exact times, are needed. 
This parameter has no effect unless auto_explain.log_analyze is enabled. This 
parameter is on by default. Only superusers can change this setting.

I believe the cost actually varies significantly with the type of plan "node",
with "nested loops" incurring much higher overhead.

I think you could compare using explain(analyze) vs explain(analyze,timing
off).  While you're at it, compare without explain at all.

I suspect the overhead is inconsequential if you set log_timing=off and set
log_min_duration such that only the slowest queries are logged.

Then, you can manually run "explain (analyze,costs on)" on any problematic
queries to avoid interfering with production clients.

Justin



You should also consider auto_explain.sample_rate: 
auto_explain.sample_rate causes auto_explain to only explain a fraction 
of the statements in each session. The default is 1, meaning explain all 
the queries. In case of nested statements, either all will be explained 
or none. Only superusers can change this setting.


This option is available since 9.6

Regards



Re: Distributing data over "spindles" even on AWS EBS, (followup to the work queue saga)

2019-03-14 Thread Justin Pryzby
On Wed, Mar 13, 2019 at 02:44:10PM -0400, Gunther wrote:
> You see that I already did a lot to balance IO out to many different
> tablespaces that's why there are so many volumes.

I wonder if it wouldn't be both better and much easier to have just 1 or 2
tablespaces and combine drives into a single LVM VG and do something like
lvcreate --stripes 

> I am not sure my autovacuum setup is working right though. I wonder if there
> isn't some autovacuum statistics which I can query that would give me
> confidence that it is actually running?

pg_stat_all_tables

Did you do this ?
ALTER TABLE ... SET (autovacuum_vacuum_scale_factor=0.001, 
autovacuum_vacuum_threshold=1);

Justin



Re: impact of auto explain on overall performance

2019-03-14 Thread Jeff Janes
On Thu, Mar 14, 2019 at 3:29 AM Stephan Schmidt  wrote:

> Hello,
>
>
>
> i’m currently working on a high Performance Database and want to make sure
> that whenever there are slow queries during regular operations i’ve got all
> Information about the query in my logs. So auto_explain come to mind, but
> the documentation explicitly states that it Comes at a cost. My Question
> is, how big is the latency added by auto_explain in percentage or ms ?
>

You will have to measure it yourself and see.  It depends on your hardware,
OS, and OS version, and PostgreSQL version.  And the nature of your
queries.  If you have auto_explain.log_timing=on, then I find that large
sorts are the worst impacted.  So if you have a lot of those, you should be
careful.

On older kernels, I would run with auto_explain.log_timing=off.  On newer
kernels where you can read the clock from user-space, I run with
auto_explain.log_timing=on.  I find the slowdown noticeable with careful
investigation (around 3%, last time I carefully investigated it), but
usually well worth paying to have actual data to work with when I find slow
queries in the log.  I made a special role with auto_explain disabled for
use with a few reporting queries with large sorts, both to circumvent the
overhead and to avoid spamming the log with slow queries I already know
about.

Cheers,

Jeff

>


Re: impact of auto explain on overall performance

2019-03-14 Thread Jeremy Schneider
On 3/14/19 00:29, Stephan Schmidt wrote:
> i’m currently working on a high Performance Database and want to make
> sure that whenever there are slow queries during regular operations i’ve
> got all Information about the query in my logs. So auto_explain come to
> mind, but the documentation explicitly states that it Comes at a cost.
> My Question is, how big is the latency added by auto_explain in
> percentage or ms ?

One thought - what if the problem query is a 4ms query that just went to
6ms but it's executed millions of times per second? That would create a
150% increase to the load on the system.

The approach I've had the most success with is to combine active session
sampling (from pg_stat_activity) with pg_stat_statements (ideally with
periodic snapshots) to identify problematic SQL statements, then use
explain analyze after you've identified them.

There are a handful of extensions on the internet that can do active
session sampling for you, and I've seen a few scripts that can be put
into a scheduler to capture snapshots of stats tables.

Maybe something to consider in addition to the auto_explain stuff.

-Jeremy

-- 
http://about.me/jeremy_schneider



Facing issue in using special characters

2019-03-14 Thread M Tarkeshwar Rao
Hi all,

Facing issue in using special characters. We are trying to insert records to a 
remote Postgres Server and our application not able to perform this because of 
errors.
It seems that issue is because of the special characters that has been used in 
one of the field of a row.

Regards
Tarkeshwar