Re: benchmarking effective_io_concurrency

2019-07-22 Thread Rick Otten
On Mon, Jul 22, 2019 at 2:42 AM Fabio Pardi  wrote:

> Hello,
>
>
> I recently spent a bit of time benchmarking effective_io_concurrency on
> Postgres.
>
> I would like to share my findings with you:
>
>
> https://portavita.github.io/2019-07-19-PostgreSQL_effective_io_concurrency_benchmarked/
>
> Comments are welcome.
>
> regards,
>
> fabio pardi
>

You didn't mention what type of disk storage you are using, or if that
matters.  The number of cores in your database could also matter.

Does the max_parallel_workers setting have any influence on how
effective_io_concurrency works?

Based on your data, one should set effective_io_concurrency at the highest
possible setting with no ill effects with the possible exception that your
disk will get busier.  Somehow I suspect that as you scale the number of
concurrent disk i/o tasks, other things may start to suffer.  For example
does CPU wait time start to increase as more and more threads are consumed
waiting for i/o instead of doing other processing?  Do you run into lock
contention on the i/o subsystem?  (Back in the day, lock contention for
/dev/tcp was a major bottleneck for scaling busy webservers vertically.  I
have no idea if modern linux kernels could run into the same issue waiting
for locks for /dev/sd0.  Surely if anything was going to push that issue,
it would be setting effective_io_concurrency really high and then demanding
a lot of concurrent disk accesses.)


Re: benchmarking effective_io_concurrency

2019-07-22 Thread Fabio Pardi
Hi Rick, 

thanks for your inputs.

On 22/07/2019 14:06, Rick Otten wrote:
> 
> 
> 
> You didn't mention what type of disk storage you are using, or if that 
> matters. 

I actually mentioned I m using SSD, in RAID 10. Also is mentioned I tested in a 
no-RAID setup. Is that what you mean?

 The number of cores in your database could also matter.
> 

True, when scaling I think it can actually bring up problems as you mention 
here below. (BTW, Tested on a VM with 6 cores and on HW with 32. I updated the 
blogpost, thanks)


> Does the max_parallel_workers setting have any influence on how 
> effective_io_concurrency works?
> 

I m not sure about that one related to the tests I ran, because the query plan 
does not show parallelism. 

> Based on your data, one should set effective_io_concurrency at the highest 
> possible setting with no ill effects with the possible exception that your 
> disk will get busier.  Somehow I suspect that as you scale the number of 
> concurrent disk i/o tasks, other things may start to suffer.  For example 
> does CPU wait time start to increase as more and more threads are consumed 
> waiting for i/o instead of doing other processing?  Do you run into lock 
> contention on the i/o subsystem?  (Back in the day, lock contention for 
> /dev/tcp was a major bottleneck for scaling busy webservers vertically.  I 
> have no idea if modern linux kernels could run into the same issue waiting 
> for locks for /dev/sd0.  Surely if anything was going to push that issue, it 
> would be setting effective_io_concurrency really high and then demanding a 
> lot of concurrent disk accesses.)
> 
> 
>  

My suggestion would be to try by your own and find out what works for you, 
maybe slowly increasing the value of effective_io_concurrency. 

Every workload is peculiar, so I suspect there is no silver bullet here. Also 
the documentation gives you directions in that way...



regards,

fabio pardi




Re: Speeding up query pulling comments from pg_catalog

2019-07-22 Thread Ken Tanzer
On Sat, Jul 20, 2019 at 12:25 PM Tom Lane  wrote:

> Ken Tanzer  writes:
> > On Sat, Jul 20, 2019 at 7:46 AM Tom Lane  wrote:
> >> and then to add insult to injury, has to search pg_description a second
> >> time for each hit.
>
> > Not sure if I'm understanding this correctly, but are you saying that
> > because col_description() is specified in two places in the query, that
> it
> > actually will get called twice?
>
> Yes.
>
> > I was under the impression that a function
> > (at least a non-volatile one) specified multiple times, but with the same
> > arguments, would only get called once. Is that just wishful thinking?
>
> Afraid so.


That's good to know!  Just to help me understand:



> There's been assorted talk about various optimizations to
> avoid unnecessary duplicate function calls,


So I had read the sentence below to mean my functions would only get called
once.  But is that sentence only supposed to apply to index scans?  Or does
it mean the planner is allowed to optimize, but it just doesn't know how
yet?

A STABLE function cannot modify the database and is guaranteed to return
the same results given the same arguments for all rows within a single
statement. *This category allows the optimizer to optimize multiple calls
of the function to a single call.* In particular, it is safe to use an
expression containing such a function in an index scan condition. (Since an
index scan will evaluate the comparison value only once, not once at each
row, it is not valid to use a VOLATILE function in an index scan condition.)
(https://www.postgresql.org/docs/9.6/xfunc-volatility.html)

Cheers,
Ken



-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://demo.agency-software.org/client
*
[email protected]
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: benchmarking effective_io_concurrency

2019-07-22 Thread Merlin Moncure
On Mon, Jul 22, 2019 at 1:42 AM Fabio Pardi  wrote:
>
> Hello,
>
>
> I recently spent a bit of time benchmarking effective_io_concurrency on 
> Postgres.
>
> I would like to share my findings with you:
>
> https://portavita.github.io/2019-07-19-PostgreSQL_effective_io_concurrency_benchmarked/
>
> Comments are welcome.

I did very similar test a few years back and came up with very similar results:
https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com

effective_io_concurrency is an oft overlooked tuning parameter and I'm
curious if the underlying facility (posix_fadvise) can't be used for
more types of queries.  For ssd storage, which is increasingly common
these days, it really pays of to crank it with few downsides from my
measurement.

merlin




Re: Speeding up query pulling comments from pg_catalog

2019-07-22 Thread Tom Lane
Ken Tanzer  writes:
> On Sat, Jul 20, 2019 at 12:25 PM Tom Lane  wrote:
>> There's been assorted talk about various optimizations to
>> avoid unnecessary duplicate function calls,

> So I had read the sentence below to mean my functions would only get called
> once.  But is that sentence only supposed to apply to index scans?  Or does
> it mean the planner is allowed to optimize, but it just doesn't know how
> yet?

> A STABLE function cannot modify the database and is guaranteed to return
> the same results given the same arguments for all rows within a single
> statement. *This category allows the optimizer to optimize multiple calls
> of the function to a single call.*

It says "allows", not "requires".  But in particular, we've interpreted
that to mean trying to call a stable function (with constant or at least
stable arguments) once per query rather than once per row, as the naive
interpretation of SQL semantics would have us do.  Matching up textually
distinct calls has not been on the radar --- it seems fairly expensive
to do, with no return in typical queries, and relatively small return
even if we find a match.

regards, tom lane