Re: select distinct runs slow on pg 10.6

2019-09-12 Thread Merlin Moncure
On Wed, Sep 11, 2019 at 12:57 PM Rick Otten  wrote:
>
> On Wed, Sep 11, 2019 at 12:38 PM Dinesh Somani  wrote:
>>
>> I think Merlin has outlined pretty much all the options and very neatly. (As 
>> an asides Merlin could you possibly elaborate on the "C Hack" how that might 
>> be accomplished.)
>>
>> To OP, I am curious if the performance changes were the query rewritten such 
>> that all timestamp columns were listed first in the selection. I understand 
>> it might not be feasible to make this change in your real application 
>> without breaking the contract.
>>
>> Regards
>> Dinesh
>
>
> It looks like AWS has a pgbouncer query re-writer service that might be a 
> starting point:
> https://aws.amazon.com/blogs/big-data/query-routing-and-rewrite-introducing-pgbouncer-rr-for-amazon-redshift-and-postgresql/
>
> I've never used it.

Yeah, I haven't either.  Side note: this system also provides the
ability to load balance queries across distributed system; that's a
huge benefit.  Say you have master server and five replica, it seems
that you can round robin the read only queries using this system or
other neat little tricks.   I would be cautious about pgbouncer-rr
becoming the bottleneck itself for certain workloads though.

Anyways, a 'hack' strategy on linux might be to:
*) Check and verify that libpq is dynamically linked (which is almost
alwasys the case).  ldd /your/application should give the dynamic
library dependency to libpq.
*) Grab postgres sources for same version as production
*) configure
*) switch to interfaces/libpq
*) figure out which interface routine(s) being called into.  The
approach will be slightly different if the query is
prepared/paramterized or not.  Assuming it isn't, you'd have to modify
the PQsendQuery routine to check for the signature  (say, with
strcmp), create a new string, and have that be put instead of the
incoming const char* query.  The parameterized versions
(PQsendQueryParams) would be easier since you'd be able to use a
static string rather than parsing it out.
*) Build the library, do some testing with hand written C program
*) inject the modified libpq with LD_LIBRARY_PATH

It must be stated that some people might read this and be compelled to
barf :-) -- it's pretty gross. Having said that, sometimes you have to
find a solution.   I would definitely try the pgbouncer-rr approach
first however; this has a *lot* of potential benefit.

merlin




Re: select distinct runs slow on pg 10.6

2019-09-12 Thread Dinesh Somani
Thanks a lot, Merlin.

Yes, it could appear kinda gross to some ;-)

On Thu, Sep 12, 2019 at 7:19 AM Merlin Moncure  wrote:

> On Wed, Sep 11, 2019 at 12:57 PM Rick Otten 
> wrote:
> >
> > On Wed, Sep 11, 2019 at 12:38 PM Dinesh Somani 
> wrote:
> >>
> >> I think Merlin has outlined pretty much all the options and very
> neatly. (As an asides Merlin could you possibly elaborate on the "C Hack"
> how that might be accomplished.)
> >>
> >> To OP, I am curious if the performance changes were the query rewritten
> such that all timestamp columns were listed first in the selection. I
> understand it might not be feasible to make this change in your real
> application without breaking the contract.
> >>
> >> Regards
> >> Dinesh
> >
> >
> > It looks like AWS has a pgbouncer query re-writer service that might be
> a starting point:
> >
> https://aws.amazon.com/blogs/big-data/query-routing-and-rewrite-introducing-pgbouncer-rr-for-amazon-redshift-and-postgresql/
> >
> > I've never used it.
>
> Yeah, I haven't either.  Side note: this system also provides the
> ability to load balance queries across distributed system; that's a
> huge benefit.  Say you have master server and five replica, it seems
> that you can round robin the read only queries using this system or
> other neat little tricks.   I would be cautious about pgbouncer-rr
> becoming the bottleneck itself for certain workloads though.
>
> Anyways, a 'hack' strategy on linux might be to:
> *) Check and verify that libpq is dynamically linked (which is almost
> alwasys the case).  ldd /your/application should give the dynamic
> library dependency to libpq.
> *) Grab postgres sources for same version as production
> *) configure
> *) switch to interfaces/libpq
> *) figure out which interface routine(s) being called into.  The
> approach will be slightly different if the query is
> prepared/paramterized or not.  Assuming it isn't, you'd have to modify
> the PQsendQuery routine to check for the signature  (say, with
> strcmp), create a new string, and have that be put instead of the
> incoming const char* query.  The parameterized versions
> (PQsendQueryParams) would be easier since you'd be able to use a
> static string rather than parsing it out.
> *) Build the library, do some testing with hand written C program
> *) inject the modified libpq with LD_LIBRARY_PATH
>
> It must be stated that some people might read this and be compelled to
> barf :-) -- it's pretty gross. Having said that, sometimes you have to
> find a solution.   I would definitely try the pgbouncer-rr approach
> first however; this has a *lot* of potential benefit.
>
> merlin
>
>
>