Re: [External] Join queries slow with predicate, limit, and ordering

2019-12-03 Thread Aufar Gilbran
Thanks for the answer!

On Tue, Dec 3, 2019 at 8:39 AM Jeff Janes  wrote:
> What happens if you set enable_sort to off before running it?

Turning enable_sort to off makes the first query to not sort[1]. It
does run much slower though compared to the original query[2]. This
time I do VACUUM ANALYZE first so even the slow query is much faster,
but still much slower than the fast query[3].

> It thinks it will find 1 row, and actually finds 89,222.  I don't know 
> exactly why that would be, I suppose tag_id has an extremely skewed 
> distribution.  But yeah, that is going to cause some problems.  For one 
> thing, if there was actually just one qualifying row, then it wouldn't get to 
> stop early, as the LIMIT would never be satisfied.  So it thinks that if it 
> choose to walk the index backwards, it would have to walk the **entire** 
> index.

I'm not really sure what skewed distribution is. If by skewed you mean
that for a particular tag_id there are many entity and other tag_id
there might be low amount entity then yes, this particular key value
covers 80% of the entity. For this kind of dataset, is there any way
that I can do to improve it or is it just impossible?

> With this query, it can use the join condition to transfer the knowledge of 
> tag.id=24 to become entity_tag.tag_id=24, and then look up stats on 
> entity_tag.tag_id for the value 24.  When you specify the single row of tag 
> indirectly, it can't do that as it doesn't know what specific value of tag.id 
> is going to be the one it finds (until after the query is done being planned 
> and starts executing, at which point it is too late).  But the row with id=24 
> doesn't seem to be the same one with "tag.key = 'status' AND tag.value = 
> 'SUCCEEDED'", so you have basically changed the query entirely on us.

Apologies, I used the query for database on another environment
previously. The correct one uses tag_id=18 [3]. So it becomes like
this:

SELECT entity.id
FROM (
SELECT entity_tag.entity_id
FROM tag
JOIN entity_tag ON tag.id = entity_tag.tag_id
WHERE tag.id = 18
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;

It's still very fast and the query plan looks similar to me.

> If you replanned this query with ORDER BY entity.id+0 DESC, (and with the 
> true value of tag_id) that might give you some more insight into the hidden 
> "thought process" behind the planner.

I tried this on the fast query and it becomes very slow [4]. I guess
because it cannot consult the index for the ordering anymore so it
can't do LIMIT? I'm not so sure.

[1] https://explain.depesz.com/s/aEmR
[2] https://explain.depesz.com/s/kmNY
[3] https://explain.depesz.com/s/pD5v
[4] https://explain.depesz.com/s/4s7Q

--
Best regards,

Aufar Gilbran

-- 
*_Grab is hiring. Learn more at _**https://grab.careers 
*


By communicating with Grab Inc and/or its 
subsidiaries, associate companies and jointly controlled entities (“Grab 
Group”), you are deemed to have consented to the processing of your 
personal data as set out in the Privacy Notice which can be viewed at 
https://grab.com/privacy/ 


This email contains 
confidential information and is only for the intended recipient(s). If you 
are not the intended recipient(s), please do not disseminate, distribute or 
copy this email Please notify Grab Group immediately if you have received 
this by mistake and delete this email from your system. Email transmission 
cannot be guaranteed to be secure or error-free as any information therein 
could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or 
contain viruses. Grab Group do not accept liability for any errors or 
omissions in the contents of this email arises as a result of email 
transmission. All intellectual property rights in this email and 
attachments therein shall remain vested in Grab Group, unless otherwise 
provided by law.





Re: Make recently inserted/updated records available in the buffer/cache

2019-12-03 Thread Michael Lewis
"I am going to use it as a queue"

You may want to look at lowering fillfactor if this queue is going to have
frequent updates, and also make autovacuum/analyze much more aggressive
assuming many updates and deletes.


Re: Make recently inserted/updated records available in the buffer/cache

2019-12-03 Thread MichaelDBA
Yep, I concur completely!  For tables treated like queues you gotta do 
this stuff or deal with bloat and fragmented indexes.


Michael Lewis wrote on 12/3/2019 12:29 PM:

"I am going to use it as a queue"

You may want to look at lowering fillfactor if this queue is going to 
have frequent updates, and also make autovacuum/analyze much more 
aggressive assuming many updates and deletes.






Re: Make recently inserted/updated records available in the buffer/cache

2019-12-03 Thread Sachin Divekar
Thank you, Michaels.
.
I didn't know about fillfactor and table bloat. Did some reading on those
topics. We will definitely need to tweak these settings.

I am also going to use SKIP LOCKED to _select for update_. Any suggestions
on tuning parameters for SKIP LOCKED?

Thanks

On Tue, Dec 3, 2019 at 11:02 PM MichaelDBA  wrote:

> Yep, I concur completely!  For tables treated like queues you gotta do
> this stuff or deal with bloat and fragmented indexes.
>
> Michael Lewis wrote on 12/3/2019 12:29 PM:
> > "I am going to use it as a queue"
> >
> > You may want to look at lowering fillfactor if this queue is going to
> > have frequent updates, and also make autovacuum/analyze much more
> > aggressive assuming many updates and deletes.
>
>


Re: Make recently inserted/updated records available in the buffer/cache

2019-12-03 Thread Michael Lewis
On Tue, Dec 3, 2019 at 11:46 AM Sachin Divekar  wrote:

> I am also going to use SKIP LOCKED to _select for update_. Any suggestions
> on tuning parameters for SKIP LOCKED?
>

I am not aware of any. Either you use it because it fits your need, or not.

Note- please don't top-post (reply and include all the previous
conversation below) on the Postgres mailing lists. Quote only the part(s)
you are responding to and reply there.


Re: performance degredation after upgrade from 9.6 to 12

2019-12-03 Thread Andres Freund
Hi,

On 2019-11-24 15:50:20 -0500, Jeff Janes wrote:
> OK, but do you agree that a 15% slow down is more realistic than 3 fold
> one?  Or are you still getting 3 fold slow down with more careful testing
> and over a wide variety of queries?
> 
> I find that the main regression (about 15%) in your example occurs in major
> version 10, at the following commit:

Huh, that's somewhat surprising. <5% I can see - there were some
tradeoffs to be made, and some performance issues to be worked around,
but 15% seems large. Is this with assertions enabled? Optimized?


> I also tested the same example, only 100 times
> more rows, and still see the regression at about 16%.  This is a major
> infrastructure change patch which has been extensively built on since then,
> the chances of reverting it are very small.  It is making an omelette, and
> your example is one of the eggs that got broken.

Yea, there's zero chance of a revert.


> Performance changes in a large body of queries are usually not all due to
> the same thing.  Are you a position to custom compile your own PostgreSQL?
> It would be nice to test this commit against the one before it, and see how
> much of the change in your real queries is explained by this one thing (or
> whether any of it is)

In particular, artificial queries will often show bottlenecks that are
not releveant in practice...



> commit b8d7f053c5c2bf2a7e8734fe3327f6a8bc711755
> Author: Andres Freund 
> Date:   Tue Mar 14 15:45:36 2017 -0700
> 
> Faster expression evaluation and targetlist projection.
> 
> It is disappointing that this made this case slower rather than faster, and
> that the "future work" alluded to either hasn't happened, or wasn't
> effective for this example.

I wonder if the improvements in
https://www.postgresql.org/message-id/20191023163849.sosqbfs5yenocez3%40alap3.anarazel.de
would at least partially address this.

Greetings,

Andres Freund