Re: Erratically behaving query needs optimization

2019-09-03 Thread Barbu Paul - Gheorghe
On Tue, Sep 3, 2019 at 12:57 AM Jeff Janes  wrote:
>
> On Mon, Aug 26, 2019 at 4:26 AM Barbu Paul - Gheorghe 
>  wrote:
>>
>> On Sun, Aug 25, 2019 at 5:51 PM Jeff Janes  wrote:
>> >
>> > Yes, it certainly looks like it is due to cold caches.  But you say it is 
>> > slow at first, and then say it varies greatly during a run.  Is being slow 
>> > at first the only way it varies greatly, or is there large variation even 
>> > beyond that?
>>
>> There is a great variation in run times (hundreds of ms to several
>> seconds) even beyond the start of the server.
>> The query runs several times with a different device_id, object_id and
>> another list of attribute_ids and it varies from one another.
>
>
> If you run the exact same query (with the same parameters) once the cache is 
> hot, is the performance than pretty consistent within a given 
> parameterization?  Or is still variable even within one parameterization.
>
> If they are consistent, could you capture a fast parameterizaton and a slow 
> parameterization and show then and the plans or them?

Cannot test right now, but I think I had both cases.
In the same parametrization I had both fast and slow runs and of
course it varied when changed parametrization.

>>
>> EXPLAIN (ANALYZE,BUFFERS)
>>  SELECT DISTINCT ON (results.attribute_id) results.timestamp,
>> results.data FROM results
>>  WHERE
>>  results.data <> ''
>>  AND results.data IS NOT NULL
>>  AND results.object_id = 1955
>>  AND results.attribute_id IN (4, 5) -- possibly a longer list here
>>  AND results.data_access_result = 'SUCCESS'
>>  ORDER BY results.attribute_id, results.timestamp DESC
>>  LIMIT 2 -- limit by the length of the attributes list
>>
>> Limit  (cost=166793.28..167335.52 rows=2 width=54) (actual
>> time=134783.510..134816.941 rows=2 loops=1)
>>   Buffers: shared hit=19086 read=46836, temp read=1522 written=3311
>>   ->  Unique  (cost=166793.28..168420.01 rows=6 width=54) (actual
>> time=134783.507..134816.850 rows=2 loops=1)
>> Buffers: shared hit=19086 read=46836, temp read=1522 written=3311
>> ->  Sort  (cost=166793.28..167606.64 rows=325346 width=54)
>> (actual time=134783.505..134802.602 rows=205380 loops=1)
>>   Sort Key: attribute_id, "timestamp" DESC
>
>
> Do you have an index on (attribute_id, "timestamp" DESC)?  That might really 
> help if it can step through the rows already sorted, filter out the ones that 
> need filtering out (building the partial index might help here), hit the 
> other two tables for each of those rows using a nested loop, and stop after 2 
> rows which meet those conditions.  The problem is if you have to step through 
> an enormous number for rows before finding 2 of them with device_id=97.

I tried that index and it wasn't used, it still chose to do an
in-memory quicksort of ~600 kB. I wonder why?

>>
>> So maybe I should de-normalize and place the device_id column into the
>> "results" table and add it to the index in your suggestion above?
>
>
> Yes, if nothing else works, that should.  How hard would it be to maintain 
> that column in the correct state?

In the end I used this solution. It works ... fine, still I see slow
response times when the caches are cold, but afterwards things seem to
be fine (for now at least).
I had this in mind for a while, but wasn't convinced it was "good
design" since I had to denormalize the DB, but seeing the erratic
behaviour of the query, I finally gave up on using smart indices
trying to satisfy the planner.

It's also the first time I do this outside of a controlled learning
environment so there could be things that I missed.

Thanks for the help, all of you!

> Cheers,
>
> Jeff



-- 

Barbu Paul - Gheorghe




Re: UPGRADE TO PG11 CAUSED DEGREDATION IN PERFORMANCE

2019-09-03 Thread Merlin Moncure
On Sun, Aug 18, 2019 at 1:57 AM Mariel Cherkassky
 wrote:
>
> Hey,
> I upgraded my pg9.6 cluster to pg11.2.
> As it seems after the upgrade the duration of the same flow in my application 
> raised from 13 minutes to 19 minutes.
>
> The test I did :
> 1.reset pg_stat_statements
> 2.run the applicative flow
> 3.collect everything from pg_stat_statements
>
> I did this test on the env before the upgrade and after the upgrade. I took 
> the sum of the total_time in pg_stat_statements and sumed it up.

Since you have performance data, do you see any trends? For example,
is it generalized performance issues or are there specific queries
that have degraded?   We would need more specific detail before being
able to give better advice on how to fix performance issue.

merlin




Incorrect choice of Nested Loop for a skewed distribution

2019-09-03 Thread Oleg Kharin

Hi All,

With PostgreSQL 10 and 11, the planner doesn't use the lists of most 
common values to determine the selectivity of "=" for Nested Loop as it 
does for a normal inner join in eqjoinsel_inner(). Incorrect choice of a 
nested loops join strategy causes poor query performance.

To demonstrate it one can make the following test case:

  create table t(f integer not null,g integer not null);
  create table u(f integer not null,g integer not null);
  create sequence s cache 1000;
  insert into t select 0,s from (select nextval('s') as s) as d;
  insert into t select 0,s from (select nextval('s') as s) as d;
  insert into t select 0,s from (select nextval('s') as s from t,t t1,t 
t2) as d;
  insert into t select 0,s from (select nextval('s') as s from t,t t1,t 
t2,t t3) as d;

  insert into t(f,g) select g,f from t;
  insert into u select * from t;
  create index t_f on t(f);
  vacuum analyze;

The columns f and g of both tables t and u have a skewed distribution: 
10010 values of 0 and 10010 unique values starting from 1.

Let's see query plan for the join of t and u:

  explain analyze select * from t,u where t.f=u.f and t.g=u.g;

  QUERY PLAN
-
 Nested Loop  (cost=0.29..7629.95 rows=25055030 width=16) (actual 
time=0.042..22540.629 rows=20020 loops=1)
   ->  Seq Scan on u  (cost=0.00..289.20 rows=20020 width=8) (actual 
time=0.011..3.025 rows=20020 loops=1)
   ->  Index Scan using t_f on t  (cost=0.29..0.36 rows=1 width=8) 
(actual time=0.565..1.125 rows=1 loops=20020)

 Index Cond: (f = u.f)
 Filter: (u.g = g)
 Rows Removed by Filter: 5004
 Planning Time: 0.394 ms
 Execution Time: 22542.639 ms

After dropping the index
  drop index t_f;
we obtain much better query plan (without Nested Loop):

   QUERY PLAN
-
 Merge Join  (cost=3439.09..442052.26 rows=25055030 width=16) (actual 
time=15.708..32.735 rows=20020 loops=1)

   Merge Cond: ((t.f = u.f) AND (t.g = u.g))
   ->  Sort  (cost=1719.54..1769.59 rows=20020 width=8) (actual 
time=8.189..10.189 rows=20020 loops=1)

 Sort Key: t.f, t.g
 Sort Method: quicksort  Memory: 1707kB
 ->  Seq Scan on t  (cost=0.00..289.20 rows=20020 width=8) 
(actual time=0.012..2.958 rows=20020 loops=1)
   ->  Sort  (cost=1719.54..1769.59 rows=20020 width=8) (actual 
time=7.510..9.459 rows=20020 loops=1)

 Sort Key: u.f, u.g
 Sort Method: quicksort  Memory: 1707kB
 ->  Seq Scan on u  (cost=0.00..289.20 rows=20020 width=8) 
(actual time=0.008..2.748 rows=20020 loops=1)

 Planning Time: 0.239 ms
 Execution Time: 34.585 ms

Using MCV lists in var_eq_non_const() would prevent from choosing Nested 
Loop in such cases.


Regards,
Oleg Kharin