date:20180202

Re: SV: bad plan using nested loops

2018-02-02 Thread Johan Fredriksson

tor 2018-02-01 klockan 20:34 + skrev Johan Fredriksson:
> > Johan Fredriksson  writes:
> > > Bad plan: https://explain.depesz.com/s/avtZ
> > > Good plan: https://explain.depesz.com/s/SJSt
> > > Any suggestions on how to make the planner make better decisions
> > > for
> > > this query?
> > 
> > Core of the problem looks to be the misestimation here:
> > 
> >    Index Only Scan using shredder_cgm1 on
> > public.cachedgroupmembers cachedgroupmembers_4
> > (cost=0.43..2.33 rows=79 width=8) (actual time=0.020..0.903
> > rows=1492 loops=804)
> >  Output: cachedgroupmembers_4.memberid,
> > cachedgroupmembers_4.groupid,
> > cachedgroupmembers_4.disabled
> >  Index Cond: ((cachedgroupmembers_4.memberid =
> > principals_1.id) AND
> > (cachedgroupmembers_4.disabled = 0))
> >  Heap Fetches: 5018
> > 
> > Probably, memberid and disabled are correlated but the planner
> > doesn't
> > know that, so it thinks the index condition is way more selective
> > than it
> > actually is.  In PG 10, you could very possibly fix that by
> > installing
> > extended statistics on that pair of columns.  See
> > 
> > https://www.postgresql.org/docs/current/static/planner-stats.html#P
> > LANNER-STATS-EXTENDED
> 
> I'm not sure what you mean by correlated, but there are only a
> handful (164 when I check it) disabled groupmembers out of total 7.5
> million.
> I'll give CREATE STATISTICS on those columns a shot and see if it
> gets any better.

It looks like you are right, Tom. There actually exists full
correlation between memberid, groupid and disabled.

rt4=# SELECT stxname, stxkeys, stxdependencies FROM pg_statistic_ext;
 stxname  | stxkeys |   stxdependencies
---+-+--
 cgm_stat2 | 2 6 | {"2
=> 6": 1.00}
 cgm_stat1 | 3 6 | {"3 => 6": 1.00}
(2 rows)

However, this does not help the planner. It still picks the bad plan.


   / Eskil

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

2018-02-02 Thread Laurenz Albe

On Thu, 2018-02-01 at 20:00 +0530, Nandakumar M wrote:
> Hi,
> 
> I am using Postgres version 9.4.4 on a Mac machine.
> I have 2 queries that differ only in the order by clause.
> One of it has 'nulls last' and the other one does not have it.
> The performance difference between the two is considerable.
> 
> The slower of the two queries is
> 
> SELECT [...]
> FROM   workorder wo
> left join  workorder_fields wof
> ON wo.workorderid=wof.workorderid
> left join  servicecatalog_fields scf
> ON wo.workorderid=scf.workorderid
[...]
> ORDER BY   7 DESC nulls last limit 25
> 
> 
> 
> On removing 'nulls last' from the order by clause the query becomes very fast.
> I have attached the query plan for both the queries.

In the above case, the optimizer does not know that it will get the rows
in the correct order: indexes are sorted ASC NULLS LAST by default,
so a backwards index scan will produce the results NULLS FIRST,
which is the default for ORDER BY ... DESC.

If you want the nulls last, PostgreSQL has to retrieve *all* the rows and sort
them rather than using the first 25 results it gets by scanning then indexes.

To have the above query perform fast, add additional indexes with either
ASC NULLS FIRST or DESC NULLS LAST for all used keys.

Yours,
Laurenz Albe

Re: effective_io_concurrency on EBS/gp2

2018-02-02 Thread Vitaliy Garnashevich

I did some more tests. I've made an SQL dump of the table. Then used 
head/tail commands to cut the data part. Then used shuf command to 
shuffle rows, and then joined the pieces back and restored the table 
back into DB.


Before:
select array_agg(aid) from (select aid from pgbench_accounts order by 
ctid limit 20)_;

{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}

effective_io_concurrency=0 Execution time: 1455.336 ms
effective_io_concurrency=1 Execution time: 8365.070 ms
effective_io_concurrency=2 Execution time: 4791.961 ms
effective_io_concurrency=4 Execution time: 4113.713 ms
effective_io_concurrency=8 Execution time: 1584.862 ms
effective_io_concurrency=16 Execution time: 1533.096 ms
effective_io_concurrency=8 Execution time: 1494.494 ms
effective_io_concurrency=4 Execution time: 3235.892 ms
effective_io_concurrency=2 Execution time: 4624.334 ms
effective_io_concurrency=1 Execution time: 7831.310 ms
effective_io_concurrency=0 Execution time: 1422.203 ms

After:
select array_agg(aid) from (select aid from pgbench_accounts order by 
ctid limit 20)_;

{6861090,18316007,2361004,11880097,5079470,9859942,13776329,12687163,3793362,18312052,15912971,9928864,10179242,9307499,2737986,13911147,5337329,12582498,3019085,4631617}

effective_io_concurrency=0 Execution time: 71321.723 ms
effective_io_concurrency=1 Execution time: 180230.742 ms
effective_io_concurrency=2 Execution time: 98635.566 ms
effective_io_concurrency=4 Execution time: 91464.375 ms
effective_io_concurrency=8 Execution time: 91048.939 ms
effective_io_concurrency=16 Execution time: 97682.475 ms
effective_io_concurrency=8 Execution time: 91262.404 ms
effective_io_concurrency=4 Execution time: 90945.560 ms
effective_io_concurrency=2 Execution time: 97019.504 ms
effective_io_concurrency=1 Execution time: 180331.474 ms
effective_io_concurrency=0 Execution time: 71469.484 ms

The numbers are not directly comparable with the previous tests, because 
this time I used scale factor 200.


Regards,
Vitaliy

On 2018-02-01 20:39, Claudio Freire wrote:

On Wed, Jan 31, 2018 at 11:21 PM, hzzhangjiazhi
 wrote:

HI

  I think this parameter will be usefull when the storage using RAID
stripe , otherwise turn up this parameter is meaningless when only has one
device。

Not at all. Especially on EBS, where keeping a relatively full queue
is necessary to get max thoughput out of the drive.

Problem is, if you're scanning a highly correlated index, the
mechanism is counterproductive. I had worked on some POC patches for
correcting that, I guess I could work something out, but it's
low-priority for me. Especially since it's actually a kernel "bug" (or
shortcoming), that could be fixed in the kernel rather than worked
around by postgres.




effective_io_concurrency=0
   QUERY PLAN   


 Bitmap Heap Scan on pgbench_accounts  (cost=12838.24..357966.26 rows=1 
width=97) (actual time=1454.570..1454.570 rows=0 loops=1)
   Recheck Cond: ((aid >= 1000) AND (aid <= 100))
   Filter: (abalance <> 0)
   Rows Removed by Filter: 999001
   Heap Blocks: exact=16378
   Buffers: shared hit=2 read=19109
   ->  Bitmap Index Scan on pgbench_accounts_pkey  (cost=0.00..12838.24 
rows=986230 width=0) (actual time=447.877..447.877 rows=999001 loops=1)
 Index Cond: ((aid >= 1000) AND (aid <= 100))
 Buffers: shared hit=2 read=2731
 Planning time: 15.782 ms
 Execution time: 1455.336 ms
(11 rows)

effective_io_concurrency=1
   QUERY PLAN   


 Bitmap Heap Scan on pgbench_accounts  (cost=12838.24..357966.26 rows=1 
width=97) (actual time=8364.272..8364.272 rows=0 loops=1)
   Recheck Cond: ((aid >= 1000) AND (aid <= 100))
   Filter: (abalance <> 0)
   Rows Removed by Filter: 999001
   Heap Blocks: exact=16378
   Buffers: shared hit=2 read=19109
   ->  Bitmap Index Scan on pgbench_accounts_pkey  (cost=0.00..12838.24 
rows=986230 width=0) (actual time=448.043..448.043 rows=999001 loops=1)
 Index Cond: ((aid >= 1000) AND (aid <= 100))
 Buffers: shared hit=2 read=2731
 Planning time: 15.036 ms
 Execution time: 8365.070 ms
(11 rows)

effective_io_concurrency=2
   QUERY PLAN   


 Bitmap Heap Scan on pgbench_accounts  (cost=12838.24..357966.26 ro

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

2018-02-02 Thread Nandakumar M

Hi,

On 2 Feb 2018 15:06, "Laurenz Albe"  wrote:


>In the above case, the optimizer does >not know that it will get the rows
>in the correct order: indexes are >sorted ASC NULLS LAST by default,
>so a backwards index scan will >produce the results NULLS FIRST,
>which is the default for ORDER BY ... >DESC.


The order by column has a not null constraint on it and so nulls last or
first shouldn't make any difference.


>If you want the nulls last, PostgreSQL >has to retrieve *all* the rows and
sort
>them rather than using the first 25 >results it gets by scanning then
>indexes.

>To have the above query perform >fast, add additional indexes with either
>ASC NULLS FIRST or DESC NULLS >LAST for all used keys.


For now this is exactly what I have done. But it is in effect a duplicate
index on a PK column and I would be happy not to create it in the first
place.

Regards
Nanda

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

2018-02-02 Thread Tom Lane

Nandakumar M  writes:
> The order by column has a not null constraint on it and so nulls last or
> first shouldn't make any difference.

The planner does not consider this and it doesn't really seem like
something worth expending cycles on.  If you know that there won't be
nulls in the column, why are you insisting on specifying a nondefault
value of NULLS FIRST/LAST in the query?

regards, tom lane

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

2018-02-02 Thread Nandakumar M

Hi,

On Fri, Feb 2, 2018 at 8:30 PM, Tom Lane  wrote:
>
> The planner does not consider this and it doesn't really seem like
> something worth expending cycles on.  If you know that there won't be
> nulls in the column, why are you insisting on specifying a nondefault
> value of NULLS FIRST/LAST in the query?

The query is generated by a framework that adds 'nulls last' to all
order by clause.

This is done apparently to provide common behaviour in our application
irrespective of the database that is used.
SQL server treats nulls as lesser than non null values which is
opposite to what Postgres does.

For any indexes that we create manually, we can do a

--> create index on table_name(column_name nulls first);

But, for the PK column we are not in control of the index that is created.

Regards,
Nanda

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

2018-02-02 Thread David G. Johnston

On Fri, Feb 2, 2018 at 8:49 AM, Nandakumar M  wrote:

> But, for the PK column we are not in control of the index that is created.
>

You probably can (I assume the nulls aspect of the index doesn't prevent
PK usage), but you must add the PK to the table after creating the index
and not let the system auto-generate the index for you.

https://www.postgresql.org/docs/10/static/sql-altertable.html

ALTER TABLE name ADD PRIMARY KEY USING INDEX index_name;

David J.

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

2018-02-02 Thread Nandakumar M

Hi,

On Fri, Feb 2, 2018 at 9:28 PM, David G. Johnston
 wrote:

> You probably can (I assume the nulls aspect of the index doesn't prevent PK
> usage), but you must add the PK to the table after creating the index and
> not let the system auto-generate the index for you.
>
> https://www.postgresql.org/docs/10/static/sql-altertable.html
>
> ALTER TABLE name ADD PRIMARY KEY USING INDEX index_name;
>

I missed to notice this in the docs. Thank you David for pointing it out.

Regards,
Nanda

Re: SV: bad plan using nested loops

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

Re: effective_io_concurrency on EBS/gp2

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

8 matches

Site Navigation

Mail list logo

Footer information