Re: SV: bad plan using nested loops
tor 2018-02-01 klockan 20:34 + skrev Johan Fredriksson:
> > Johan Fredriksson writes:
> > > Bad plan: https://explain.depesz.com/s/avtZ
> > > Good plan: https://explain.depesz.com/s/SJSt
> > > Any suggestions on how to make the planner make better decisions
> > > for
> > > this query?
> >
> > Core of the problem looks to be the misestimation here:
> >
> > Index Only Scan using shredder_cgm1 on
> > public.cachedgroupmembers cachedgroupmembers_4
> > (cost=0.43..2.33 rows=79 width=8) (actual time=0.020..0.903
> > rows=1492 loops=804)
> > Output: cachedgroupmembers_4.memberid,
> > cachedgroupmembers_4.groupid,
> > cachedgroupmembers_4.disabled
> > Index Cond: ((cachedgroupmembers_4.memberid =
> > principals_1.id) AND
> > (cachedgroupmembers_4.disabled = 0))
> > Heap Fetches: 5018
> >
> > Probably, memberid and disabled are correlated but the planner
> > doesn't
> > know that, so it thinks the index condition is way more selective
> > than it
> > actually is. In PG 10, you could very possibly fix that by
> > installing
> > extended statistics on that pair of columns. See
> >
> > https://www.postgresql.org/docs/current/static/planner-stats.html#P
> > LANNER-STATS-EXTENDED
>
> I'm not sure what you mean by correlated, but there are only a
> handful (164 when I check it) disabled groupmembers out of total 7.5
> million.
> I'll give CREATE STATISTICS on those columns a shot and see if it
> gets any better.
It looks like you are right, Tom. There actually exists full
correlation between memberid, groupid and disabled.
rt4=# SELECT stxname, stxkeys, stxdependencies FROM pg_statistic_ext;
stxname | stxkeys | stxdependencies
---+-+--
cgm_stat2 | 2 6 | {"2
=> 6": 1.00}
cgm_stat1 | 3 6 | {"3 => 6": 1.00}
(2 rows)
However, this does not help the planner. It still picks the bad plan.
/ Eskil
Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used
On Thu, 2018-02-01 at 20:00 +0530, Nandakumar M wrote: > Hi, > > I am using Postgres version 9.4.4 on a Mac machine. > I have 2 queries that differ only in the order by clause. > One of it has 'nulls last' and the other one does not have it. > The performance difference between the two is considerable. > > The slower of the two queries is > > SELECT [...] > FROM workorder wo > left join workorder_fields wof > ON wo.workorderid=wof.workorderid > left join servicecatalog_fields scf > ON wo.workorderid=scf.workorderid [...] > ORDER BY 7 DESC nulls last limit 25 > > > > On removing 'nulls last' from the order by clause the query becomes very fast. > I have attached the query plan for both the queries. In the above case, the optimizer does not know that it will get the rows in the correct order: indexes are sorted ASC NULLS LAST by default, so a backwards index scan will produce the results NULLS FIRST, which is the default for ORDER BY ... DESC. If you want the nulls last, PostgreSQL has to retrieve *all* the rows and sort them rather than using the first 25 results it gets by scanning then indexes. To have the above query perform fast, add additional indexes with either ASC NULLS FIRST or DESC NULLS LAST for all used keys. Yours, Laurenz Albe
Re: effective_io_concurrency on EBS/gp2
I did some more tests. I've made an SQL dump of the table. Then used
head/tail commands to cut the data part. Then used shuf command to
shuffle rows, and then joined the pieces back and restored the table
back into DB.
Before:
select array_agg(aid) from (select aid from pgbench_accounts order by
ctid limit 20)_;
{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}
effective_io_concurrency=0 Execution time: 1455.336 ms
effective_io_concurrency=1 Execution time: 8365.070 ms
effective_io_concurrency=2 Execution time: 4791.961 ms
effective_io_concurrency=4 Execution time: 4113.713 ms
effective_io_concurrency=8 Execution time: 1584.862 ms
effective_io_concurrency=16 Execution time: 1533.096 ms
effective_io_concurrency=8 Execution time: 1494.494 ms
effective_io_concurrency=4 Execution time: 3235.892 ms
effective_io_concurrency=2 Execution time: 4624.334 ms
effective_io_concurrency=1 Execution time: 7831.310 ms
effective_io_concurrency=0 Execution time: 1422.203 ms
After:
select array_agg(aid) from (select aid from pgbench_accounts order by
ctid limit 20)_;
{6861090,18316007,2361004,11880097,5079470,9859942,13776329,12687163,3793362,18312052,15912971,9928864,10179242,9307499,2737986,13911147,5337329,12582498,3019085,4631617}
effective_io_concurrency=0 Execution time: 71321.723 ms
effective_io_concurrency=1 Execution time: 180230.742 ms
effective_io_concurrency=2 Execution time: 98635.566 ms
effective_io_concurrency=4 Execution time: 91464.375 ms
effective_io_concurrency=8 Execution time: 91048.939 ms
effective_io_concurrency=16 Execution time: 97682.475 ms
effective_io_concurrency=8 Execution time: 91262.404 ms
effective_io_concurrency=4 Execution time: 90945.560 ms
effective_io_concurrency=2 Execution time: 97019.504 ms
effective_io_concurrency=1 Execution time: 180331.474 ms
effective_io_concurrency=0 Execution time: 71469.484 ms
The numbers are not directly comparable with the previous tests, because
this time I used scale factor 200.
Regards,
Vitaliy
On 2018-02-01 20:39, Claudio Freire wrote:
On Wed, Jan 31, 2018 at 11:21 PM, hzzhangjiazhi
wrote:
HI
I think this parameter will be usefull when the storage using RAID
stripe , otherwise turn up this parameter is meaningless when only has one
device。
Not at all. Especially on EBS, where keeping a relatively full queue
is necessary to get max thoughput out of the drive.
Problem is, if you're scanning a highly correlated index, the
mechanism is counterproductive. I had worked on some POC patches for
correcting that, I guess I could work something out, but it's
low-priority for me. Especially since it's actually a kernel "bug" (or
shortcoming), that could be fixed in the kernel rather than worked
around by postgres.
effective_io_concurrency=0
QUERY PLAN
Bitmap Heap Scan on pgbench_accounts (cost=12838.24..357966.26 rows=1
width=97) (actual time=1454.570..1454.570 rows=0 loops=1)
Recheck Cond: ((aid >= 1000) AND (aid <= 100))
Filter: (abalance <> 0)
Rows Removed by Filter: 999001
Heap Blocks: exact=16378
Buffers: shared hit=2 read=19109
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..12838.24
rows=986230 width=0) (actual time=447.877..447.877 rows=999001 loops=1)
Index Cond: ((aid >= 1000) AND (aid <= 100))
Buffers: shared hit=2 read=2731
Planning time: 15.782 ms
Execution time: 1455.336 ms
(11 rows)
effective_io_concurrency=1
QUERY PLAN
Bitmap Heap Scan on pgbench_accounts (cost=12838.24..357966.26 rows=1
width=97) (actual time=8364.272..8364.272 rows=0 loops=1)
Recheck Cond: ((aid >= 1000) AND (aid <= 100))
Filter: (abalance <> 0)
Rows Removed by Filter: 999001
Heap Blocks: exact=16378
Buffers: shared hit=2 read=19109
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..12838.24
rows=986230 width=0) (actual time=448.043..448.043 rows=999001 loops=1)
Index Cond: ((aid >= 1000) AND (aid <= 100))
Buffers: shared hit=2 read=2731
Planning time: 15.036 ms
Execution time: 8365.070 ms
(11 rows)
effective_io_concurrency=2
QUERY PLAN
Bitmap Heap Scan on pgbench_accounts (cost=12838.24..357966.26 ro
Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used
Hi, On 2 Feb 2018 15:06, "Laurenz Albe" wrote: >In the above case, the optimizer does >not know that it will get the rows >in the correct order: indexes are >sorted ASC NULLS LAST by default, >so a backwards index scan will >produce the results NULLS FIRST, >which is the default for ORDER BY ... >DESC. The order by column has a not null constraint on it and so nulls last or first shouldn't make any difference. >If you want the nulls last, PostgreSQL >has to retrieve *all* the rows and sort >them rather than using the first 25 >results it gets by scanning then >indexes. >To have the above query perform >fast, add additional indexes with either >ASC NULLS FIRST or DESC NULLS >LAST for all used keys. For now this is exactly what I have done. But it is in effect a duplicate index on a PK column and I would be happy not to create it in the first place. Regards Nanda
Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used
Nandakumar M writes: > The order by column has a not null constraint on it and so nulls last or > first shouldn't make any difference. The planner does not consider this and it doesn't really seem like something worth expending cycles on. If you know that there won't be nulls in the column, why are you insisting on specifying a nondefault value of NULLS FIRST/LAST in the query? regards, tom lane
Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used
Hi, On Fri, Feb 2, 2018 at 8:30 PM, Tom Lane wrote: > > The planner does not consider this and it doesn't really seem like > something worth expending cycles on. If you know that there won't be > nulls in the column, why are you insisting on specifying a nondefault > value of NULLS FIRST/LAST in the query? The query is generated by a framework that adds 'nulls last' to all order by clause. This is done apparently to provide common behaviour in our application irrespective of the database that is used. SQL server treats nulls as lesser than non null values which is opposite to what Postgres does. For any indexes that we create manually, we can do a --> create index on table_name(column_name nulls first); But, for the PK column we are not in control of the index that is created. Regards, Nanda
Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used
On Fri, Feb 2, 2018 at 8:49 AM, Nandakumar M wrote: > But, for the PK column we are not in control of the index that is created. > You probably can (I assume the nulls aspect of the index doesn't prevent PK usage), but you must add the PK to the table after creating the index and not let the system auto-generate the index for you. https://www.postgresql.org/docs/10/static/sql-altertable.html ALTER TABLE name ADD PRIMARY KEY USING INDEX index_name; David J.
Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used
Hi, On Fri, Feb 2, 2018 at 9:28 PM, David G. Johnston wrote: > You probably can (I assume the nulls aspect of the index doesn't prevent PK > usage), but you must add the PK to the table after creating the index and > not let the system auto-generate the index for you. > > https://www.postgresql.org/docs/10/static/sql-altertable.html > > ALTER TABLE name ADD PRIMARY KEY USING INDEX index_name; > I missed to notice this in the docs. Thank you David for pointing it out. Regards, Nanda
