Need help with optimising simple query

2018-07-09 Thread Nandakumar M
Hi,

I am having a query that has an order by and a limit clause. The
column on which I am doing order by is indexed (default b tree index).
However the index is not being used. On tweaking the query a bit I
found that when I use left join index is not used whereas when I use
inner join the index is used.

Unfortunately, the behaviour we expect is that of left join only. My
question is, is there any way to modify/improve the query to improve
the query speed or is this the best that is possible for this case.

Please find below a simplified version of the queries. I tried the
queries on 9.3 and 10 versions and both gave similar results.


Table structure

performance_test=# \d+ child
Table "public.child"
 Column |  Type  | Collation | Nullable |  Default
 | Storage  | Stats target | Description
++---+--+---+--+--+-
 id | bigint |   | not null |
nextval('child_id_seq'::regclass) | plain|  |
 name   | text   |   | not null |
 | extended |  |
Indexes:
"child_pkey" PRIMARY KEY, btree (id)
"child_name_unique" UNIQUE CONSTRAINT, btree (name)
Referenced by:
TABLE "parent" CONSTRAINT "parent_child_id_fkey" FOREIGN KEY
(child_id) REFERENCES child(id)


performance_test=# \d+ parent
 Table "public.parent"
  Column  |  Type  | Collation | Nullable |  Default
| Storage  | Stats target | Description
--++---+--++--+--+-
 id   | bigint |   | not null |
nextval('parent_id_seq'::regclass) | plain|  |
 name | text   |   | not null |
| extended |  |
 child_id | bigint |   |  |
| plain|  |
Indexes:
"parent_pkey" PRIMARY KEY, btree (id)
"parent_name_unique" UNIQUE CONSTRAINT, btree (name)
"parent_child_id_idx" btree (child_id)
Foreign-key constraints:
"parent_child_id_fkey" FOREIGN KEY (child_id) REFERENCES child(id)



Query used to populate data

performance_test=# insert into child(name) select concat('child ',
gen.id) as name  from (select generate_series(1,10) as id) as gen;

performance_test=# insert into parent(name, child_id) select
concat('parent ', gen.id) as name, (id%10) + 1  from (select
generate_series(1,100) as id) as gen;


Left join with order by using child name

performance_test=# explain analyze select * from parent left join
child on parent.child_id = child.id order by child.name limit 10;
  QUERY
PLAN
--
 Limit  (cost=69318.55..69318.58 rows=10 width=59) (actual
time=790.708..790.709 rows=10 loops=1)
   ->  Sort  (cost=69318.55..71818.55 rows=100 width=59) (actual
time=790.705..790.706 rows=10 loops=1)
 Sort Key: child.name
 Sort Method: top-N heapsort  Memory: 27kB
 ->  Hash Left Join  (cost=3473.00..47708.91 rows=100
width=59) (actual time=51.066..401.028 rows=100 loops=1)
   Hash Cond: (parent.child_id = child.id)
   ->  Seq Scan on parent  (cost=0.00..17353.00
rows=100 width=29) (actual time=0.026..67.848 rows=100
loops=1)
   ->  Hash  (cost=1637.00..1637.00 rows=10 width=19)
(actual time=50.879..50.879 rows=10 loops=1)
 Buckets: 65536  Batches: 2  Memory Usage: 3053kB
 ->  Seq Scan on child  (cost=0.00..1637.00
rows=10 width=19) (actual time=0.018..17.281 rows=10 loops=1)
 Planning time: 1.191 ms
 Execution time: 790.797 ms
(12 rows)



Inner join with sorting according to child name

performance_test=# explain analyze select * from parent inner join
child on parent.child_id = child.id order by child.name limit 10;

QUERY PLAN
--
 Limit  (cost=0.84..2.03 rows=10 width=59) (actual time=0.156..0.193
rows=10 loops=1)
   ->  Nested Loop  (cost=0.84..119132.56 rows=100 width=59)
(actual time=0.154..0.186 rows=10 loops=1)
 ->  Index Scan using child_name_unique on child
(cost=0.42..5448.56 rows=10 width=19) (actual time=0.126..0.126
rows=1 loops=1)
 ->  Index Scan using parent_child_id_idx on parent
(cost=0.42..1.04 rows=10 width=29) (actual time=0.019..0.045 rows=10
loops=1)
   Index Cond: (child_id = child.id)
 Planning time: 0.941 ms
 Execution time: 0.283 ms
(7 rows)




Version

performance_test=# select version();
  version
--

Re: Need help with optimising simple query

2018-07-09 Thread Tom Lane
Nandakumar M  writes:
> I am having a query that has an order by and a limit clause. The
> column on which I am doing order by is indexed (default b tree index).
> However the index is not being used. On tweaking the query a bit I
> found that when I use left join index is not used whereas when I use
> inner join the index is used.

The reason the index isn't being used is that the sort order the query
requests isn't the same as the order provided by the index.  Here:

> performance_test=# explain analyze select * from parent left join
> child on parent.child_id = child.id order by child.name limit 10;

you're asking to sort by a column that will include null values for
child.name anywhere that there's a parent row without a match for
child_id.  Those rows aren't even represented in the index on child.name,
much less placed in the right order.

regards, tom lane



Re: Need help with optimising simple query

2018-07-09 Thread Nandakumar M
Hi Tom,

Is there something that I can do to improve the performance of such
queries (where ordering is done based on child table column and join
is left join)? Maybe a combined index or something like that? Or is it
possible to modify the query to get same result but execute faster.
One ad-hoc optimisation (which gives somewhat better performance) that
came to mind is to have a sub query for child table like

performance_test=# explain analyze select * from parent left join
(select * from child order by name limit 10) as child on
parent.child_id = child.id order by child.name limit 10;


QUERY PLAN
-
 Limit  (cost=42714.84..42714.86 rows=10 width=59) (actual
time=311.623..311.624 rows=10 loops=1)
   ->  Sort  (cost=42714.84..45214.84 rows=100 width=59) (actual
time=311.622..311.622 rows=10 loops=1)
 Sort Key: child.name
 Sort Method: top-N heapsort  Memory: 26kB
 ->  Hash Left Join  (cost=1.19..21105.20 rows=100
width=59) (actual time=0.120..204.386 rows=100 loops=1)
   Hash Cond: (parent.child_id = child.id)
   ->  Seq Scan on parent  (cost=0.00..17353.00
rows=100 width=29) (actual time=0.073..73.052 rows=100
loops=1)
   ->  Hash  (cost=1.06..1.06 rows=10 width=19) (actual
time=0.035..0.035 rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Limit  (cost=0.42..0.96 rows=10 width=19)
(actual time=0.014..0.027 rows=10 loops=1)
   ->  Index Scan using child_name_unique on
child  (cost=0.42..5448.56 rows=10 width=19) (actual
time=0.013..0.024 rows=10 loops=1)
 Planning time: 0.505 ms
 Execution time: 311.682 ms
(13 rows)

Time: 312.673 ms

Is there something I can do that will improve the query performance
much more than this?

Thanks.

Regards,
Nanda

On Mon, 9 Jul 2018, 19:53 Tom Lane,  wrote:
>
> Nandakumar M  writes:
> > I am having a query that has an order by and a limit clause. The
> > column on which I am doing order by is indexed (default b tree index).
> > However the index is not being used. On tweaking the query a bit I
> > found that when I use left join index is not used whereas when I use
> > inner join the index is used.
>
> The reason the index isn't being used is that the sort order the query
> requests isn't the same as the order provided by the index.  Here:
>
> > performance_test=# explain analyze select * from parent left join
> > child on parent.child_id = child.id order by child.name limit 10;
>
> you're asking to sort by a column that will include null values for
> child.name anywhere that there's a parent row without a match for
> child_id.  Those rows aren't even represented in the index on child.name,
> much less placed in the right order.
>
> regards, tom lane