WHERE IN for JOIN subquery?

2017-12-18 Thread Dave Johansen
I want to do a JOIN against a subquery that is doing an aggregation. The
query itself is relatively straightforward, but has poor performance.

Here it is:
SELECT a.*, b.*
FROM base AS a
LEFT OUTER JOIN
(SELECT other, COUNT(value), COUNT(DISTINCT value) FROM other GROUP
BY other) AS b
USING (other)
WHERE id IN (4, 56, 102);

It's significantly faster, but more complicated (and repetitive), if I add
the following:
WHERE other = ANY(ARRAY(SELECT DISTINCT other FROM base WHERE id IN (4, 56,
102)))

I tried adding the following:
other IN (a.other)
Or:
other = a.other
But I get this error:
ERROR:  invalid reference to FROM-clause entry for table "a"

LINE 1: ...ue), COUNT(DISTINCT value) FROM other WHERE other=a.other GR...

 ^

HINT:  There is an entry for table "a", but it cannot be referenced from
this part of the query.

Is there a way to do something like that simpler query so the subquery can
get better performance by filtering only to what it needs instead of doing
the GROUP BY on the whole table?

Thanks,
Dave

In case it's helpful, here's the table definitions:
CREATE TABLE base (id INTEGER PRIMARY KEY, value TEXT, other INTEGER);
CREATE TABLE other (other INTEGER, value INTEGER);

And the explain results:
EXPLAIN ANALYZE SELECT a.*, b.* FROM base AS a LEFT OUTER JOIN (SELECT
other, COUNT(value), COUNT(DISTINCT value) FROM other WHERE other =
ANY(ARRAY(SELECT DISTINCT other FROM base WHERE id IN (4, 56, 102))) GROUP
BY other) AS b USING (other) WHERE id IN (4, 56, 102);

QUERY PLAN


---

 Merge Right Join  (cost=27619.21..27741.23 rows=3 width=33) (actual
time=105.045..115.539 rows=3 loops=1)

   Merge Cond: (other.other = a.other)

   ->  GroupAggregate  (cost=27602.28..27711.74 rows=1001 width=20) (actual
time=104.989..115.452 rows=3 loops=1)

 Group Key: other.other

 InitPlan 1 (returns $0)

   ->  Unique  (cost=16.93..16.95 rows=3 width=4) (actual
time=0.083..0.127 rows=3 loops=1)

 ->  Sort  (cost=16.93..16.94 rows=3 width=4) (actual
time=0.073..0.085 rows=3 loops=1)

   Sort Key: base.other

   Sort Method: quicksort  Memory: 25kB

   ->  Index Scan using base_pkey on base
 (cost=0.29..16.91 rows=3 width=4) (actual time=0.019..0.042 rows=3 loops=1)

 Index Cond: (id = ANY
('{4,56,102}'::integer[]))

 ->  Sort  (cost=27585.34..27610.20 rows=9945 width=8) (actual
time=99.401..107.199 rows=3035 loops=1)

   Sort Key: other.other

   Sort Method: quicksort  Memory: 239kB

   ->  Seq Scan on other  (cost=0.00..26925.00 rows=9945
width=8) (actual time=0.708..90.738 rows=3035 loops=1)

 Filter: (other = ANY ($0))

 Rows Removed by Filter: 996965

   ->  Sort  (cost=16.93..16.94 rows=3 width=13) (actual time=0.044..0.051
rows=3 loops=1)

 Sort Key: a.other

 Sort Method: quicksort  Memory: 25kB

 ->  Index Scan using base_pkey on base a  (cost=0.29..16.91 rows=3
width=13) (actual time=0.016..0.027 rows=3 loops=1)

   Index Cond: (id = ANY ('{4,56,102}'::integer[]))

 Planning time: 4.163 ms

 Execution time: 115.665 ms


EXPLAIN ANALYZE SELECT a.*, b.* FROM base AS a LEFT OUTER JOIN (SELECT
other, COUNT(value), COUNT(DISTINCT value) FROM other GROUP BY other) AS b
USING (other) WHERE id IN (4, 56, 102);

 QUERY PLAN




 Merge Right Join  (cost=127786.02..137791.07 rows=3 width=60) (actual
time=7459.042..12060.805 rows=3 loops=1)

   Merge Cond: (other.other = a.other)

   ->  GroupAggregate  (cost=127763.19..137765.69 rows=200 width=20)
(actual time=7143.486..12057.835 rows=830 loops=1)

 Group Key: other.other

 ->  Sort  (cost=127763.19..130263.31 rows=150 width=8) (actual
time=7137.594..9624.119 rows=829088 loops=1)

   Sort Key: other.other

   Sort Method: external merge  Disk: 17576kB

   ->  Seq Scan on other  (cost=0.00..14425.50 rows=150
width=8) (actual time=0.555..2727.461 rows=100 loops=1)

   ->  Sort  (cost=22.83..22.84 rows=3 width=40) (actual time=0.103..0.112
rows=3 loops=1)

 Sort Key: a.other

 Sort Method: quicksort  Memory: 25kB

 ->  Bitmap Heap Scan on base a  (cost=12.87..22.81 rows=3
width=40) (actual time=0.048..0.064 rows=3 loops=1)

   Recheck Cond: (id = ANY ('{4,56,102}'::integer[]))

   Heap Blocks: exact=1

   ->  Bitmap Index Scan on bas

Re: WHERE IN for JOIN subquery?

2017-12-18 Thread David G. Johnston
On Mon, Dec 18, 2017 at 5:00 PM, Dave Johansen 
wrote:

>
> other = a.other
> But I get this error:
> ERROR:  invalid reference to FROM-clause entry for table "a"
>
> LINE 1: ...ue), COUNT(DISTINCT value) FROM other WHERE other=a.other GR...
>
> HINT:  There is an entry for table "a", but it cannot be referenced from
> this part of the query.
>

One possible solution to this error is to add the word "LATERAL" before
LEFT JOIN so that the right side of the join can reference variables from
the left side.

David J.
​


Re: WHERE IN for JOIN subquery?

2017-12-18 Thread Dave Johansen
On Mon, Dec 18, 2017 at 5:10 PM, David G. Johnston <
[email protected]> wrote:

> On Mon, Dec 18, 2017 at 5:00 PM, Dave Johansen 
> wrote:
>
>>
>> other = a.other
>> But I get this error:
>> ERROR:  invalid reference to FROM-clause entry for table "a"
>>
>> LINE 1: ...ue), COUNT(DISTINCT value) FROM other WHERE other=a.other GR...
>>
>> HINT:  There is an entry for table "a", but it cannot be referenced from
>> this part of the query.
>>
>
> One possible solution to this error is to add the word "LATERAL" before
> LEFT JOIN so that the right side of the join can reference variables from
> the left side.
>

That appears to be what I was looking for.
Thanks,
Dave