date:20220612

Missed query planner optimization: `n in (select q)` -> `n in (q)`

2022-06-12 Thread Josh

Hey y'all!

So recently, I ran into an issue where a query I wrote wasn't using an index, 
presumably because what I was doing was too hard for the query planner to 
figure out. I've distilled the problem into its essence to the best of my 
ability, and found that it's because `select` seems to hinder it.

The problem boils down to the planner not figuring out that these two queries 
should use an index:

```sql
-- Setup
create table numbers(n int);
insert into numbers (n) select generate_series(1, 100);
create index numbers_n_idx on numbers(n);

-- Non-indexed queries
explain analyze select numbers.n from (values (500)) quantities(q)
join numbers on numbers.n in (select q);

explain analyze select numbers.n from (values (500)) quantities(q)
join numbers on numbers.n = any(select q);
```

These examples may seem silly, so let me provide a "case study" query that 
should justify the need for such an optimization. I had a query that was 
generating an array of items, and wanted to join it to a table given that some 
column of that table was present in the array. It looked like so:

```sql
select numbers.n from quantities join numbers on numbers.n in (select 
unnest(quantities.q));
```

This turned out to be horrendously slow, because it was performing a sequential 
scan! I did however end up settling on the following form:

```sql
select numbers.n from quantities join numbers on numbers.n = any(quantities.q);
```

This was only possible because I was dealing with arrays though, and an 
operation such as `in (select unnest...)` can be easily converted to `= 
any(...)`. However for the general case, I believe an optimization in this area 
may provide benefit as there may exist a circumstance that does not have an 
alternative to a sub-query select (`= any()` was my alternative), but I am just 
a database newbie.

I've noticed this problem has been around since at least 11.7, and is still 
present as of the `postgres:15beta1` docker image. I've attached a script which 
reproduces the issue. It uses docker, so I'm confident you'll be able to run it 
without issue.

Finally, I ask:

- Is this an issue that should be fixed? I'm a database newbie so I have no 
idea about the deep semantics of SQL and what a select inside a 
`join_condition` could imply to the planner to prevent it from optimizing it.

- If "yes" to the previous question, what would be the precise semantics of 
such an optimization? I loosely say `n in (select q)` -> `n in (q)` for all n 
and q, but of course I don't have enough knowledge to know that this is correct 
in terms of whatever Postgres' internal query optimization IR is.

- Can a database newbie like myself contribute an optimization pass in Postgres 
to fix this? I'm fascinated by the work y'all do, and submitting a patch to 
Postgres that makes it into production would make my week.

Thank you for your time, and have a great day!


repro.sh
Description: Binary data

Re: Missed query planner optimization: `n in (select q)` -> `n in (q)`

2022-06-12 Thread David G. Johnston

On Sun, Jun 12, 2022 at 2:47 PM Josh  wrote:

>
> This was only possible because I was dealing with arrays though, and an
> operation such as `in (select unnest...)` can be easily converted to `=
> any(...)`. However for the general case,

In the general case you don't have subqueries inside join conditions.

> I believe an optimization in this area may provide benefit as there may
> exist a circumstance that does not have an alternative to a sub-query
> select (`= any()` was my alternative)

I think we'd want a concrete example of a non-poorly written query (or at
least a poorly written one that, say, is generated by a framework, not just
inexperienced human SQL writers) before we'd want to even entertain
spending time on something like this.

> - Is this an issue that should be fixed?

Probably not worth the effort.

I'm fascinated by the work y'all do, and submitting a patch to Postgres
> that makes it into production would make my week.
>
>
Maybe you'll find almost as much good is done helping others get their
patches committed.  There are many in need of reviewers.

https://commitfest.postgresql.org/

Missed query planner optimization: `n in (select q)` -> `n in (q)`

Re: Missed query planner optimization: `n in (select q)` -> `n in (q)`

2 matches

Site Navigation

Mail list logo

Footer information