date:20211108

EXISTS by itself vs SELECT EXISTS much slower in query.

2021-11-08 Thread Jimmy A

A description of what you are trying to achieve and what results you
expect.:
I have two equivalent queries, one with an EXISTS clause by itself and one
wrapped in a (SELECT EXISTS) and the "naked" exists is much slower.
I would expect both to be the same speed / have same execution plan.

-- slow
explain (analyze, buffers)
SELECT
parent.*,
EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id) AS
child_exists
FROM parent
ORDER BY parent_id LIMIT 10;

-- fast
explain (analyze, buffers)
SELECT
parent.*,
(SELECT EXISTS (SELECT * FROM child WHERE
child.parent_id=parent.parent_id)) AS child_exists
FROM parent
ORDER BY parent_id LIMIT 10;

-- slow
https://explain.depesz.com/s/DzcK

-- fast
https://explain.depesz.com/s/EftS

Setup:
CREATE TABLE parent(parent_id BIGSERIAL PRIMARY KEY, name text);
CREATE TABLE child(child_id BIGSERIAL PRIMARY KEY, parent_id bigint
references parent(parent_id), name text);

-- random name and sequential primary key for 100 thousand parents.
INSERT INTO parent
SELECT
nextval('parent_parent_id_seq'),
md5(random()::text)
FROM generate_series(1, 10);

-- 1 million children.
-- set every odd id parent to have children. even id parent gets none.
INSERT INTO child
SELECT
   nextval('child_child_id_seq'),
   ((generate_series/2*2) % 10)::bigint + 1,
   md5(random()::text)
FROM generate_series(1, 100);

CREATE INDEX ON child(parent_id);
VACUUM ANALYZE parent, child;

Both queries return the same results - I have taken a md5 of both queries
without the LIMIT clause to confirm.
Tables have been vacuumed and analyzed.
No other queries are being executed.
Reproducible with LIMIT 1 or LIMIT 100 or LIMIT 500.
Changing work_mem makes no difference.

-[ RECORD 1 ]--+-
relname| parent
relpages   | 935
reltuples  | 10
relallvisible  | 935
relkind| r
relnatts   | 2
relhassubclass | f
reloptions |
pg_table_size  | 7700480
-[ RECORD 2 ]--+-
relname| child
relpages   | 10310
reltuples  | 1e+06
relallvisible  | 10310
relkind| r
relnatts   | 3
relhassubclass | f
reloptions |
pg_table_size  | 84516864

PostgreSQL version number you are running:
PostgreSQL 13.4 on arm-apple-darwin20.5.0, compiled by Apple clang version
12.0.5 (clang-1205.0.22.9), 64-bit

How you installed PostgreSQL:
Using homebrew for mac.
brew install postgres

Changes made to the settings in the postgresql.conf file:  see Server
Configuration for a quick way to list them all.
checkpoint_completion_target | 0.9  | configuration file
checkpoint_timeout   | 30min| configuration file
client_encoding  | UTF8 | client
cpu_tuple_cost   | 0.03 | configuration file
effective_cache_size | 4GB  | configuration file
log_directory| log  | configuration file
log_min_duration_statement   | 25ms | configuration file
log_statement| none | configuration file
log_temp_files   | 0| configuration file
log_timezone | America/Anchorage| configuration file
maintenance_work_mem | 512MB| configuration file
max_parallel_maintenance_workers | 2| configuration file
max_parallel_workers | 4| configuration file
max_parallel_workers_per_gather  | 4| configuration file
max_stack_depth  | 2MB  | environment
variable
max_wal_size | 10GB | configuration file
max_worker_processes | 4| configuration file
min_wal_size | 80MB | configuration file
random_page_cost | 1.1  | configuration file
shared_buffers   | 512MB| configuration file
shared_preload_libraries | auto_explain | configuration file
track_io_timing  | on   | configuration file
vacuum_cost_limit| 1000 | configuration file
wal_buffers  | 64MB | configuration file
wal_compression  | on   | configuration file
work_mem | 128MB| configuration file

Operating system and version:
macOS Big Sur 11.2.3
I have confirmed this to happen on ubuntu linux however.

What program you're using to connect to PostgreSQL:
psql

Is there anything relevant or unusual in the PostgreSQL server logs?:
no

Hardware specs:
MacBook Air10,1 M1
8GB RAM
APPLE SSD AP0512Q 500.28GB


setup.sql
Description: Binary data

Re: EXISTS by itself vs SELECT EXISTS much slower in query.

2021-11-08 Thread Vasya Boytsov

postgresql 14, linux
with:
CREATE TABLE child(child_id bigint generated always as identity
PRIMARY KEY, parent_id bigint references parent(parent_id), name
text);
CREATE TABLE child(child_id bigint generated always as identity
PRIMARY KEY, parent_id bigint references parent(parent_id), name
text);
-
INSERT INTO parent(name)
SELECT
md5(random()::text)
FROM generate_series(1, 10);
-
INSERT INTO child(parent_id, name)
SELECT
   ((generate_series/2*2) % 10)::bigint + 1,
   md5(random()::text)
FROM generate_series(1, 100);
-
 CREATE INDEX ON child(parent_id);
VACUUM ANALYZE parent, child;

slow:
explain (analyze, buffers)
SELECT
parent.*,
EXISTS (SELECT * FROM child WHERE
child.parent_id=parent.parent_id) AS child_exists
FROM parent
ORDER BY parent_id LIMIT 10;
https://explain.depesz.com/s/Sx9t
fast:
explain (analyze, buffers)
SELECT
parent.*,
(SELECT EXISTS (SELECT * FROM child WHERE
child.parent_id=parent.parent_id)) AS child_exists
FROM parent
ORDER BY parent_id LIMIT 10;

https://explain.depesz.com/s/mIXR

---

so, this looks strange.

On 11/8/21, Jimmy A  wrote:
> A description of what you are trying to achieve and what results you
> expect.:
> I have two equivalent queries, one with an EXISTS clause by itself and one
> wrapped in a (SELECT EXISTS) and the "naked" exists is much slower.
> I would expect both to be the same speed / have same execution plan.
>
> -- slow
> explain (analyze, buffers)
> SELECT
> parent.*,
> EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id) AS
> child_exists
> FROM parent
> ORDER BY parent_id LIMIT 10;
>
> -- fast
> explain (analyze, buffers)
> SELECT
> parent.*,
> (SELECT EXISTS (SELECT * FROM child WHERE
> child.parent_id=parent.parent_id)) AS child_exists
> FROM parent
> ORDER BY parent_id LIMIT 10;
>
> -- slow
> https://explain.depesz.com/s/DzcK
>
> -- fast
> https://explain.depesz.com/s/EftS
>
> Setup:
> CREATE TABLE parent(parent_id BIGSERIAL PRIMARY KEY, name text);
> CREATE TABLE child(child_id BIGSERIAL PRIMARY KEY, parent_id bigint
> references parent(parent_id), name text);
>
> -- random name and sequential primary key for 100 thousand parents.
> INSERT INTO parent
> SELECT
> nextval('parent_parent_id_seq'),
> md5(random()::text)
> FROM generate_series(1, 10);
>
> -- 1 million children.
> -- set every odd id parent to have children. even id parent gets none.
> INSERT INTO child
> SELECT
>nextval('child_child_id_seq'),
>((generate_series/2*2) % 10)::bigint + 1,
>md5(random()::text)
> FROM generate_series(1, 100);
>
> CREATE INDEX ON child(parent_id);
> VACUUM ANALYZE parent, child;
>
> Both queries return the same results - I have taken a md5 of both queries
> without the LIMIT clause to confirm.
> Tables have been vacuumed and analyzed.
> No other queries are being executed.
> Reproducible with LIMIT 1 or LIMIT 100 or LIMIT 500.
> Changing work_mem makes no difference.
>
> -[ RECORD 1 ]--+-
> relname| parent
> relpages   | 935
> reltuples  | 10
> relallvisible  | 935
> relkind| r
> relnatts   | 2
> relhassubclass | f
> reloptions |
> pg_table_size  | 7700480
> -[ RECORD 2 ]--+-
> relname| child
> relpages   | 10310
> reltuples  | 1e+06
> relallvisible  | 10310
> relkind| r
> relnatts   | 3
> relhassubclass | f
> reloptions |
> pg_table_size  | 84516864
>
> PostgreSQL version number you are running:
> PostgreSQL 13.4 on arm-apple-darwin20.5.0, compiled by Apple clang version
> 12.0.5 (clang-1205.0.22.9), 64-bit
>
> How you installed PostgreSQL:
> Using homebrew for mac.
> brew install postgres
>
> Changes made to the settings in the postgresql.conf file:  see Server
> Configuration for a quick way to list them all.
> checkpoint_completion_target | 0.9  | configuration
> file
> checkpoint_timeout   | 30min| configuration
> file
> client_encoding  | UTF8 | client
> cpu_tuple_cost   | 0.03 | configuration
> file
> effective_cache_size | 4GB  | configuration
> file
> log_directory| log  | configuration
> file
> log_min_duration_statement   | 25ms | configuration
> file
> log_statement| none | configuration
> file
> log_temp_files   | 0| configuration
> file
> log_timezone | America/Anchorage| configuration
> file
> maintenance_work_mem | 512MB| configuration
> file
> max_parallel_maintenance_workers | 2| configuration
> file
> max_parallel_workers | 4| configuration
> file
> max_parallel_workers_per_gather  | 4

Re: EXISTS by itself vs SELECT EXISTS much slower in query.

2021-11-08 Thread Tom Lane

Jimmy A  writes:
> I have two equivalent queries, one with an EXISTS clause by itself and one
> wrapped in a (SELECT EXISTS) and the "naked" exists is much slower.
> I would expect both to be the same speed / have same execution plan.

That is a dangerous assumption.  In general, wrapping (SELECT ...) around
something has a significant performance impact, because it pushes Postgres
to try to decouple the sub-select's execution from the outer query.
As an example,

postgres=# select x, random() from generate_series(1,3) x;
 x |   random
---+-
 1 | 0.08595356832524814
 2 |  0.6444265043474005
 3 |  0.6878852071694332
(3 rows)

postgres=# select x, (select random()) from generate_series(1,3) x;
 x |   random   
---+
 1 | 0.7028987801136708
 2 | 0.7028987801136708
 3 | 0.7028987801136708
(3 rows)

That's not a bug: it's expected that the second query will evaluate
random() only once.

In the case at hand, I suspect you're getting a "hashed subplan"
in one query and not the other.  The depesz.com display doesn't
really show that, but EXPLAIN VERBOSE would.

regards, tom lane

EXISTS by itself vs SELECT EXISTS much slower in query.

Re: EXISTS by itself vs SELECT EXISTS much slower in query.

Re: EXISTS by itself vs SELECT EXISTS much slower in query.

3 matches

Site Navigation

Mail list logo

Footer information