EXISTS by itself vs SELECT EXISTS much slower in query.

2021-11-08 Thread Jimmy A
A description of what you are trying to achieve and what results you
expect.:
I have two equivalent queries, one with an EXISTS clause by itself and one
wrapped in a (SELECT EXISTS) and the "naked" exists is much slower.
I would expect both to be the same speed / have same execution plan.

-- slow
explain (analyze, buffers)
SELECT
parent.*,
EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id) AS
child_exists
FROM parent
ORDER BY parent_id LIMIT 10;

-- fast
explain (analyze, buffers)
SELECT
parent.*,
(SELECT EXISTS (SELECT * FROM child WHERE
child.parent_id=parent.parent_id)) AS child_exists
FROM parent
ORDER BY parent_id LIMIT 10;

-- slow
https://explain.depesz.com/s/DzcK

-- fast
https://explain.depesz.com/s/EftS

Setup:
CREATE TABLE parent(parent_id BIGSERIAL PRIMARY KEY, name text);
CREATE TABLE child(child_id BIGSERIAL PRIMARY KEY, parent_id bigint
references parent(parent_id), name text);

-- random name and sequential primary key for 100 thousand parents.
INSERT INTO parent
SELECT
nextval('parent_parent_id_seq'),
md5(random()::text)
FROM generate_series(1, 10);

-- 1 million children.
-- set every odd id parent to have children. even id parent gets none.
INSERT INTO child
SELECT
   nextval('child_child_id_seq'),
   ((generate_series/2*2) % 10)::bigint + 1,
   md5(random()::text)
FROM generate_series(1, 100);

CREATE INDEX ON child(parent_id);
VACUUM ANALYZE parent, child;

Both queries return the same results - I have taken a md5 of both queries
without the LIMIT clause to confirm.
Tables have been vacuumed and analyzed.
No other queries are being executed.
Reproducible with LIMIT 1 or LIMIT 100 or LIMIT 500.
Changing work_mem makes no difference.

-[ RECORD 1 ]--+-
relname| parent
relpages   | 935
reltuples  | 10
relallvisible  | 935
relkind| r
relnatts   | 2
relhassubclass | f
reloptions |
pg_table_size  | 7700480
-[ RECORD 2 ]--+-
relname| child
relpages   | 10310
reltuples  | 1e+06
relallvisible  | 10310
relkind| r
relnatts   | 3
relhassubclass | f
reloptions |
pg_table_size  | 84516864

PostgreSQL version number you are running:
PostgreSQL 13.4 on arm-apple-darwin20.5.0, compiled by Apple clang version
12.0.5 (clang-1205.0.22.9), 64-bit

How you installed PostgreSQL:
Using homebrew for mac.
brew install postgres

Changes made to the settings in the postgresql.conf file:  see Server
Configuration for a quick way to list them all.
checkpoint_completion_target | 0.9  | configuration file
checkpoint_timeout   | 30min| configuration file
client_encoding  | UTF8 | client
cpu_tuple_cost   | 0.03 | configuration file
effective_cache_size | 4GB  | configuration file
log_directory| log  | configuration file
log_min_duration_statement   | 25ms | configuration file
log_statement| none | configuration file
log_temp_files   | 0| configuration file
log_timezone | America/Anchorage| configuration file
maintenance_work_mem | 512MB| configuration file
max_parallel_maintenance_workers | 2| configuration file
max_parallel_workers | 4| configuration file
max_parallel_workers_per_gather  | 4| configuration file
max_stack_depth  | 2MB  | environment
variable
max_wal_size | 10GB | configuration file
max_worker_processes | 4| configuration file
min_wal_size | 80MB | configuration file
random_page_cost | 1.1  | configuration file
shared_buffers   | 512MB| configuration file
shared_preload_libraries | auto_explain | configuration file
track_io_timing  | on   | configuration file
vacuum_cost_limit| 1000 | configuration file
wal_buffers  | 64MB | configuration file
wal_compression  | on   | configuration file
work_mem | 128MB| configuration file

Operating system and version:
macOS Big Sur 11.2.3
I have confirmed this to happen on ubuntu linux however.

What program you're using to connect to PostgreSQL:
psql

Is there anything relevant or unusual in the PostgreSQL server logs?:
no

Hardware specs:
MacBook Air10,1 M1
8GB RAM
APPLE SSD AP0512Q 500.28GB


setup.sql
Description: Binary data


Re: EXISTS by itself vs SELECT EXISTS much slower in query.

2021-11-08 Thread Vasya Boytsov
postgresql 14, linux
with:
CREATE TABLE child(child_id bigint generated always as identity
PRIMARY KEY, parent_id bigint references parent(parent_id), name
text);
CREATE TABLE child(child_id bigint generated always as identity
PRIMARY KEY, parent_id bigint references parent(parent_id), name
text);
-
INSERT INTO parent(name)
SELECT
md5(random()::text)
FROM generate_series(1, 10);
-
INSERT INTO child(parent_id, name)
SELECT
   ((generate_series/2*2) % 10)::bigint + 1,
   md5(random()::text)
FROM generate_series(1, 100);
-
 CREATE INDEX ON child(parent_id);
VACUUM ANALYZE parent, child;

slow:
explain (analyze, buffers)
SELECT
parent.*,
EXISTS (SELECT * FROM child WHERE
child.parent_id=parent.parent_id) AS child_exists
FROM parent
ORDER BY parent_id LIMIT 10;
https://explain.depesz.com/s/Sx9t
fast:
explain (analyze, buffers)
SELECT
parent.*,
(SELECT EXISTS (SELECT * FROM child WHERE
child.parent_id=parent.parent_id)) AS child_exists
FROM parent
ORDER BY parent_id LIMIT 10;

https://explain.depesz.com/s/mIXR

---

so, this looks strange.

On 11/8/21, Jimmy A  wrote:
> A description of what you are trying to achieve and what results you
> expect.:
> I have two equivalent queries, one with an EXISTS clause by itself and one
> wrapped in a (SELECT EXISTS) and the "naked" exists is much slower.
> I would expect both to be the same speed / have same execution plan.
>
> -- slow
> explain (analyze, buffers)
> SELECT
> parent.*,
> EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id) AS
> child_exists
> FROM parent
> ORDER BY parent_id LIMIT 10;
>
> -- fast
> explain (analyze, buffers)
> SELECT
> parent.*,
> (SELECT EXISTS (SELECT * FROM child WHERE
> child.parent_id=parent.parent_id)) AS child_exists
> FROM parent
> ORDER BY parent_id LIMIT 10;
>
> -- slow
> https://explain.depesz.com/s/DzcK
>
> -- fast
> https://explain.depesz.com/s/EftS
>
> Setup:
> CREATE TABLE parent(parent_id BIGSERIAL PRIMARY KEY, name text);
> CREATE TABLE child(child_id BIGSERIAL PRIMARY KEY, parent_id bigint
> references parent(parent_id), name text);
>
> -- random name and sequential primary key for 100 thousand parents.
> INSERT INTO parent
> SELECT
> nextval('parent_parent_id_seq'),
> md5(random()::text)
> FROM generate_series(1, 10);
>
> -- 1 million children.
> -- set every odd id parent to have children. even id parent gets none.
> INSERT INTO child
> SELECT
>nextval('child_child_id_seq'),
>((generate_series/2*2) % 10)::bigint + 1,
>md5(random()::text)
> FROM generate_series(1, 100);
>
> CREATE INDEX ON child(parent_id);
> VACUUM ANALYZE parent, child;
>
> Both queries return the same results - I have taken a md5 of both queries
> without the LIMIT clause to confirm.
> Tables have been vacuumed and analyzed.
> No other queries are being executed.
> Reproducible with LIMIT 1 or LIMIT 100 or LIMIT 500.
> Changing work_mem makes no difference.
>
> -[ RECORD 1 ]--+-
> relname| parent
> relpages   | 935
> reltuples  | 10
> relallvisible  | 935
> relkind| r
> relnatts   | 2
> relhassubclass | f
> reloptions |
> pg_table_size  | 7700480
> -[ RECORD 2 ]--+-
> relname| child
> relpages   | 10310
> reltuples  | 1e+06
> relallvisible  | 10310
> relkind| r
> relnatts   | 3
> relhassubclass | f
> reloptions |
> pg_table_size  | 84516864
>
> PostgreSQL version number you are running:
> PostgreSQL 13.4 on arm-apple-darwin20.5.0, compiled by Apple clang version
> 12.0.5 (clang-1205.0.22.9), 64-bit
>
> How you installed PostgreSQL:
> Using homebrew for mac.
> brew install postgres
>
> Changes made to the settings in the postgresql.conf file:  see Server
> Configuration for a quick way to list them all.
> checkpoint_completion_target | 0.9  | configuration
> file
> checkpoint_timeout   | 30min| configuration
> file
> client_encoding  | UTF8 | client
> cpu_tuple_cost   | 0.03 | configuration
> file
> effective_cache_size | 4GB  | configuration
> file
> log_directory| log  | configuration
> file
> log_min_duration_statement   | 25ms | configuration
> file
> log_statement| none | configuration
> file
> log_temp_files   | 0| configuration
> file
> log_timezone | America/Anchorage| configuration
> file
> maintenance_work_mem | 512MB| configuration
> file
> max_parallel_maintenance_workers | 2| configuration
> file
> max_parallel_workers | 4| configuration
> file
> max_parallel_workers_per_gather  | 4 

Re: EXISTS by itself vs SELECT EXISTS much slower in query.

2021-11-08 Thread Tom Lane
Jimmy A  writes:
> I have two equivalent queries, one with an EXISTS clause by itself and one
> wrapped in a (SELECT EXISTS) and the "naked" exists is much slower.
> I would expect both to be the same speed / have same execution plan.

That is a dangerous assumption.  In general, wrapping (SELECT ...) around
something has a significant performance impact, because it pushes Postgres
to try to decouple the sub-select's execution from the outer query.
As an example,

postgres=# select x, random() from generate_series(1,3) x;
 x |   random
---+-
 1 | 0.08595356832524814
 2 |  0.6444265043474005
 3 |  0.6878852071694332
(3 rows)

postgres=# select x, (select random()) from generate_series(1,3) x;
 x |   random   
---+
 1 | 0.7028987801136708
 2 | 0.7028987801136708
 3 | 0.7028987801136708
(3 rows)

That's not a bug: it's expected that the second query will evaluate
random() only once.

In the case at hand, I suspect you're getting a "hashed subplan"
in one query and not the other.  The depesz.com display doesn't
really show that, but EXPLAIN VERBOSE would.

regards, tom lane