Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)
*A description of what you are trying to achieve and what results you
expect.:*
My end goal was to test the execution time difference between using an
IF(SELECT COUNT(*)...) and an IF EXISTS() when no indexes were used and
when a string match was not found. My expectation was that my 2 functions
would behave fairly similarly, but they most certainly did not. Here are
the table, functions, test queries, and test query results I received, as
well as comments as I present the pieces and talk about the results from my
perspective.
This is the table and data that I used for my tests. A table with 1
million sequenced records. No indexing on any columns. I ran ANALYZE on
this table and a VACUUM on the entire database, just to be sure.
CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 99) AS
int_distinct, 'Test'::text || generate_series(0, 99)::text AS
text_distinct;
These are the 2 functions that I ran my final tests with. My goal was to
determine which function would perform the fastest and my expectation was
that they would still be somewhat close in execution time comparison.
--Test Function #1
CREATE OR REPLACE FUNCTION zz_spx_ifcount_noidx(p_findme text)
RETURNS text
LANGUAGE 'plpgsql'
STABLE
AS $$
BEGIN
IF (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) =
LOWER(p_findme)) > 0 THEN
RETURN 'Found';
ELSE
RETURN 'Not Found';
END IF;
END;
$$;
--Test Function #2
CREATE OR REPLACE FUNCTION zz_spx_ifexists_noidx(p_findme text)
RETURNS text
LANGUAGE 'plpgsql'
STABLE
AS $$
BEGIN
IF EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) =
LOWER(p_findme)) THEN
RETURN 'Found';
ELSE
RETURN 'Not Found';
END IF;
END;
$$;
The first thing I did was to run some baseline tests using the basic
queries inside of the IF() checks found in each of the functions to see how
the query planner handled them. I ran the following two queries.
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
LOWER(text_distinct) = LOWER('Test501');
EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
LOWER(text_distinct) = LOWER('Test501');
The execution time results and query plans for these two were very similar,
as expected. In the results I can see that 2 workers were employed for
each query plan.
--Results for the SELECT COUNT(*) query.
QUERY
PLAN
Finalize Aggregate (cost=12661.42..12661.43 rows=1 width=8) (actual
time=172.105..172.105 rows=1 loops=1)
Buffers: shared
read=1912
-> Gather (cost=12661.21..12661.42 rows=2 width=8) (actual
time=172.020..172.099 rows=3 loops=1)
Workers Planned:
2
Workers Launched:
2
Buffers: shared
read=1912
-> Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual
time=155.123..155.123 rows=1 loops=3)
Buffers: shared
read=5406
-> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083
width=0) (actual time=155.103..155.103 rows=0 loops=3)
Filter: (lower(text_distinct) =
'test501'::text)
Rows Removed by Filter:
33
Buffers: shared
read=5406
Planning time: 0.718
ms
Execution time: 187.601 ms
--Results for the SELECT 1 query.
QUERY
PLAN
Gather (cost=1000.00..13156.00 rows=5000 width=4) (actual
time=175.682..175.682 rows=0 loops=1)
Workers Planned:
2
Workers Launched:
2
Buffers: shared
read=2021
-> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083
width=4) (actual time=159.769..159.769 rows=0 loops=3)
Filter: (lower(text_distinct) =
'test501'::text)
Rows Removed by Filter:
33
Buffers: shared
read=5406
Planning time: 0.874
ms
Execution time: 192.045 ms
After running these baseline tests and viewing the fairly similar results,
right or wrong, I expected my queries that tested the functions to behave
similarly. I started with the following query...
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
zz_spx_ifcount_noidx('Test501');
and I got the following "auto_explain" results...
2018-04-16 14:57:22.624 EDT [17812] LOG: duration: 155.239 ms plan:
Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE
LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual
time=155.230..155.230 rows=1 loops=1)
Buffers: shared read=1682
-> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083
width=0) (actual time=155.222..155.222 rows=0 loops=1)
Filter: (lower(text_distinct) = 'test501'::text)
Rows Removed by Filter: 311170
Buffers: shared read=1682
2018-04-16 14:57:22.624 EDT [9096] LOG: duration: 154.603 ms plan:
Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE
LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
Partial Aggregate (cost=11661.
Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)
On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra
wrote:
>
>
> On 04/16/2018 10:42 PM, Hackety Man wrote:
>
>> ...
>> The first thing I did was to run some baseline tests using the basic
>> queries inside of the IF() checks found in each of the functions to
>> see how the query planner handled them. I ran the following two
>> queries.
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test501');
>> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test501');
>>
>>
> Those are not the interesting plans, though. The EXISTS only cares about
> the first row, so you should be looking at
>
> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
> LOWER(text_distinct) = LOWER('Test501') LIMIT 1;
Okay. I tested this query and it did return an execution time on par with
my tests of the "zz_spx_ifexists_noidx" function.
>
>
> I moved on to test the other function with the following query...
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
>> zz_spx_ifcount_noidx('Test501');
>>
>> and I got the following "auto_explain" results...
>>
>> 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: 426.279 ms
>> plan:
>> Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>> Result (cost=4.08..4.09 rows=1 width=1) (actual
>> time=426.274..426.274 rows=1 loops=1)
>> Buffers: shared read=5406
>> InitPlan 1 (returns $0)
>>-> Seq Scan on zz_noidx1 (cost=0.00..20406.00 rows=5000
>> width=0) (actual time=426.273..426.273 rows=0 loops=1)
>> Filter: (lower(text_distinct) = 'test501'::text)
>> Rows Removed by Filter: 100
>> Buffers: shared read=5406
>> 2018-04-16 14:58:34.134 EDT [12616] CONTEXT: SQL statement
>> "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>> PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>> 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: 428.077 ms
>> plan:
>> Query Text: explain (analyze, buffers) select * from
>> zz_spx_ifexists_noidx('Test501')
>> Function Scan on zz_spx_ifexists_noidx (cost=0.25..0.26
>> rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
>> Buffers: shared hit=30 read=5438
>>
>> Definitely not the execution time, or query plan, results I was
>> expecting. As we can see, no workers were employed here and my
>> guess was that this was the reason or the large execution time
>> difference between these 2 tests? 199 milliseconds versus 428
>> milliseconds, which is a big difference. Why are workers not being
>> employed here like they were when I tested the query found inside of
>> the IF() check in a standalone manner? But then I ran another test
>> and the results made even less sense to me.
>>
>>
> The plan difference is due to not realizing the EXISTS essentially implies
> LIMIT 1. Secondly, it expects about 5000 rows matching the condition,
> uniformly spread through the table. But it apparently takes much longer to
> find the first one, hence the increased duration.
>
Ah. I did not know that. So EXISTS inherently applies a LIMIT 1, even
though it doesn't show in the query plan, correct? Is it not possible for
parallel scans to be implemented while applying an implicit, or explicit,
LIMIT 1?
>
> How did you generate the data?
I used generate_series() to create 1 million records in sequence at the
same time that I created the table using the following script...
CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 99) AS
int_distinct, 'Test'::text || generate_series(0, 99)::text AS
text_distinct;
>
> When I ran the above query the first 5 times after starting my
>> Postgres service, I got the same results each time (around 428
>> milliseconds), but when running the query 6 or more times, the
>> execution time jumps up to almost double that. Here are the
>> "auto_explain" results running this query a 6th time...
>>
>>
> This is likely due to generating a generic plan after the fifth execution.
> There seems to be only sma
Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)
Hi Pavel,
Thanks for sharing that information. I was not aware that the parallel
query functionality was not yet fully implemented.
Thanks,
Ryan
On Tue, Apr 17, 2018 at 1:17 AM, Pavel Stehule
wrote:
> Hi
>
> 2018-04-16 22:42 GMT+02:00 Hackety Man :
>
>> *A description of what you are trying to achieve and what results you
>> expect.:*
>>
>> My end goal was to test the execution time difference between using an
>> IF(SELECT COUNT(*)...) and an IF EXISTS() when no indexes were used and
>> when a string match was not found. My expectation was that my 2 functions
>> would behave fairly similarly, but they most certainly did not. Here are
>> the table, functions, test queries, and test query results I received, as
>> well as comments as I present the pieces and talk about the results from my
>> perspective.
>>
>> This is the table and data that I used for my tests. A table with 1
>> million sequenced records. No indexing on any columns. I ran ANALYZE on
>> this table and a VACUUM on the entire database, just to be sure.
>>
>> CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 99) AS
>> int_distinct, 'Test'::text || generate_series(0, 99)::text AS
>> text_distinct;
>>
>> These are the 2 functions that I ran my final tests with. My goal was to
>> determine which function would perform the fastest and my expectation was
>> that they would still be somewhat close in execution time comparison.
>>
>> --Test Function #1
>> CREATE OR REPLACE FUNCTION zz_spx_ifcount_noidx(p_findme text)
>> RETURNS text
>> LANGUAGE 'plpgsql'
>> STABLE
>> AS $$
>>
>> BEGIN
>> IF (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct)
>> = LOWER(p_findme)) > 0 THEN
>> RETURN 'Found';
>> ELSE
>> RETURN 'Not Found';
>> END IF;
>> END;
>> $$;
>>
>> --Test Function #2
>> CREATE OR REPLACE FUNCTION zz_spx_ifexists_noidx(p_findme text)
>> RETURNS text
>> LANGUAGE 'plpgsql'
>> STABLE
>> AS $$
>>
>> BEGIN
>> IF EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct)
>> = LOWER(p_findme)) THEN
>> RETURN 'Found';
>> ELSE
>> RETURN 'Not Found';
>> END IF;
>> END;
>> $$;
>>
>> The first thing I did was to run some baseline tests using the basic
>> queries inside of the IF() checks found in each of the functions to see how
>> the query planner handled them. I ran the following two queries.
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test501');
>> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test501');
>>
>> The execution time results and query plans for these two were very
>> similar, as expected. In the results I can see that 2 workers were
>> employed for each query plan.
>>
>> --Results for the SELECT COUNT(*) query.
>> QUERY PLAN
>>
>>
>>
>>
>> Finalize Aggregate (cost=12661.42..12661.43 rows=1 width=8) (actual
>> time=172.105..172.105 rows=1 loops=1)
>> Buffers: shared read=1912
>>
>>
>> -> Gather (cost=12661.21..12661.42 rows=2 width=8) (actual
>> time=172.020..172.099 rows=3 loops=1)
>> Workers Planned: 2
>>
>>
>> Workers Launched: 2
>>
>>
>> Buffers: shared read=1912
>>
>>
>> -> Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual
>> time=155.123..155.123 rows=1 loops=3)
>> Buffers: shared read=5406
>>
>>
>> -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083
>> width=0) (actual time=155.103..155.103 rows=0 loops=3)
>> Filter: (lower(text_distinct) = 'test501'::text)
>>
>> Rows Removed by Filter: 33
>>
>> Buffers: shared read=5406
>>
>> Planning time: 0.718 ms
>>
>>
>> Execution time: 187.601 ms
>>
>> --Results for the SELECT 1 query.
>> QUERY PLAN
>>
>>
>>
>> Gather (cost=1000.00..13156.00 rows=5000 width=4) (actual
>> time=175.682..175.682 rows=0 loops=1)
>> Workers Planned: 2
>>
>>
>>
Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)
On Tue, Apr 17, 2018 at 10:23 AM, Tomas Vondra wrote: > > > On 04/17/2018 04:01 PM, Hackety Man wrote: > >> >> >> On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra < >> [email protected] <mailto:[email protected]>> >> wrote: >> >> >> >> On 04/16/2018 10:42 PM, Hackety Man wrote: >> >> ... >> The first thing I did was to run some baseline tests using >> the basic >> queries inside of the IF() checks found in each of the >> functions to >> see how the query planner handled them. I ran the >> following two >> queries. >> >> EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM >> zz_noidx1 WHERE >> LOWER(text_distinct) = LOWER('Test501'); >> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE >> LOWER(text_distinct) = LOWER('Test501'); >> >> >> Those are not the interesting plans, though. The EXISTS only cares >> about the first row, so you should be looking at >> >> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE >> LOWER(text_distinct) = LOWER('Test501') LIMIT 1; >> >> >> >> Okay. I tested this query and it did return an execution time on par >> with my tests of the "zz_spx_ifexists_noidx" function. >> * >> * >> >> >> >> >> I moved on to test the other function with the following >> query... >> >> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM >> zz_spx_ifcount_noidx('Test501'); >> >> and I got the following "auto_explain" results... >> >> 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: >> 426.279 ms plan: >>Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 >> WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) >>Result (cost=4.08..4.09 rows=1 width=1) (actual >> time=426.274..426.274 rows=1 loops=1) >> Buffers: shared read=5406 >> InitPlan 1 (returns $0) >> -> Seq Scan on zz_noidx1 (cost=0.00..20406.00 >> rows=5000 >> width=0) (actual time=426.273..426.273 rows=0 loops=1) >>Filter: (lower(text_distinct) = >> 'test501'::text) >>Rows Removed by Filter: 100 >>Buffers: shared read=5406 >> 2018-04-16 14:58:34.134 EDT [12616] CONTEXT: SQL >> statement >> "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))" >>PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 >> at IF >> 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: >> 428.077 ms plan: >>Query Text: explain (analyze, buffers) select * from >> zz_spx_ifexists_noidx('Test501') >>Function Scan on zz_spx_ifexists_noidx >> (cost=0.25..0.26 >> rows=1 width=32) (actual time=428.076..428.076 rows=1 >> loops=1) >> Buffers: shared hit=30 read=5438 >> >> Definitely not the execution time, or query plan, results I >> was >> expecting. As we can see, no workers were employed here and >> my >> guess was that this was the reason or the large execution >> time >> difference between these 2 tests? 199 milliseconds versus >> 428 >> milliseconds, which is a big difference. Why are workers >> not being >> employed here like they were when I tested the query found >> inside of >> the IF() check in a standalone manner? But then I ran >> another test >> and the results made even less sense to me. >> >> >> The plan difference is due to not realizing the EXISTS essentially >> implies LIMIT 1. Secondly, it expects about 5000 rows matching the >> condition, uniformly spread through the table. But it apparently >> takes much longer to find the first one, hence the increased d
