Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread Ranier Vilela
Em sáb., 28 de ago. de 2021 às 22:55, [email protected] <
[email protected]> escreveu:

>
>
>>  -Original Message-
>>  From: Tom Lane 
>>  Sent: Saturday, August 28, 2021 15:51
>>  To: [email protected]
>>  Cc: Andrew Dunstan ; Justin Pryzby
>>  ; Ranier Vilela ; pgsql-
>>  [email protected]
>>  Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2
>>  and 13.4
>>
>>  "[email protected]"  writes:
>>  > SET lc_messages = 'C';
>>  > show lc_messages; --> OK 'C'
>>
>>  > explain (analyze,buffers,COSTS,TIMING) select MAX(toFloat(b,
> null)) as
>>  > "b" from sampletest ...
>>  > Execution Time: 175.600 ms
>>
>>  > explain (analyze,buffers,COSTS,TIMING) select MAX(toFloat(a,
> null)) as
>>  > "a" from sampletest ...
>>  > Execution Time: 88031.585 ms
>>
>>  > Doesn't seem to make a difference unless I misunderstood what you
>>  were asking for regarding the locale?
>>
>>  Hmm.  This suggests that whatever effect Andrew found with NLS is
>>  actually not the explanation for your problem.  So I'm even more
>>  confused than before.
>>
>>regards, tom lane
>
> I am so sorry to hear... So, curious on my end: is this something that you
> are not able to reproduce on your environments? On my end, I did reproduce
> it on different VMs and my local laptop, across windows Server 2012 and
> Windows 10, so I'd figure it would be pretty easy to reproduce?
>
What does reproduction have to do with solving the problem?
Can you tell how many commits there are between the affected versions?

I retested this case with HEAD, and it seems to me that NLS does affect it.

postgres=# create table sampletest (a varchar, b varchar);
CREATE TABLE
postgres=# insert into sampletest (a, b)
postgres-# select substr(md5(random()::text), 0, 15),
(1*random())::integer::varchar
postgres-#   from generate_series(1,10);
INSERT 0 10
postgres=#
postgres=# CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)
postgres-# RETURNS real AS $$
postgres$# BEGIN
postgres$#   RETURN case when str is null then val else str::real end;
postgres$# EXCEPTION WHEN OTHERS THEN
postgres$#   RETURN val;
postgres$# END;
postgres$# $$ LANGUAGE plpgsql COST 1 IMMUTABLE;
CREATE FUNCTION
postgres=# explain (analyze,buffers,COSTS,TIMING)
postgres-# select MAX(toFloat(b, null)) as "b" from sampletest;
   QUERY PLAN

 Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual
time=386.990..386.991 rows=1 loops=1)
   Buffers: shared hit=643 read=1
   ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056 width=32)
(actual time=0.032..17.325 rows=10 loops=1)
 Buffers: shared hit=637
 Planning:
   Buffers: shared hit=13 read=13
 Planning Time: 0.967 ms
 Execution Time: 387.989 ms
(8 rows)


postgres=# explain (analyze,buffers,COSTS,TIMING)
postgres-# select MAX(toFloat(a, null)) as "a" from sampletest;
   QUERY PLAN

 Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual
time=1812.556..1812.557 rows=1 loops=1)
   Buffers: shared hit=639 read=1
   ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056 width=32)
(actual time=0.026..20.866 rows=10 loops=1)
 Buffers: shared hit=637
 Planning Time: 0.152 ms
 Execution Time: 1812.587 ms
(6 rows)


postgres=# SET lc_messages = 'C';
SET
postgres=# drop table sampletest;
DROP TABLE
postgres=# create table sampletest (a varchar, b varchar);
CREATE TABLE
postgres=# insert into sampletest (a, b)
postgres-# select substr(md5(random()::text), 0, 15),
(1*random())::integer::varchar
postgres-#   from generate_series(1,10);
INSERT 0 10
postgres=#
postgres=# CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)
postgres-# RETURNS real AS $$
postgres$# BEGIN
postgres$#   RETURN case when str is null then val else str::real end;
postgres$# EXCEPTION WHEN OTHERS THEN
postgres$#   RETURN val;
postgres$# END;
postgres$# $$ LANGUAGE plpgsql COST 1 IMMUTABLE;
CREATE FUNCTION
postgres=# explain (analyze,buffers,COSTS,TIMING)
postgres-# select MAX(toFloat(b, null)) as "b" from sampletest;
   QUERY PLAN

 Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual
time=278.993..278.994 rows=1 loops=1)
   Buffers: shared hit=637
   ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056 width=32)
(actual time=0.029..16.837 rows=10 loops=1)
 Buffers: shared hit=637
 Plann

Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread Tom Lane
Ranier Vilela  writes:
> I retested this case with HEAD, and it seems to me that NLS does affect it.

Sure, there's no question that message translation will have *some* cost.
But on my machine it is an incremental tens-of-percent kind of cost,
and that is the result you're getting as well.  So it's not very clear
where these factor-of-several-hundred differences are coming from.

regards, tom lane




Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread Ranier Vilela
Em dom., 29 de ago. de 2021 às 10:35, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > I retested this case with HEAD, and it seems to me that NLS does affect
> it.
>
> Sure, there's no question that message translation will have *some* cost.
> But on my machine it is an incremental tens-of-percent kind of cost,
> and that is the result you're getting as well.  So it's not very clear
> where these factor-of-several-hundred differences are coming from.
>
A hypothesis that has not yet come up, may be some defect in the code
generation,
by the previous msvc compiler used, because in all my tests I always use
the latest version,
which has several corrections in the code generation part.

View this test with one of the attempts to reproduce the problem.
msvc: 19.29.30133 para x64
windows 10 64 bits
Postgres: 12.8

postgres=# select version();
  version

 PostgreSQL 12.8, compiled by Visual C++ build 1929, 64-bit
(1 row)


postgres=# drop table sampletest;
DROP TABLE
postgres=# create table sampletest (a varchar, b varchar);
CREATE TABLE
postgres=# insert into sampletest (a, b)
postgres-# select substr(md5(random()::text), 0, 15),
(1*random())::integer::varchar
postgres-#   from generate_series(1,10);
INSERT 0 10
postgres=#
postgres=# CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)
postgres-# RETURNS real AS $$
postgres$# BEGIN
postgres$#   RETURN case when str is null then val else str::real end;
postgres$# EXCEPTION WHEN OTHERS THEN
postgres$#   RETURN val;
postgres$# END;
postgres$# $$ LANGUAGE plpgsql COST 1 IMMUTABLE;
CREATE FUNCTION
postgres=# explain (analyze,buffers,COSTS,TIMING)
postgres-# select MAX(toFloat(b, null)) as "b" from sampletest;
   QUERY PLAN

 Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual
time=339.978..339.979 rows=1 loops=1)
   Buffers: shared hit=644
   ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056 width=32)
(actual time=0.032..18.132 rows=10 loops=1)
 Buffers: shared hit=637
 Planning Time: 3.631 ms
 Execution Time: 340.330 ms
(6 rows)


postgres=# explain (analyze,buffers,COSTS,TIMING)
postgres-# select MAX(toFloat(a, null)) as "a" from sampletest;
   QUERY PLAN

 Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual
time=1724.902..1724.903 rows=1 loops=1)
   Buffers: shared hit=640
   ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056 width=32)
(actual time=0.021..23.489 rows=10 loops=1)
 Buffers: shared hit=637
 Planning Time: 0.150 ms
 Execution Time: 1724.930 ms
(6 rows)

regards,
Ranier Vilela


RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread [email protected]
>Sure, there's no question that message translation will have *some* cost.
>But on my machine it is an incremental tens-of-percent kind of cost,
>and that is the result you're getting as well.  So it's not very clear
>where these factor-of-several-hundred differences are coming from.
>A hypothesis that has not yet come up, may be some defect in the code 
>generation, 
>by the previous msvc compiler used, because in all my tests I always use the 
>latest version, 
>which has several corrections in the code generation part.



Hello all,

I don't think this reproduces the issue I experience. I saw a difference of 
around 500x! What you see is 5x, which according to Tom would be expected for 
an execution path involving exceptions. And NLS should have an impact as well 
since more work happens. From the numbers you published, I see 10-15% change 
which again would be expected?

I cannot think of anything that would be specific to me with regards to this 
scenario given that I have tried it in quite a few environments from plain 
stock installs. Until one of you is able to reproduce this, you may be chasing 
other issues. 

Is it possible that the client I am using or the way I am creating the test 
database might affect this scenario? I use DBeaver and use the default settings 
to create the database:
- default encoding: UTF8
- collate: English_United States.1252
- ctype: English_United States.1252
- default tablespace: pg_default

Settings:
NameValue   Unit
allow_system_table_mods off [NULL]
application_nameDBeaver 21.1.3 - Main [NULL]
archive_cleanup_command [NULL]
archive_command (disabled)  [NULL]
archive_modeoff [NULL]
archive_timeout 0   s
array_nulls on  [NULL]
authentication_timeout  60  s
autovacuum  on  [NULL]
autovacuum_analyze_scale_factor 0.1 [NULL]
autovacuum_analyze_threshold50  [NULL]
autovacuum_freeze_max_age   2   [NULL]
autovacuum_max_workers  3   [NULL]
autovacuum_multixact_freeze_max_age 4   [NULL]
autovacuum_naptime  60  s
autovacuum_vacuum_cost_delay2   ms
autovacuum_vacuum_cost_limit-1  [NULL]
autovacuum_vacuum_insert_scale_factor   0.2 [NULL]
autovacuum_vacuum_insert_threshold  1000[NULL]
autovacuum_vacuum_scale_factor  0.2 [NULL]
autovacuum_vacuum_threshold 50  [NULL]
autovacuum_work_mem -1  kB
backend_flush_after 0   8kB
backslash_quote safe_encoding   [NULL]
backtrace_functions [NULL]
bgwriter_delay  200 ms
bgwriter_flush_after0   8kB
bgwriter_lru_maxpages   100 [NULL]
bgwriter_lru_multiplier 2   [NULL]
block_size  8192[NULL]
bonjour off [NULL]
bonjour_name[NULL]
bytea_outputhex [NULL]
check_function_bodies   on  [NULL]
checkpoint_completion_target0.5 [NULL]
checkpoint_flush_after  0   8kB
checkpoint_timeout  300 s
checkpoint_warning  30  s
client_encoding UTF8[NULL]
client_min_messages notice  [NULL]
cluster_name[NULL]
commit_delay0   [NULL]
commit_siblings 5   [NULL]
config_file C:/Program Files/PostgreSQL/13/data/postgresql.conf [NULL]
constraint_exclusionpartition   [NULL]
cpu_index_tuple_cost0.005   [NULL]
cpu_operator_cost   0.0025  [NULL]
cpu_tuple_cost  0.01[NULL]
cursor_tuple_fraction   0.1 [NULL]
data_checksums  off [NULL]
data_directory  C:/Program Files/PostgreSQL/13/data [NULL]
data_directory_mode 700 [NULL]
data_sync_retry off [NULL]
DateStyle   ISO, YMD[NULL]
db_user_namespace   off [NULL]
deadlock_timeout1000ms
debug_assertionsoff [NULL]
debug_pretty_print  on  [NULL]
debug_print_parse   off [NULL]
debug_print_planoff [NULL]
debug_print_rewritten   off [NULL]
default_statistics_target   100 [NULL]
default_table_access_method heap[NULL]
default_tablespace  [NULL]
default_text_search_config  pg_catalog.english  [NULL]
default_transaction_deferrable  off [NULL]
default_transaction_isolation   read committed  [NULL]
default_transaction_read_only   off [NULL]
dynamic_library_path$libdir [NULL]
dynamic_shared_memory_type  windows [NULL]
effective_cache_size524288  8kB
effective_io_concurrency0   [NULL]
enable_bitmapscan   on  [NULL]
enable_gathermerge  on  [NULL]
enable_hashagg  on  [NULL]
enable_hashjoin on  [NULL]
enable_incremental_sort on  [NULL]
enable_indexonlyscanon  [NULL]
enable_indexscanon  [NULL]
enable_material on  [NULL]
enable_mergejoinon  [NULL]
enable_nestloop on  [NULL]
enable_parallel_append  on  [NULL]
enable_parallel_hashon  [NULL]
enable_partition_pruning  

Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread Tom Lane
"[email protected]"  writes:
> Is it possible that the client I am using or the way I am creating the test 
> database might affect this scenario? I use DBeaver and use the default 
> settings to create the database:
> - default encoding: UTF8
> - collate: English_United States.1252
> - ctype: English_United States.1252

Yeah, I was thinking of quizzing you about that.  I wonder whether
something is thinking it needs to transcode to WIN1252 encoding and then
back to UTF8, based on the .1252 property of the LC_XXX settings.  That
shouldn't account for any 500X factor either, but we're kind of grasping
at straws here.

Does Windows have any locale choices that imply UTF8 encoding exactly,
and if so, do your results change when using that?  Alternatively,
try creating a database with WIN1252 encoding and those locale settings.

regards, tom lane




Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread Ranier Vilela
Em dom., 29 de ago. de 2021 às 13:03, [email protected] <
[email protected]> escreveu:

> >Sure, there's no question that message translation will have *some* cost.
> >But on my machine it is an incremental tens-of-percent kind of cost,
> >and that is the result you're getting as well.  So it's not very clear
> >where these factor-of-several-hundred differences are coming from.
> >A hypothesis that has not yet come up, may be some defect in the code
> generation,
> >by the previous msvc compiler used, because in all my tests I always use
> the latest version,
> >which has several corrections in the code generation part.
>
>
> 
>
> Hello all,
>
> I don't think this reproduces the issue I experience. I saw a difference
> of around 500x! What you see is 5x, which according to Tom would be
> expected for an execution path involving exceptions. And NLS should have an
> impact as well since more work happens. From the numbers you published, I
> see 10-15% change which again would be expected?
>
Yes, It seems to me that is expected for NLS usage.


>
> I cannot think of anything that would be specific to me with regards to
> this scenario given that I have tried it in quite a few environments from
> plain stock installs. Until one of you is able to reproduce this, you may
> be chasing other issues.
>
I think I'm unable to reproduce the issue, because I didn't use any plain
stock installs.
Postgres env tests here, is a fresh build with the latest msvc.
I have no intention of repeating the issue, with something exactly the same
as your environment,
but with a very different environment.

Can you show the version of Postgres, at your Windows 10 env, who got this
result?
Planning Time: 0.171 ms
Execution Time: 88031.585 ms

regards,
Ranier Vilela


RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread [email protected]


From: Ranier Vilela  
Sent: Sunday, August 29, 2021 14:20
To: [email protected]
Cc: Tom Lane ; Andrew Dunstan ; Justin 
Pryzby ; [email protected]
Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

Em dom., 29 de ago. de 2021 às 13:03, mailto:[email protected] 
 escreveu:
>Sure, there's no question that message translation will have *some* cost.
>But on my machine it is an incremental tens-of-percent kind of cost,
>and that is the result you're getting as well.  So it's not very clear
>where these factor-of-several-hundred differences are coming from.
>A hypothesis that has not yet come up, may be some defect in the code 
>generation, 
>by the previous msvc compiler used, because in all my tests I always use the 
>latest version, 
>which has several corrections in the code generation part.



Hello all,

I don't think this reproduces the issue I experience. I saw a difference of 
around 500x! What you see is 5x, which according to Tom would be expected for 
an execution path involving exceptions. And NLS should have an impact as well 
since more work happens. From the numbers you published, I see 10-15% change 
which again would be expected?
Yes, It seems to me that is expected for NLS usage.
 

I cannot think of anything that would be specific to me with regards to this 
scenario given that I have tried it in quite a few environments from plain 
stock installs. Until one of you is able to reproduce this, you may be chasing 
other issues. 
I think I'm unable to reproduce the issue, because I didn't use any plain stock 
installs.
Postgres env tests here, is a fresh build with the latest msvc.
I have no intention of repeating the issue, with something exactly the same as 
your environment, 
but with a very different environment.

Can you show the version of Postgres, at your Windows 10 env, who got this 
result?
Planning Time: 0.171 ms
Execution Time: 88031.585 ms

regards,
Ranier Vilela



---
Hello Ranier,

All my tests were on latest 13.4 install I downloaded from the main site.

SELECT version();
PostgreSQL 13.4, compiled by Visual C++ build 1914, 64-bit


As per the following:

> I think I'm unable to reproduce the issue, because I didn't use any plain 
> stock installs.
> Postgres env tests here, is a fresh build with the latest msvc.
> I have no intention of repeating the issue, with something exactly the same 
> as your environment, 
> but with a very different environment.

I am not sure I understand. Are you saying the standard installs may be faulty? 
A stock install from the stock installer on a windows machine should take 10mn 
top. If it doesn't reproduce the issue out of the box, then at least I have a 
confirmation that there may be something weird that I am somehow repeating 
across all the installs I have performed???

Thank you,
Laurent.




RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread [email protected]


   >  -Original Message-
   >  From: Tom Lane 
   >  Sent: Sunday, August 29, 2021 12:19
   >  To: [email protected]
   >  Cc: Ranier Vilela ; Andrew Dunstan
   >  ; Justin Pryzby ; pgsql-
   >  [email protected]
   >  Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2
   >  and 13.4
   >  
   >  "[email protected]"  writes:
   >  > Is it possible that the client I am using or the way I am creating the 
test
   >  database might affect this scenario? I use DBeaver and use the default
   >  settings to create the database:
   >  > - default encoding: UTF8
   >  > - collate: English_United States.1252
   >  > - ctype: English_United States.1252
   >  
   >  Yeah, I was thinking of quizzing you about that.  I wonder whether
   >  something is thinking it needs to transcode to WIN1252 encoding and
   >  then back to UTF8, based on the .1252 property of the LC_XXX settings.
   >  That shouldn't account for any 500X factor either, but we're kind of
   >  grasping at straws here.
   >  
   >  Does Windows have any locale choices that imply UTF8 encoding
   >  exactly, and if so, do your results change when using that?  
Alternatively,
   >  try creating a database with WIN1252 encoding and those locale
   >  settings.
   >  
   >regards, tom lane

Yeah, grasping at straws... and no material changes 😊 This is mystifying.

show lc_messages;
-- English_United States.1252

create table sampletest (a varchar, b varchar);
insert into sampletest (a, b)
select substr(md5(random()::text), 0, 15), 
(1*random())::integer::varchar
  from generate_series(1,10);

CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)
RETURNS real AS $$
BEGIN
  RETURN case when str is null then val else str::real end;
EXCEPTION WHEN OTHERS THEN
  RETURN val;
END;
$$ LANGUAGE plpgsql COST 1 IMMUTABLE;

explain (analyze,buffers,COSTS,TIMING) 
select MAX(toFloat(a, null)) as "a" from sampletest
--Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual 
time=89527.032..89527.033 rows=1 loops=1)
--  Buffers: shared hit=647
--  ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056 width=32) 
(actual time=0.024..37.811 rows=10 loops=1)
--Buffers: shared hit=637
--Planning:
--  Buffers: shared hit=24
--Planning Time: 0.347 ms
--Execution Time: 89527.501 ms

explain (analyze,buffers,COSTS,TIMING) 
select MAX(toFloat(b, null)) as "b" from sampletest
--Aggregate  (cost=2137.00..2137.01 rows=1 width=4) (actual 
time=186.605..186.606 rows=1 loops=1)
--  Buffers: shared hit=637
--  ->  Seq Scan on sampletest  (cost=0.00..1637.00 rows=10 width=8) 
(actual time=0.008..9.679 rows=10 loops=1)
--Buffers: shared hit=637
--Planning:
--  Buffers: shared hit=4
--Planning Time: 0.339 ms
--Execution Time: 186.641 ms


At this point, I am not sure how to proceed except to rethink that toFloat() 
function and many other places where we use exceptions. We get such dirty data 
that I need a "safe" way to convert a string to float without throwing an 
exception. BTW, I tried other combinations in case there may have been some 
weird interactions with the ::REAL conversion operator, but nothing made any 
change. Could you recommend another approach off the top of your head? I could 
use regexes for testing etc... Or maybe there is another option like a no-throw 
conversion that's built in or in some extension that you may know of? Like the 
"SAFE." Prefix in BigQuery.

Thank you,
Laurent.





Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread Ranier Vilela
Em dom., 29 de ago. de 2021 às 21:29, [email protected] <
[email protected]> escreveu:

>
>
> From: Ranier Vilela 
> Sent: Sunday, August 29, 2021 14:20
> To: [email protected]
> Cc: Tom Lane ; Andrew Dunstan ;
> Justin Pryzby ; [email protected]
> Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2 and
> 13.4
>
> Em dom., 29 de ago. de 2021 às 13:03, mailto:[email protected]
>  escreveu:
> >Sure, there's no question that message translation will have *some* cost.
> >But on my machine it is an incremental tens-of-percent kind of cost,
> >and that is the result you're getting as well.  So it's not very clear
> >where these factor-of-several-hundred differences are coming from.
> >A hypothesis that has not yet come up, may be some defect in the code
> generation,
> >by the previous msvc compiler used, because in all my tests I always use
> the latest version,
> >which has several corrections in the code generation part.
>
>
> 
>
> Hello all,
>
> I don't think this reproduces the issue I experience. I saw a difference
> of around 500x! What you see is 5x, which according to Tom would be
> expected for an execution path involving exceptions. And NLS should have an
> impact as well since more work happens. From the numbers you published, I
> see 10-15% change which again would be expected?
> Yes, It seems to me that is expected for NLS usage.
>
>
> I cannot think of anything that would be specific to me with regards to
> this scenario given that I have tried it in quite a few environments from
> plain stock installs. Until one of you is able to reproduce this, you may
> be chasing other issues.
> I think I'm unable to reproduce the issue, because I didn't use any plain
> stock installs.
> Postgres env tests here, is a fresh build with the latest msvc.
> I have no intention of repeating the issue, with something exactly the
> same as your environment,
> but with a very different environment.
>
> Can you show the version of Postgres, at your Windows 10 env, who got this
> result?
> Planning Time: 0.171 ms
> Execution Time: 88031.585 ms
>
> regards,
> Ranier Vilela
>
>
>
>
> ---
> Hello Ranier,
>
> All my tests were on latest 13.4 install I downloaded from the main site.
>
> SELECT version();
> PostgreSQL 13.4, compiled by Visual C++ build 1914, 64-bit
>
>
> As per the following:
>
> > I think I'm unable to reproduce the issue, because I didn't use any
> plain stock installs.
> > Postgres env tests here, is a fresh build with the latest msvc.
> > I have no intention of repeating the issue, with something exactly the
> same as your environment,
> > but with a very different environment.
>
> I am not sure I understand. Are you saying the standard installs may be
> faulty?

Not exactly.

A stock install from the stock installer on a windows machine should take
> 10mn top. If it doesn't reproduce the issue out of the box, then at least I
> have a confirmation that there may be something weird that I am somehow
> repeating across all the installs I have performed???
>
Most likely it's something in your environment, along with your client.

All I can say is that it is unreproducible with a build/test made with the
latest version of msvc.
Windows 10 64 bits.
msvc 2019 64 bits.

git clone --branch remote/origins/REL_13_4
https://github.com/postgres/postgres/ postgres_13_4
cd postgres_13_4
cd src
cd tools
cd msvc
build
install c:\postgres_bench
cd\postgres_bench\bin
initdb -D c:\postgres_bench\data -E UTF-8 -U postgres -W
pg_ctl -D c:\postgres_bench\data -l c:\postgres_bench\log\log1 start
psql -U postgres

postgres=# select version();
  version

 PostgreSQL 13.4, compiled by Visual C++ build 1929, 64-bit
(1 row)

postgres=# create table sampletest (a varchar, b varchar);
CREATE TABLE
postgres=# insert into sampletest (a, b)
postgres-# select substr(md5(random()::text), 0, 15),
(1*random())::integer::varchar
postgres-#   from generate_series(1,10);
INSERT 0 10
postgres=#
postgres=# CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)
postgres-# RETURNS real AS $$
postgres$# BEGIN
postgres$#   RETURN case when str is null then val else str::real end;
postgres$# EXCEPTION WHEN OTHERS THEN
postgres$#   RETURN val;
postgres$# END;
postgres$# $$ LANGUAGE plpgsql COST 1 IMMUTABLE;
CREATE FUNCTION
postgres=# explain (analyze,buffers,COSTS,TIMING)
postgres-# select MAX(toFloat(a, null)) as "a" from sampletest;
   QUERY PLAN

Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread Julien Rouhaud
On Mon, Aug 30, 2021 at 8:44 AM [email protected]
 wrote:
>
> Yeah, grasping at straws... and no material changes 😊 This is mystifying.
>
> show lc_messages;
> -- English_United States.1252
>
> create table sampletest (a varchar, b varchar);
> insert into sampletest (a, b)
> select substr(md5(random()::text), 0, 15), 
> (1*random())::integer::varchar
>   from generate_series(1,10);
>
> CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)
> RETURNS real AS $$
> BEGIN
>   RETURN case when str is null then val else str::real end;
> EXCEPTION WHEN OTHERS THEN
>   RETURN val;
> END;
> $$ LANGUAGE plpgsql COST 1 IMMUTABLE;
>
> explain (analyze,buffers,COSTS,TIMING)
> select MAX(toFloat(a, null)) as "a" from sampletest
> --Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual 
> time=89527.032..89527.033 rows=1 loops=1)
> --  Buffers: shared hit=647
> --  ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056 width=32) 
> (actual time=0.024..37.811 rows=10 loops=1)
> --Buffers: shared hit=637
> --Planning:
> --  Buffers: shared hit=24
> --Planning Time: 0.347 ms
> --Execution Time: 89527.501 ms
>
> explain (analyze,buffers,COSTS,TIMING)
> select MAX(toFloat(b, null)) as "b" from sampletest
> --Aggregate  (cost=2137.00..2137.01 rows=1 width=4) (actual 
> time=186.605..186.606 rows=1 loops=1)
> --  Buffers: shared hit=637
> --  ->  Seq Scan on sampletest  (cost=0.00..1637.00 rows=10 width=8) 
> (actual time=0.008..9.679 rows=10 loops=1)
> --Buffers: shared hit=637
> --Planning:
> --  Buffers: shared hit=4
> --Planning Time: 0.339 ms
> --Execution Time: 186.641 ms
>
>
> At this point, I am not sure how to proceed except to rethink that toFloat() 
> function and many other places where we use exceptions. We get such dirty 
> data that I need a "safe" way to convert a string to float without throwing 
> an exception. BTW, I tried other combinations in case there may have been 
> some weird interactions with the ::REAL conversion operator, but nothing made 
> any change. Could you recommend another approach off the top of your head? I 
> could use regexes for testing etc... Or maybe there is another option like a 
> no-throw conversion that's built in or in some extension that you may know 
> of? Like the "SAFE." Prefix in BigQuery.

I tried this scenario using edb's 13.3 x64 install:

postgres=# select version();
  version

 PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit
(1 row)


postgres=# \l postgres
  List of databases
   Name   |  Owner   | Encoding | Collate | Ctype | Access privileges
--+--+--+-+---+---
 postgres | postgres | UTF8 | C   | C |
(1 row)

postgres=# explain (analyze,buffers,COSTS,TIMING)
postgres-# select MAX(toFloat(a, null)) as "a" from sampletest;
   QUERY PLAN

Aggregate  (cost=2137.00..2137.01 rows=1 width=4) (actual
time=44962.279..44962.280 rows=1 loops=1)
   Buffers: shared hit=657
   ->  Seq Scan on sampletest  (cost=0.00..1637.00 rows=10
width=15) (actual time=0.009..8.900 rows=10 loops=1)
 Buffers: shared hit=637
 Planning:
   Buffers: shared hit=78
 Planning Time: 0.531 ms
 Execution Time: 44963.747 ms
(8 rows)

and with locally compiled REL_13_STABLE's head on the same machine:

rjuju=# select version();
  version

 PostgreSQL 13.4, compiled by Visual C++ build 1929, 64-bit
(1 row)

rjuju=# \l rjuju
   List of databases
 Name  | Owner | Encoding | Collate | Ctype | Access privileges
---+---+--+-+---+---
 rjuju | rjuju | UTF8 | C   | C |
(1 row)

rjuju-# select MAX(toFloat(a, null)) as "a" from sampletest;
  QUERY PLAN
---
 Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual
time=460.334..460.334 rows=1 loops=1)
   Buffers: shared hit=646 read=1
   ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056
width=32) (actual time=0.010..7.612 rows=10 loops=1)
 Buffers: shared hit=637
 Planning:
   Buffers: shared hit=20 read=1
 Planning Time: 0.125 ms
 Execution Time: 460.527 ms
(8 rows)

Note that I followed [1], so I simply used "build" and "install".  I
have no idea what is done by default and if NLS is included or not.

So if default build on windows has NLS included, it probably means
that either there's something specific on edb's build (I have no idea
how their build is produced) or their version of msvc is responsible

Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-29 Thread Pavel Stehule
Hi

po 30. 8. 2021 v 2:44 odesílatel [email protected] <
[email protected]> napsal:

>
>
>
> At this point, I am not sure how to proceed except to rethink that
> toFloat() function and many other places where we use exceptions. We get
> such dirty data that I need a "safe" way to convert a string to float
> without throwing an exception. BTW, I tried other combinations in case
> there may have been some weird interactions with the ::REAL conversion
> operator, but nothing made any change. Could you recommend another approach
> off the top of your head? I could use regexes for testing etc... Or maybe
> there is another option like a no-throw conversion that's built in or in
> some extension that you may know of? Like the "SAFE." Prefix in BigQuery.
>

CREATE OR REPLACE FUNCTION safe_to_double_precision(t text)
RETURNS double precision AS $$
BEGIN
  IF $1 SIMILAR TO '[+-]?([0-9]*[.])?[0-9]+' THEN
RETURN $1::double precision;
  ELSE
RETURN NULL;
  END IF;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

Regards

Pavel


>
> Thank you,
> Laurent.
>
>
>
>


Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 (workarounds)

2021-08-29 Thread Justin Pryzby
On Mon, Aug 30, 2021 at 04:43:23AM +0200, Pavel Stehule wrote:
> po 30. 8. 2021 v 2:44 odesílatel [email protected] napsal:
> > At this point, I am not sure how to proceed except to rethink that
> > toFloat() function and many other places where we use exceptions. We get
> > such dirty data that I need a "safe" way to convert a string to float
> > without throwing an exception. BTW, I tried other combinations in case
> > there may have been some weird interactions with the ::REAL conversion
> > operator, but nothing made any change. Could you recommend another approach
> > off the top of your head? I could use regexes for testing etc... Or maybe
> > there is another option like a no-throw conversion that's built in or in
> > some extension that you may know of? Like the "SAFE." Prefix in BigQuery.
> 
> CREATE OR REPLACE FUNCTION safe_to_double_precision(t text)
> RETURNS double precision AS $$
> BEGIN
>   IF $1 SIMILAR TO '[+-]?([0-9]*[.])?[0-9]+' THEN
> RETURN $1::double precision;
>   ELSE
> RETURN NULL;
>   END IF;
> END;
> $$ LANGUAGE plpgsql IMMUTABLE STRICT;

This tries to use a regex to determine if something is a "Number" or not.
Which has all the issues enumerated in painful detail by long answers on stack
overflow, and other wiki/blog/forums.

Rather than trying to define Numbers using regex, I'd try to avoid only the
most frequent exceptions and get 90% of the performance back.  I don't know
what your data looks like, but you might try things like this:

IF $1 IS NULL THEN RETURN $2
ELSE IF $1 ~ '^$' THEN RETURN $2
ELSE IF $1 ~ '[[:alpha:]]{2}' THEN RETURN $2
ELSE IF $1 !~ '[[:digit:]]' THEN RETURN $2
BEGIN   

  
   RETURN $1::float;
EXCEPTION WHEN OTHERS THEN  

  
   RETURN $2;
END;

  

You can check the stackoverflow page for ideas as to what kind of thing to
reject, but it may depend mostly on your data (what is the most common string?
The most common exceptional string?).

I think it's possible that could even be *faster* than the original, since it
avoids the exception block for values which are for sure going to cause an
exception anyway.  It might be that using alternation (|) is faster (if less
readable) than using a handful of IF branches.

-- 
Justin




RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 (workarounds)

2021-08-29 Thread [email protected]


   >  -Original Message-
   >  From: Justin Pryzby 
   >  Sent: Sunday, August 29, 2021 23:17
   >  To: Pavel Stehule 
   >  Cc: [email protected]; Tom Lane ; Ranier
   >  Vilela ; Andrew Dunstan
   >  ; [email protected]
   >  Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2
   >  and 13.4 (workarounds)
   >  
   >  On Mon, Aug 30, 2021 at 04:43:23AM +0200, Pavel Stehule wrote:
   >  > po 30. 8. 2021 v 2:44 odesílatel [email protected] napsal:
   >  > > At this point, I am not sure how to proceed except to rethink that
   >  > > toFloat() function and many other places where we use exceptions.
   >  We
   >  > > get such dirty data that I need a "safe" way to convert a string to
   >  > > float without throwing an exception. BTW, I tried other
   >  combinations
   >  > > in case there may have been some weird interactions with the ::REAL
   >  > > conversion operator, but nothing made any change. Could you
   >  > > recommend another approach off the top of your head? I could use
   >  > > regexes for testing etc... Or maybe there is another option like a
   >  > > no-throw conversion that's built in or in some extension that you
   >  may know of? Like the "SAFE." Prefix in BigQuery.
   >  >
   >  > CREATE OR REPLACE FUNCTION safe_to_double_precision(t text)
   >  RETURNS
   >  > double precision AS $$ BEGIN
   >  >   IF $1 SIMILAR TO '[+-]?([0-9]*[.])?[0-9]+' THEN
   >  > RETURN $1::double precision;
   >  >   ELSE
   >  > RETURN NULL;
   >  >   END IF;
   >  > END;
   >  > $$ LANGUAGE plpgsql IMMUTABLE STRICT;
   >  
   >  This tries to use a regex to determine if something is a "Number" or not.
   >  Which has all the issues enumerated in painful detail by long answers on
   >  stack overflow, and other wiki/blog/forums.
   >  
   >  Rather than trying to define Numbers using regex, I'd try to avoid only
   >  the most frequent exceptions and get 90% of the performance back.  I
   >  don't know what your data looks like, but you might try things like this:
   >  
   >  IF $1 IS NULL THEN RETURN $2
   >  ELSE IF $1 ~ '^$' THEN RETURN $2
   >  ELSE IF $1 ~ '[[:alpha:]]{2}' THEN RETURN $2 ELSE IF $1 !~ '[[:digit:]]' 
THEN
   >  RETURN $2
   >  BEGIN
   > RETURN $1::float;
   >  EXCEPTION WHEN OTHERS THEN
   > RETURN $2;
   >  END;
   >  
   >  You can check the stackoverflow page for ideas as to what kind of thing
   >  to reject, but it may depend mostly on your data (what is the most
   >  common string?
   >  The most common exceptional string?).
   >  
   >  I think it's possible that could even be *faster* than the original, 
since it
   >  avoids the exception block for values which are for sure going to cause
   >  an exception anyway.  It might be that using alternation (|) is faster (if
   >  less
   >  readable) than using a handful of IF branches.
   >  
   >  --
   >  Justin

That's exactly where my head was at. I have looked different way to test for a 
floating point number and recognize the challenge 😊

The data is very messy with people entering data by hand. We have seen alpha 
and punctuation, people copy/pasting from excel so large numbers get the "e" 
notation. It's a total mess. The application that authors that data is a piece 
of crap and we have no chance to change it unfortunately. Short of rolling out 
an ETL process, which is painful for the way our data comes in, I need an in-db 
solution.

Thank you!
Laurent.