Re: Parallel Query - Can it be used within functions?

2018-02-28 Thread Michael Krüger
ah, and for completeness the simple python function I use for the test:

create or replace function reports.generic_query_python(_sql text)
  RETURNS SETOF record
  LANGUAGE 'plpythonu'
  PARALLEL SAFE
  COST 100
  VOLATILE
  ROWS 5000
AS $BODY$
return plpy.execute( _sql )
$BODY$;


Michael Krüger  schrieb am Mi., 28. Feb. 2018 um
09:05 Uhr:

> Ok, to close this thread. The problem is, that plpgsql function do seem to
> return data using a cursor. That fact is disabling parallel execution. So
> if we instead hand over the SQL to a function with e.g. a python body, then
> parallel execution is happening, because the data is first assembled in
> memory before it is returned, without using a cursor:
>
> mkrueger=# explain analyze select * from reports.generic_query_python($$
> select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where
> mediatrunkid in
> (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
> group by 1 $$) as foo (mediatrunkid bigint, count numeric);
> LOG:  0: duration: 35.158 ms  plan:
> Query Text:  select mediatrunkid,count(*)::numeric from
> reports.mediatrunkkpi where mediatrunkid in
> (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
> group by 1
> Partial HashAggregate  (cost=56693.72..56696.72 rows=300 width=16) (actual
> time=35.144..35.149 rows=17 loops=1)
>   Group Key: mediatrunkid
>   Buffers: shared hit=1641
>   ->  Parallel Bitmap Heap Scan on mediatrunkkpi  (cost=4525.01..56279.28
> rows=82889 width=8) (actual time=2.350..24.584 rows=63794 loops=1)
> Recheck Cond: (mediatrunkid = ANY
> ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
> Heap Blocks: exact=1641
> Buffers: shared hit=1641
> ->  Bitmap Index Scan on idx_mediatrunkkpi_trunk
> (cost=0.00..4475.27 rows=198933 width=0) (never executed)
>   Index Cond: (mediatrunkid = ANY
> ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
> LOCATION:  explain_ExecutorEnd, auto_explain.c:359
> LOG:  0: duration: 35.165 ms  plan:
> Query Text:  select mediatrunkid,count(*)::numeric from
> reports.mediatrunkkpi where mediatrunkid in
> (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
> group by 1
> Partial HashAggregate  (cost=56693.72..56696.72 rows=300 width=16) (actual
> time=35.152..35.157 rows=17 loops=1)
>   Group Key: mediatrunkid
>   Buffers: shared hit=1630
>   ->  Parallel Bitmap Heap Scan on mediatrunkkpi  (cost=4525.01..56279.28
> rows=82889 width=8) (actual time=2.364..24.702 rows=63272 loops=1)
> Recheck Cond: (mediatrunkid = ANY
> ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
> Heap Blocks: exact=1630
> Buffers: shared hit=1630
> ->  Bitmap Index Scan on idx_mediatrunkkpi_trunk
> (cost=0.00..4475.27 rows=198933 width=0) (never executed)
>   Index Cond: (mediatrunkid = ANY
> ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
> LOCATION:  explain_ExecutorEnd, auto_explain.c:359
> LOG:  0: duration: 47.855 ms  plan:
> Query Text:  select mediatrunkid,count(*)::numeric from
> reports.mediatrunkkpi where mediatrunkid in
> (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
> group by 1
> Finalize GroupAggregate  (cost=57784.41..57792.66 rows=300 width=40)
> (actual time=45.331..45.344 rows=17 loops=1)
>   Group Key: mediatrunkid
>   Buffers: shared hit=2735
>   ->  Sort  (cost=57784.41..57785.91 rows=600 width=16) (actual
> time=45.322..45.325 rows=51 loops=1)
> Sort Key: mediatrunkid
> Sort Method: quicksort  Memory: 27kB
> Buffers: shared hit=2735
> ->  Gather  (cost=57693.72..57756.72 rows=600 width=16) (actual
> time=45.270..45.295 rows=51 loops=1)
>   Workers Planned: 2
>   Workers Launched: 2
>   Buffers: shared hit=2735
>   ->  Partial HashAggregate  (cost=56693.72..56696.72 rows=300
> width=16) (actual time=38.387..38.391 rows=17 loops=3)
> Group 

Re: Parallel Query - Can it be used within functions?

2018-02-28 Thread Michael Krüger
1241 loops=1)
Index Cond: (mediatrunkid = ANY
('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
Buffers: shared hit=823
LOCATION:  explain_ExecutorEnd, auto_explain.c:359
LOG:  0: duration: 49.924 ms  plan:
Query Text: explain analyze select * from reports.generic_query_python($$
select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where
mediatrunkid in
(2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
group by 1 $$) as foo (mediatrunkid bigint, count numeric);
Function Scan on generic_query_python foo  (cost=0.25..50.25 rows=5000
width=40) (actual time=49.920..49.922 rows=17 loops=1)
  Buffers: shared hit=6388
LOCATION:  explain_ExecutorEnd, auto_explain.c:359
  QUERY PLAN
---
 Function Scan on generic_query_python foo  (cost=0.25..50.25 rows=5000
width=40) (actual time=49.920..49.922 rows=17 loops=1)
 Planning time: 0.029 ms
 Execution time: 49.977 ms
(3 rows)





Michael Krüger  schrieb am Fr., 16. Feb. 2018 um
11:42 Uhr:

> Dear all,
>
> still same behavior with Postgres 10.2 ...
>
> Just as a reminder that the issue still exists.
>
> Regards,
> Michael
>
> Andreas Kretschmer  schrieb am Di., 6. Feb. 2018
> um 08:35 Uhr:
>
>> Hi,
>>
>>
>> Am 06.02.2018 um 08:24 schrieb Michael Krüger:
>> > create or replace function reports.generic_query(_sql text)
>> >   RETURNS SETOF record
>> >   LANGUAGE 'plpgsql'
>> >   PARALLEL SAFE
>> >   COST 100
>>
>> there is an other parameter, parallel_setup_cost, with default = 1000. I
>> think, you should set this parameter too.
>>
>> Please keep me informed, it is interessting me.
>>
>>
>> Regards, Andreas
>>
>> --
>> 2ndQuadrant - The PostgreSQL Support Company.
>> www.2ndQuadrant.com
>>
>>
>>


Postgres C-API: How to get the Oid for a custom type defined in a schema outside of the current search path

2020-12-03 Thread Michael Krüger
Dear all,

I need to find out the Oid of a custom type in C programming language. This 
type is defined in a schema, outside of the standard search path, e.g. public.
I know that I can call TypenameGetTypid to find out an Oid. But this is not 
working for me if the type is not in the public namespace.
How can I query a non-public Oid? Do I have to change the search path? If yes 
how do I do it? As mentioned I need to do this with the C interface. 

Regards,
Michael

Email:  michael@kruegers.email


Re: Postgres C-API: How to get the Oid for a custom type defined in a schema outside of the current search path

2020-12-03 Thread Michael Krüger
Hi,

> Am 03.12.2020 um 11:01 schrieb Pavel Stehule :
> 
> Hi
> 
> čt 3. 12. 2020 v 10:32 odesílatel Michael Krüger  
> napsal:
> Dear all,
> 
> I need to find out the Oid of a custom type in C programming language. This 
> type is defined in a schema, outside of the standard search path, e.g. public.
> I know that I can call TypenameGetTypid to find out an Oid. But this is not 
> working for me if the type is not in the public namespace.
> How can I query a non-public Oid? Do I have to change the search path? If yes 
> how do I do it? As mentioned I need to do this with the C interface. 
> 
> I looking to source code
> 
> Oid
> TypenameGetTypidExtended(const char *typname, bool temp_ok)
> {
> <-->Oid><--><-->typid;
> <-->ListCell   *l;
> 
> <-->recomputeNamespacePath();
> 
> <-->foreach(l, activeSearchPath)
> <-->{
> <--><-->Oid><--><-->namespaceId = lfirst_oid(l);
> 
> <--><-->if (!temp_ok && namespaceId == myTempNamespace)
> <--><--><-->continue;<-><--><-->/* do not look in temp namespace */
> 
> <--><-->typid = GetSysCacheOid2(TYPENAMENSP, Anum_pg_type_oid,
> <--><--><--><--><--><--><--><-->PointerGetDatum(typname),
> <--><--><--><--><--><--><--><-->ObjectIdGetDatum(namespaceId));
> <--><-->if (OidIsValid(typid))
> <--><--><-->return typid;
> <-->}
> 
> <-->/* Not found in path */
> <-->return InvalidOid;
> }
> 
> You can get the value from sys cache by call GetSysCacheOid2. It can looks 
> like
> 
> <--><-->char<-->   *schemaname;
> <--><-->char<-->   *typname;
> <--><--><-->namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
> <--><--><-->if (OidIsValid(namespaceId))
> <--><--><--><-->typoid = GetSysCacheOid2(TYPENAMENSP, Anum_pg_type_oid,
> <--><--><--><--><--><--><--><--><--><--> PointerGetDatum(typname),
> <--><--><--><--><--><--><--><--><--><--> ObjectIdGetDatum(namespaceId));
> <--><--><-->else
> <--><--><--><-->typoid = InvalidOid;
> 

Thank you so much. That worked.

Regards,
Michael

Slow alter sequence with PG10.1

2018-01-22 Thread Michael Krüger
Dear community,

I'm using PG10.1 on CentOS Linux release 7.4.1708 (Core) after upgrading it
from PG9.6.6. My application heavily uses sequences and requires different
increments of sequence numbers, e.g. a range of 100, 1000 or 5000 numbers,
so it is not possible to set a fixed increment on a sequence that can be
used by my application.

With PG10.1 the performance has dropped seriously so that my application
becomes unusable. After investigating different aspects, I was able to
isolate the issue to be related to the sequences in Postgres 10.1.

Below shows a simple test script showing the problem:
-- 1) Create a sequence
CREATE SEQUENCE my_sequence_1 INCREMENT BY 1 MINVALUE 1 NO MAXVALUE START
WITH 1 CYCLE;

-- 2) Create a function that allows to request a number range
CREATE OR REPLACE FUNCTION multi_nextval(
use_seqname text,
use_increment integer)
RETURNS bigint
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
reply int8;
lock_id int4;
BEGIN
SELECT oid::int4 INTO lock_id FROM pg_class WHERE relname =
split_part(use_seqname, '.', 2);
perform pg_advisory_lock(lock_id);
execute 'ALTER SEQUENCE ' || use_seqname || ' INCREMENT BY ' ||
use_increment::text;
reply := nextval(use_seqname);
execute 'ALTER SEQUENCE ' || use_seqname || ' INCREMENT BY 1';
perform pg_advisory_unlock(lock_id);
return reply - use_increment + 1;
END;
$BODY$;

-- 3) Loop 2 times and request 5000 values each time
DO $$
DECLARE
--
  i_index integer;
  i_value bigint;
BEGIN
  FOR i_index IN select * from generate_series(1,2,1)
  LOOP
SELECT multi_nextval('my_sequence_1',5000) INTO i_value ;
if (i_index % 250 = 0) THEN
  raise notice 'Loop: % - NextVal: %', i_index, i_value;
end if;
  END LOOP;
END$$;

On my computer I tried this code on PG9.6.6 and it executed in roughly 3
seconds.
When running it on PG10.1 it takes over 7 minutes.

Further investigation showed that the problem is related to ALTER
SEQUENCE...

I can't believe that PG10.1 was changed that dramatically without providing
a workaround or a way to switch to the old PG9.6 performance, at least I
can't find anything in the documentation.

Is this a bug?

Thanks in advance,
Michael






-- 
Email:   michael@kruegers.email
Mobile: 0152 5891 8787


Re: Slow alter sequence with PG10.1

2018-01-22 Thread Michael Krüger
Hi Adrian and all,

I do not fully understand the reasoning of making sequences transactional
in the first place. As far as I know its also not done on Oracle or SQL
Server, but maybe I'm even wrong on that. What I question is a change in
behavior of existing functionality with such big impact, without config
option to restore old behavior, or maybe to have another variant in place
like:

create sequence [non] [transactional] to at least be able to decide which
variant to use. Maintain the performance of the old behavior, or if
transactions safety is needed, the new behavior with the performance impact
but more safety if needed.

I will try if Adrians proposal does the trick for my application. Sounds
promising, thanks.

Regards,
Michael

Adrian Klaver  schrieb am Mo., 22. Jan. 2018 um
22:29 Uhr:

> On 01/22/2018 07:24 AM, Michael Krüger wrote:
> > Dear community,
> >
> > I'm using PG10.1 on CentOS Linux release 7.4.1708 (Core) after upgrading
> > it from PG9.6.6. My application heavily uses sequences and requires
> > different increments of sequence numbers, e.g. a range of 100, 1000 or
> > 5000 numbers, so it is not possible to set a fixed increment on a
> > sequence that can be used by my application.
> >
> > With PG10.1 the performance has dropped seriously so that my application
> > becomes unusable. After investigating different aspects, I was able to
> > isolate the issue to be related to the sequences in Postgres 10.1.
> >
> > Below shows a simple test script showing the problem:
> > -- 1) Create a sequence
> > CREATE SEQUENCE my_sequence_1 INCREMENT BY 1 MINVALUE 1 NO MAXVALUE
> > START WITH 1 CYCLE;
> >
> > -- 2) Create a function that allows to request a number range
> > CREATE OR REPLACE FUNCTION multi_nextval(
> > use_seqname text,
> > use_increment integer)
> >  RETURNS bigint
> >  LANGUAGE 'plpgsql'
> >  COST 100
> >  VOLATILE
> > AS $BODY$
> > DECLARE
> >  reply int8;
> >  lock_id int4;
> > BEGIN
> >  SELECT oid::int4 INTO lock_id FROM pg_class WHERE relname =
> > split_part(use_seqname, '.', 2);
> >  perform pg_advisory_lock(lock_id);
> >  execute 'ALTER SEQUENCE ' || use_seqname || ' INCREMENT BY ' ||
> > use_increment::text;
> >  reply := nextval(use_seqname);
> >  execute 'ALTER SEQUENCE ' || use_seqname || ' INCREMENT BY 1';
> >  perform pg_advisory_unlock(lock_id);
> >  return reply - use_increment + 1;
> > END;
> > $BODY$;
>
> Not entirely sure I understand how the batching above works, still maybe
> something like this:
>
> CREATE OR REPLACE FUNCTION public.multi_nextval(use_seqname text,
> use_increment integer)
>   RETURNS bigint
>   LANGUAGE plpgsql
> AS $function$
> DECLARE
>  reply int8;
>  lock_id int4;
>  seq_idx int8 :=nextval(use_seqname);
> BEGIN
>  SELECT oid::int4 INTO lock_id FROM pg_class WHERE relname =
> split_part(use_seqname, '.', 2);
>  perform pg_advisory_lock(lock_id);
>  perform setval(use_seqname, seq_idx + use_increment, 't');
>  reply := nextval(use_seqname);
>  perform pg_advisory_unlock(lock_id);
>  return reply;
> END;
> $function$
>
> On an older laptop this does the 2 loops in about 1.6 secs.
>
> >
> > -- 3) Loop 2 times and request 5000 values each time
> > DO $$
> > DECLARE
> > --
> >i_index integer;
> >i_value bigint;
> > BEGIN
> >FOR i_index IN select * from generate_series(1,2,1)
> >LOOP
> >  SELECT multi_nextval('my_sequence_1',5000) INTO i_value ;
> >  if (i_index % 250 = 0) THEN
> >raise notice 'Loop: % - NextVal: %', i_index, i_value;
> >  end if;
> >END LOOP;
> > END$$;
> >
> > On my computer I tried this code on PG9.6.6 and it executed in roughly 3
> > seconds.
> > When running it on PG10.1 it takes over 7 minutes.
> >
> > Further investigation showed that the problem is related to ALTER
> > SEQUENCE...
> >
> > I can't believe that PG10.1 was changed that dramatically without
> > providing a workaround or a way to switch to the old PG9.6 performance,
> > at least I can't find anything in the documentation.
> >
> > Is this a bug?
> >
> > Thanks in advance,
> > Michael
> >
> >
> >
> >
> >
> >
> > --
> > Email:   michael@kruegers.email
> > Mobile: 0152 5891 8787
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Slow alter sequence with PG10.1

2018-01-22 Thread Michael Krüger
Hello all,

after changing the function to this:

CREATE OR REPLACE FUNCTION multi_nextval(
use_seqname text,
use_increment integer)
RETURNS bigint
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
reply int8;
lock_id int4;
seq_idx int8;
BEGIN
SELECT oid::int4 INTO lock_id FROM pg_class WHERE relname =
split_part(use_seqname, '.', 2);
perform pg_advisory_lock(lock_id);

seq_idx :=nextval(use_seqname);
perform setval(use_seqname, seq_idx + use_increment - 1, 't');

perform pg_advisory_unlock(lock_id);
return seq_idx;
END;
$BODY$;

I do get a total execution time of Time: 5922,428 ms (00:05,922) - much
better than before.

Is there any drawback to use setval compared to the ALTER SEQUENCE which I
have used before? If not, then this could be the work around to go with as
it has a similar performance to the original function.

I guess - the reason I'm a bit disappointed from the new behavior is that
we have used Postgresql for more than 10 years and it has never let us
down. We have been able to improve our product with every new release of
Postgres. This is the first time for me that a new release of Postgres
caused some severe headaches among our customers.
If you all agree that this changed function should be equivalent to the
original one, then its at least an easy fix.

Thank you all for your fast responses.

Regards,
Michael

Michael Krüger  schrieb am Mo., 22. Jan. 2018 um
23:11 Uhr:

> Hi Adrian and all,
>
> I do not fully understand the reasoning of making sequences transactional
> in the first place. As far as I know its also not done on Oracle or SQL
> Server, but maybe I'm even wrong on that. What I question is a change in
> behavior of existing functionality with such big impact, without config
> option to restore old behavior, or maybe to have another variant in place
> like:
>
> create sequence [non] [transactional] to at least be able to decide which
> variant to use. Maintain the performance of the old behavior, or if
> transactions safety is needed, the new behavior with the performance impact
> but more safety if needed.
>
> I will try if Adrians proposal does the trick for my application. Sounds
> promising, thanks.
>
> Regards,
> Michael
>
> Adrian Klaver  schrieb am Mo., 22. Jan. 2018
> um 22:29 Uhr:
>
>> On 01/22/2018 07:24 AM, Michael Krüger wrote:
>> > Dear community,
>> >
>> > I'm using PG10.1 on CentOS Linux release 7.4.1708 (Core) after upgrading
>> > it from PG9.6.6. My application heavily uses sequences and requires
>> > different increments of sequence numbers, e.g. a range of 100, 1000 or
>> > 5000 numbers, so it is not possible to set a fixed increment on a
>> > sequence that can be used by my application.
>> >
>> > With PG10.1 the performance has dropped seriously so that my application
>> > becomes unusable. After investigating different aspects, I was able to
>> > isolate the issue to be related to the sequences in Postgres 10.1.
>> >
>> > Below shows a simple test script showing the problem:
>> > -- 1) Create a sequence
>> > CREATE SEQUENCE my_sequence_1 INCREMENT BY 1 MINVALUE 1 NO MAXVALUE
>> > START WITH 1 CYCLE;
>> >
>> > -- 2) Create a function that allows to request a number range
>> > CREATE OR REPLACE FUNCTION multi_nextval(
>> > use_seqname text,
>> > use_increment integer)
>> >  RETURNS bigint
>> >  LANGUAGE 'plpgsql'
>> >  COST 100
>> >  VOLATILE
>> > AS $BODY$
>> > DECLARE
>> >  reply int8;
>> >  lock_id int4;
>> > BEGIN
>> >  SELECT oid::int4 INTO lock_id FROM pg_class WHERE relname =
>> > split_part(use_seqname, '.', 2);
>> >  perform pg_advisory_lock(lock_id);
>> >  execute 'ALTER SEQUENCE ' || use_seqname || ' INCREMENT BY ' ||
>> > use_increment::text;
>> >  reply := nextval(use_seqname);
>> >  execute 'ALTER SEQUENCE ' || use_seqname || ' INCREMENT BY 1';
>> >  perform pg_advisory_unlock(lock_id);
>> >  return reply - use_increment + 1;
>> > END;
>> > $BODY$;
>>
>> Not entirely sure I understand how the batching above works, still maybe
>> something like this:
>>
>> CREATE OR REPLACE FUNCTION public.multi_nextval(use_seqname text,
>> use_increment integer)
>>   RETURNS bigint
>>   LANGUAGE plpgsql
>> AS $function$
>> DECLARE
>>  reply int8;
>>  lock_id int4;
>>  seq_idx int8 :=nextval(use_seqname);
>> BEGIN
>

Re: Slow alter sequence with PG10.1

2018-01-22 Thread Michael Krüger
Hello all,

I think a good alternative was found and seems to be working fine. I really
do appreciate all the help and feedback.
Many thanks.

Regards,
Michael

Adrian Klaver  schrieb am Di., 23. Jan. 2018 um
02:12 Uhr:

> On 01/22/2018 02:47 PM, Michael Krüger wrote:
> > Hello all,
> >
> > after changing the function to this:
> >
> > CREATE OR REPLACE FUNCTION multi_nextval(
> > use_seqname text,
> > use_increment integer)
> >  RETURNS bigint
> >  LANGUAGE 'plpgsql'
> >  COST 100
> >  VOLATILE
> > AS $BODY$
> > DECLARE
> >  reply int8;
> >  lock_id int4;
> >  seq_idx int8;
> > BEGIN
> >  SELECT oid::int4 INTO lock_id FROM pg_class WHERE relname =
> > split_part(use_seqname, '.', 2);
> >  perform pg_advisory_lock(lock_id);
> >
> >  seq_idx :=nextval(use_seqname);
> >  perform setval(use_seqname, seq_idx + use_increment - 1, 't');
> >
> >  perform pg_advisory_unlock(lock_id);
> >  return seq_idx;
> > END;
> > $BODY$;
> >
> > I do get a total execution time of Time: 5922,428 ms (00:05,922) - much
> > better than before.
> >
> > Is there any drawback to use setval compared to the ALTER SEQUENCE which
> > I have used before? If not, then this could be the work around to go
> > with as it has a similar performance to the original function.
> >
> > I guess - the reason I'm a bit disappointed from the new behavior is
> > that we have used Postgresql for more than 10 years and it has never let
> > us down. We have been able to improve our product with every new release
>
> Well the nature of major version releases is that they can break
> backwards compatibility. This is one of the reasons there is 5 year
> community support on versions, time to develop a migration plan. I have
> been caught by changes, before e.g. the 8.3 change in casting rules, a
> later change that made plpythonu use Python rules for truthfulness
> instead of SQL, etc. You seem to have had a run of good luck. Going
> forward I would assume a major release will contain breaking changes and
> test thoroughly.
>
> > of Postgres. This is the first time for me that a new release of
> > Postgres caused some severe headaches among our customers.
> > If you all agree that this changed function should be equivalent to the
> > original one, then its at least an easy fix.
> >
> > Thank you all for your fast responses.
> >
> > Regards,
> > Michael
> >
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Parallel Query - Can it be used within functions?

2018-02-05 Thread Michael Krüger
Dear community,

I need a bit of advice on parallel query execution.
I have a table with roughly 2 million rows in it. These rows are
distributed over a set of IDs. Therefore I can group my rows based on the
ID set.

If I query the table directly like this:

select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where
mediatrunkid in
(2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
group by 1;
 mediatrunkid | count
--+---
   409612 | 11862
   409613 | 11862
  2101260 | 11860
  2101261 | 11882
  2101263 | 11809
  2109452 | 11751
  2117644 | 11833
  2125836 | 11832
  2125839 | 11852
  2125841 | 11882
  2129932 | 11882
  2129933 | 11854
  2134028 | 11718
  2138124 | 11850
  2142220 | 11861
  2154508 | 11825
  2187276 | 11826
(17 rows)

Time: 44,056 ms

I see from the query plan that it executed nicely in parallel using two
workers:
2018-02-06 08:18:47.381 CET [18898] LOG:  duration: 43.072 ms  plan:
Query Text: select mediatrunkid,count(*)::numeric from
reports.mediatrunkkpi where mediatrunkid in
(2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
group by 1;
Finalize GroupAggregate  (cost=57784.41..57792.66 rows=300 width=40)
(actual time=40.583..40.596 rows=17 loops=1)
  Group Key: mediatrunkid
  Buffers: shared hit=3076
  ->  Sort  (cost=57784.41..57785.91 rows=600 width=16) (actual
time=40.578..40.582 rows=51 loops=1)
Sort Key: mediatrunkid
Sort Method: quicksort  Memory: 27kB
Buffers: shared hit=3076
->  Gather  (cost=57693.72..57756.72 rows=600 width=16) (actual
time=40.534..40.561 rows=51 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=3076
  ->  Partial HashAggregate  (cost=56693.72..56696.72 rows=300
width=16) (actual time=33.901..33.909 rows=17 loops=3)
Group Key: mediatrunkid
Buffers: shared hit=6006
->  Parallel Bitmap Heap Scan on mediatrunkkpi
(cost=4525.01..56279.28 rows=82889 width=8) (actual time=4.575..23.145
rows=67080 loops=3)
  Recheck Cond: (mediatrunkid = ANY
('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
  Heap Blocks: exact=2253
  Buffers: shared hit=6006
  ->  Bitmap Index Scan on idx_mediatrunkkpi_trunk
(cost=0.00..4475.27 rows=198933 width=0) (actual time=10.057..10.057
rows=201241 loops=1)
Index Cond: (mediatrunkid = ANY
('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
Buffers: shared hit=823

But when I do the very same within a function:

create or replace function reports.generic_query(_sql text)
  RETURNS SETOF record
  LANGUAGE 'plpgsql'
  PARALLEL SAFE
  COST 100
  SET "TimeZone"='utc'
  STABLE
  ROWS 1
AS $BODY$
DECLARE
  --
BEGIN
  RETURN QUERY EXECUTE _sql;
END
$BODY$;

select * from reports.generic_query($$ select
mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where
mediatrunkid in
(2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
group by 1 $$) as foo (trunkid bigint, count numeric);
 trunkid | count
-+---
 2109452 | 11751
  409613 | 11862
 2125839 | 11852
 2125841 | 11882
 2142220 | 11861
 2117644 | 11833
 2154508 | 11825
 2134028 | 11718
 2101263 | 11809
 2101261 | 11882
 2129933 | 11854
 2129932 | 11882
 2125836 | 11832
 2138124 | 11850
  409612 | 11862
 2187276 | 11826
 2101260 | 11860
(17 rows)

Time: 86,275 ms

the execution time almost doubles, and the query no longer is considered
for parallel execution:

2018-02-06 08:20:30.553 CET [18898] LOG:  duration: 85.358 ms  plan:
Query Text:  select mediatrunkid,count(*)::numeric from
reports.mediatrunkkpi where mediatrunkid in
(2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
group by 1
HashAggregate  (cost=60900.33..60904.08 rows=300 width=40) (actual
time=85.341..85.348 rows=17 loops=1)
  Group Key: mediatrunkid
  Buffers: shared hit=6006
  ->  Bitmap Heap Scan on mediatrunkkpi  (cost=4525.01..59905.66
rows=198933 width=8) (actual time=9.084..39.655 rows=201241 loops=1)
Recheck Cond: (mediatrunkid = ANY
('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
Heap Blocks: exact=5183
Buffers: shared hit=6006
->  Bitm

Re: Parallel Query - Can it be used within functions?

2018-02-16 Thread Michael Krüger
Dear all,

still same behavior with Postgres 10.2 ...

Just as a reminder that the issue still exists.

Regards,
Michael

Andreas Kretschmer  schrieb am Di., 6. Feb. 2018
um 08:35 Uhr:

> Hi,
>
>
> Am 06.02.2018 um 08:24 schrieb Michael Krüger:
> > create or replace function reports.generic_query(_sql text)
> >   RETURNS SETOF record
> >   LANGUAGE 'plpgsql'
> >   PARALLEL SAFE
> >   COST 100
>
> there is an other parameter, parallel_setup_cost, with default = 1000. I
> think, you should set this parameter too.
>
> Please keep me informed, it is interessting me.
>
>
> Regards, Andreas
>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>
>