Re: new stored procedure with OUT parameters
Hi út 11. 12. 2018 v 7:20 odesílatel Anton Shen <4175geo...@gmail.com> napsal: > Hi all, > > I was playing around with the stored procedure support in v11 and found > that pure OUT parameters are not supported. Is there any reason we only > support INOUT but not OUT parameters? > The procedure implementation in v11 is initial stage - only functionality with some simple implementation or without design issues was implemented. If I remember there was not clean what is correct and expected behave of usage of OUT variable when it is called from SQL environment, and when it is called from plpgsql. On Oracle - the OUT variables are part of procedure signature - you can write procedures P1(OUT a int), P1(OUT a text). Currently we have not a variables in SQL environment. So if Peter implemented OUT variables now then a) only IN parameters will be part of signature - like functions - but it is different than on Oracle, and we lost a possibility to use interesting feature b) the procedures with OUT variables will not be callable from SQL environment - that be messy for users. c) disallow it. I hope so PostgreSQL 12 will have schema variables, and then we can implement OUT variables. Now, it is not possible (do it most correct) due missing some other feature. INOUT parameters are good enough, and we have opened door for future correct design. Regards Pavel > > psql (11.0 (Homebrew petere/postgresql)) > dev=# CREATE PROCEDURE test_sp(a OUT int) LANGUAGE plpgsql AS $$ > dev$# BEGIN > dev$# a = 5; > dev$# END; $$; > ERROR: procedures cannot have OUT arguments > HINT: INOUT arguments are permitted. > > Thanks, > Anton >
simple query on why a merge join plan got selected
Hey Guys, I was just playing with exploring joins and plans i came across this create table t1(a int); create table t2(a int); insert into t1 select (x % 10) from generate_series(1, 10) x; insert into t2 select (x % 100) from generate_series(1, 10) x; pgtesting=> analyze t1; ANALYZE pgtesting=> analyze t2; ANALYZE this is reproducible the below query by default makes use of merge join (which takes way longer to return rows as compared to when i explicitly disable merge join it returns in half the time taken by merge join) but i am not able to figure out why, although i have run analyze on the tables. pgtesting=> explain (analyze, buffers) select * from t1 join t2 using (a); QUERY PLAN --- Merge Join (cost=19495.64..1039705.09 rows=97241600 width=4) (actual time=124.153..22243.262 rows=1 loops=1) Merge Cond: (t1.a = t2.a) Buffers: shared hit=886, temp read=320384 written=616 -> Sort (cost=9747.82..9997.82 rows=10 width=4) (actual time=56.442..81.611 rows=10 loops=1) Sort Key: t1.a Sort Method: external merge Disk: 1376kB Buffers: shared hit=443, temp read=172 written=173 -> Seq Scan on t1 (cost=0.00..1443.00 rows=10 width=4) (actual time=0.030..10.003 rows=10 loops=1) Buffers: shared hit=443 -> Sort (cost=9747.82..9997.82 rows=10 width=4) (actual time=67.702..9469.366 rows=10001 loops=1) Sort Key: t2.a Sort Method: external sort Disk: 1768kB Buffers: shared hit=443, temp read=220222 written=443 -> Seq Scan on t2 (cost=0.00..1443.00 rows=10 width=4) (actual time=0.013..8.186 rows=10 loops=1) Buffers: shared hit=443 Planning time: 0.402 ms Execution time: 26093.192 ms (17 rows) pgtesting=> set enable_mergejoin TO FALSE; SET pgtesting=> explain (analyze, buffers) select * from t1 join t2 using (a); QUERY PLAN - Hash Join (cost=3084.00..1117491.00 rows=97241600 width=4) (actual time=26.893..10229.924 rows=1 loops=1) Hash Cond: (t1.a = t2.a) Buffers: shared hit=889, temp read=273 written=271 -> Seq Scan on t1 (cost=0.00..1443.00 rows=10 width=4) (actual time=0.028..18.123 rows=10 loops=1) Buffers: shared hit=443 -> Hash (cost=1443.00..1443.00 rows=10 width=4) (actual time=26.255..26.255 rows=10 loops=1) Buckets: 131072 Batches: 2 Memory Usage: 2713kB Buffers: shared hit=443, temp written=152 -> Seq Scan on t2 (cost=0.00..1443.00 rows=10 width=4) (actual time=0.017..9.163 rows=10 loops=1) Buffers: shared hit=443 Planning time: 0.099 ms Execution time: 14095.975 ms (12 rows) pgtesting=> show work_mem; work_mem -- 4MB (1 row) pgtesting=> show shared_buffers; shared_buffers 1GB (1 row) pgtesting=> select version(); version - PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit (1 row) May be i am missing something way obvious :) but my only concern being high cardinality joins may not use the statistics correctly? Regards, Vijay
Re: simple query on why a merge join plan got selected
I'd run each query multiple times -- before and after disabling mergejoin -- to ensure that all the data is safely in RAM. On 12/15/2018 02:13 PM, Vijaykumar Jain wrote: Hey Guys, I was just playing with exploring joins and plans i came across this create table t1(a int); create table t2(a int); insert into t1 select (x % 10) from generate_series(1, 10) x; insert into t2 select (x % 100) from generate_series(1, 10) x; pgtesting=> analyze t1; ANALYZE pgtesting=> analyze t2; ANALYZE this is reproducible the below query by default makes use of merge join (which takes way longer to return rows as compared to when i explicitly disable merge join it returns in half the time taken by merge join) but i am not able to figure out why, although i have run analyze on the tables. pgtesting=> explain (analyze, buffers) select * from t1 join t2 using (a); QUERY PLAN --- Merge Join (cost=19495.64..1039705.09 rows=97241600 width=4) (actual time=124.153..22243.262 rows=1 loops=1) Merge Cond: (t1.a = t2.a) Buffers: shared hit=886, temp read=320384 written=616 -> Sort (cost=9747.82..9997.82 rows=10 width=4) (actual time=56.442..81.611 rows=10 loops=1) Sort Key: t1.a Sort Method: external merge Disk: 1376kB Buffers: shared hit=443, temp read=172 written=173 -> Seq Scan on t1 (cost=0.00..1443.00 rows=10 width=4) (actual time=0.030..10.003 rows=10 loops=1) Buffers: shared hit=443 -> Sort (cost=9747.82..9997.82 rows=10 width=4) (actual time=67.702..9469.366 rows=10001 loops=1) Sort Key: t2.a Sort Method: external sort Disk: 1768kB Buffers: shared hit=443, temp read=220222 written=443 -> Seq Scan on t2 (cost=0.00..1443.00 rows=10 width=4) (actual time=0.013..8.186 rows=10 loops=1) Buffers: shared hit=443 Planning time: 0.402 ms Execution time: 26093.192 ms (17 rows) pgtesting=> set enable_mergejoin TO FALSE; SET pgtesting=> explain (analyze, buffers) select * from t1 join t2 using (a); QUERY PLAN - Hash Join (cost=3084.00..1117491.00 rows=97241600 width=4) (actual time=26.893..10229.924 rows=1 loops=1) Hash Cond: (t1.a = t2.a) Buffers: shared hit=889, temp read=273 written=271 -> Seq Scan on t1 (cost=0.00..1443.00 rows=10 width=4) (actual time=0.028..18.123 rows=10 loops=1) Buffers: shared hit=443 -> Hash (cost=1443.00..1443.00 rows=10 width=4) (actual time=26.255..26.255 rows=10 loops=1) Buckets: 131072 Batches: 2 Memory Usage: 2713kB Buffers: shared hit=443, temp written=152 -> Seq Scan on t2 (cost=0.00..1443.00 rows=10 width=4) (actual time=0.017..9.163 rows=10 loops=1) Buffers: shared hit=443 Planning time: 0.099 ms Execution time: 14095.975 ms (12 rows) pgtesting=> show work_mem; work_mem -- 4MB (1 row) pgtesting=> show shared_buffers; shared_buffers 1GB (1 row) pgtesting=> select version(); version - PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit (1 row) May be i am missing something way obvious :) but my only concern being high cardinality joins may not use the statistics correctly? Regards, Vijay -- Angular momentum makes the world go 'round.
date_trunc not immutable
Version: PG 10.6 on AWS Linux. I am trying to create an index on function date_trunc('month',timestamp) PG is complaining that the function must be marked as IMMUTABLE. So I assume that date_trunc is not marked as immutable. Definition of immutable from PG documentation All functions and operators used in an index definition must be "immutable", that is, their results must depend only on their arguments and never on any outside influence (such as the contents of another table or the current time). This restriction ensures that the behavior of the index is well-defined. To use a user-defined function in an index expression or WHERE clause, remember to mark the function immutable when you create it.=== What am I missing? date_trunc will always return the same value for a given value. Not sure how I can mark a PG function as immutable.
Re: date_trunc not immutable
On 12/15/18 3:26 PM, Ravi Krishna wrote: Version: PG 10.6 on AWS Linux. I am trying to create an index on function date_trunc('month',timestamp) PG is complaining that the function must be marked as IMMUTABLE. So I assume that date_trunc is not marked as immutable. Definition of immutable from PG documentation All functions and operators used in an index definition must be "immutable", that is, their results must depend only on their arguments and never on any outside influence (such as the contents of another table or the current time). This restriction ensures that the behavior of the index is well-defined. To use a user-defined function in an index expression or WHERE clause, remember to mark the function immutable when you create it. === What am I missing? date_trunc will always return the same value for a given value. Not sure how I can mark a PG function as immutable. No it won't: show timezone; TimeZone US/Pacific select date_trunc('hour', now()); date_trunc 2018-12-15 15:00:00-08 set timezone='UTC'; select date_trunc('hour', now()); date_trunc 2018-12-15 23:00:00+00 -- Adrian Klaver adrian.kla...@aklaver.com
Re: simple query on why a merge join plan got selected
Vijaykumar Jain writes: > I was just playing with exploring joins and plans i came across this > create table t1(a int); > create table t2(a int); > insert into t1 select (x % 10) from generate_series(1, 10) x; > insert into t2 select (x % 100) from generate_series(1, 10) x; > ... > select * from t1 join t2 using (a); Hm. This is a fairly extreme case for mergejoining. In the first place, because of the disparity in the key ranges (t1.a goes from 0..9, t2.a from 0..99) the planner can figure out that a merge join can stop after scanning only 10% of t2. That doesn't help much here, since we still have to sort all of t2, but nonetheless the planner is going to take that into account. In the second place, because you have so many duplicate values, most rows in t1 will require "rescanning" 1000 rows that were already read and joined to the previous row of t1 (assuming t1 is on the left of the join; it's worse if t2 is on the left). The planner estimates each of those situations properly, but it looks to me like it is not handling the combination of both effects correctly. In costsize.c we've got /* * The number of tuple comparisons needed is approximately number of outer * rows plus number of inner rows plus number of rescanned tuples (can we * refine this?). At each one, we need to evaluate the mergejoin quals. */ startup_cost += merge_qual_cost.startup; startup_cost += merge_qual_cost.per_tuple * (outer_skip_rows + inner_skip_rows * rescanratio); run_cost += merge_qual_cost.per_tuple * ((outer_rows - outer_skip_rows) + (inner_rows - inner_skip_rows) * rescanratio); where outer_rows and inner_rows are the numbers of rows we're predicting to actually read from each input, the xxx_skip_rows values are zero for this example, and rescanratio was previously computed as /* We'll inflate various costs this much to account for rescanning */ rescanratio = 1.0 + (rescannedtuples / inner_path_rows); where inner_path_rows is the *total* size of the inner relation, including rows that we're predicting won't get read because of the stop-short effect. As far as I can tell, that comment's claim about the number of tuple comparisons needed is on-target ... but the code is computing a number of tuple comparisons 10x less than that. The reason is that rescanratio is wrong: it should be rescanratio = 1.0 + (rescannedtuples / inner_rows); instead, so that it's something that makes sense to multiply inner_rows by. In the existing uses of rescanratio, one multiplies it by inner_path_rows and needs to be changed to inner_rows to agree with this definition, but the other uses are already consistent with this. This doesn't make a significant difference if either rescannedtuples is small, or inner_rows isn't much less than inner_path_rows. But when neither is true, we can greatly underestimate the number of tuple comparisons we'll have to do, as well as the number of re-fetches from the inner plan node. I think in practice it doesn't matter that often, because in such situations we'd usually not have picked a mergejoin anyway. But in your example the buggy mergejoin cost estimate is about 10% less than the hashjoin cost estimate, so we go with mergejoin. The attached proposed patch fixes this, raising the mergejoin cost estimate to about 35% more than the hashjoin estimate, which seems a lot closer to reality. It doesn't seem to change any results in the regression tests, which I find unsurprising: there are cases like this in the tests, but as I just said, they pick hashjoins already. Also interesting is that after this fix, the estimated costs of a mergejoin for this example are about the same whether t1 or t2 is on the left. I think that's right: t2-on-the-left has 10x more rescanning to do per outer tuple, but it stops after scanning only 10% of the outer relation, canceling that out. I'm not sure whether to back-patch this. It's a pretty clear thinko, but there's the question of whether we'd risk destabilizing plan choices that are working OK in the real world. regards, tom lane diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 7bf67a0..480fd25 100644 *** a/src/backend/optimizer/path/costsize.c --- b/src/backend/optimizer/path/costsize.c *** final_cost_mergejoin(PlannerInfo *root, *** 2941,2948 if (rescannedtuples < 0) rescannedtuples = 0; } ! /* We'll inflate various costs this much to account for rescanning */ ! rescanratio = 1.0 + (rescannedtuples / inner_path_rows); /* * Decide whether we want to materialize the inner input to shield it from --- 2941,2953 if (rescannedtuples < 0) rescannedtuples = 0; } ! ! /* ! * We'll inflate various costs this much to account for rescanning. Note ! * that this is to be multiplied by something involving inner_rows, or ! * another number related to
Re: date_trunc not immutable
Ravi Krishna writes: > I am trying to create an index on function date_trunc('month',timestamp) > PG is complaining that the function must be marked as IMMUTABLE. The timestamptz version of it is not immutable, because its effects depend on the timezone setting: regression=# set timezone = 'America/New_York'; SET regression=# select date_trunc('month', now()); date_trunc 2018-12-01 00:00:00-05 (1 row) regression=# set timezone = 'Europe/Paris'; SET regression=# select date_trunc('month', now()); date_trunc 2018-12-01 00:00:00+01 (1 row) If you want immutability, you need to be working with timestamp-without-tz or date input, so that timezone isn't a factor. regards, tom lane
Re: date_trunc not immutable
On 2018-12-15, Adrian Klaver wrote: > On 12/15/18 3:26 PM, Ravi Krishna wrote: >> Version: PG 10.6 on AWS Linux. >> >> I am trying to create an index on function date_trunc('month',timestamp) >> >> PG is complaining that the function must be marked as IMMUTABLE. So I >> assume that date_trunc is not marked as immutable. >> >> Definition of immutable from PG documentation >> >> >> All functions and operators used in an index definition must be >> "immutable", that is, their results must depend only on their arguments >> and never on any outside influence (such as the contents of another >> table or the current time). This restriction ensures that the behavior >> of the index is well-defined. To use a user-defined function in an index >> expression or WHERE clause, remember to mark the function immutable when >> you create it. >> === >> What am I missing? date_trunc will always return the same value for a >> given value. Not sure how I can mark a PG function as immutable. > > No it won't: > > show timezone; >TimeZone > > US/Pacific > > select date_trunc('hour', now()); > date_trunc > > 2018-12-15 15:00:00-08 > > set timezone='UTC'; > > select date_trunc('hour', now()); > > date_trunc > > 2018-12-15 23:00:00+00 > Ravi, the date_trunc('month',timestamp) is already immutable (at least in PG11): postgres=# \df+ date_trunc Schema |Name| Result data type | Argument data types| Volatility | ... ++-+---++-... pg_catalog | date_trunc | timestamp without time zone | text, timestamp without time zone | immutable | ... For the "date_trunc(text, timestampTZ) see Adrian's response, why it does not always return the same values for the same input. -- Best regards, Vitaly Burovoy
Re: date_trunc not immutable
Thanks all. I forgot the TZ part.