Re: Need optimization in query
Hi Team, Does anybody tried to have a look at it and would like to suggest any optimisations? Thanks On Sat, Jun 11, 2022 at 12:48 AM Shubham Mittal wrote: > Hi Team. > > *I have a use case to get the result as follows:* > > 1. asin_ymm is never null. > 2. If there is more than 1 entry for an asin_ymm with both null and > non-null submodelId, I should return rows with non-null submodelId only, > otherwise if there is no submodelid present for a asin_ymm, then return > that row with null submodelid. > 3. Also if submodelid is null , assuming fitment_key would always be null > in the table. > 4. Using that resultset, If there is more than 1 entry for an (asin_ymm > ,SubmodelID) with both null and non-null fitment_key, I should return rows > with non-null fitment_key only, otherwise if there is no fitment_key > present for a (asin_ymm,submodelId), then return that row with null > fitment_key. > 5. Using that resultset, i need to return those rows having maximum values > in fitment_key(ie. for eg.(out of these two 'A', 'D','1--3-4' and A', > 'D','1-2-3-4-7', i should return row having A', 'D','1-2-3-4-7) > > create table fitment_records(asin_ymm varchar(50), SubmodelID varchar(50), > fitment_key varchar(50)); > > insert into fitment_records(asin_ymm, SubmodelID,fitment_key) values('A', > null,null); > insert into fitment_records(asin_ymm, SubmodelID,fitment_key) values('B', > null,null); > insert into fitment_records(asin_ymm, SubmodelID,fitment_key ) values('C', > null,null); > insert into fitment_records(asin_ymm, SubmodelID,fitment_key) values('A', > 'D','1--3-4'); > insert into fitment_records(asin_ymm, SubmodelID,fitment_key) values('B', > 'E','2-3-4-5'); > insert into fitment_records(asin_ymm, SubmodelID,fitment_key) values('B', > 'E', null); > insert into fitment_records(asin_ymm, SubmodelID,fitment_key) values('A', > 'F','2-3'); > insert into fitment_records(asin_ymm, SubmodelID,fitment_key) values('A', > 'E', null); > insert into fitment_records(asin_ymm, SubmodelID,fitment_key) values('A', > 'D', null); > insert into fitment_records(asin_ymm, SubmodelID,fitment_key) values('A', > 'D','1-2-3-4-7'); > > output should be: > > asin_ymm | submodelid | fitment_key > --++- > A| D | 1-2-3-4-7 > A| E | null > A| F | 2-3 > B| E | 2-3-4-5 > C| null | null > > Currently i have written these queries for this usecase. Can we optimise it > further? Considering data is in millions > > create temporary view tv1 as (SELECT * > FROM fitment_records fr_1 > WHERE fitment_key IS NOT NULL OR > (fitment_key IS NULL AND > NOT EXISTS (SELECT 1 FROM fitment_records fr_2 > WHERE fr_2.asin_ymm = fr_1.asin_ymm AND > fr_2.SubmodelID = fr_1.SubmodelID and > fr_2.fitment_key IS NOT NULL))); > > create temporary view tv2 as (select * > FROM tv1 fr_1 > WHERE SubmodelID IS NOT NULL OR > (SubmodelID IS NULL AND > NOT EXISTS (SELECT 1 FROM fitment_records fr_2 > WHERE fr_2.asin_ymm = fr_1.asin_ymm AND > fr_2.SubmodelID IS NOT NULL) )); > > create temporary view fitment_records_with_fitment_key_size as ( > select asin_ymm, SubmodelID, fitment_key, > Array_Length(string_to_array(fitment_key, '-'),1) as fitment_key_size > from tv2 > where SubmodelID is not null > and fitment_key is not null > group by asin_ymm, SubmodelID, fitment_key > ); > > create temporary view fitment_records_with_fitment_key_max_size as ( > select asin_ymm, SubmodelID, max(fitment_key_size) as max_fitment_key_size > from fitment_records_with_fitment_key_size > group by asin_ymm, SubmodelID > ); > > select * from tv2 > except > select f2.* > from fitment_records_with_fitment_key_size frws, > fitment_records_with_fitment_key_max_size frwms, > tv2 f2 > where frws.asin_ymm = frwms.asin_ymm > AND frws.SubmodelID = frwms.SubmodelID > AND frws.fitment_key_size < frwms.max_fitment_key_size > AND frws.SubmodelID = f2.SubmodelID > AND frws.asin_ymm = f2.asin_ymm > AND frws.fitment_key = f2.fitment_key; > > Thanks & Regards > > >
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"
> pavel.steh...@gmail.com wrote: > >> b...@yugabyte.com napsal: >> >> Does the “Tip” call-out box, from which the “Subject” here is copied, and >> the larger story that I copied below, apply even when the executable section >> of the block statement in question does nothing at all that could be rolled >> back? >> >> This is my block: >> >> begin >> year_as_int := year_as_text; >> exception >> when invalid_text_representation then >> bad_integer := true; >> end; >> >> The typecast to integer will cause an error if the input text does not >> represent an integer. > > The block is a PLpgSQL statement (internally it is not just syntax) - and a > safepoint is created before execution of any statement inside the block, when > the block has EXCEPTION parts. PL/pgSQL is an AST interpreter. It knows > nothing about statements inside the block. The lazy implementation > theoretically can be possible, but why? Any statement can raise an exception. > There is not any evidence what statements are safe and what not. Thanks for the quick response. I'll take this to mean that the present behavior will never change—in spite of your: > The lazy implementation [that Bryn sketched] theoretically can be possible You then said: > but why? OK, so I'm obliged to answer. Because SQL rests on the principle that you just say *what* you want but not *how*. Here, I want to cast my string, which putatively represents an integer, to an "int" value. The text comes from the outside world, and what is meant to be "42017" might arrive as "42O17". Or, might arrive properly, as "42000e-03". Consider this: create function is_int_nn(t in text) returns boolean language plpgsql as $body$ declare ok boolean not null := (t is not null); begin if ok then -- Placeholder naïve REGEXPR test. ok := (t != '') and not (t ~ 'x'); end if; if ok then declare n constant numeric not null := t; r constant numeric not null := round(n); begin ok := (r = n); end; end if; return ok; end; $body$; select (select is_int_nn(null )::text) as test_1, (select is_int_nn('' )::text) as test_2, (select is_int_nn('42000x-04')::text) as test_3, (select is_int_nn('42000e-04')::text) as test_4, (select is_int_nn('42000e-03')::text) as test_5; Neither the design nor the implementation of the code is by any means finished yet. And this holds, therefore, for the tests too. So this is a loud denial of « just say *what* you want ». You might argue that any junior programmer could manage the complete exercise in a morning. But then somebody else has to review it. And it's another artifact to be managed. Generic utilities like this always present a challenge when they need to be used in more than one distinct application. You need to invent a "single source of truth" scheme. Compare everything that "function is_int_nn(t in text)" implies with the block that I showed above. Oracle Database 12c Release 2 (and later) has a validate_conversion() built-in. https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/VALIDATE_CONVERSION.html#GUID-DC485EEB-CB6D-42EF-97AA-4487884CB2CD This arrived in (some time around) 2017. Is there any chance that PG might ship an equivalent?
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"
Bryn Llewellyn writes: > OK, so I'm obliged to answer. > Because SQL rests on the principle that you just say *what* you want but > not *how*. It also rests on the principle that the programmer shouldn't be too concerned about micro-efficiencies. You've given a perfectly good six-line implementation of what you want; use it and be happy. > Oracle Database 12c Release 2 (and later) has a validate_conversion() > built-in. [ shrug... ] We are not Oracle. One of the main ways in which we are not Oracle is that we support extensible database functionality. To write a "validate_conversion" function that supports extension datatypes, but doesn't use something morally equivalent to a subtransaction, would be a nightmare: large, fragile, and probably not all that much faster. regards, tom lane
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"
po 13. 6. 2022 v 21:02 odesílatel Bryn Llewellyn napsal: > *pavel.steh...@gmail.com wrote:* > > *b...@yugabyte.com napsal:* > > Does the “Tip” call-out box, from which the “Subject” here is copied, and > the larger story that I copied below, apply even when the executable > section of the block statement in question does nothing at all that could > be rolled back? > > This is my block: > > > > > > > *begin year_as_int := year_as_text;exception when > invalid_text_representation thenbad_integer := true;end;* > > The typecast to integer will cause an error if the input text does not > represent an integer. > > > The block is a PLpgSQL statement (internally it is not just syntax) - and > a safepoint is created before execution of any statement inside the block, > when the block has EXCEPTION parts. PL/pgSQL is an AST interpreter. It > knows nothing about statements inside the block. The lazy implementation > theoretically can be possible, but why? Any statement can raise an > exception. There is not any evidence what statements are safe and what not. > > > Thanks for the quick response. I'll take this to mean that the present > behavior will never change—in spite of your: > > The lazy implementation [that Bryn sketched] theoretically can be possible > > > You then said: > > but why? > > > OK, so I'm obliged to answer. > > Because SQL rests on the principle that you just say *what* you want but > not *how*. Here, I want to cast my string, which putatively represents an > integer, to an "int" value. The text comes from the outside world, and what > is meant to be "42017" might arrive as "42O17". Or, might arrive properly, > as "42000e-03". > > Consider this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *create function is_int_nn(t in text) returns boolean language plpgsqlas > $body$declare ok boolean not null := (t is not null);begin if ok then > -- Placeholder naïve REGEXPR test.ok := (t != '') and not (t ~ > 'x'); end if; if ok thendeclare n constant numeric not null := > t; r constant numeric not null := round(n);begin ok := (r = > n);end; end if; return ok;end;$body$;select (select is_int_nn(null > )::text) as test_1, (select is_int_nn('' )::text) as > test_2, (select is_int_nn('42000x-04')::text) as test_3, (select > is_int_nn('42000e-04')::text) as test_4, (select > is_int_nn('42000e-03')::text) as test_5;* > > Neither the design nor the implementation of the code is by any means > finished yet. And this holds, therefore, for the tests too. So this is a > loud denial of « just say *what* you want ». > > You might argue that any junior programmer could manage the complete > exercise in a morning. But then somebody else has to review it. And it's > another artifact to be managed. Generic utilities like this always present > a challenge when they need to be used in more than one distinct > application. You need to invent a "single source of truth" scheme. > > Compare everything that "function is_int_nn(t in text)" implies with the > block that I showed above. > > Oracle Database 12c Release 2 (and later) has a validate_conversion() > built-in. > > https://docs.oracle.com/en/database/oracle/oracle- > database/19/sqlrf/VALIDATE_CONVERSION.html#GUID-DC485EEB-CB6D-42EF-97AA-4487884CB2CD > > This arrived in (some time around) 2017. > > *Is there any chance that PG might ship an equivalent*? > I remember a long and still not closed discussion about fault tolerant copy implementation. The problem is a lot of possibly redundant code for exception safe input functions, if I remember well. And it is not enough for fault tolerant copy still. Maybe it needs some refactoring of the PostgreSQL exceptions handling system to be able to handle some exceptions that come from a non-storage engine without the necessity to use safepoints. I have no idea if somebody is working on this issue now, but I don't expect so it is easy to fix it. Maybe a more probable fix can be to reduce an overhead of savepoints. This issue is more complex than can be visible from user perspective - and the complexity is based on how pg has implemented exceptions. Regards Pavel
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"
> On Jun 12, 2022, at 23:07, Pavel Stehule wrote: > The lazy implementation theoretically can be possible, but why? Isn't one of the reasons for the savepoint (in fact, the principal reason) to reset the connection back to non-error state so that execution can continue? In that case, it really does need to create the savepoint at the start of the block, regardless of what's in it, since any statement can raise an error.
Tools to convert timestamp data to another time zone in PostgreSQL
Hi, We have recently modified our application to work with PostgreSQL databases and schemas. We also support Oracle and SQL Server Databases. Along with adding support for PostgreSQL, we have upgraded our infrastructure such that all environments are configured to use the UTC time zone. Previously, the environments were configured to use the time zone where the database server and application server were installed. As a result, we have hit an issue where we need to convert data in timestamp columns in existing records to reflect that the time is in UTC. The timezone is not specified in our timestamp columns (i.e. they are defined as timezone without time zone). We need to do this for interoperability between the database engines we support. After a bit of searching, we found we can write an SQL similar to the one below to do the conversion: update client set create_stamp = (create_stamp at time zone 'America/New_York' at time zone 'UTC') where client_code = 'HOANA'; This does work correctly. However, we have some limitations in terms using SQL statements like this. - We would need to identify the timestamp columns that would be affected across many tables and multiple schemas. - We also store date-only information in timestamp without time zone columns. This was done as a result of migrating our application from Oracle where the DATE data type was used at the time (Oracle now supports timestamp columns). I was wondering if you are aware of any open source and/or commercial tools that could allow us to easily identify the affected columns, exclude columns if necessary, and apply the necessary conversion. If not, we would have to write a utility that does this for us, which could be a lengthy process. Thanks, Joel
Re: Tools to convert timestamp data to another time zone in PostgreSQL
On 6/13/22 2:52 PM, Joel Rabinovitch wrote: Hi, We have recently modified our application to work with PostgreSQL databases and schemas. We also support Oracle and SQL Server Databases. Along with adding support for PostgreSQL, we have upgraded our infrastructure such that all environments are configured to use the UTC time zone. Previously, the environments were configured to use the time zone where the database server and application server were installed. As a result, we have hit an issue where we need to convert data in timestamp columns in existing records to reflect that the time is in UTC. The timezone is not specified in our timestamp columns (i.e. they are defined as timezone without time zone). We need to do this for interoperability between the database engines we support. After a bit of searching, we found we can write an SQL similar to the one below to do the conversion: update client set create_stamp = (create_stamp at time zone 'America/New_York' at time zone 'UTC') where client_code = 'HOANA'; This does work correctly. However, we have some limitations in terms using SQL statements like this. - We would need to identify the timestamp columns that would be affected across many tables and multiple schemas. select table_schema, table_name, column_name from information_schema.columns where data_type = 'timestamp without time zone'; - We also store date-only information in timestamp without time zone columns. This was done as a result of migrating our application from Oracle where the DATE data type was used at the time (Oracle now supports timestamp columns). A date stored in a timestamp field is going to be a timestamp at midnight: timestamp_test Table "public.timestamp_test" Column |Type | Collation | Nullable | Default +-+---+--+- ts | timestamp without time zone | | | tsz| timestamp with time zone| insert into timestamp_test values (current_date, current_date); ts|tsz -+ 2022-06-13 00:00:00 | 2022-06-13 00:00:00-07 I was wondering if you are aware of any open source and/or commercial tools that could allow us to easily identify the affected columns, exclude columns if necessary, and apply the necessary conversion. If not, we would have to write a utility that does this for us, which could be a lengthy process. Thanks, Joel -- Adrian Klaver adrian.kla...@aklaver.com
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"
> x...@thebuild.com wrote: > > Isn't one of the reasons for the savepoint (in fact, the principal reason) to > reset the connection back to non-error state so that execution can continue? > In that case, it really does need to create the savepoint at the start of the > block, regardless of what's in it, since any statement can raise an error. > t...@sss.pgh.pa.us wrote: > > It also rests on the principle that the programmer shouldn't be too concerned > about micro-efficiencies. You've given a perfectly good six-line > implementation of what you want; use it and be happy. > > ... > > [ shrug... ] We are not Oracle. One of the main ways in which we are not > Oracle is that we support extensible database functionality. To write a > "validate_conversion" function that supports extension datatypes, but doesn't > use something morally equivalent to a subtransaction, would be a nightmare: > large, fragile, and probably not all that much faster. > pavel.steh...@gmail.com wrote: > > I remember a long and still not closed discussion about fault tolerant copy > implementation. The problem is a lot of possibly redundant code for exception > safe input functions, if I remember well. And it is not enough for fault > tolerant copy still. Maybe it needs some refactoring of the PostgreSQL > exceptions handling system to be able to handle some exceptions that come > from a non-storage engine without the necessity to use safepoints. I have no > idea if somebody is working on this issue now, but I don't expect so it is > easy to fix it. Maybe a more probable fix can be to reduce an overhead of > savepoints. This issue is more complex than can be visible from user > perspective - and the complexity is based on how pg has implemented > exceptions. —— Thanks, Tom Lane, for your reply. The note in the doc: > A block containing an EXCEPTION clause is significantly more expensive to > enter and exit than a block without one. uses the word "significantly". This scares the application programmer. Would you (all) consider this revised wording: « If the executable section of a block that has an exception section makes changes to the database, then this brings significantly more expense than when there is no exception section. However, if the executable section makes only changes to local variables or session parameters, then the additional expense brought by an exception section is negligible. » Oracle Database users had to wait about 40 years for the "validate_conversion()" built-in—despite strong and persistent user-pressure. PostreSQL is about 25 years old. So there's plenty of time to reconsider... —— Thanks, Pavel Stehule, for your reply. Forget savepoints and blocks with exception sections. And forget the general case that Oracle's "validate_conversion()" handles. It would seem to me that the code that implements "year_as_int := year_as_text" and, maybe, says "text into int won't go", is distinct from the code that it notifies to raise an error. It ought to be possible to factor out this code for re-use and use it both to do what it presently does and to implement a built-in "is_int_nn(t in text)". And maybe the same for numeric and for timestamp[tz]. These must be the common cases when such putative values come in from the UI. Proper practice insists on re-validating the conversions in the data base even though it's to be hoped that the UI will have done this. However, in the light of Tom's « You've given a perfectly good six-line implementation of what you want; use it and be happy. », the need (almost) vanishes—except for the point that I mentioned earlier about packaging up the test in a the application development show's "utilities" kit. —— Thanks, Christophe Pettus, for your reply. > it really does need to create the savepoint at the start of the block, > regardless of what's in it, since any statement can raise an error. This is necessary only when it can't be proved that the executable section that precedes the exception section cannot make database changes—hereinafter "is safe". Such safety tests are always over cautious. So, for example, it would be assumed that any transfer of control out of the executable section, and then back, was unsafe. However, the experts have insisted that even the safety of this case cannot be proved: > an executable section with only simple assignments that use only built-in > functionality. I suppose that "PL/pgSQL is an AST interpreter" is the clue here. Given this, then yes, you're right.
Re: multiple entries for synchronous_standby_names
On Fri, Jun 10, 2022 at 05:04:30PM +0100, Nitesh Nathani wrote: > Trying to achieve sync streaming to barman server and i need to add an > entry to postgresql.conf for this parameter, which already has an entry and > tried a few variations but does not work. Any ideas? Also tried '&&' but in > vain > > synchronous_standby_names='ANY 1 (*)',barman-wal-archive This grammar flavor is not supported (see also syncrep_gram.y for the code): https://www.postgresql.org/docs/devel/runtime-config-replication.html And here is the actual list of grammars supported: [FIRST] num_sync ( standby_name [, ...] ) ANY num_sync ( standby_name [, ...] ) standby_name [, ...] In short, you can specify a list of node names within one ANY or FIRST clause, but you cannot specify a list made of ANY/FIRST items. Without knowing what kind of priority policy you are trying to achieve, it is hard to recommend one method over the others. What we support now has proven to be hard enough to implement and to make robust, and supporting sub-groups of nodes was also on the table back in the day, but the lack of cases did not justify the extra implementation complexity, as far as I recall this matter. -- Michael signature.asc Description: PGP signature
RE: Build Postgres On AIX
Postgres will not build on AIX with either of the following options to the configure script: --with-uuid=e2fs --with-uuid=ossp so I was using --with-uuid=bsd which does work except for the arc4random undefined symbol error I was getting. I removed the --with-uuid=bsd option to the configure script and everything builds including the contrib subdir except the uuid-ossp subdir like we want. Heres the configure command: ./configure \ --prefix="$BUILD_DIR/pgsql-$POSTGRES_VERSION" \ --without-readline \ --without-zlib \ --with-openssl \ --with-includes="$BUILD_DIR"/openssl/include/openssl/ \ --with-libraries="$BUILD_DIR"/openssl/lib/ \ "$ADDITIONAL_FLAGS" However, when I try to create a database I'm getting an out of memory error (see below.) Any suggestions? Thanks, Mark --- Command: ${postgresDir}/bin/initdb -D ./data -U dbmsowner -W TopMemoryContext: 75328 total in 5 blocks; 33264 free (32 chunks); 42064 used TopTransactionContext: 524288 total in 7 blocks; 106872 free (26 chunks); 417416 used TableSpace cache: 8192 total in 1 blocks; 5064 free (0 chunks); 3128 used <.> Relcache by OID: 8192 total in 1 blocks; 1760 free (0 chunks); 6432 used CacheMemoryContext: 2097152 total in 9 blocks; 808960 free (10 chunks); 1288192 used index info: 2048 total in 2 blocks; 808 free (0 chunks); 1240 used: pg_description_o_c_o_index relation rules: 16384 total in 5 blocks; 4840 free (0 chunks); 11544 used: pg_replication_slots relation rules: 16384 total in 5 blocks; 4544 free (1 chunks); 11840 used: pg_statio_all_sequences relation rules: 49152 total in 9 blocks; 7160 free (3 chunks); 41992 used: pg_statio_all_indexes <...etc...> Grand total: 3779872 bytes in 241 blocks; 1220984 free (137 chunks); 255 used 2022-06-13 23:20:12.911 EDT [15270042] FATAL: out of memory 2022-06-13 23:20:12.911 EDT [15270042] DETAIL: Failed on request of size 8192 in memory context "ExprContext". 2022-06-13 23:20:12.911 EDT [15270042] STATEMENT: SELECT pg_import_system_collations('pg_catalog'); -Original Message- From: Tom Lane Sent: Tuesday, May 31, 2022 11:20 AM To: Mark Hill Cc: pgsql-general@lists.postgresql.org Subject: Re: Build Postgres On AIX [You don't often get email from t...@sss.pgh.pa.us. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] EXTERNAL Mark Hill writes: > I'm building Postgres 14.2 on AIX. Apparently the arc4random function is > not available in the AIX system libraries. The build > fails when it tries to build > postgresql-14.2/contrib/uuid-ossp/uuid-ossp.c because of the use of > arc4random in that source file. AFAICS arc4random is only referenced in the "BSD" code paths, so you must have tried to specify --with-uuid=bsd. It's not too surprising that that didn't work. As per the docs, your other alternatives are to use the OSSP library or the e2fsprogs library. Or you could just not build uuid-ossp; it's fairly vestigial these days, now that we offer gen_random_uuid() in core. regards, tom lane
Re: Tools to convert timestamp data to another time zone in PostgreSQL
On Mon, Jun 13, 2022 at 09:52:00PM +, Joel Rabinovitch wrote: >Hi, > > > >We have recently modified our application to work with PostgreSQL >databases and schemas. We also support Oracle and SQL Server Databases. [skipped] > >As a result, we have hit an issue where we need to convert data in >timestamp columns in existing records to reflect that the time is in UTC. >The timezone is not specified in our timestamp columns (i.e. they are >defined as timezone without time zone). We need to do this for btw, it's not specified in timestamptz either. timestamptz always stores time in UTC microseconds, and displays it in timezone according to the session settings. >interoperability between the database engines we support. It's better to use timestamptz type https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timestamp_.28without_time_zone.29 and I think it's a good time to do that change. [skipped] Setting timezone in session to 'America/New_York' and converting column to timestamptz should do it fine set timezone = 'America/New_York'; ALTER TABLE ALTER COLUMN create_stamp TYPE timestamptz; should do it just fine (on a reasonably sized tables). Other possibilities, like creating a new column and renaming it after the proper feel in, are possible. > > set create_stamp = (create_stamp at time zone 'America/New_York' at >time zone 'UTC') > >where client_code = 'HOANA'; > > >