How to keep format of views source code as entered?

2021-01-07 Thread Markhof, Ingolf
Hi!

Switching from Oracle SLQ to PostgreSQL I am facing the issue that the SQL code 
the system returns when I open a views source code is different from the code I 
entered. The code is formatted differently, comments are gone and e.g. all text 
constants got an explicit cast to ::text added. (see sample below).

I want the SLQ code of my views stored as I entered it. Is there any way to 
achieve this? Or will I be forced to maintain my views SQL code outside of 
PostgreSQL views?

Any hints welcome!

Here is an example:

I enter this code to define a simple view:

create or replace view myview as
select
  product_id,
  product_acronym
from
  products -- my comment here
where
  product_acronym = 'ABC'
;

However, when I open the view my SQL client (DBeaver) again, this is what I get:

CREATE OR REPLACE VIEW myview
AS SELECT product_id,
product_acronym
   FROM products
  WHERE product_acronym = 'ABC'::text;

So, the formatting changed, keywords are capitalized, the comment I added in 
the from-part has gone and the text constant 'ABC' changed to 'ABC'::text.






Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio


RE: How to keep format of views source code as entered?

2021-01-07 Thread Markhof, Ingolf
So, it looks like PostgreSQL does support saving the original source code of a 
view.

What's best practise to use as a code repository?

I would expect support of multi-user access, access-right management and 
perhaps versioning as well…?

Thanks for your help!

Ingolf

From: Markhof, Ingolf [mailto:ingolf.mark...@de.verizon.com]
Sent: 07 January 2021 17:19
To: pgsql-general@lists.postgresql.org
Subject: [E] How to keep format of views source code as entered?

Hi!

Switching from Oracle SLQ to PostgreSQL I am facing the issue that the SQL code 
the system returns when I open a views source code is different from the code I 
entered. The code is formatted differently, comments are gone and e.g. all text 
constants got an explicit cast to ::text added. (see sample below).

I want the SLQ code of my views stored as I entered it. Is there any way to 
achieve this? Or will I be forced to maintain my views SQL code outside of 
PostgreSQL views?

Any hints welcome!

Here is an example:

I enter this code to define a simple view:

create or replace view myview as
select
  product_id,
  product_acronym
from
  products -- my comment here
where
  product_acronym = 'ABC'
;

However, when I open the view my SQL client (DBeaver) again, this is what I get:

CREATE OR REPLACE VIEW myview
AS SELECT product_id,
product_acronym
   FROM products
  WHERE product_acronym = 'ABC'::text;

So, the formatting changed, keywords are capitalized, the comment I added in 
the from-part has gone and the text constant 'ABC' changed to 'ABC'::text.


Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio


RE: How to keep format of views source code as entered?

2021-01-08 Thread Markhof, Ingolf
Thanks for your comments and thoughts.

I am really surprised that PostgreSQL is unable to keep the source text of a 
view. Honestly, for me the looks like an implementation gap. Consider software 
development. You are writing code in C++ maybe on a UNIX host. And whenever you 
feed you source code into the compiler, it will delete it, keeping the 
resulting executable, only. And you could not even store your source code on 
the UNIX system. Instead, you'd be forced to do so in a separate system, like 
GitHub. Stupid, isn't it? Right. There are good reasons to store the source 
code on GitHub or alike anyhow. Especially when working on larger project and 
when collaborating with many people. But in case of rather small project with a 
few people only, this might be an overkill.

It shouldn't be rocket science to enable PostgreSQL to store the original 
source code as well. It's weird PostgreSQL is not doing it.

Regards,
Ingolf



Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio


RE: How to keep format of views source code as entered?

2021-01-09 Thread Markhof, Ingolf
Tom, all,

when I change a tables column name in Oracle SQL , SQLdeveloper (a SQL client) 
marks views that refer to the table using the old column name as erroneous. So, 
I can easily identify these cases. And of course I, as a user, I am acting in 
my context, i.e. my schema. So it is perfectly clear what table I am referring 
to.

Please note: I am not developing any PL/SQL code. I don't have big development 
projects. I have more the role of an data analyst. I just create rather complex 
SQL queries which, from time to time, may need to be adopted to some new 
requirements. Or peers want to (re-) use (part of) my SQL queries. There is not 
really much versioning required. 

What I understood so far is: I can use e.g. DBeaver to interact with 
PostgreSQL, to develop my SQL code. But I finally need to copy the SQL code 
into e.g. Github. Which is a manual process. I'd mark the SQL code in the 
DBeaver editor window and copy&paste it into some file in e.g. GitHub. Using 
Github, I'd get version control and other enhanced collaboration features which 
I don't really need. At the price that code transfer from SQL (DBeaver) to the 
code repository and vice versa is complete manually?! This doesn't really look 
like an enhancement.

Most likely, there are more professional ways to do that. I'd be glad to get 
advice.

What I would like to have is something that would automatically update the SQL 
code in the software repository when I run a CREATE OR REPLACE VIEW.

Ingolf



Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio


Re: How to keep format of views source code as entered?

2021-01-26 Thread Markhof, Ingolf
Hi!

Today, I made an astonishing / disappointing experience related to that source 
code topic:

You may not be able to delete tables / views that are referenced by other users 
objects, 
e.g. views. Unless you add the CASCADE option which will cause all depending 
views to be
deleted as well. And the CASCASE will work and delete the other users view even 
when you 
don't have the permission to drop that other users view!

Looks like the Oracle concept of marking views as invalid makes some sense...

Regards,
Ingolf



-Original Message-
From: Paul Förster [mailto:paul.foers...@gmail.com] 
Sent: 14 January 2021 07:16
To: Cybertec Schönig & Schönig GmbH 
Cc: Adam Brusselback ; David G. Johnston 
; raf ; 
pgsql-generallists.postgresql.org 
Subject: [E] Re: How to keep format of views source code as entered?

Hi Laurenz,

> On 14. Jan, 2021, at 04:59, Laurenz Albe  wrote:
> 
> If PostgreSQL were to store the original text, either that text would become
> wrong, or you would have to forbid renaming of anything that is referenced
> by a view.

this is why views, procedures, functions and packages can become invalid in 
Oracle, which I really hate because as a DBA, it's almost impossible to quickly 
see (or in fact see at all) why this happens, or having to debug applications 
that you don't know and/or can't fix anyway. Oracle's invalid object concept 
doesn't make sense.

So, I'm not at all in favor of saving the original statement text.

Cheers,
Paul



Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio





Foreign table performance issue / PostgreSQK vs. ORACLE

2021-01-29 Thread Markhof, Ingolf
Hi!

I am struggling with the slow performance when running queries referring to 
foreign tables. - Yes, I know... - Please read the whole story!

The set-up basically is a production database and a reporting database. As 
names indicate, the production database is used for production, the reporting 
database is for analysis. On the reporting database, the only way to access 
product data is via foreign tables that link to the related production tables.

Now, while some queries on the reporting service run fine, some don't even 
return any data after hours.

However, the same set-up worked fine in Oracle before. Reporting wasn't always 
fast, but it delivered results in acceptable time. A query executed on the 
Oracle reporting server returns data in e.g. 30 seconds. But running the query 
translated to PostgreSQL on the PostgreSQL DB does not deliver a single row 
after hours (!) of run time.

So, I wonder: Is there a fundamental difference between Oracle database links 
and foreign tables in PostgreSQL that could explain the different run times? 
Could there be some tuning option in PostgreSQL to make queries via foreign 
tables faster (e.g. I heard about option fetch_size)?

Your pointes welcome!


Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio


RE: Foreign table performance issue / PostgreSQK vs. ORACLE

2021-02-02 Thread Markhof, Ingolf
Hi!

My PostgreSQL version is 11.8. 

The query I am running is referring to a number of foreign tables. The first 
one (table1) has to IDs, let's say ID_A and ID_B. While ID_A is unique, ID_B is 
not. In my case, I am pulling formation for a value of IB_B for which about 800 
rows (with unique ID_A) exist. I found:

While

select * from my_view where id_b='some value';

seemingly runs "forever" (I cancelled execution after a few hours), the 
following completes in about 1 hr:

select * from my_view where ia_a in (
select id_a from table1 where id_b='some value'
 );

So, I tried smaller chunks of ID_a and found the execution time is non-linear 
with respect to number of IDs. For e.g. 50 ID_A's, it was completed in about 12 
sec.

Next I have split the ~800 ID_A's into chunks of 50 and submitted these 16 
queries one after another. They all completed in about 12 secs, each.

I then found the option fetch_size, e.g. ALTER SERVER some_server OPTIONS 
(fetch_size '5'). A chunk of 50 now executes in 2 seconds (instead of 12 
before).

So, I found the "size" of the query has a serious impact to the execution time. 
I don't really understand why execution 16*50 takes 16*2 secs only, but 
executing 1*800 takes about 3000 seconds...

Regards,
Ingolf


-Original Message-
From: Sebastian Dressler [mailto:sebast...@swarm64.com] 
Sent: 30 January 2021 11:45
To: Markhof, Ingolf 
Cc: pgsql-general@lists.postgresql.org
Subject: [E] Re: Foreign table performance issue / PostgreSQK vs. ORACLE

Hi Ingolf,

> On 29. Jan 2021, at 13:56, Markhof, Ingolf  
> wrote:
> 
> Hi!
>  
> I am struggling with the slow performance when running queries referring to 
> foreign tables. – Yes, I know… - Please read the whole story!

Done and it rings a bell or two.

> The set-up basically is a production database and a reporting database. As 
> names indicate, the production database is used for production, the reporting 
> database is for analysis. On the reporting database, the only way to access 
> product data is via foreign tables that link to the related production tables.
>  
> Now, while some queries on the reporting service run fine, some don't even 
> return any data after hours.
>  
> However, the same set-up worked fine in Oracle before. Reporting wasn't 
> always fast, but it delivered results in acceptable time. A query executed on 
> the Oracle reporting server returns data in e.g. 30 seconds. But running the 
> query translated to PostgreSQL on the PostgreSQL DB does not deliver a single 
> row after hours (!) of run time.
>  
> So, I wonder: Is there a fundamental difference between Oracle database links 
> and foreign tables in PostgreSQL that could explain the different run times? 
> Could there be some tuning option in PostgreSQL to make queries via foreign 
> tables faster (e.g. I heard about option fetch_size)?

You did not explicitly mention it, but I assume you are using postgres_fdw to 
connect from reporting (R) to production (P). Thomas and Tom already mentioned 
incomplete/non-existing/non-applicable filter pushdowns. I want to add another 
probable root cause to the list explaining the behavior you experience.

The postgres_fdw uses a CURSOR on P to execute the query. While this guarantees 
transaction safety, it also prohibits parallelism (PostgreSQL server-side 
cursors enforce a sequential plan).

As a result, depending on the size of tables, indexes, and filters pushed down 
(or not), this probably results in slow-running queries. IMO, the worst-case 
scenario is that a sequential table scan without any filtering, and a single 
worker runs on the target.

Of course, you can try to optimize schemas on P and queries on R, enabling more 
filter pushdown and eventually a faster execution. However, I believe this does 
not work with your entire workload, i.e. there will always be performance gaps.

The parallelism issue is theoretically fixable by utilizing partitions on P. R 
then connects to P with multiple postgres_fdw-backed child tables. However, 
this will only work with a patch to postgres_fdw to implement 
"IsForeignScanParallelSafe" (see [1] for a possible implementation). Without 
this method, there will be no parallelism again. Without, the partitions scan 
occurs sequentially, not showing a performance gain.

I want to mention there are proprietary options available (re-)enabling 
PostgreSQL parallelism with cursors. Such an extension can potentially fix your 
performance issue. However, I have not tried it so far with a setup similar to 
yours.

Cheers,
Sebastian


[1]: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_swarm64_parallel-2Dpostgres-2Dfdw-2Dpatch&d=DwIGaQ&c=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ&r=ivZWA-ECVj3XrXBe0obDwKY7Ui7K5Nj9oD2KKWLm0Bw&

dealing with dependencies

2021-07-16 Thread Markhof, Ingolf
Hi!

I recently switched from Oracle SQL to PostgreSQL.

In Oracle, I can easily delete a table or view that is used by existing
views. The system marks the affected views. I can then re-define the
deleted table or view and have all dependent views easily re-compiled. Done.

PostgreSQL instead is preventing inconsistency. It simply refuses to delete
a view or table that is referenced by other views. Consequently, I need to
delete all dependent views first, re-define the one I want to change and
then create all dependent views deleted before... - Which is much more
difficult to handle.

What I especially dislike is that you cannot even insert a column into an
existing view if that view is used by some other views. E.g.:

create table my_table (col1 text, col2 text);
create view my_view1 as select col1, col2 from my_table;
create view my_view2 as select col1, col2 from my_view1;
create or replace view my_view1 as select col1, col1||col2, col2 from
my_table; --> ERROR: Cannot change name of view column "col2" to ..

The create or replace of view 2 fails. Clear, the manual states about
create or replace view: "the new query must generate the same columns that
were generated by the existing view query (that is, the same column names
in the same order and with the same data types), but it may add additional
columns to the end of the list.". Obviously, the columns are internally
referenced by index, not by name.  But if I want my new column between to
exiting ones, I need to deleted my_view2, first...

I wonder how you deal with it in a professional way. Sounds like some type
of "make" (that UNIX tool dealing with dependencies in the context of e.g.
programming in C) would be helpful...

So, in an environment of rapid prototyping, if you develop the data-base
design and view for tables etc. and you then need to make changes to a base
table that affect all the views using it, there should be another way than
doing all this manually...?!

Thx for your pointers!
I.

==

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio



Regexp_replace bug / does not terminate on long strings

2021-08-19 Thread Markhof, Ingolf
BRIEF:

regexp_replace(source,pattern,replacement,flags) needs very (!) long to
complete or does not complete at all (?!) for big input strings (a few k
characters). (Oracle SQL completes the same in a few ms)

VERBOSE

Given a comma-separated list of "words" (whereas a word is any sequence of
characters not containing a comma) I want to delete sequences of
duplicates, e.g.

'one,one,one,two,two,three' --> 'one,two,three'

I use this regexp_replace to delete duplicate "words" in a comma separated
string:

select regexp_replace(
  'one,one,one,two,two,three', -- input string
  '([^,]+)(,\1)*($|,)', -- pattern
  '\1\3', -- replacement
  'g'  -- apply globally (all matches)
);

However, I found that this regexp_replace seemingly runs "endless" (I
cancelled the query after a few minutes) for bigger input strings. Such as:

 
'1/1,1/1,1/1,1/1,1/1,1/1,1/1,1/1,1/1,1/1,1/1,2/1,2/1,2/1,2/1,2/2,2/1,2/1,2/2,2/2,2/1,2/1,2/2,2/2,2/1,2/2,2/1,2/1,2/2,2/1,2/1,2/2,3/1,3/1,3/2,3/1,3/1,3/1,3/1,3/1,3/1,3/3,3/1,3/1,3/1,3/3,3/1,3/1,3/2,3/1,3/1,3/1,3/3,3/3,3/1,3/1,4/1,4/1,4/1,4/1,4/1,4/1,5/2,5/5,5/1,5/5,5/5,5/2,5/1,5/1,5/5,5/1,6/1,6/1,6/1,6/1,6/3,6/6,6/1,6/1,6/1,6/3,6/1,6/1,6/1,6/1,6/1,6/1,6/6,6/1,7/1,7/3,7/1,7/1,7/1,7/5,7/1,7/1,7/1,7/1,7/1,7/1,7/1,7/5,7/1,7/3,7/1,8/1,8/1,8/2,8/2,8/1,8/6,8/1,8/1,8/1,8/1,8/6,8/1,8/1,8/1,9/2,9/1,9/1,9/2,10/4,10/2,10/2,10/2,10/2,10/1,10/4,10/10,10/10,10/2,10/1,10/1,10/2,10/1,10/8,10/1,10/3,10/2,10/5,10/10,10/2,10/10,10/2,10/3,10/1,10/1,10/1,10/1,10/8,10/5,12/5,12/3,12/5,12/5,12/1,12/5,12/1,12/3,12/1,12/1,12/5,12/2,12/1,12/0.768,12/1,12/2,12/2,12/2,12/2,12/2,12/1,12/1,14/3,15/1,15/10,15/1,15/2,15/3,15/2,15/1,15/15,15/1,15/2,15/4,15/15,15/5,15/1,15/2,15/15,15/1,15/5,15/1,15/3,15/5,15/5,15/1,15/10,15/4,15/2,15/2,15/15,15/3,15/2,15/2,15/3,15/3,16/3,16/3,18/4,18/3,18/1,18/2,18/2,18/2,18/4,18/2,18/2,18/1,18/3,20/20,20/5,20/0.896,20/5,20/1,20/2,20/1,20/3,20/4,20/5,20/10,20/20,20/10,20/5,20/1,20/1,20/4,20/2,20/1,20/1,20/3,20/4,20/20,20/2,20/20,20/2,20/20,20/20,24/3,24/4,24/2,24/4,24/3,24/3,24/3,24/3,25/3,25/4,25/10,25/25,25/6,25/1,25/7,25/2,25/5,25/2,25/2,25/25,25/3,25/25,25/25,25/10,25/3,25/10,25/25,25/6,25/4,25/25,25/5,25/5,25/3,25/1,25/5,25/10,25/25,25/7,25/14,25/5,25/5,25/5,25/3,25/5,25/14,25/2,30/2,30/7,30/3,30/8,30/15,30/1,30/4,30/7,30/2,30/5,30/30,30/8,30/5,30/5,30/5,30/8,30/8,30/1,30/10,30/3,30/30,30/10,30/4,30/30,30/30,30/3,30/1,30/15,30/3,30/5,30/3,35/5,35/12,35/5,35/10,35/3,35/3,35/4,35/10,35/12,35/5,35/5,35/4,40/15,40/4,40/40,40/2,40/10,40/10,40/5,40/3,40/3,40/40,40/4,40/15,40/10,40/2,40/5,45/6,45/6,45/6,45/6,45/6,50/8,50/4,50/50,50/8,50/15,50/7,50/3,50/20,50/25,50/50,50/5,50/50,50/12,50/7,50/4,50/15,50/10,50/8,50/3,50/2,50/20,50/25,50/10,50/8,50/50,50/5,50/5,50/50,50/10,50/10,50/10,50/5,50/5,50/4,50/10,50/50,50/5,50/50,50/8,50/8,50/50,50/50,50/10,50/12,50/2,50/5,55/10,55/10,55/5,55/5,60/3,60/25,60/4,60/60,60/30,60/25,60/6,60/6,60/10,60/5,60/5,60/3,60/10,60/5,60/5,60/10,60/30,60/6,60/5,60/10,60/60,70/10,70/10,75/15,75/3,75/4,75/10,75/20,75/5,75/6,75/8,75/6,75/7,75/75,75/7,75/15,75/25,75/15,75/7,75/3,75/15,75/8,75/30,75/75,75/8,75/8,75/5,75/20,75/75,75/6,75/30,75/15,75/75,75/8,75/25,75/15,75/7,75/75,75/10,75/4,80/5,80/5,90/50,90/50,100/7,100/10,100/10,100/10,100/10,100/10,100/8,100/20,100/10,100/20,100/20,100/5,100/25,100/8,100/5,100/100,100/8,100/20,100/8,100/10,100/20,100/7,100/6,100/50,100/15,100/10,100/2,100/35,100/10,100/10,100/35,100/30,100/100,100/5,100/40,100/35,100/100,100/50,100/35,100/30,100/7,100/10,100/10,100/7,100/25,100/100,100/40,100/5,100/15,100/6,100/7,100/20,100/10,100/2,100/20,105/10,105/20,105/10,105/20,120/15,120/10,120/15,120/15,150/150,150/150,150/75,150/20,150/20,150/20,150/30,150/75,150/25,150/15,150/25,150/150,150/15,150/6,150/6,150/30,150/20,200/15,200/15,200/20,200/10,200/40,200/15,200/40,200/50,200/7,200/20,200/15,200/25,200/20,200/7,200/15,200/7,200/15,200/20,200/20,200/200,200/15,200/50,200/10,200/20,200/20,200/20,200/200,200/20,200/25,200/7,240/15,240/15,250/20,250/50,250/20,250/10,250/10,250/25,250/250,250/250,250/25,250/50,250/25,300/20,300/20,300/30,300/7,300/20,300/300,300/300,300/20,300/30,300/20,300/7,300/10,300/20,300/20,300/30,300/20,300/7,300/7,300/20,300/30,300/30,300/300,300/50,300/300,300/30,300/10,300/300,300/300,300/50,300/300,400/20,400/20,400/25,400/25,450/50,450/50,500/500,500/50,500/50,500/50,500/50,500/500,500/500,500/500,500/35,500/25,500/35,500/25,500/500,600/40,600/40,1000/20,1000/40,1000/20,1000/1000,1000/20,1000/35,1000/1000,1000/20,1000/50,1000/500,1000/50,1000/50,1000/1000,1000/500,1000/1000,1000/35,1000/40'

I also run the same in Oracle SQL where the same completes "immediately"
(with in a few ms).

There seems to be some issue in the implementation of regexp_replace or
regular expressions in general in PostgeSQL...

FYI: I have:

select version();
PostgreSQL 11.9 on aarch64-unknown-linux-gnu, compiled by
aarch64-unknown-linux-gnu-gcc (GCC) 7.4.0, 64-bit

Any comments?

==

string_agg distinct order by

2021-08-19 Thread Markhof, Ingolf
I am looking for something like

string_agg(distinct col_x order by col_y)

Unfortunately, you can either have the distinct, but then the order by
needs to be identical to what's aggregated, or you can have the order be
determined by another column. Not both...

Here is the playground

Given:

create table sites (
  state text,
  city text,
  col_a text,
  col_b text
);

insert into sites values ('Texas','Dallas','green','green');
insert into sites values ('Texas','Houston','green','green');
insert into sites values ('Texas','Austin','yellow','green');
insert into sites values ('Texas','Waco','yellow','yellow');
insert into sites values ('Texas','Midland','red','red');
insert into sites values ('Texas','Amarillo','red','yellow');

For each city, there is a status denoted by colour combination, e.g. 'green
/ green'. This is stored in two different columns in the table.

There is an order in the colours:

create table colours (
  colour text,
  value integer
);

insert into colours values ('red', 1);
insert into colours values ('yellow', 2);
insert into colours values ('green', 3);

So, red first, yellow second, green last.

I want an aggregated view showing for each state the list of existing
status combinations, such as:

with site_status as (
select
  state,
  city,
  col_a || '/' || col_b as status,
  ca.value as val_a,
  cb.value as val_b
from
  sites a
  join colours ca on ca.colour=a.col_a
  join colours cb on cb.colour=a.col_b
)
select
  state,
  string_agg(distinct status,',') as list
from
  site_status
group by
  state
;

This results in:

Texas  green/green,red/red,red/yellow,yellow/green,yellow/yellow

By using distinct in the string_agg, I avoid double entries. Fine. But now,
I want the data ordered. E.g. in the order of the first colour. I SQL, this
could read...

string_agg(distinct status,',' order by val_a) as list

but this doesn't work. I get:

SQL Error [42P10]: ERROR: in an aggregate with DISTINCT, ORDER BY
expressions must appear in argument list

So, I could say:

string_agg(distinct status,',' order by status) as list

but this is not what I want: 'green' would get first, red second, yellow
last...

I could also drop the distinct and say:

string_agg(status,',' order by val_a) as list

This would return the list in correct order, but with double values
('green/green') showing up twice.

I tried to delete the double entries via regexp_replace, but found this
doesn't work for rather long strings (where a single regexp_replace can run
many minutes!)

Any pointers?

Thank you very much for any idea.

==

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio



Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Markhof, Ingolf
Hi Tom,

thank you very much for your reply. Actually, I was assuming all these
regular expressions are based on the same core implementation.
Interestingly, this doesn't seem to be true...

I am also surprised that you say the (\1)+ subpattern is computationally
expensive. Regular expressions are greedy by default. I.e. in case of a*
matching against a string of 1000 a's, the system will not try a, aa, aaa,
... and so on, right? Instead, it will consume all the a's in one go.
Likewise, I was expecting that the system would eat all the repetitions of
a word in one go. Your proposal to use (\1)? instead at the first glance
seems to require more effort, because the same word and it's matching
successor will need to be matched again and again and again. Roughly, 2N
matches are to be done instead of just N.

However, you are perfectly right: When I use (\1)? instead of (\1)+, the
expression is evaluates quickly!

Thank you very much for looking into this and for proposing the alternative
approach which is working fine.

Regards
Ingolf



On Fri, Aug 20, 2021 at 12:42 AM Tom Lane  wrote:

> "Markhof, Ingolf"  writes:
> > BRIEF:
> > regexp_replace(source,pattern,replacement,flags) needs very (!) long to
> > complete or does not complete at all (?!) for big input strings (a few k
> > characters). (Oracle SQL completes the same in a few ms)
>
> Regexps containing backrefs are inherently hard --- every engine has
> strengths and weaknesses.  I doubt it'd be hard to find cases where
> our engine is orders of magnitude faster than Oracle's; but you've
> hit on a case where the opposite is true.
>
> The core of the problem is that it's hard to tell how much of the
> string could be matched by the (,\1)* subpattern.  In principle, *all*
> of the remaining string could be, if it were N repetitions of the
> initial word.  Or it could be N-1 repetitions followed by one other
> word, and so on.  The difficulty is that since our engine guarantees
> to find the longest feasible match, it tries these options from
> longest to shortest.  Usually the actual match (if any) will be pretty
> short, so that you have O(N) wasted work per word, making the runtime
> at least O(N^2).
>
> I think your best bet is to not try to eliminate multiple duplicates
> at a time.  Get rid of one dup at a time, say by
>  str := regexp_replace(str, '([^,]+)(,\1)?($|,)', '\1\3', 'g');
> and repeat till the string doesn't get any shorter.
>
> I did come across a performance bug [1] while poking at this, but
> alas fixing it doesn't move the needle very much for this example.
>
> regards, tom lane
>
> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.postgresql.org_message-2Did_1808998.1629412269-2540sss.pgh.pa.us&d=DwIBAg&c=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ&r=ivZWA-ECVj3XrXBe0obDwKY7Ui7K5Nj9oD2KKWLm0Bw&m=q11bVTHCxVx8BQu2pjn6-3nOY8aN8hORXofVK38HqF8&s=hJxrzmTT6G7AUomoeFgh0IGDO3NcUP4gB9kvYHnt3m0&e=
>

==

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio



Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Markhof, Ingolf
Thank you very much for all your proposals!

Ingolf

==

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio



Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-23 Thread Markhof, Ingolf
Right. Considering a longer sequence of a's, "(a*)\1" allows a wide variety
of matches. But in fact, this is not what I was trying to use. I was more
looking at "(a)\1*" which shall match exactly what "a+" matches. As
matching is greedy, "(a)\1*" shall consume all a's in a sequence in one go,
just like "a+" does...?!

Regards,
Ingolf


On Fri, Aug 20, 2021 at 6:52 PM Tom Lane  wrote:

> "Markhof, Ingolf"  writes:
> > thank you very much for your reply. Actually, I was assuming all these
> > regular expressions are based on the same core implementation.
>
> They are not.  There are at least three fundamentally different
> implementation technologies (DFA, NFA, hybrid).  Friedl's "Mastering
> Regular Expressions" cites multiple different programs using each
> of those, every one of which behaves a bit differently when you start
> poking at corner cases.  And that's just in the open-source world;
> I don't know what Oracle is using, but I bet it ain't open source.
>
> > I am also surprised that you say the (\1)+ subpattern is computationally
> > expensive. Regular expressions are greedy by default. I.e. in case of a*
> > matching against a string of 1000 a's, the system will not try a, aa,
> aaa,
> > ... and so on, right? Instead, it will consume all the a's in one go.
>
> "a*" is easy.  "(a*)\1" is less easy --- if you let the a* consume the
> whole string, you will not get a match, even though one is possible.
> In general, backrefs create a mess in what would otherwise be a pretty
> straightforward concept :-(.
>
> regards, tom lane
>

==

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio



Re: [E] Re: string_agg distinct order by

2021-08-23 Thread Markhof, Ingolf
Tom, I see your point. Which is valid considering there could be any value
in col_y for some value in col_x.  But in my case, col_y is a a function of
col_x, i.e. two rows with the same value in row_x will have the same value
in row_y as well.

Consider, you need to store some length values. Like this:

create table items (
  id text,
  len integer,
  unit text
);

insert into items values (1,1,'mm');
insert into items values (2,5,'mm');
insert into items values (3,5,'mm');
insert into items values (4,1,'cm');
insert into items values (5,1,'cm');
insert into items values (6,1,'m');
insert into items values (7,1,'m');
insert into items values (7,2,'m');
insert into items values (8,2,'m');
insert into items values (9,5,'m');

With the view...

create view vu_items as
select
  id,
  len || unit as descr,
  len*case unit when 'mm' then 1 when 'cm' then 10 when 'm' then 1000 end
as len_mm
from items;

...I now want to have a list of all distinct descr ordered by length. But...

select
  string_agg(descr,',' order by len_mm)
from vu_items;

...creates a list with duplicates, only:

1mm,5mm,5mm,1cm,1cm,1m,1m,2m,2m,5m

And...

select
  string_agg(distinct descr,',' order by descr)
from vu_items;

...gives a list of distinct values, but in the wrong order:

1cm,1m,1mm,2m,5m,5mm

My solution to this (now) is:

select
  regexp_replace(
string_agg(descr,',' order by len_mm),
'([^,]+)(,\1)?($|,)',
'\1\3',
    'g'
  )
from vu_items;

Thx again for your hint in the regexp_replacy issue in my other post...

Regards,
Ingolf




On Thu, Aug 19, 2021 at 6:30 PM Tom Lane  wrote:

> "Markhof, Ingolf"  writes:
> > I am looking for something like
> > string_agg(distinct col_x order by col_y)
>
> > Unfortunately, you can either have the distinct, but then the order by
> > needs to be identical to what's aggregated, or you can have the order be
> > determined by another column. Not both...
>
> The reason for that restriction is that the case you propose is
> ill-defined.  If we combine rows with the same col_x, which row's
> value of col_y is to be used to sort the merged row?  I think once
> you answer that question, a suitable query will suggest itself.
>
> regards, tom lane
>

==

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio



Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-23 Thread Markhof, Ingolf
Argh...

Yes, When I use (\1)? instead of (\1)+, the expression is evaluated
quickly, but it doesn't return what I want. Once a word is written, it is
not subject to matching again. i.e.

select regexp_replace( --> remove double entries
  'one,one,one,two,two,three,three',
  '([^,]+)(,\1)?($|,)',
'\1\3',
'g'
   ) as res;

returns 'one,one,two,three'. The first 'one' is found, followed by a ',one,
so the 'one,one,' is replaced by a 'one,'. It looks like the system is
'reading' from an input string and writing to output string. And the 'one,'
send to the output is not matched again. So the system instead finds just
another 'one,' in the input string and writes another 'one,' to the output
string.

Honestly, this behaviour seems to be incorrect for me. Once the system
replaces the first two 'one,one,' by a single 'one,', I'd expect to match
this replaced one 'one,' with the next 'one,' following, replacing these
two by another, single 'one,', again...

Regards,
Ingolf



Ingolf Markhof 
International Network Product - International Access <
inp-intlacc...@verizon.com>



Sebrathweg 20, 44149 Dortmund, Germany
Office: +49 231 972 1475 | Vnet: 317-1475


On Fri, Aug 20, 2021 at 6:11 PM Markhof, Ingolf <
ingolf.mark...@de.verizon.com> wrote:

> Hi Tom,
>
> thank you very much for your reply. Actually, I was assuming all these
> regular expressions are based on the same core implementation.
> Interestingly, this doesn't seem to be true...
>
> I am also surprised that you say the (\1)+ subpattern is computationally
> expensive. Regular expressions are greedy by default. I.e. in case of a*
> matching against a string of 1000 a's, the system will not try a, aa, aaa,
> ... and so on, right? Instead, it will consume all the a's in one go.
> Likewise, I was expecting that the system would eat all the repetitions of
> a word in one go. Your proposal to use (\1)? instead at the first glance
> seems to require more effort, because the same word and it's matching
> successor will need to be matched again and again and again. Roughly, 2N
> matches are to be done instead of just N.
>
> However, you are perfectly right: When I use (\1)? instead of (\1)+, the
> expression is evaluates quickly!
>
> Thank you very much for looking into this and for proposing the
> alternative approach which is working fine.
>
> Regards
> Ingolf
>
>
>
> On Fri, Aug 20, 2021 at 12:42 AM Tom Lane  wrote:
>
>> "Markhof, Ingolf"  writes:
>> > BRIEF:
>> > regexp_replace(source,pattern,replacement,flags) needs very (!) long to
>> > complete or does not complete at all (?!) for big input strings (a few k
>> > characters). (Oracle SQL completes the same in a few ms)
>>
>> Regexps containing backrefs are inherently hard --- every engine has
>> strengths and weaknesses.  I doubt it'd be hard to find cases where
>> our engine is orders of magnitude faster than Oracle's; but you've
>> hit on a case where the opposite is true.
>>
>> The core of the problem is that it's hard to tell how much of the
>> string could be matched by the (,\1)* subpattern.  In principle, *all*
>> of the remaining string could be, if it were N repetitions of the
>> initial word.  Or it could be N-1 repetitions followed by one other
>> word, and so on.  The difficulty is that since our engine guarantees
>> to find the longest feasible match, it tries these options from
>> longest to shortest.  Usually the actual match (if any) will be pretty
>> short, so that you have O(N) wasted work per word, making the runtime
>> at least O(N^2).
>>
>> I think your best bet is to not try to eliminate multiple duplicates
>> at a time.  Get rid of one dup at a time, say by
>>  str := regexp_replace(str, '([^,]+)(,\1)?($|,)', '\1\3', 'g');
>> and repeat till the string doesn't get any shorter.
>>
>> I did come across a performance bug [1] while poking at this, but
>> alas fixing it doesn't move the needle very much for this example.
>>
>> regards, tom lane
>>
>> [1]
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.postgresql.org_message-2Did_1808998.1629412269-2540sss.pgh.pa.us&d=DwIBAg&c=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ&r=ivZWA-ECVj3XrXBe0obDwKY7Ui7K5Nj9oD2KKWLm0Bw&m=q11bVTHCxVx8BQu2pjn6-3nOY8aN8hORXofVK38HqF8&s=hJxrzmTT6G7AUomoeFgh0IGDO3NcUP4gB9kvYHnt3m0&e=
>>
>


Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-23 Thread Markhof, Ingolf
You are right, I also found the same behaviour when using e.g the UNIX sed
command.

Ingolf


On Mon, Aug 23, 2021 at 4:24 PM Francisco Olarte 
wrote:

> Ingolf:
>
> On Mon, Aug 23, 2021 at 2:39 PM Markhof, Ingolf
>  wrote:
> > Yes, When I use (\1)? instead of (\1)+, the expression is evaluated
> quickly, but it doesn't return what I want. Once a word is written, it is
> not subject to matching again. i.e.
> > select regexp_replace( --> remove double entries
> >   'one,one,one,two,two,three,three',
> >   '([^,]+)(,\1)?($|,)',
> > '\1\3',
> > 'g'
> >) as res;
> >
> ...
> > Honestly, this behaviour seems to be incorrect for me. Once the system
> replaces the first two 'one,one,' by a single 'one,', I'd expect to match
> this replaced one 'one,' with the next 'one,' following, replacing these
> two by another, single 'one,', again...
>
> I think your expectation is misguided. All the regexp engines I've
> used do it this way, when asked to match "g"lobally they do
> non-overlapping matches, they do not substitute and recurse with the
> modified string.
>
> Also, your way opens the door to run-away or infinite loops (
> rr('a','a','aa','g') or rr('a','a','a','g'), not to speak of
> r('x','','','g') ). Even a misguided r(str, '_+','_','g'), used
> sometimes to normalize space runs and similar things, can go into a
> loop.
>
> Francisco Olarte.
>

==

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio