ProjectSet

2018-08-02 Thread Oliver Kohll
Hi,

I've just been running explain analyze on a slow query and came across
something new to me - a node type of ProjectSet in the plan. What is that?

The plan is here: https://explain.depesz.com/s/4rqA

The query does a cross join of two tables to get every row in one combined
with every row in the other, then runs the similarity function to compare
text in them. The full query is below, apologies for the different
obfuscation to the plan, if that's an issue let me know.

Is there an explanation of ProjectSet anywhere? I can't see one with a
quick google or search of these archives.

Cheers
Oliver

---

SELECT table1.id,
similarity(table2.field1::text, regexp_matches(table1.field3::text,
'product.ame:s*([^-]*)'::text, 'g'::text)::character
varying(10)::text)::double precision AS similarityscore,
table1.ourid_g3a11eruac8ct55b,
regexp_matches(table1.field3::text, 'product.ame:s*([^-]*)'::text,
'g'::text)::character varying(10) AS products,
table2.field1,
table2.field2,
table2.abaccount
   FROM table1,
table2
  WHERE lower(table2.statusofingredient::text) < '6'::text AND
lower(table2.statusofproduct::text) < '5'::text AND table1.fsacreated >=
(date_trunc('day'::text, now()) - '30 days'::interval) AND
lower(table2.bought::text) = 'bought'::text
  ORDER BY table1.id DESC NULLS LAST;


Re: ProjectSet

2018-08-02 Thread Oliver Kohll
Ah thanks David, at least I know what it is now. I don't think I have any
set returning functions though will double check in case there's a joined
view that has one. Perhaps it could be the something to do with cross
product which similarly creates multiple rows on the right for each row on
the left side.

In any case, after deleting some obsolete rows, the plan seems to have
changed as the query now executes in a fraction of a second as opposed to
25 seconds.

Oliver

On 2 August 2018 at 13:21:32, David Rowley (david.row...@2ndquadrant.com)
wrote:

On 2 August 2018 at 21:17, Oliver Kohll  wrote:
> Is there an explanation of ProjectSet anywhere?

Plan node types and what they each do are not very well documented
outside of the source code.

ProjectSet appears when the SELECT or ORDER BY clause of the query.
They basically just execute the set-returning function(s) for each
tuple until none of the functions return any more records.

Simple Example:

EXPLAIN SELECT generate_series(1,2);
QUERY PLAN
-
ProjectSet (cost=0.00..5.02 rows=1000 width=4)
-> Result (cost=0.00..0.01 rows=1 width=0)
(2 rows)

-- 
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: ProjectSet

2018-08-02 Thread Oliver Kohll
Of course! Doh.

On 2 August 2018 at 14:27:02, David Rowley (david.row...@2ndquadrant.com)
wrote:

postgres=# select proretset from pg_proc where proname = 'regexp_matches';
proretset
---
t
t
(2 rows)


Interconnected views

2023-06-02 Thread Oliver Kohll
Hi,

Just wondering, does anyone else create apps which might not have 'big'
data, but quite complex arrangements of views joining to each other?

If so, do you have scripts to aid refactoring them e.g. drop/recreate/test
them in the right order etc.?

I'm really impressed with the way Postgres handles these multi-layered
views. Some of our explain analyze outputs could fill a book each! Tools
like Depesz' can be very useful. Sometimes a little tweaking or a judicious
index is necessary, but the performance is very reliable and scalable.

Blog post about it here:
https://blog.agilebase.co.uk/2023/05/21/refactoring-sql-views/

Oliver

-- 
See us at the Bath Digital Festival , 12th July


Re: Is there any good optimization solution to improve the query efficiency?

2023-06-05 Thread Oliver Kohll
On Mon, 5 Jun 2023 at 07:56, gzh  wrote:

> Hi everyone,
> I'm running into some performance issues with my SQL query.
> The following SQL query is taking a long time to execute.
>
> Execution Plan:
> explain analyse
> select * from TBL_RES
> left outer join(select T_CUST.RSNO RSNO2 ,
> T_CUST.KNO ,
> T_CUST.AGE ,
> T_CUST.GST
> from TBL_CUST T_CUST ,
> (select T_CUST.RSNO ,
> T_CUST.KNO ,
> MIN(T_CUST.GSTSEQ) GSTSEQ
> from TBL_CUST T_CUST ,
> TBL_POV T_POV ,
> TBL_RES T_RES
> where T_CUST.STSFLG = 'T'
> and T_CUST.DISPSEQ <> 
> AND T_CUST.KFIX = '0'
> and T_POV.CRSNO = T_RES.CRSNO
> and T_RES.RSNO = T_CUST.RSNO
> group by T_CUST.RSNO , T_CUST.KNO) T_POV2
> where T_POV2.RSNO = T_CUST.RSNO
> and T_POV2.KNO = T_CUST.KNO
> and T_POV2.GSTSEQ = T_CUST.GSTSEQ) T_POV3 on TBL_RES.RSNO = T_POV3.RSNO2
> and TBL_RES.KNO = T_POV3.KNO
> where TBL_RES.CID >= to_date('2022/07/01', '/MM/DD')
> and TBL_RES.CID <= to_date('2022/07/31', '/MM/DD')
> and TBL_RES.COD >= to_date('2022/07/01', '/MM/DD')
> and TBL_RES.COD <= to_date('2022/07/31', '/MM/DD')
> - Execution Plan -
> Nested Loop Left Join  (cost=254388.44..452544.70 rows=473 width=3545)
> (actual time=3077.312..996048.714 rows=15123 loops=1)
>   Join Filter: ((TBL_RES.RSNO = T_CUST.RSNO) AND ((TBL_RES.KNO)::text =
> (T_CUST.KNO)::text))
>   Rows Removed by Join Filter: 4992268642
>   ->  Gather  (cost=1000.00..58424.35 rows=473 width=3489) (actual
> time=0.684..14.158 rows=15123 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> ->  Parallel Seq Scan on TBL_RES  (cost=0.00..57377.05 rows=197
> width=3489) (actual time=0.096..279.504 rows=5041 loops=3)
>   Filter: ((CID >= to_date('2022/07/01'::text,
> '/MM/DD'::text)) AND (CID <= to_date('2022/07/31'::text,
> '/MM/DD'::text)) AND (COD >= to_date('2022/07/01'::text,
> '/MM/DD'::text)) AND (COD <= to_date('2022/07/31'::text,
> '/MM/DD'::text)))
>   Rows Removed by Filter: 161714
>   ->  Materialize  (cost=253388.44..394112.08 rows=1 width=56) (actual
> time=0.081..26.426 rows=330111 loops=15123)
> ->  Hash Join  (cost=253388.44..394112.07 rows=1 width=56) (actual
> time=1197.484..2954.084 rows=330111 loops=1)
>   Hash Cond: ((T_CUST.RSNO = T_CUST_1.RSNO) AND
> ((T_CUST.KNO)::text = (T_CUST_1.KNO)::text) AND (T_CUST.gstseq =
> (min(T_CUST_1.gstseq
>   ->  Seq Scan on TBL_CUST T_CUST  (cost=0.00..79431.15
> rows=2000315 width=61) (actual time=0.015..561.005 rows=2000752 loops=1)
>   ->  Hash  (cost=246230.90..246230.90 rows=262488 width=50)
> (actual time=1197.025..1209.957 rows=330111 loops=1)
> Buckets: 65536  Batches: 8  Memory Usage: 2773kB
> ->  Finalize GroupAggregate
> (cost=205244.84..243606.02 rows=262488 width=50) (actual
> time=788.552..1116.074 rows=330111 loops=1)
>   Group Key: T_CUST_1.RSNO, T_CUST_1.KNO
>   ->  Gather Merge  (cost=205244.84..238964.80
> rows=268846 width=50) (actual time=788.547..982.479 rows=330111 loops=1)
> Workers Planned: 2
> Workers Launched: 1
> ->  Partial GroupAggregate
> (cost=204244.81..206933.27 rows=134423 width=50) (actual
> time=784.032..900.979 rows=165056 loops=2)
>   Group Key: T_CUST_1.RSNO,
> T_CUST_1.KNO
>   ->  Sort  (cost=204244.81..204580.87
> rows=134423 width=23) (actual time=784.019..833.791 rows=165061 loops=2)
> Sort Key: T_CUST_1.RSNO,
> T_CUST_1.KNO
> Sort Method: external merge
> Disk: 5480kB
> Worker 0:  Sort Method:
> external merge  Disk: 5520kB
> ->  Parallel Hash Join
> (cost=111758.80..190036.38 rows=134423 width=23) (actual
> time=645.302..716.247 rows=165061 loops=2)
>   Hash Cond:
> (T_CUST_1.RSNO = T_RES.RSNO)
>   ->  Parallel Seq Scan on
> TBL_CUST T_CUST_1  (cost=0.00..74013.63 rows=204760 width=23) (actual
> time=0.018..264.390 rows=165058 loops=2)
> Filter: ((dispseq
> <> ''::numeric) AND ((stsflg)::text = 'T'::text) AND ((KFIX)::text =
> '0'::text))
> Rows Removed by
> Filter: 835318
>   ->  Parallel Hash
> (cost=109508.52..109508.52 rows=137142 width=8) (actual
> time=343.593..343.896 rows=165058 loops=2)
> Buckets: 131072
> Batches: 8  Memory Usage: 3008kB
> ->  Parallel Hash
> Join  (cost=51834.70..109508.52 rows

Setting up replication

2021-05-26 Thread Oliver Kohll
Hi,

We currently have an app with the database on the same server as the app
itself. I'd like to transition to a system where

1) in the short term, the db replicates to a different server. This will
allow us to take the daily pg_dump backups from the replica rather than the
primary server. They're currently slowing down the system too much as they
run.

2) in the medium term, switch the replica to be the primary and connect to
that from the app, i.e. app and db will be on separate servers, letting us
resource each appropriately. A 3rd server can then be used to replicate to
for backup purposes.

3) in the long run, depending on demand that also gives us the option of
scaling the db horizontally e.g. with a distributed db like Citus.

Are there any suggestions / good walkthroughs of how to do number 1? There
are many options!

All I know so far is we can probably use streaming replication as I can
make sure the PostgreSQL versions on each server are the same.

One thing I'm wondering is how often should a base backup be taken? Also
should we set up everything manually with scripts or use a 3rd party backup
tool like barman?

Any suggestions appreciated.

Oliver


Re: Setting up replication

2021-05-26 Thread Oliver Kohll
That is helpful, thanks Vijay.

I will wade in and give it a go. For some reason I had it in my head that
it was a good idea to run pg_basebackup frequently, e.g. once a day, but it
looks like it's only necessary once for the initial transfer to the replica.

Oliver

On Wed, 26 May 2021 at 20:37, Vijaykumar Jain <
vijaykumarjain.git...@gmail.com> wrote:

> core ref:
> PostgreSQL: Documentation: 13: Part III. Server Administration
> <https://www.postgresql.org/docs/13/admin.html>
> although this is a lot verbose, but you would keep coming back to this to
> tune your setup.
>
>
> to understand basic setups. some are
> How to Set Up Streaming Replication in PostgreSQL 12 - Percona Database
> Performance Blog
> <https://www.percona.com/blog/2019/10/11/how-to-set-up-streaming-replication-in-postgresql-12/>
> How to setup Postgres 13 WAL streaming replication on Ubuntu 18.04 -
> Highgo Software Inc.
> <https://www.highgo.ca/2021/02/03/how-to-setup-postgres-13-wal-streaming-replication-on-ubuntu-18-04/>
>
> some other references.
> dhamaniasad/awesome-postgres: A curated list of awesome PostgreSQL
> software, libraries, tools and resources, inspired by awesome-mysql
> (github.com) <https://github.com/dhamaniasad/awesome-postgres>
>
> a typical setup
>
> Primary ---streaming replication --->  (Replica1, Replica2 )
>
> Primary - writes
> replica R1,R2  - reads ( depending on load can be put behind load
> balancer like haproxy and connection pooler pgbouncer)
> Scaling PostgreSQL using Connection Poolers and Load Balancers for an
> Enterprise Grade environment - Percona Database Performance Blog
> <https://www.percona.com/blog/2018/10/02/scaling-postgresql-using-connection-poolers-and-load-balancers-for-an-enterprise-grade-environment/>
> https://tinyurl.com/f2zk76yc  (EDB link, but the link is too big)
>
>
> backups:
> vm snapshots ( optional )
> physical disk backups. ( optional )
> pg_dumpall from replica and save it to external storage daily. (PostgreSQL:
> Documentation: 13: pg_dumpall
> <https://www.postgresql.org/docs/13/app-pg-dumpall.html>)
> barman (point in time recovery, can configure to save 7 days of WALs for
> point in time recovery ) on external server. (Barman Manual (pgbarman.org)
> <http://docs.pgbarman.org/release/2.12/>)
> Implement backup with Barman. This tutorial is part of a multipage… | by
> Sylvain | coderbunker | Medium
> <https://medium.com/coderbunker/implement-backup-with-barman-bb0b44af71f9>
>
> Ideally, i would always go with core docs, as many tutorials get stale,
> but i just mention to help get started quickly and then come back to core
> docs.
>
> Things can get more complex (or simpler) if you go with auto failover
> solutions
> pg_auto_failover
> patroni
> enterprise solutions from EDB, cruncy etc .
>
> this channel on youtube is pretty neat too. Scaling Postgres - YouTube
> <https://www.youtube.com/channel/UCnfO7IhkmJu_azn0WbIcV9A>
>
> I am not sure my reply  is making it helpful or making it too loud for
> simple setups.
> anyways :)
>
>
> On Wed, 26 May 2021 at 23:28, Oliver Kohll  wrote:
>
>> Hi,
>>
>> We currently have an app with the database on the same server as the app
>> itself. I'd like to transition to a system where
>>
>> 1) in the short term, the db replicates to a different server. This will
>> allow us to take the daily pg_dump backups from the replica rather than the
>> primary server. They're currently slowing down the system too much as they
>> run.
>>
>> 2) in the medium term, switch the replica to be the primary and connect
>> to that from the app, i.e. app and db will be on separate servers, letting
>> us resource each appropriately. A 3rd server can then be used to replicate
>> to for backup purposes.
>>
>> 3) in the long run, depending on demand that also gives us the option of
>> scaling the db horizontally e.g. with a distributed db like Citus.
>>
>> Are there any suggestions / good walkthroughs of how to do number 1?
>> There are many options!
>>
>> All I know so far is we can probably use streaming replication as I can
>> make sure the PostgreSQL versions on each server are the same.
>>
>> One thing I'm wondering is how often should a base backup be taken? Also
>> should we set up everything manually with scripts or use a 3rd party backup
>> tool like barman?
>>
>> Any suggestions appreciated.
>>
>> Oliver
>>
>
>
> --
> Thanks,
> Vijay
> Mumbai, India
>


replace inside regexp_replace

2021-06-21 Thread Oliver Kohll
Hi,

I have some text

'here is [[my text]] to replace and [[some more]]'

which I want to transform to

'here is my_text to replace and some_more'

i.e. wherever there are double square brackets, remove them and replace
spaces in the contents with underscores.

My attempt to do that is the regex

select regexp_replace(
'here is [[my text]] to replace and [[some more]]',
E'\\[\\[(.*?)\\]\\]',
replace(E'\\1', ' ', '_'),
'g'
);

which results in

'here is my text to replace and some more'

It half works, i.e. it removes the brackets but doesn't seem to process the
inner replace. It's as if the select were just

select regexp_replace(
'here is [[my text]] to replace and [[some more]]',
E'\\[\\[(.*?)\\]\\]',
E'\\1',
'g'
);

I've a feeling I'm missing something fundamental, any idea what?

Thanks
Oliver


Re: replace inside regexp_replace

2021-06-22 Thread Oliver Kohll
On Mon, 21 Jun 2021 at 15:09, Francisco Olarte 
wrote:

> Oliver:
>
> On Mon, Jun 21, 2021 at 3:27 PM Oliver Kohll 
> wrote:
> ...
> > My attempt to do that is the regex
> > select regexp_replace(
> > 'here is [[my text]] to replace and [[some more]]',
> > E'\\[\\[(.*?)\\]\\]',
> > replace(E'\\1', ' ', '_'),
> > 'g'
> > );
> > which results in
> > 'here is my text to replace and some more'
>
> > It half works, i.e. it removes the brackets but doesn't seem to process
> the inner replace. It's as if the select were just
> > select regexp_replace(
> > 'here is [[my text]] to replace and [[some more]]',
> > E'\\[\\[(.*?)\\]\\]',
> > E'\\1',
> > 'g'
> > );
>
> > I've a feeling I'm missing something fundamental, any idea what?
>
> You are assuming replace will magically work in a way it does not. The
> inner replace is evaluated first:
>
> > select replace(E'\\1', ' ', '_');
>  replace
> -
>  \1
>
> and it's result is passed as 3rd argument to the outer replace, so
> both select are equivalent.
>
> What you want to do can be done in some languages passing a closure,
> or a function, to their replace function, or with special forms ( like
> the e modifier in perl s/// ), but I'm not sure it can be done.
>
> On languages with basic regex support, like I think SQL is, you
> normally have to either split the string in match/no match or do a
> multiple match ( match something like (.*?)\[\[(.*?)\]\]  with two
> captures ) and loop in the result aplying your second replacement (
> which is what perl does behind the scenes, and other languages do )
>
> In perl you can do it with something like:
>
> $ perl -pe 's{\[\[(.*?)\]\]}{ $1=~s/ /_/gr}eg'
> here is [[my text]] to replace and [[some more]]',
> here is my_text to replace and some_more',
>
> But note the magic e there.
>
> In python you can use the function form:
>
> re.sub(pattern, repl, string, count=0, flags=0)
>
> Return the string obtained by replacing ..repl can be a string or
> a function; if it is a string,
> If repl is a function, it is called for every non-overlapping
> occurrence of pattern. The function takes a single match object
> argument, and returns the replacement string.
>
> An so on on other languages, but in sql
>
> regexp_replace ( string text, pattern text, replacement text [, flags
> text ] ) → text
>
> The replacement is a plain text ( and AFAIK you cannot use functions
> as values in sql ).
>
> You could probably define your function doing that if you have any PL
> installed in your DB.
>
> Francisco Olarte.
>

Right, thanks, I have a better understanding now. The calling app is
written in Java so I will write a routine there to do it instead.

Cheers
Oliver


Incremental Materialized Views

2021-08-23 Thread Oliver Kohll
Hi,

Just wondering if anyone knows which release (if any) this is targeted for?

https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

Asking because this could make a massive difference to some of our
workload, even when limited to relatively simple queries. It's quite
exciting.

Regards
Oliver
www.agilebase.co.uk


Re: Incremental Materialized Views

2021-08-23 Thread Oliver Kohll
On Mon, 23 Aug 2021 at 11:20, Marc  wrote:

> On 23 Aug 2021, at 11:55, Oliver Kohll wrote:
>
> > Hi,
> >
> > Just wondering if anyone knows which release (if any) this is targeted
> for?
> >
> > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance
> >
> > Asking because this could make a massive difference to some of our
> > workload, even when limited to relatively simple queries. It's quite
> > exciting.
> >
> > Regards
> > Oliver
> > www.agilebase.co.uk
>
> Oliver,
>
> According to this info maybe version 15
>
> https://commitfest.postgresql.org/23/2138/
>
> Regards,
>
>
> Marc
>

[thumbsup]


Re: DataDirect PostgreSQL

2021-08-25 Thread Oliver Kohll
On Wed, 25 Aug 2021 at 15:16, jagadeeshwaraiah, Govindu <
govindu.jagadeeshwara...@dsm.com> wrote:

> Hello Team,
>
> Good day!!!
>
>
>
> We are using source as PostgreSQL with Power BI reports. We have
> established connection and created test dashboard in the PowerBI report.
> But we have installed *trial for Progress DataDirect PostgreSQL*.
>
> So now we need to use full version of DataDirect PostgreSQL.
>
> Could you please give us the information about full version of DataDirect
> PostgreSQL. Will this purchase and use the s/w or what is the process.
>
>
>
> Thanks,
>
> Jag
>

I didn't even know there was a commercial ODBC driver for PostgreSQL, I've
always used the open source one which can be downloaded here:

https://odbc.postgresql.org/

However if you do want to purchase the commercial driver, you'll probably
need to use the Contact Us or How To Buy links here:
https://www.progress.com/odbc/postgresql

Oliver