Re: Foreign table performance issue / PostgreSQK vs. ORACLE

2021-02-03 Thread Sebastian Dressler
Hi Ingolf,

On 2. Feb 2021, at 13:05, Markhof, Ingolf 
mailto:ingolf.mark...@de.verizon.com>> wrote:

Hi!

My PostgreSQL version is 11.8.

The query I am running is referring to a number of foreign tables. The first 
one (table1) has to IDs, let's say ID_A and ID_B. While ID_A is unique, ID_B is 
not. In my case, I am pulling formation for a value of IB_B for which about 800 
rows (with unique ID_A) exist. I found:

While

select * from my_view where id_b='some value';

seemingly runs "forever" (I cancelled execution after a few hours), the 
following completes in about 1 hr:

select * from my_view where ia_a in (
   select id_a from table1 where id_b='some value'
);

So, I tried smaller chunks of ID_a and found the execution time is non-linear 
with respect to number of IDs. For e.g. 50 ID_A's, it was completed in about 12 
sec.

[...]

I then found the option fetch_size, e.g. ALTER SERVER some_server OPTIONS 
(fetch_size '5'). A chunk of 50 now executes in 2 seconds (instead of 12 
before).

Thanks for the additional info. I tried to replicate this, you can find the 
GitHub Gist at [1], happy to hear your feedback about it.

What I can see from the execution plans in my example is, that the postgres_fdw 
pushes down that part

Remote SQL: SELECT id_a FROM public.a WHERE ((id_b = 1))

part to the remote. On the remote DB this query can result in either an 
index-only scan, an index scan or a full table scan. Which method is chosen 
depends on table size and indexes. Given the nature of postgres_fdw this will 
be done in any case with a sequential query. For this part I would claim:

- If the planner expects few rows, it will choose an index-lookup which is a 
good thing because it effectively reduces the amount of data that needs to be 
queried. This would make it fast, given this is a sequential scan.
- If the planner expects many rows, it might choose a sequential scan which can 
be slow depending on the overall size of the table and likely whether it is 
cached or not.

So, I found the "size" of the query has a serious impact to the execution time. 
I don't really understand why execution 16*50 takes 16*2 secs only, but 
executing 1*800 takes about 3000 seconds...

The mentioned fetch_size parameter has a positive effect, because one can grab 
many more rows and return them at the same time. Worst case (and this is just 
pure assumption), on each new fetch, the query might be re-executed and thus 
runtime becomes much more.

Further up in the plan, I see

Remote SQL: SELECT id_a, id_b FROM public.a

which is the "SELECT * FROM my_view" part. Meaning, here it will definitely do 
a full table scan on remote since it cannot push down the IN condition. I don't 
really see right now why this query at all is slower than your original form. 
In my experiment it is not, but maybe I am doing something wrong in the schema.

One thought would be however, that the full table scan on the remote is more 
efficient than pushing down the filter and thus it returns faster.

To really figure out more, I would suggest to increase the logging level on 
your remote server in order to see which queries are really executed. Even 
better to maybe use auto_explain to fetch plans and see whether these claims 
apply.

Best,
Sebastian

--


[1]: https://gist.github.com/sdressler/9a93d66b7052dc75ec45c0a4bf5c61de

Sebastian Dressler, Solution Architect, Swarm64
+49 30 994 0496 72 | sebast...@swarm64.com



Re: count(*) vs count(id)

2021-02-03 Thread Karsten Hilbert
Am Wed, Feb 03, 2021 at 01:43:14AM -0500 schrieb Cherio:

> I just ran a few practical tests on large (~14mil rows) tables that have
> multiple indexes.
>
> SELECT COUNT(id) forces PostgreSQL to use the primary key index.
> SELECT COUNT(*) allows PostgreSQL to chose an index to use and it seems to
> be choosing one of smaller size which leads to less IO and hence returns
> the result faster.

Would you mind throwing in a test for

select count(1) ...

?

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




Re: count(*) vs count(id)

2021-02-03 Thread Laurenz Albe
On Wed, 2021-02-03 at 10:54 +0100, Karsten Hilbert wrote:
> > I just ran a few practical tests on large (~14mil rows) tables that have
> > multiple indexes.
> > SELECT COUNT(id) forces PostgreSQL to use the primary key index.
> > SELECT COUNT(*) allows PostgreSQL to chose an index to use and it seems to
> > be choosing one of smaller size which leads to less IO and hence returns
> > the result faster.
> 
> Would you mind throwing in a test for
> 
> select count(1) ...

No need to test.  That will be slightly slower because it repeatedly checks
if 1 is NULL or not.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





UPDATE Syntax

2021-02-03 Thread Ulrich Goebel

Hi,

in a Python Script I build an UPDATE using the syntax:

update tbl set (col1, col2, ...) = (val1, val2, ...) where id = xx

That works as long in the two lists are more then one column and values.
If I have just one column and value, I get an error message:


==
cg=# select id, name, vorname, status from tbl_person;
 id |  name  | vorname | status
++-+
  3 | Goebel | Ulrich  | a
(1 row)

cg=#
cg=#
cg=#
cg=# update tbl_person set status = 'a' where id=3;
UPDATE 1
cg=#
cg=#
cg=#
cg=# update tbl_person set (status) = ('a') where id=3;
ERROR:  source for a multiple-column UPDATE item must be a sub-SELECT or 
ROW() expression

LINE 1: update tbl_person set (status) = ('a') where id=3;
  ^
cg=#
cg=#
cg=#
cg=# update tbl_person set (status, name) = ('a', 'Goebel') where id=3;
UPDATE 1
cg=#
==


In fact in the script I have both cases: one ore more columns.

Is there another way but to code a case differentiation to avoid this
error message? Well, I could use the other syntax:

update tbl set col1=val1, col2=val2, ... where id=xx

but I just like the former...

Thank's
Ulrich


--
Ulrich Goebel
Am Büchel 57, 53173 Bonn




Re: UPDATE Syntax - solved

2021-02-03 Thread Ulrich Goebel

Solved. Use

update tbl set (col1, col2, ...) = ROW(val1, val2, ...) where id=xx

Thank's to Laurenz Albe in the german mailinglist.


Am 03.02.21 um 11:31 schrieb Ulrich Goebel:

Hi,

in a Python Script I build an UPDATE using the syntax:

update tbl set (col1, col2, ...) = (val1, val2, ...) where id = xx

That works as long in the two lists are more then one column and values.
If I have just one column and value, I get an error message:


==
cg=# select id, name, vorname, status from tbl_person;
  id |  name  | vorname | status
++-+
   3 | Goebel | Ulrich  | a
(1 row)

cg=#
cg=#
cg=#
cg=# update tbl_person set status = 'a' where id=3;
UPDATE 1
cg=#
cg=#
cg=#
cg=# update tbl_person set (status) = ('a') where id=3;
ERROR:  source for a multiple-column UPDATE item must be a sub-SELECT or 
ROW() expression

LINE 1: update tbl_person set (status) = ('a') where id=3;
   ^
cg=#
cg=#
cg=#
cg=# update tbl_person set (status, name) = ('a', 'Goebel') where id=3;
UPDATE 1
cg=#
==


In fact in the script I have both cases: one ore more columns.

Is there another way but to code a case differentiation to avoid this
error message? Well, I could use the other syntax:

update tbl set col1=val1, col2=val2, ... where id=xx

but I just like the former...

Thank's
Ulrich




--
Ulrich Goebel
Am Büchel 57, 53173 Bonn




table returning function for each row in other resultset

2021-02-03 Thread Niels Jespersen
Hello all

I have som data in a resultset. E.g: 

id date_begin date_end  amount
1  2021-01-04 2021-02-06  100
2  2021-03-17 2021-05-11  234

I have a table returning function that can take one row and split it into 
constituent monthpieces and distribute amount proportionally. 

select * from func(1, 2021-01-04, 2021-02-06, 100); 

returns 

1,2021-01-04,2021-01-31,84.848485
1,2021-02-01,2021-02-06,18.181818

So far, so good. 

Now, what I want is to run the function on the resultset from above and get 

1,2021-01-04,2021-01-31,84.848485
1,2021-02-01,2021-02-06,18.181818
2,2021-03-17,2021-03-31,63.818182
2,2021-04-01,2021-04-30,127.636364
2,2021-05-01,2021-05-11,46.8

How can I accomplish this, please. 

Regards Niels Jespersen






Re: table returning function for each row in other resultset

2021-02-03 Thread David G. Johnston
On Wed, Feb 3, 2021 at 8:01 AM Niels Jespersen  wrote:

> Hello all
>
> I have som data in a resultset. E.g:
>
> id date_begin date_end  amount
> 1  2021-01-04 2021-02-06  100
> 2  2021-03-17 2021-05-11  234
>
> I have a table returning function that can take one row and split it into
> constituent monthpieces and distribute amount proportionally.
>
> select * from func(1, 2021-01-04, 2021-02-06, 100);
> [...]
> How can I accomplish this, please.
>

Lateral Join.

(not tested)
SELECT *
FROM resultset, func(id, date_begin, date_end, amount);

David J.


Tool for migrating Gupta SQLBase to PostgreSQL

2021-02-03 Thread Marco Lechner
Hi,

we do have an old application that uses Gupta SQLBase as storage backend. While 
developing a new OpenSource-based application to substitute the old one we have 
to migrate the data in the Gupta SQLBase DB into PostgreSQL (because 
PostgreSQL/PostGIS has become our Standard DB-storage backend for good reasons).

Anyone here who knows of a tool to convert or migrate Gupta SQLBase (11.6.x) to 
PostgreSQL?

Happy for any hint.

Best regards
Marco

i.A. Dr. Marco Lechner
Leiter Fachgebiet RN 1 │ Head RN 1

--
Bundesamt für Strahlenschutz │ Federal Office for Radiation Protection
Koordination Notfallschutzsysteme │ Coordination Emergency Systems │ RN 1
Rosastr. 9
D-79098 Freiburg

Tel.: +49 30 18333-6724
Coronakontakt: +49 7661 9098245
E-Mail: mlech...@bfs.de
www.bfs.de

Abonnieren Sie den BfS-Newsletter 
„StrahlenschutzAktuell“
Folgen Sie uns auf Twitter

Informationen zum Datenschutz gemäß Artikel 13 DSGVO finden Sie unter: 
www.bfs.de/datenschutz | Information concerning 
data protection in pursuance of section 13 GDPR can be found at: 
www.bfs.de/privacy-policy

--
Hinweis zu Anhängen die auf .p7m/.p7c/.p7s oder .asc/.asc.sig enden:
Die .p7?- und .asc-Dateien sind ungefährliche Signaturdateien (digitale 
Unterschriften). In E-Mail-Clients mit S/MIME Konfiguration (.p7?) oder 
PGP-Erweiterung (.asc) dienen sie zur:
- Überprüfung des Absenders
- Überprüfung einer evtl. Veränderung des Inhalts während der Übermittlung über 
das Internet
Die Signaturdateien können ebenso dazu verwendet werden dem Absender dieser 
Signatur eine E-Mail mit verschlüsseltem Inhalt zu senden. In E-Mail-Clients 
ohne S/MIME Konfiguration oder PGP-Erweiterung erscheinen die Dateien als 
Anhang und können ignoriert werden.



Re: SV: Npgsql and the Connection Service File

2021-02-03 Thread Dave Cramer
On Mon, 1 Feb 2021 at 04:51, Niels Jespersen  wrote:

> >On Sat, 2021-01-30 at 15:56 +, Niels Jespersen wrote:
> >> It would be nice if Npgsql (and jdbc and others) emulated the libpq
> behaviour.
> >>  Because in my mind, abstracting hostname, portnumber and databasename
> >> away is a really useful feature.
> >>
> >> How do others manage this?
> >
> >Either they don't or they write their own class that does that.
> >
> >Perhaps you should contact Npgsql and suggest such a feature.
> >
> Yes, thank you.
> I did just that. It turns out there is already an existing pull request
> with exactly that feature. I hope for it to be included in a release soon.
> Regards Niels Jespersen
>
>
Ya, I intend to push one in sooner than later.

Dave


> >Yours,
> >Laurenz Albe
> >--
> >Cybertec | https://www.cybertec-postgresql.com
> >
>
>


Database upgrade Pg11 to Pg12 issues

2021-02-03 Thread Simon Windsor

Hi

I have upgraded many Pg databases from 9 to 10 to 11 using the Ubuntu 
pg_upgradecluster command (wrapper to pg_update) without issues, however 
today I upgraded a DB from 11 to 12.


Using the syntax

*/pg_upgradecluser -k -m upgrade 11 main/*

using the latest 12.5 binaries all appeared to work well, and for a 330G 
DB this took 2 minutes. On starting the DB using


systemctl start postgresql@12.service

the DB started, but the screen showed the default directory was 
*//var/lib/postgresql/11/main/*. Checking the open database with


/*show data_directory; */gave /*/var/lib/postgresql/12/main*/

Then using /*lsof | grep '11/main'*/ gave no files, but/*lsof | grep 
'12/main'*/ gave over 30K files, Checking the directory sizes suggests 
that 11/main is 300G and 12/main is 40G


So all appears OK, with files hard linked between 11/main and 12/main, 
and I assume the issue relates to the PGDATA/postgresql.auto.conf file 
being just copied from the original 11/main with data_directory being 
set to /var/lib/postgresql/11/main.


If I where to run /*pg_dropcluster 11 main*/ to remove the old database 
and conf files, will this destroy my running Pg12 database with hard 
linked files in 11/main and 12/main? In theory it shouldn't and files 
from 11/main and 12/main will all be under 12/main, I just have doubts.


Is there an easy way to tidy this up?

Simon


--
Simon Windsor

Eml: simon.wind...@cornfield.me.uk


“There is nothing in the world that some man cannot make a little worse and 
sell a little cheaper, and he who considers price only is that man's lawful 
prey.”



Re: Database upgrade Pg11 to Pg12 issues

2021-02-03 Thread Bruce Momjian
On Wed, Feb  3, 2021 at 11:25:11PM +, Simon Windsor wrote:
> Hi
> 
> I have upgraded many Pg databases from 9 to 10 to 11 using the Ubuntu
> pg_upgradecluster command (wrapper to pg_update) without issues, however today
> I upgraded a DB from 11 to 12.
> 
> Using the syntax
> 
> pg_upgradecluser -k -m upgrade 11 main
> 
> using the latest 12.5 binaries all appeared to work well, and for a 330G DB
> this took 2 minutes. On starting the DB using
> 
> systemctl start postgresql@12.service
> 
> the DB started, but the screen showed the default directory was /var/lib/
> postgresql/11/main. Checking the open database with
> 
> show data_directory; gave /var/lib/postgresql/12/main
> 
> Then using lsof | grep '11/main' gave no files, but lsof | grep '12/main' gave
> over 30K files, Checking the directory sizes suggests that 11/main is 300G and
> 12/main is 40G

This doesn't make sense to me.  Since we hard-linked, why would 12 be so
much smaller?  If it was symlinks, I could imaging that, but it doesn't
use symlinks, just hard links, so it should be similar.  Please look at
the size of main/base on both, since that is where your data is.  Maybe
11 just has a lot of old WAL that isn't copied.

> So all appears OK, with files hard linked between 11/main and 12/main, and I
> assume the issue relates to the PGDATA/postgresql.auto.conf file being just
> copied from the original 11/main with data_directory being set to /var/lib/
> postgresql/11/main.
> 
> If I where to run pg_dropcluster 11 main to remove the old database and conf
> files, will this destroy my running Pg12 database with hard linked files in 
> 11/
> main and 12/main? In theory it shouldn't and files from 11/main and 12/main
> will all be under 12/main, I just have doubts.
> 
> Is there an easy way to tidy this up?

Yes, if the sizes made sense, removing 11 would be fine, but right now,
it doesn't sound right.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Database upgrade Pg11 to Pg12 issues

2021-02-03 Thread Craig McIlwee
(replying to the entire list instead of Bruce only this time...)


> This doesn't make sense to me.  Since we hard-linked, why would 12 be so
> much smaller?  If it was symlinks, I could imaging that, but it doesn't
> use symlinks, just hard links, so it should be similar.  Please look at
> the size of main/base on both, since that is where your data is.  Maybe
> 11 just has a lot of old WAL that isn't copied.
>
>
It depends on how the directory sizes were measured. "du" won't count the
same file more than once, so if space was measured using one of the
following commands:

du -hs /var/lib/postgresql/
du -hs /var/lib/postgresql/11/main /var/lib/postgresql/12/main

Then you would see a large value for the 11 directory and then the 12
directory would only show the files that weren't already counted when
measuring the 11 directory.  Running du on each version's directory
separately might give results that are more in line with expectations.

Craig


Re: Database upgrade Pg11 to Pg12 issues

2021-02-03 Thread Bruce Momjian
On Wed, Feb  3, 2021 at 10:07:03PM -0500, Craig McIlwee wrote:
> 
> (replying to the entire list instead of Bruce only this time...)
> 
> 
> 
> This doesn't make sense to me.  Since we hard-linked, why would 12 be so
> much smaller?  If it was symlinks, I could imaging that, but it doesn't
> use symlinks, just hard links, so it should be similar.  Please look at
> the size of main/base on both, since that is where your data is.  Maybe
> 11 just has a lot of old WAL that isn't copied.
> 
> 
> 
> It depends on how the directory sizes were measured. "du" won't count the same
> file more than once, so if space was measured using one of the following
> commands:
> 
> du -hs /var/lib/postgresql/
> du -hs /var/lib/postgresql/11/main /var/lib/postgresql/12/main 
> 
> Then you would see a large value for the 11 directory and then the 12 
> directory
> would only show the files that weren't already counted when measuring the 11
> directory.  Running du on each version's directory separately might give
> results that are more in line with expectations.

Yes, I was assuming the du commands were run separately on each
directory.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





SV: SV: Npgsql and the Connection Service File

2021-02-03 Thread Niels Jespersen
Fra: Dave Cramer 

>>> It would be nice if Npgsql (and jdbc and others) emulated the libpq 
>>> behaviour.
>>>  Because in my mind, abstracting hostname, portnumber and databasename
>>> away is a really useful feature.
>>>
>>> How do others manage this?
>>
>>Either they don't or they write their own class that does that.
>>
>>Perhaps you should contact Npgsql and suggest such a feature.
>>
>Yes, thank you.
>I did just that. It turns out there is already an existing pull request with 
>exactly that feature. I hope for it to be included in a release soon.
>Regards Niels Jespersen
>
>Ya, I intend to push one in sooner than later.
>
>Dave
>

Thank you for that. Looking very much forward to it.

Niels
   >


pg_trgm for address search

2021-02-03 Thread Sumit Raja
Hello,

I  am trying to get a functioning postgres address search capability for
Australian addresses using tsearch or pg_trgm. pg_trgm is actually better
suited in this case as it allows progressive entry e.g. ' 1,  20 Kle' gives
a good set of results across Kelm Avenue, Kleins Av etc.

Good performance with tsearch (<1s response times) for non progressive
matches but the pg_trgm performance varies from 3.5 to 15 seconds.

Table is very simple:

CREATE TABLE address_search.tsearch_address_detail (
  address_detail_pid character varying(15) NOT NULL,
  address_state_abbreviation character varying(3) NOT NULL,
  address_concat text not null,
  address_concat_ts tsvector
);

CREATE INDEX idx_places_trgm_gin_addr ON
address_search.tsearch_address_detail USING gin(address_concat
gin_trgm_ops);

CREATE INDEX idx_places_trgm_gist_addr ON
address_search.tsearch_address_detail USING gist(address_concat
gist_trgm_ops);

CREATE INDEX idx_places_ts_gin_addr ON
address_search.tsearch_address_detail USING GIN (address_concat_ts);

Actual data looks like the below:

 address_detail_pid | address_state_abbreviation |
address_concat  |
address_concat_ts
++-+---
 GAWA_163274127 | WA | Unit 1,  20 Klem Avenue,
Salter Point, WA 6152  | '1':2 '20':3
'6152':9 'avenu':5 'klem':4 'point':7 'salter':6 'unit':1 'wa':8
 GAWA_163274129 | WA | Unit 2,  20 Klem Avenue,
Salter Point, WA 6152  | '2':2 '20':3
'6152':9 'avenu':5 'klem':4 'point':7 'salter':6 'unit':1 'wa':8
 GAWA_163274130 | WA | Unit 2,  3 Klem Avenue,
Salter Point, WA 6152   | '2':2 '3':3
'6152':9 'avenu':5 'klem':4 'point':7 'salter':6 'unit':1 'wa':8
 GAWA_163274255 | WA | 11 Lancefield Street,
Laverton, WA 6440 | '11':1 '6440':6
'lancefield':2 'laverton':4 'street':3 'wa':5
 GAWA_163274256 | WA | 13 Lancefield Street,
Laverton, WA 6440 | '13':1 '6440':6
'lancefield':2 'laverton':4 'street':3 'wa':5

and query is:

SELECT address_detail_pid, address_concat, word_similarity('1, 20 kle',
address_concat) AS sml
FROM address_search.tsearch_address_detail
WHERE '1, 20 kle' <% address_concat
ORDER BY sml DESC limit 10;

The explain (analyze, buffer) is (https://explain.depesz.com/s/tvZ9):
--
 Limit  (cost=52551.77..52551.79 rows=10 width=65) (actual
time=3119.791..3119.793 rows=10 loops=1)
   Buffers: shared hit=6432 read=4564
   ->  Sort  (cost=52551.77..52590.50 rows=15492 width=65) (actual
time=3119.788..3119.789 rows=10 loops=1)
 Sort Key: (word_similarity('1, 20 kle'::text, address_concat)) DESC
 Sort Method: top-N heapsort  Memory: 26kB
 Buffers: shared hit=6432 read=4564
 ->  Bitmap Heap Scan on tsearch_address_detail
 (cost=252.06..52216.99 rows=15492 width=65) (actual time=547.976..3119.067
rows=394 loops=1)
   Recheck Cond: ('1, 20 kle'::text <% address_concat)
   Rows Removed by Index Recheck: 3791
   Heap Blocks: exact=3991
   Buffers: shared hit=6429 read=4564
   ->  Bitmap Index Scan on idx_places_trgm_gin_addr
 (cost=0.00..248.19 rows=15492 width=0) (actual time=547.380..547.380
rows=4185 loops=1)
 Index Cond: ('1, 20 kle'::text <% address_concat)
 Buffers: shared hit=3991 read=3011
 Planning Time: 44.701 ms
 Execution Time: 3120.052 ms
(16 rows)

Table size is 3026MB and GIN index size is 293 MB.

I've increased shared_buffers to 800MB, work_mem=1500MB,
effective_cache_size=2GB.

Are there any optimisations I can make or should I be building the data set
differently for better searching by pg_trgm? Or using a combination of
tsearch and pg_trgm?

Thanks

Sumit


SV: table returning function for each row in other resultset

2021-02-03 Thread Niels Jespersen
Fra: David G. Johnston 
Sendt: 3. februar 2021 16:08

>On Wed, Feb 3, 2021 at 8:01 AM Niels Jespersen  wrote:
>Hello all
>
>I have som data in a resultset. E.g:
>
>id date_begin date_end  amount
>1  2021-01-04 2021-02-06  100
>2  2021-03-17 2021-05-11  234
>
>I have a table returning function that can take one row and split it into 
>constituent monthpieces and distribute amount proportionally.
>
>select * from func(1, 2021-01-04, 2021-02-06, 100);
>[...]
>How can I accomplish this, please.
>
>Lateral Join.
>
>(not tested)
>SELECT *
>FROM resultset, func(id, date_begin, date_end, amount);
>
>David J.

Of course, yes. It works. My Oracle background isn't very helpful when it comes 
to including lateral joins in my thinking.

with res(id, date_begin, date_end, amount) as (select *
   from (values (1::bigint, 
'2021-01-04'::date, '2021-02-06'::date, 100::numeric),
(2::bigint, 
'2021-03-17'::date, '2021-05-11'::date, 234::numeric)) a)
select r.id, m.date_start, m.date_end, m.amount
from res r,
 month_parts_low_freq(r.id, r.date_begin, r.date_end, r.amount) m;

produces the expected result. Now I will see how it goes with about 150 million 
rows in input.

>