from:"Thomas Kellerer"

Re: PG 10 logical replication version dependency?

2018-03-02 Thread Thomas Kellerer

Colin 't Hart schrieb am 02.03.2018 um 09:23:
> Is there a version dependency when using logical replication? Do both
> sides have to be running the same major version? Or is it a wire
> protocol that will be backwards compatible in future releases?
> 
> I sincerely hope it's the latter so that users of this aren't forced
> to upgrade all of their interconnected servers at the same time.

I think the only restriction is that connected servers need to be Postgres 10 
or higher. 

So you should be able to replicate from Postgres 10 to Postgres 11 once the 
next version is released.

query_to_xml() returns invalid XML when query returns no rows

2018-03-12 Thread Thomas Kellerer

I am not sure if this qualifies as a bug: 

query_to_xml() returns an empty XML document when the query returns no rows, 
e.g:

   select query_to_xml('select 42 where false', false, true, '');

The problem with this is, that if the resulting XML is then fed into e.g. the 
xpath() function, that function fails because the "empty" document is an 
invalid XML:

So the following query:

select xpath('/row/col/text()', query_to_xml('select 42 as col where 
false', false, true, ''));

fails with "ERROR:  could not parse XML document"

I would have expected query_to_xml() to return NULL if the query returns no 
rows, rather than an invalid XML document.

Note that IS DOCUMENT is false for the empty XML returned, so it can be trapped 
in a query. 
But if query_to_xml() is nested somehow (like the above) this is really 
complicated to apply and I find it a bit surprising that query_to_xml() returns 
invalid XML at all

Tested with 10.2 on Windows 10

Thomas

Re: query_to_xml() returns invalid XML when query returns no rows

2018-03-13 Thread Thomas Kellerer


Peter Eisentraut schrieb am 12.03.2018 um 23:31:

I am not sure if this qualifies as a bug:

query_to_xml() returns an empty XML document when the query returns no rows, 
e.g:

select query_to_xml('select 42 where false', false, true, '');

The problem with this is, that if the resulting XML is then fed
into e.g. the xpath() function, that function fails because the
"empty" document is an invalid XML:


That's because you have the tableforest argument set to true.  If you
want a proper XML document, then you should write

 select query_to_xml('select 42 where false', false, false, '');



Hmm, that indeed works.
I didn't want the extra level introduced by the  tag, that's why I used 
tableforest = true.
But that's easier to deal with than the check for an invalid document

I still think it's incorrect to return an empty (=invalid) XML instead of a 
NULL value though.

Regards
Thomas

Re: Hello all, I wanted to install any postgresql database for my development

2018-03-27 Thread Thomas Kellerer

sanjeev kumar schrieb am 28.03.2018 um 08:01:
> Please suggest me any verion of PostgreSQL Database for my development 
> testing are improving my skills.

If you are starting with Postgres the best choice is to use the current version 
which is 10.

If you are using Linux, please follow the instructions here: 
https://www.postgresql.org/download/ to add the Postgres repository to your 
Linux system, so that you can use your package manager to install it. 

If you are using Windows, you can download an installer from here: 
https://www.enterprisedb.com/downloads/postgres-postgresql-downloads

Thomas

Re: dealing with lock

2018-04-06 Thread Thomas Kellerer


Adrian Klaver schrieb am 07.04.2018 um 00:02:

Is there a way to identify the list of statements that have to rewrite the 
table.


https://www.postgresql.org/docs/10/static/sql-altertable.html

Notes

"Adding a column with a DEFAULT clause or changing the type of an existing column will require the entire table and its indexes to be rewritten. 

Note that this will change with Postgres 11

https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/

Re: Postgresql with JDK

2018-04-17 Thread Thomas Kellerer

vaibhav zaveri schrieb am 17.04.2018 um 08:47:
> Hi, 
> 
> Yes that is the link. 
> But is JDK 1.8 supported by PostgreSQL?? 
> 

Postgres itself does not need or "support" Java. 

Only the JDBC driver needs that which is a client-side technology.
And yes, the JDBC driver does support Java 8, which is clearly stated on the 
download page:

https://jdbc.postgresql.org/download.html

It supports PostgreSQL 8.2 or newer and requires Java 6 or newer. It contains 
support for SSL and the javax.sql package.

If you are using Java 8 or newer then you should use the JDBC 4.2 version.
If you are using Java 7 then you should use the JDBC 4.1 version.

Thomas

Re: How to fetch data from tables in PostgreSQL

2018-04-18 Thread Thomas Kellerer

vaibhav zaveri schrieb am 18.04.2018 um 09:22:
> How to fetch data from tables in PostgreSQL.

That is done using the SELECT statement: 
https://www.postgresql.org/docs/current/static/sql-select.html

For an introduction on how to write queries, you might want to have a look at 
the Postgres tutorial:

https://www.postgresql.org/docs/current/static/tutorial-select.html

Thomas

Re: postgres with graph model

2018-04-19 Thread Thomas Kellerer


Philipp Kraus schrieb am 19.04.2018 um 16:53:

I’m using in a project Postgresql and PostGIS for a geospatial data
model, but now I need also a graph in this structure, so my question
is, is there any existing extension for Postgres to build a graph. I
found ltree but this is for tree structures only, not for graphs. In
general I have different undirected weighted graphs. So I need some
routing algorithms based on the graph weights, distance calculation
between nodes. My first idea was to use a graph database e.g neo4j,
but I have got a limitation, that the whole system should be designed
in Postgres. 
Did you can give some ideas to build a graph within Postgres?


There is a Postgres fork that claims to combine the graph and relational world

http://bitnine.net/agensgraph

Re: How to find the hits on the databases and tables in Postgres

2018-05-04 Thread Thomas Kellerer


nikhil raj schrieb am 04.05.2018 um 18:45:

But here is the issue is here we have 200+ databases and 5 servers so
cannot manually runs this command all ways is there any 3rd party
tool for that which would give me the hits on DB and tables in it  so
that it would be so help full for me.

now currently prepared a scripts for that which goes and does for DB
butr after that i have much manually work with it so is there any
third party tool for that



Have a look at

* http://dalibo.github.io/powa/
* http://opm.io/

Re: How to reply to an existing bug?

2018-05-27 Thread Thomas Kellerer

Erwin Brandstetter schrieb am 28.05.2018 um 02:00:
> I found an existing bug report and have something to add to it.
> 
> What's the best way to reply to it? Just using a browser, with no newsreader 
> installed.
> 
> This one:
> https://www.postgresql.org/message-id/flat/20170925084522.1442.32786%40wrigleys.postgresql.org#20170925084522.1442.32...@wrigleys.postgresql.org
> 
> I want to add a related question on stackoverflow.com 
> 
> https://stackoverflow.com/questions/50553558/postgresql-add-serial-column-if-not-exists-still-creating-sequences/50557433
> 
> And that the same bug has been carried over to IDENTITY columns.
> 
> https://dbfiddle.uk/?rdbms=postgres_10&fiddle=ac978b262727fa842aa1a71349a23767

I think that warrants a new bug report (maybe with a reference to the previous 
one).

binaries for 11 beta compiled with --with-llvm?

2018-05-28 Thread Thomas Kellerer

Are there any plans to provide a binary download for the Postgres 11 beta with 
JITting enabled? 

Currently I can't find any packages under https://www.postgresql.org/download/ 
or http://www.enterprisedb.com/products-services-training/pgbindownload

The only one I found was from BigSQL 
https://www.openscg.com/bigsql/package-manager/

But neither the Linux binaries nor the Windows binaries were compiled with the 
--with-llvm option 
(will JITting be possible with Windows at all?)

Thanks
Thomas

Re: binaries for 11 beta compiled with --with-llvm?

2018-05-28 Thread Thomas Kellerer

Andres Freund schrieb am 29.05.2018 um 08:28:
>> But neither the Linux binaries nor the Windows binaries were compiled with 
>> the --with-llvm option 
>> (will JITting be possible with Windows at all?)
> 
> Not in 11.

I assumed that ;) 
Not a real problem.

But what about Linux binaries with JITting enabled? 
I would like to test some of our DWH queries to see if that improves performance

Or do I need to compile Postgres 11 myself to get that? 

Thomas

Postgres 11 beta - no JITing

2018-06-05 Thread Thomas Kellerer

Hello,

I am trying to assess the benefits of the new JIT feature in Postgres 11. 
However I can't figure out how to enable it. 

I have a test server with "CentOS Linux release 7.5.1804"

I installed the beta using:

  yum install 
https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7.4-x86_64/postgresql11-server-11.0-beta1_1PGDG.rhel7.x86_64.rpm
  yum install postgresql11
  yum install postgresql11-server

Then initdb and everything was up and running. 

I checked the installation:

  "show jit_provider" gives me: "llvmjit"
  "show jit" gives me: "on"

pg_config reports that the beta was built with LLVM support. 

I see --with-llvm in the output as well as 
'LLVM_CONFIG=/usr/lib64/llvm5.0/bin/llvm-config'

But on my system there is no /usr/lib64/llvm5.0

So I ran "yum install llvm" which gave me: "llvm-libs-3.4.2-8.el7.x86_64" and 
"llvm-3.4.2-8.el7.x86_64" but still no /usr/lib64/llvm5.0)
And installing clang, gave me: clang.x86_64 3.4.2-8.el7 

The cost of my test query is around 25 - so even with the default 
configuration I would expect JITing to jump in. 

As it didn't happen, I set jit_optimize_above_cost and jit_above_cost to 10, 
just to see what happens - still no JITing

I have uploaded one sample plan: https://explain.depesz.com/s/aVj2

Is there anything I need to install or enable in addition to what I have 
described? 

I am not really experienced with managing a Linux system, so if I did some 
blatantly wrong things, please bare with me ;)

Thanks
Thomas

Re: Postgres 11 beta - no JITing

2018-06-05 Thread Thomas Kellerer

Andres Freund schrieb am 05.06.2018 um 14:34:
>> I am trying to assess the benefits of the new JIT feature in Postgres 11. 
>> However I can't figure out how to enable it. 
>>
>> I have a test server with "CentOS Linux release 7.5.1804"
>>
>> I installed the beta using:
>>
>>   yum install 
>> https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7.4-x86_64/postgresql11-server-11.0-beta1_1PGDG.rhel7.x86_64.rpm
>>   yum install postgresql11
>>   yum install postgresql11-server
> 
> I suspect the issue is that you also need the postgresql11-llvmjit
> package. The LLVM support adds dependencies, so it's not included in the
> main package.

Ah, that was it, thanks. 

(It took me a while to figure out how to also install llvm-5.0 for Centos 7, 
but I finally managed)

Using distinct in an aggregate prevents parallel execution?

2018-06-06 Thread Thomas Kellerer

Consider this simplified example:

select c.id, 
   count(*) as total_orders,
   sum(p.price) as total_value
from customer c
  join orders o ON c.id = o.customer_id 
  join order_line ol ON o.id = ol.order_id 
  join product p ON ol.product_id = p.id
group by c.id;

This uses parallel execution quite nicely: https://explain.depesz.com/s/aSPNn

However, the query is incorrect as it does not count the number of orders, but 
(essentially) the number of order_lines.

This can easily be fixed using:

select c.id, 
   count(distinct o.id) as total_orders,
   sum(p.price) as total_value
from customer c
  join orders o ON c.id = o.customer_id 
  join order_line ol ON o.id = ol.order_id 
  join product p ON ol.product_id = p.id
group by c.id;  

But in that case Postgres 10.4 decides to no longer use parallel execution: 
https://explain.depesz.com/s/7Ua3

Which increases the query execution time quite a bit (from 3 to 8 seconds). 

Is this a known limitation? 

Thomas

Re: Using distinct in an aggregate prevents parallel execution?

2018-06-06 Thread Thomas Kellerer


Tom Lane schrieb am 06.06.2018 um 16:32:

Thomas Kellerer  writes:

Is this a known limitation?


Yes, unless somebody has done radical restructuring of the aggregation
code while I wasn't looking.

agg(DISTINCT ...) is currently implemented inside the Agg plan node,
so it's an indivisible black box to everything else.  That was a
simple, minimum-code-footprint method for implementing the feature
back when; but it's got lots of drawbacks, and one is that there's
no reasonable way to parallelize.

I'd anticipate that before we could even start to think of parallelizing,
we'd have to split out the distinct-ification processing into a separate
plan node.

agg(... ORDER BY ...) has got the same problem, and it'd likely be
advisable to fix that at the same time.


Thansk for the explanation.

Re: Catching unique_violation exception on specific column/index

2018-06-11 Thread Thomas Kellerer

Alexey Dokuchaev schrieb am 11.06.2018 um 12:10:
> I have a table with several UNIQUE and CHECK constraints.  One of these
> UNIQUE constraints actually *can* be violated -- not on the table level,
> of course, but on the application level -- meaning, if the entry with
> particular foo_key is already in there, do not throw an exception, just
> silently do nothing.
> 
> The usual approach ("EXCEPTION WHEN unique_violation THEN ... END") does
> not really cut it because I want to catch unique_violation only when it
> happens on "foo_key", and still rightfully complain on others.  However,
> there is no "EXCEPTION WHEN unique_violation ON (foo_key)" or something.
> Is there a way to do this without using triggers and in a less ugly way
> than the code below?
> 
> IF SQLERRM = 'duplicate key value violates unique constraint' ||
>   ' "foo_key"' THEN
> RAISE NOTICE '%: %', SQLSTATE, SQLERRM;
> ELSE
> RAISE EXCEPTION '%: %', SQLSTATE, SQLERRM;
> END IF;
> 
> ./danfe
 
What's wrong  with: 

  INSERT ...
  ON CONFLICT (foo_key) DO NOTHING

Re: Catching unique_violation exception on specific column/index

2018-06-11 Thread Thomas Kellerer

Alexey Dokuchaev schrieb am 11.06.2018 um 12:58:
>>> I have a table with several UNIQUE and CHECK constraints.  One of these
>>> UNIQUE constraints actually *can* be violated -- not on the table level,
>>> of course, but on the application level -- meaning, if the entry with
>>> particular foo_key is already in there, do not throw an exception, just
>>> silently do nothing.
>>  
>> What's wrong with:
>>
>>   INSERT ...
>>   ON CONFLICT (foo_key) DO NOTHING
> 
> Nothing I guess, except that it is available since 9.5 (right?), and I try
> to stay compatible with 9.3.  Sorry for not saying this in the first place.

Andreas already mentioned that 9.3 will be EOL soon (3 months from now), but 
the performance and efficiency for concurrent execution of ON CONFLICT is 
much better than anything you can implement yourself. 

If that functionality is an important part of your code, you should consider 
upgrading to 10 (or 9.6 if your are really conservative) rather sooner
than later.

Re: First query on each connection is too slow

2018-06-13 Thread Thomas Kellerer

Vadim Nevorotin schrieb am 13.06.2018 um 11:55:
> I have a very strange problem. I'm using PostgreSQL 9.6 with PostGIS
> 2.3 (both from Debian Strecth repos) to store DB for OSM server (but
> actually it doesn't matter). And I've noticed, that on each new
> connection to DB first query is much slower (10x) than all others.
> E.g.:
> 
> If I run others instances of psql in parallel, when the first is
> active - they has absolutely the same problem. In one instance of
> psql query is fast (if it's not the first query), in others - first
> query is slow, but all others is fast.

Do you have many(!) tables and schemas? 

Caching of schema information is done per connection, so maybe the 
lookup of the table in the system catalogs is what is taking so much time.

But I think you would need thousands of schemas with thousands of tables 
each schema in order to notice an impact there.

Re: PostgreSQL Volume Question

2018-06-20 Thread Thomas Kellerer

Data Ace schrieb am 15.06.2018 um 18:26:
> Well I think my question is somewhat away from my intention cause of
> my poor understanding and questioning :(
> 
> Actually, I have 1TB data and have hardware spec enough to handle
> this amount of data, but the problem is that it needs too many join
> operations and the analysis process is going too slow right now.
> 
> I've searched and found that graph model nicely fits for network data
> like social data in query performance.
> 
> Should I change my DB (I mean my DB for analysis)? or do I need some
> other solutions or any extension?


AgensGraph is a Postgres fork implemententing a graph database supporting 
Cypher as the query language while at the same time still supporting SQL 
(and even queries mixing both)

I have never used it, but maybe it's worth a try.

http://bitnine.net/agensgraph/

Thomas

Plan output: actual execution time not considering loops?

2018-06-20 Thread Thomas Kellerer

Consider the following execution plan:

Seq Scan on orders o1  (cost=0.00..18818840.86 rows=3500 width=16) (actual 
time=0.033..8625.104 rows=99906 loops=1)
  Filter: (amount = (SubPlan 1))
  Rows Removed by Filter: 600094
  Buffers: shared hit=7719778
  SubPlan 1
    ->  Aggregate  (cost=26.87..26.87 rows=1 width=32) (actual 
time=0.012..0.012 rows=1 loops=70)
  Buffers: shared hit=7714631
  ->  Bitmap Heap Scan on orders o2  (cost=3.45..26.85 rows=8 width=8) 
(actual time=0.004..0.008 rows=8 loops=70)
    Recheck Cond: (customer_id = o1.customer_id)
    Heap Blocks: exact=5597311
    Buffers: shared hit=7714631
    ->  Bitmap Index Scan on orders_customer_id_order_date_idx  
(cost=0.00..3.45 rows=8 width=0) (actual time=0.003..0.003 rows=8 loops=70)
  Index Cond: (customer_id = o1.customer_id)
  Buffers: shared hit=2117320
Planning time: 0.136 ms
Execution time: 8628.724 ms

My expectation would have been that the "Aggregate" step shows the actual time 
as a product of the number of loops.

The Bitmap Heap Scan takes 0.008ms for each execution, so shouldn't the "actual 
time" for the "Aggregate" step be 0.008ms * 70 (= 5600ms)?

The plan was generated using Postgres 10.4 (on Windows 10, but I think that is 
irrelevant)

Thomas

Re: Plan output: actual execution time not considering loops?

2018-06-20 Thread Thomas Kellerer

Tom Lane schrieb am 20.06.2018 um 16:03:
>> Consider the following execution plan:
>> ...
>>     ->  Aggregate  (cost=26.87..26.87 rows=1 width=32) (actual 
>> time=0.012..0.012 rows=1 loops=70)
>>   ->  Bitmap Heap Scan on orders o2  (cost=3.45..26.85 rows=8 
>> width=8) (actual time=0.004..0.008 rows=8 loops=70)
>>     ->  Bitmap Index Scan on orders_customer_id_order_date_idx  
>> (cost=0.00..3.45 rows=8 width=0) (actual time=0.003..0.003 rows=8 
>> loops=70)
> 
>> My expectation would have been that the "Aggregate" step shows the actual 
>> time as a product of the number of loops.
> 
> No, that looks fine to me.  The rule of thumb for reading this is total
> time spent in/below this node is "actual time" times "number of loops".

OK, if that is the rule I can live with that ;)

> It seems a bit odd that the Agg node would account for a third of the
> total execution time when it's only processing 8 rows on average ...
> but maybe it's a really expensive aggregate.

But it's processing those 8 rows 700.000 times - so the total time seems 
correct. 

FWIW, the query looks like this:

select customer_id, 
   amount, 
   sales_person_id
from orders o1
where amount = (select max(o2.amount)
from orders o2
where o2.customer_id = o1.customer_id);

It's not a real world query - it's just there to illustrate the drawbacks of 
co-related sub-queries.
 
> Another thought is that the EXPLAIN ANALYZE instrumentation itself
> can account for significant per-node-invocation overhead.  If the
> total execution time drops significantly when you add "timing off"
> to the EXPLAIN options, then that's probably a factor in making
> the Agg node look relatively expensive.

"timing off" doesn't really change the execution time (it's about 60ms faster 
without)

Thanks for the answer, I am not really concerned about the query performance 
itself, just about the plan ;) 

Thomas

Re: Schema/Data conversion opensource tools from MySQL to PostgreSQL

2018-06-25 Thread Thomas Kellerer


chiru r schrieb am 25.06.2018 um 18:47:

Please suggest Schema/Data conversion opensource tools from MySQL to PostgreSQL.


ora2pg also supports MySQL: https://github.com/darold/ora2pg

Re: CTE optimization fence

2018-06-26 Thread Thomas Kellerer

Tom Lane schrieb am 27.06.2018 um 05:48:
>> I see there was some discussion last year about removing the CTE 
>> optimization fence (e.g. 
>> http://www.postgresql-archive.org/CTE-inlining-td5958992.html) but can't 
>> find anything more recent. Does anyone know if this is still under 
>> consideration?
> 
> but we have to settle on a way of controlling it.

+1 from me. 

I am running more and more into situations where people consider this a bug 
rather than a feature.

FWIW, I think a GUC that switches between the current (mostly unwanted, at 
least surprising) 
way and one where the CTE is optimized together with the main query would suit 
"most" people.

For sake of compatibility this could default to the current behaviour

Re: Windows 10 got stuck with PostgreSQL at starting up. Adding delay lets it avoid.

2018-06-28 Thread Thomas Kellerer

TAKATSUKA Haruka schrieb am 29.06.2018 um 08:03:
> I got a trouble in PostgreSQL 9.3.x on Windows 10.
> I would like to add new delay code as an official build option.
> 
> Windows 10 sometime (approximately once in 300 tries) hung up 
> at OS starting up. The logs say it happened while the PostgreSQL 
> service was starting. When OS stopped, some postgres auxiliary 
> process were started and some were not started yet. 
> 
> The Windows dump say some threads of the postgres auxiliary process
> are waiting OS level locks and the logon processes’thread are
> also waiting a lock. MS help desk said that PostgreSQL’s OS level 
> deadlock caused OS freeze. I think it is strange story. But, 
> in fact, it not happened in repeated tests when I got rid of 
> PostgreSQL from the initial auto-starting services.
> 
> I tweaked PostgreSQL 9.3.x (the newest from the repository) to add 
> 0.5 or 3.0 seconds delay after each sub process starts. 
> And then the hung up was gone. This test patch is attached. 
> It is only implemented for Windows. Also, I did not use existing 
> pg_usleep because it contains locking codes (e.g. WaitForSingleObject
> and Enter/LeaveCriticalSection).
> 
> Although Windows OS may have some problems, I think we should have
> a means to avoid it. Can PostgreSQL be accepted such delay codes
> as build-time options by preprocessor variables?

Did you try setting the service to "delayed start"?

When exactly is a TIMESTAMPTZ converted to the sessions time zone?

2018-07-03 Thread Thomas Kellerer


A recent discussion around timestamptz behaviour has lead me to question my own 
understanding on how a TIMESTAMPTZ is converted to the session's time zone.

I assumed this conversion happens *on the server* before the value is sent to 
the client.

A co-worker of mine claims that this is purely a client side thing, and that the server 
will always send the "plain" UTC value that is stored in a timestamptz column.

The manual is ambiguous - at least to me

   When a timestamp with time zone value is output, is always converted from UTC
   to the current timezone zone, and displayed as local time in that zone

Does "is output" refer to the output on the client (after receiving a UTC 
value) or the sending of a converted value?

In this email: 
https://www.postgresql.org/message-id/19896.1152889217%40sss.pgh.pa.us

Tom Lane states:

   They are converted to local time in the zone specified by the timezone
   configuration parameter before being displayed to the client

but the "before being displayed" could also refer to a conversion on the server.

Could someone enlighten me, please?

Thomas

Re: How to set array element to null value

2018-07-09 Thread Thomas Kellerer

Brahmam Eswar schrieb am 09.07.2018 um 11:58:
> I'm trying to reset array element to null. but 3rd line of below snippet is 
> giving the compilation error.
> 
> 
> FOR indx_1 IN array_lower(X, 1)..array_upper(X, 1) LOOP
> IF X[indx_1].REFERENCE_VALUE = 'ABC' THEN
> X[indx_1].REFERENCE_VALUE:='';
> END IF;
> END LOOP;

What data type is X exactly? It looks like a composite record type. 

(Also: an empty string '' is not the same as NULL)

Re: Create event triger

2018-07-10 Thread Thomas Kellerer

Łukasz Jarych schrieb am 09.07.2018 um 13:03:
> i have small database and i am tracking changes using trigger:
> 
> CREATE TRIGGER t BEFORE INSERT OR UPDATE OR DELETE ON t_trig
> 
> FOR EACH ROW EXECUTE PROCEDURE change_trigger();
> 
> It is possible to create general trigger for all tables?
> Like event trigger?
> 
> It would be very helpful for me. 
> Now i have to set up this trigger on each table.

Maybe pgaudit is a better solution? 

https://www.pgaudit.org/

Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Thomas Kellerer

Ravi Krishna schrieb am 10.07.2018 um 16:08:
> 
> We recently did a test on COPY and found that on large tables (47 million 
> rows , 20GB of raw data) the 
> difference in COPY with 16 indexes and COPY without any index is 1:14. That 
> is, COPY is 14 times slower 
> when data is ingested with all indexes as opposed to COPY first without index 
> and then create all index.
> 
> I googled for earlier posting on this and it looks like this has been asked 
> before too.  
> 
> This is what I am thinking to do:
> 
> 1 - Extract index definition and save it as a SQL somewhere, either a file or 
> a table.
> 2 - Drop all indexes.
> 3 - Ingest data via COPY
> 4 - Recreate all indexes saved in (1).
> 
> Is there a generic sql or script or tool to accomplish (1).

You can do this with a script like this:

-- First store the definitions of all the indexes in a table:

create table index_backup 
as
select *
from pg_indexes
where schemaname = 'public'
  and tablename = 'the_table';

-- now drop all the indexes:
do
$$
declare
  l_rec record;
begin
  for l_rec in select schemaname, indexname from index_backup
  loop
execute format('drop index %I.%I', l_rec.schemaname, l_rec.indexname);
  end loop;
end;
$$

-- now insert the data

...

-- and restore all indexes

do
$$
declare
  l_rec record;
begin
  for l_rec in select indexdef from index_backup
  loop
 execute l_rec.indexdef;
  end loop;
end;
$$

Re: Open Source tool to deploy/promote PostgreSQL DDL

2018-07-11 Thread Thomas Kellerer

Hustler DBA schrieb am 11.07.2018 um 00:13:
> A client of mine is looking for an open source tool to
> deploy and promote PostgreSQL DDL changes through database
> environments as part of SDLC. What tools (open source) does the
> community members use? I normally use scripts, but they want
> something open source.
We are using Liquibase (with the XML format) and that has served as well.

Re: Do we need yet another IDE (SQL development assistant) for PostgreSQL?

2018-07-16 Thread Thomas Kellerer

Tim Clarke schrieb am 16.07.2018 um 11:52:
> +1 for not re-inventing the wheel - building on Netbeans or the Eclipse
> project would save you heaps of time and effort and provide
> cross-platform out of the box. I use Eclipse all the time.

The NetBeans platform (and Eclipse as well) is based on Java however. 

But Dmitry stated that he is using C++ so that won't really help.

Re: User documentation vs Official Docs

2018-07-16 Thread Thomas Kellerer


Joshua D. Drake schrieb am 16.07.2018 um 22:32:

-general.

Over the last year as I have visited many meetups and interacted with
people at conferences etc... There are three prevailing issues that
continue to come up in contributing to the community. This email is
about one of them. Where is the "user" documentation? The official
documentation is awesome, if you know what you are doing. It is not
particularly useful for HOWTO style docs. There is some user
documentation in the wiki but let's be honest, writing a
blog/article/howto in a wiki is a pain in the butt.

What does the community think about a community run, community
organized, sub project for USER documentation? This type of
documentation would be things like, "10 steps to configure
replication", "Dumb simple Postgres backups",  "5 things to NEVER do
with Postgres". I imagine we would sort it by version (9.6/10.0
etc...) as well as break it down via type (Administration, Tuning,
Gotchas) etc...



What about: https://en.wikibooks.org/wiki/PostgreSQL

A bit confused about "pgsql_tmp" vs "temp tablespace"

2018-07-18 Thread Thomas Kellerer

In the chapter "Database File layout" the pgsql_tmp is explained as follows:

   Temporary files (for operations such as sorting more data than can fit in 
memory) 
   are created within PGDATA/base/pgsql_tmp, or within a pgsql_tmp subdirectory 
of 
   a tablespace directory

However the documentation for "temp_tablespaces" states: 

   Temporary files for purposes such as sorting large data sets are also 
created 
   in these tablespaces.


How do these two things related to each other? 

Does this mean that if I do not explicitly create a dedicated "temp tablespace" 
then the pgsql_tmp subdirectory is used. 
But _if_ I do create a temp tablespace (by creating one, and adding it to 
temp_tablespaces) then the sorting is done *there*? 

So far I thought that a temp tablespace is only used for temporary tables (and 
indexes on them) but that paragraph in the 
temp_tablespaces documentation seems to indicate otherwise. 

Background: we are setting up a new server that has a regular (large) SSD and 
very fast NVMe SSD (which is too small to hold all tables). 
So we would like to put anything that is "temporary" onto the NVMe drive. 

But I'm unsure if that is better done through a symlink for pgsql_tmp or a temp 
tablespace. 
Currently no temporary tables are used (but that might change in the future), 
so only intermediate results (e.g. CTEs, sorting etc) would wind up there.

Re: A bit confused about "pgsql_tmp" vs "temp tablespace"

2018-07-18 Thread Thomas Kellerer

Adrian Klaver schrieb am 18.07.2018 um 15:06:
>> In the chapter "Database File layout" the pgsql_tmp is explained as follows:
>>
>>     Temporary files (for operations such as sorting more data than can fit 
>> in memory)
>>     are created within PGDATA/base/pgsql_tmp, or within a pgsql_tmp 
>> subdirectory of
>>     a tablespace directory
>>
>> However the documentation for "temp_tablespaces" states:
>>
>>     Temporary files for purposes such as sorting large data sets are also 
>> created
>>     in these tablespaces.
>>
>>
>> How do these two things related to each other?
>>
>> Does this mean that if I do not explicitly create a dedicated "temp 
>> tablespace" then the pgsql_tmp subdirectory is used.
>> But _if_ I do create a temp tablespace (by creating one, and adding it to 
>> temp_tablespaces) then the sorting is done *there*?
> 
> Yes, for those objects that do not have a tablespace specified in their 
> CREATE statement.
>
>>
>> So far I thought that a temp tablespace is only used for temporary tables 
>> (and indexes on them) but that paragraph in the
>> temp_tablespaces documentation seems to indicate otherwise.
> 
> The Database File Layout section you quoted above says the same
> thing. Basically setting temp_tablespaces just overrides where temp
> objects and operation files are placed when a tablespace is not
> specified in their creation.

Thanks.

I understand the relation between explicitly CREATEd objects and the temp 
tablespace(s).

But what about the (temp) space needed for e.g. sorting, grouping or 
intermediate results from CTEs or derived tables? 
Is that also controlled through the temp_tablespaces?

Re: How to split an array into columns

2018-08-24 Thread Thomas Kellerer

a schrieb am 24.08.2018 um 11:01:
> Say if I have an float8 array:
> 
> id| data
> --|---
> a | {1,2}
> b | {2,4}
> 
> If I could using query to make it looks like this:
> 
> id| data[1] | data[2]
> --|--|---
> a |  1  | 2
> b |  2  | 4
> 
> Since I would have around 200,000 rows, I would prefer it having
> enough capacity to carry out the calculation such as sum().

Maybe I am missing something, but: 

   select id, data[1], data[2]
   from the_table;

will work just fine.

Arrays, casting and "constrained" data types

2018-08-24 Thread Thomas Kellerer

I stumbled across the following:

Consider the following (simplified) table:

  create table test
  (
val numeric(20,0),
ref_val numeric(20,0)
  );
 
and the following very simple recursive CTE:

with recursive tree as (
  select val, array[val] as path
  from test
  union all
  select child.val, parent.path||child.val 
  from test child
join tree parent on parent.val = child.ref_val
)
select *
from tree;

The above fails with: recursive query "tree" column 2 has type numeric(20,0)[] 
in non-recursive term but type numeric[] overall

However, when casting the array in the non-recursive part, it still doesn't 
work:

with recursive tree as (
  select val, array[val]::numeric[] as path
  from test
  union all
  select child.val, parent.path||child.val 
  from test child
join tree parent on parent.val = child.ref_val
)
select *
from tree;

same error as before. Neither does array[val::numeric] work. 

However, appending the column to an empty array works: 

with recursive tree as (
  select val, array[]::numeric[] || val as path
  from test
  union all
  select child.val, parent.path||child.val 
  from test child
join tree parent on parent.val = child.ref_val
)
select *
from tree;


My question is: why isn't "array[val]::numeric[]" enough to create a numeric[] 
array in the non-recursive part? 

I have seen the same problem with "varchar(x)" 

Thomas

Re: PostgreSQL intenal scheduler?

2018-09-05 Thread Thomas Kellerer

Thiemo Kellner schrieb am 05.09.2018 um 17:06:
> I am designing a framework for historisation implementation (SCD).
> One feature I would like to provide is a table in that the actual
> state of an entity is put and if this is complete, this history table
> is "updated":
> 
>    -
> ==>  | ENTITY_ACT |  ==>  | ENTITY_HIST |
>    -
> 

> I plan to use instead-of-triggers on the hist table that read the
> actual table and perfoms all necessary inserts und updates on the
> history table. If I want the termination of a record version
> (actually the record of a specific business key with a specific
> payload) to get propagated up and/or down referential integrities (no
> overlapping validities) I have to make sure that only one of those
> processes is modifying a table. I was thinking of a scheduler queue
> where the trigger would put a process request and PostgreSQL would
> work through. Is there a scheduler within PostgreSQL? I read the
> documentation and searched the web but could not find a hint. But
> before going another road or implementing something myself, I ask.
> Maybe this design is no good at all.


There is no built-in scheduler, but there is an extension that supplies that

https://github.com/citusdata/pg_cron

Re: A Timeseries Case Study: InfluxDB VS PostgreSQL

2018-09-07 Thread Thomas Kellerer

Fabio Pardi schrieb am 07.09.2018 um 10:07:
> Hi,
> I recently published a blog article reporting a small research I made on
> the usage of InfluxDB and PostgreSQL for time series, together with
> Grafana on my specific use case.
> 
> I think that some of you might find it interesting, maybe inspiring or
> perhaps it can trigger some interesting discussion, given the high level
> of expertise of people in this mailing list.
> 
> I reached out to InfluxDB guys, but after an initial ack, I did not hear
> from them any longer.
> 
> https://portavita.github.io/2018-07-31-blog_influxdb_vs_postgresql
> 
> All comments, critics, suggestions and corrections are very welcome

Did you ever look at Timescale?  https://www.timescale.com/

It's implemented as a Postgres extension.

Would be interesting to compare with a "native" time series database

Re: COPY threads

2018-10-11 Thread Thomas Kellerer


Rob Sargent schrieb am 10.10.2018 um 00:45:> Can anyone here tell me whether or 
not the CopyManager facility in

JDBC via org.postgresql:postgresql:42.1.4 is internally
multithreaded? Running on CentOS 7 (all participants), java8,
postgres 10.5


An alternative to creating your own multi-threaded importer, might be to use 
pgLoader which supports that out of the box:

https://pgloader.readthedocs.io/en/latest/pgloader.html#a-note-about-parallelism

I have never used it though, but heard good things about it.

Re: Regarding varchar max length in postgres

2018-10-15 Thread Thomas Kellerer

Durgamahesh Manne schrieb am 15.10.2018 um 11:05:
> On 10/5/18 8:18 AM, Durgamahesh Manne wrote:
> > Hi
> >
> > please let me know the max length of varchar  & text in postgres
> 
> https://www.postgresql.org/docs/10/static/datatype-character.html
> 
>  Hi 
> 
> Thank you for this information
> 
> as per the postgresql documentation   please ignore incomplete message i 
> mailed to respected community members 
> 
>  If |character varying |is used without length specifier, the type 
> accepts strings of any size 
> 
> but varchar does not accept more than this 10485760 value
> 
> 
>    create table test(id serial primary key, str varchar(10485761));
> 
>      ERROR: length for type varchar cannot exceed 10485760 

Further down on that page the overall limit is documented: 

In any case, the longest possible character string that can be stored is 
about 1 GB

So the part that you quoted implicitly means "accepts strings of any size  - up 
to the maximum of 1GB"
Maybe it makes sense to make that clearer at that point.

Regards
Thomas

Re: Regarding varchar max length in postgres

2018-10-15 Thread Thomas Kellerer

Durgamahesh Manne schrieb am 15.10.2018 um 11:18:
> was there any specific reason that you have given max length for varchar is 
> limited to 10485760 value?
> 
> why you have not given max length for varchar is unlimited like text datatype 
> ?
> 
> |character varying(/|n|/)|, |varchar(/|n|/)|variable-length with limit 
> |character(/|n|/)|, |char(/|n|/)|fixed-length, blank padded
> |text|variable unlimited length

It "text" type is a "character string" just like all the other character types 
and thus is also limited to 1GB

"text", "varchar", "character varying" and "character" are all identical in how 
they are stored and processed. 

Thomas

How to extract information from pg_ddl_command type

2019-07-11 Thread Thomas Kellerer

The function pg_event_trigger_ddl_commands() returns several columns, one of 
them being "command" that is of the type "pg_ddl_command". 

The manual[1] describes this column as: 

> A complete representation of the command, in internal format. 
> This cannot be output directly, but it can be passed to other 
> functions to obtain different pieces of information about the 
> command.

However, I can not find any of those "other functions" to extract information 
from that column. 

My goal is to get the complete SQL text that fired the event trigger - is that 
even possible through pg_event_trigger_ddl_commands()? 

Thanks
Thomas

[1] https://www.postgresql.org/docs/current/functions-event-triggers.html

Re: Download link for postgres 10.8 edb software

2019-07-11 Thread Thomas Kellerer

Nanda Kumar schrieb am 12.07.2019 um 04:55:
> Currently we are using postgres database 9.7 version. We need to test
> our application in 10.8 edb version. So can you please share the
> download link for PostgreSQL 10.8 edb software. This is required for
> testing purpose .
> 

The current minor release for Postgres 10 is 10.9 you should use 10.8

You can download it from here:

   https://www.enterprisedb.com/downloads/postgres-postgresql-downloads

If you are running Linux, it is better to install it through your Linux package 
manager as described here:

   https://www.postgresql.org/download/

Thomas

Re: Tablespace column value null on select * from pg_tables

2019-07-15 Thread Thomas Kellerer


Alex Williams schrieb am 15.07.2019 um 20:35:

But in my case, I have a database that's in a user-defined tablespace
(data2) and all the tables/indexes there are also in data2 and I want
to do a select into a table the results of all the tables /
tablespaces they are in that database...when doing this:

SELECT distinct tablespace FROM pg_tables;


I get 2 rows: null and pg_global (I think to expect null for
pg_default, but if the table is in a user-defined tablespace, should
we expect it to show it, in my case, data2?)



If data2 is the default tablespace of the database, then this is expected.

The tablespace column is null in pg_tables if the table is located in the
default tablespace of the database.

See here:

https://www.postgresql.org/message-id/flat/15901-e5cfe2dd7298a3a4%40postgresql.org

And the answer on SO:

https://stackoverflow.com/a/56950950

Thomas

Re: Too slow to create new schema and their tables, functions, triggers.

2019-07-26 Thread Thomas Kellerer

PegoraroF10 schrieb am 26.07.2019 um 00:16:
> statement: create or replace function valoresdfe... 0 mins 1.135 secs
> statement: create or replace function dadosorigem...0 mins 0.055 secs
> statement: CREATE OR REPLACE FUNCTION SONU...   0 mins 0.013 secs
> statement: create or replace function contatoscampa...2 mins 13.492 
> secs
> statement: create or replace function FORMATARTELEF...  0 mins 0.013 secs
> statement: create or replace function ChecaVar  0 mins 0.012 secs
> statement: CREATE or replace FUNCTION criatrigge... 1 mins 16.42 secs
> 

Is it possible those functions (were creating was slow) were still in use 
by another session and the create script had to wait for an exclusive lock to 
replace the function?

Re: Which version to upgrade upto

2019-07-31 Thread Thomas Kellerer


Vikas Sharma schrieb am 31.07.2019 um 17:57:

The architects and developers have perception that the latest release
always will have bugs and others might be using in production. They
feel 11.2 will be better bet than 11.4.



You should always use the latest minor version, so 11.4 is preferred over 11.2

Quote from the homepage[1]


For minor releases, the community considers not upgrading to be riskier than 
upgrading


So definitely go with 11.4

If you want to see how many (and which) bugs have been fixed between 11.2 and 
11.4 you can check:

   https://why-upgrade.depesz.com/show?from=11.2&to=11.4&keywords=

Thomas


[1] https://www.postgresql.org/support/versioning/

Re: Cursors for PGJDBC queries

2019-08-01 Thread Thomas Kellerer

Rashmi V Bharadwaj schrieb am 01.08.2019 um 09:10:
> I am trying to set the fetch size for my ResultSet to avoid Out of
> Memory exception. I have created the Statement with
> ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY and
> ResultSet.HOLD_CURSORS_OVER_COMMIT and I've also disabled auto commit
> as mentioned in the link Getting results based on a cursor
> .
> I am still getting Out of memory error. My SQL query is a simple
> SELECT statement to retrieve all the rows from a table. According to
> https://postgrespro.com/list/thread-id/2370772, the holdability must
> be CLOSE_CURSORS_AT_COMMIT. Could you please confirm this is a
> requirement?

To rule out the obvious: you did call Statement.setFetchSize() before calling 
executeQuery()? 


Using

  connection.setAutoCommit(false);
  Statement stmt = connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, 
ResultSet.CONCUR_READ_ONLY);
  stmt.setFetchSize(100);
  ResultSet rs = stmt.executeQuery("");
  while (rs.next()) {
...
  }

works perfectly for me, even with really large results.

Re: How to check if a field exists in NEW in trigger

2019-08-04 Thread Thomas Kellerer

Igal @ Lucee.org schrieb am 05.08.2019 um 00:52:
> I have the following statement in a trigger:
> 
>     new.email = lower(new.email);
> 
> When I try to update a record without setting the email column however, I get 
> an error:
> 
> SQL Error [42703]: ERROR: record "new" has no field "email"
>   Where: SQL statement "SELECT lower(new.email)"
> PL/pgSQL function on_record_modified() line 26 at assignment
> 
> I have seen some hacks suggesting TRY/CATCH or converting to a JSON
> and checking if the field exists, but I would think that there's a
> better way to check if the field is in the NEW record, no?


I assume using to_jsonb(new) and then check for the key in the json value 
will be faster than checking e.g. information_schema.column 
or pg_catalog.pg_attribute

Re: How to check if a field exists in NEW in trigger

2019-08-05 Thread Thomas Kellerer

Pavel Stehule schrieb am 05.08.2019 um 08:19:
>>> I have seen some hacks suggesting TRY/CATCH or converting to a JSON
>>> and checking if the field exists, but I would think that there's a
>>> better way to check if the field is in the NEW record, no?
> 
>> I assume using to_jsonb(new) and then check for the key in the json value
>> will be faster than checking e.g. information_schema.column
>> or pg_catalog.pg_attribute
> 
> Alternative solution can be using other language than PLpgSQL -
> PLPythonu or PLPerl (there it is simple task). This language is not
> designed for too dynamic code. PLpgSQL triggers are designed for
> stable schema - you should to know if table has email column or not.
> 
> Catching errors in PLpgSQL is relative expensive solution due related
> savepoint overhead in background.
Yes, exception handling (or a catalog lookup) is expensive. 
That's why I suggested that using to_jsonb() has the least overhead. 

The check is then as simple as:

   if (to_jsonb(new) ? 'email') then 
 ... do something
   end if;

Thomas

lead() with arrays - strange behaviour

2019-08-08 Thread Thomas Kellerer

Consider the following dummy table (this is a simplified example from a bigger 
query):

create table sample_data (id int, id_list int[]);
insert into sample_data (id, id_list)
values 
   (1, array[1,2,3]),
   (2, array[2,3,4]),
   (3, array[4,5,6]);

The following statement tries to find the overlapping values in id_list between 
the current row and the next row:

select id, 
   id_list,
   lead(id_list) over (order by id) as next_list,
   array(select unnest(id_list) intersect select unnest(lead(id_list) 
over (order by id))) as common_ids
from sample_data;   

The above returns: 

id | id_list | next_list | common_ids
---+-+---+---
 1 | {1,2,3} | {2,3,4}   | {}
 2 | {2,3,4} | {4,5,6}   | {}
 3 | {4,5,6} |   | {}

The empty array for "common_ids" is obviously incorrect. 

However, when the evaluation of the "next_list" is put into a derived table, 
then this works as expected:

select id, id_list, next_list, 
   array(select unnest(id_list) intersect select unnest(next_list)) as 
common_ids
from (
  select id, 
 id_list,
 lead(id_list) over (order by id) as next_list
  from sample_data
) t

returns:

id | id_list | next_list | common_ids
---+-+---+---
 1 | {1,2,3} | {2,3,4}   | {2,3} 
 2 | {2,3,4} | {4,5,6}   | {4}   
 3 | {4,5,6} |   | {}

This is with Postgres 11.4

Is this a bug or simply not supported?

It does work correctly with intarray's "intersect" operator:

select id, 
   id_list,
   id_list & lead(id_list) over (order by id) as next_list
from sample_data;   


However, the actual data uses a bigint, so intarray isn't an option.

Thomas

Re: lead() with arrays - strange behaviour

2019-08-08 Thread Thomas Kellerer

David Rowley schrieb am 08.08.2019 um 13:03:
>> The following statement tries to find the overlapping values in id_list 
>> between the current row and the next row:
>>
>> select id,
>>id_list,
>>lead(id_list) over (order by id) as next_list,
>>array(select unnest(id_list) intersect select 
>> unnest(lead(id_list) over (order by id))) as common_ids
>> from sample_data;
>>
>> The above returns:
>>
>> id | id_list | next_list | common_ids
>> ---+-+---+---
>>  1 | {1,2,3} | {2,3,4}   | {}
>>  2 | {2,3,4} | {4,5,6}   | {}
>>  3 | {4,5,6} |   | {}
>>
>> The empty array for "common_ids" is obviously incorrect.
> 
> I think you're confused with what the SELECT with the empty FROM
> clause does here.  In your subquery "id_list" is just a parameter from
> the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to
> return anything since those are both just effectively scalar values,
> to which there is no "next" value.

id_list is a column in the table and as you can see in the output 
lead(id_list) most definitely returns the array from the next row. 

and "select unnest(some_array)" works just fine as you can see 
when "next_list" is taken from the derived table. 

Thomas

Re: lead() with arrays - strange behaviour

2019-08-08 Thread Thomas Kellerer

Tom Lane schrieb am 08.08.2019 um 16:10:
> David's point is that the two occurrences of lead() don't mean the
> same thing.  A window function is directly tied to the SELECT that
> it is in the select-list of, and its notion of next and previous
> rows is concerned with the set of rows that that SELECT's FROM-clause
> generates.  In this example, the inner SELECT has an empty FROM that
> returns one row, so the lead() in that SELECT doesn't do anything
> useful.

Ah! Now I get it ;) 

Thanks for clearing that up.

> You could probably get where you want to go with something along
> the lines of

Yes, that's what I did in the end (see my initial post)

Re: Determining table and column access based on query

2019-08-13 Thread Thomas Kellerer

Thomas Rosenstein schrieb am 13.08.2019 um 12:17:
> we would like to evaluate what data a SQL query will access, is there
> a possibility to ask the Planner directly which tables and columns
> will be affected in the end?
> 

explain (verbose)  will show the output columns for each step. 

Is that what you are looking for?

Re: SQL equivalint of #incude directive ?

2019-08-30 Thread Thomas Kellerer


stan schrieb am 30.08.2019 um 15:48:

I thought this would be common. But a quick Google only revealed what look to be
workarounds.

I am defining a bunch of functions, and I would prefer to store them in a
separate file, which then gets "source" by the main DB init file.

Is there a standard way to do this?



Define "standard".

With psql you can use the \i directive from within a SQL script.

But that does not work with other SQL clients.

It all depends on the SQL client you use to run those scripts.

I use Liquibase to manage schema setup and migration - that has a totally different 
"include" directive then psql.

Re: PG SQL and LIKE clause

2019-09-13 Thread Thomas Kellerer

Matthias Apitz schrieb am 13.09.2019 um 07:28:
> We're porting a huge Library Management System, written using all kind
> of languages one can think of (C, C++, ESQL/C, Perl, Java, ...) on Linux
> from the DBS Sybase to PG, millions of lines of code, which works also
> with DBS Oracle and in the past with INFORMIX-SE and -ONLINE.
> 
> We got to know that in CHAR columns with trailing blanks a 

In a nutshell: do not use the CHAR data type (neither in Oracle nor in Postgres)

Unless your application actively checks the data type name of columns in a 
table you can simply change the type of those columns to varchar (or text) and 
get rid of the annoying (but required) behaviour of the CHAR type. 

To be honest, I am surprised this didn't show up in Oracle, as I think the CHAR 
behaviour there is the same as in Postgres.

Thomas

Re: Advice for geographically dispersed multi master

2019-10-03 Thread Thomas Kellerer


Nikolai Lusan schrieb am 03.10.2019 um 11:05:

I have read the various replication and clustering documentation for
postgresql 11, and it looks like what I want is "Synchronous Multimaster
Replication". The organisation I am doing this for does not have the money
to throw at a commercial solution like BRD, but from my reading it looks
like PG v12 may have a method to make such a solution easier to achieve
with stock PG.


You might be interested in this blog post:

https://info.crunchydata.com/blog/a-guide-to-building-an-active-active-postgresql-cluster

Postgres 12: backend crashes when creating non-deterministic collation

2019-10-04 Thread Thomas Kellerer


I was trying to learn how the new non-deterministic collations in v12 work, but 
the following makes the backend crash:

CREATE COLLATION de_ci (provider = icu, locale = 'de-x-icu', deterministic = 
false);

Which leads to:

2019-10-04 11:54:23 CEST   LOG:  server process (PID 7540) was terminated by 
exception 0xC005
2019-10-04 11:54:23 CEST   DETAIL:  Failed process was running:
CREATE COLLATION de_ci (provider = icu, locale = 'de-x-icu', 
deterministic = false)
2019-10-04 11:54:23 CEST   HINT:  See C include file "ntstatus.h" for a 
description of the hexadecimal value.
2019-10-04 11:54:23 CEST   LOG:  terminating any other active server processes
2019-10-04 11:54:23 CEST   WARNING:  terminating connection because of crash of 
another server process
2019-10-04 11:54:23 CEST   DETAIL:  The postmaster has commanded this server 
process to roll back the current transaction and exit, because another server 
process exited abnormally and possibly corrupted shared memory.

This is on Windows 10 with the Postgres 12 binaries from EDB.
Exact Postgres version is: PostgreSQL 12.0, compiled by Visual C++ build 1914, 
64-bit
The database was pg_upgraded if that makes any difference

I might have misunderstood how to use deterministic to create a 
case-insensitive collation, but I don't think the backend should crash if I do 
something wrong ;)

Regards
Thomas

Re: Postgres 12: backend crashes when creating non-deterministic collation

2019-10-04 Thread Thomas Kellerer


Laurenz Albe schrieb am 04.10.2019 um 16:04:

I was trying to learn how the new non-deterministic collations in v12
work, but the following makes the backend crash:

CREATE COLLATION de_ci (provider = icu, locale = 'de-x-icu',
deterministic = false);

Which leads to:

2019-10-04 11:54:23 CEST   LOG:  server process (PID 7540) was
terminated by exception 0xC005

I might have misunderstood how to use deterministic to create a case-
insensitive collation, but I don't think the backend should crash if
I do something wrong ;)


Yes, there is a bug somewhere.  FWIW, it works on my Linux system.


It also works on Windows when I specify "correct" locale names - the above 
seems to be an edge case.
Is it worth the effort to report that through the bug reporting form?


To get a case insensitive collation you'd have to use something like

 LOCALE = 'de-DE-u-ks-level2'


Creating works, but apparently on Windows ICU does not support this.

The following works fine on Linux (returns both rows), but not on Windows 
(returns nothing)

  create collation de_ci (provider = icu, locale = 'de-DE-u-ks-level2', 
deterministic = false);
  create table test (name text);
  insert into test values ('FOO'), ('Foo');

  select *
  from test
  where name = 'foo' collate de_ci;

Not a big deal, but might surprise some people.

Thomas

Re: Postgres 12: backend crashes when creating non-deterministic collation

2019-10-04 Thread Thomas Kellerer


Daniel Verite schrieb am 04.10.2019 um 18:49:

Creating works, but apparently on Windows ICU does not support this.


After installing v12 on windows with the EDB installer, I notice
that it ships with ICU 53, a relatively old version (2014).

Concerning the problem just above (not the crash), ICU 53 is too old
to support BCP47 tags as collation attributes, as mentioned
at https://www.postgresql.org/docs/12/collation.html :

   "The first example selects the ICU locale using a “language tag” per
   BCP 47. The second example uses the traditional ICU-specific locale
   syntax. The first style is preferred going forward, but it is not
   supported by older ICU versions.

With ICU 53 or older, instead of the locale above, we must use the
old-style syntax:

  locale = 'de-DE@colStrength=secondary'

If you use that in your example, the case insensitive lookups should
work.


That indeed works, thanks a lot.

Re: Postgres 12: backend crashes when creating non-deterministic collation

2019-10-04 Thread Thomas Kellerer


Tom Lane schrieb am 04.10.2019 um 16:52:

Is it worth the effort to report that through the bug reporting form?


No, this thread is a sufficient report.  What *would* be a good use
of time is to get a stack trace from the crash, if you can.


I don't know if I did everything correctly, but here it is. I hope it helps


icuuc53.dll!64964a80()  Unbekannt

  icuuc53.dll!64964c2d()Unbekannt
  icuuc53.dll!64966328()Unbekannt
  icuuc53.dll!64965469()Unbekannt
  icuuc53.dll!6495ef28()Unbekannt
  icuuc53.dll!64961501()Unbekannt
  icuuc53.dll!6495b330()Unbekannt
  icuuc53.dll!64959b9e()Unbekannt
  icuin53.dll!64a8bd92()Unbekannt
  postgres.exe!get_collation_actual_version(char collprovider, const char * 
collcollate) Zeile 1533 C
  postgres.exe!DefineCollation(ParseState * pstate, List * names, List * 
parameters, bool if_not_exists) Zeile 218  C
  postgres.exe!ProcessUtilitySlow(ParseState * pstate, PlannedStmt * pstmt, 
const char * queryString, ProcessUtilityContext context, ParamListInfoData * 
params, QueryEnvironment * queryEnv, _DestReceiver * dest, char * 
completionTag) Zeile 1292C
  postgres.exe!standard_ProcessUtility(PlannedStmt * pstmt, const char * 
queryString, ProcessUtilityContext context, ParamListInfoData * params, 
QueryEnvironment * queryEnv, _DestReceiver * dest, char * completionTag) Zeile 
933 C
  postgres.exe!ProcessUtility(PlannedStmt * pstmt, const char * queryString, 
ProcessUtilityContext context, ParamListInfoData * params, QueryEnvironment * 
queryEnv, _DestReceiver * dest, char * completionTag) Zeile 363  C
  postgres.exe!PortalRunUtility(PortalData * portal, PlannedStmt * pstmt, bool 
isTopLevel, bool setHoldSnapshot, _DestReceiver * dest, char * completionTag) 
Zeile 1184 C
  postgres.exe!PortalRunMulti(PortalData * portal, bool isTopLevel, bool 
setHoldSnapshot, _DestReceiver * dest, _DestReceiver * altdest, char * 
completionTag) Zeile 1323   C
  postgres.exe!PortalRun(PortalData * portal, long count, bool isTopLevel, bool 
run_once, _DestReceiver * dest, _DestReceiver * altdest, char * completionTag) 
Zeile 800C
  postgres.exe!exec_execute_message(const char * portal_name, long max_rows) 
Zeile 2098 C
  postgres.exe!PostgresMain(int argc, char * * argv, const char * dbname, const 
char * username) Zeile 4299 C
  [Inlineframe] postgres.exe!BackendRun(Port *) Zeile 4431  C
  postgres.exe!SubPostmasterMain(int argc, char * * argv) Zeile 4953C
  postgres.exe!main(int argc, char * * argv) Zeile 216  C

So it happens somewhere inside the ICU DLL - but I don't have the symbols for 
that

Re: Postgres 12: backend crashes when creating non-deterministic collation

2019-10-05 Thread Thomas Kellerer


Tom Lane schrieb am 04.10.2019 um 19:36:

Hm.  This trace says that the crash happened somewhere down inside ICU
itself, during the ucol_open() call in get_collation_actual_version().
There isn't much we could have done to mess up the arguments to that
function.  That would seem to mean that it's ICU's bug not ours.
Maybe another reason not to be using such an old ICU version :-(.


I would like to test this with a newer ICU version.

So I managed to setup the build environment with Visual Studio, but I can't 
figure out how to enable ICU for the build.

I created a config.pl to specify the location of the downloaded ICU libraries 
with the following content

# Configuration arguments for vcbuild.
use strict;
use warnings;

our $config = {
  icu   => "d:\Projects\postgres\libs\icu\lib64\"# --with-icu=
};

The build is successful, but when I run "install targetfolder" no ICU libraries 
are included and pg_config only shows:

   CONFIGURE = --enable-thread-safety --with-ldap --without-zlib

I have no idea what I am missing.

I also tried building with Msys2 but even when I run configure with the 
--with-icu option, no ICU DLLs are copied

Regards
Thomas

Re: Postgres 12: backend crashes when creating non-deterministic collation

2019-10-05 Thread Thomas Kellerer


Thomas Kellerer schrieb am 05.10.2019 um 13:39:

Hm.  This trace says that the crash happened somewhere down inside ICU
itself, during the ucol_open() call in get_collation_actual_version().
There isn't much we could have done to mess up the arguments to that
function.  That would seem to mean that it's ICU's bug not ours.
Maybe another reason not to be using such an old ICU version :-(.


I would like to test this with a newer ICU version.

So I managed to setup the build environment with Visual Studio, but I can't 
figure out how to enable ICU for the build.


Ah, figured it out.

config.pl has a different format compared to config_default.pl

   $config->{icu}='d:\Projects\postgres\libs\icu'

did the trick, and postgres was built with ICU support.

I can confirm that with ICU 65 the crash does not occur and the case 
insensitive comparison works fine as well.

Regards
Thomas

Re: Case Insensitive Comparison with Postgres 12

2019-10-10 Thread Thomas Kellerer

Igal @ Lucee.org schrieb am 10.10.2019 um 14:41:
> Thank you all for replying.  I tried to use the locale suggested by
> both Laurenz and Pavel, en-US-u-ks-level2, but I'm still getting
> false for a simple comparison of 'Abc' = 'abc'.  I tried the locale
> both as a 'string' and as an "identifier":
> 
>> drop collation if exists case_insensitive;
> 
>> create collation case_insensitive (
>    provider=icu, locale="en-US-u-ks-level2", deterministic=false
> );
> 
>> select 'Abc' = 'abc' collate case_insensitive as is_equal;
> 
> is_equal|
> |
> false   |
> 
> What am I doing wrong here?

Check the version of libicu that your Linux is using. 
That locale format requires version 54 or later.
(My up-to-date CentOS 7.7 for example is still using version 50 and the EDB 
Windows binaries include version 53). 

In another thread about ICU problems, Daniel Verite explained that in more 
detail:

> With ICU 53 or older, instead of the locale above, we must use the old-style 
> syntax:
> 
>  locale = 'de-DE@colStrength=secondary'

In your case I guess, it should be 

   locale = 'en-US@colStrength=secondary'

Thomas

JSON vs. JSONB storage size

2019-10-11 Thread Thomas Kellerer

I recently stumbled over the presentation "How to Use JSON in MySQL Wrong" by 
Bill Karwin[1]

While most of the indexing part simply doesn't apply to Postgres, I was curious 
about the statement that the data type of a json value inside the json matters 
as well (Slide 56)

Apparently in MySQL storing {"a": 123456789} takes less space than {"a": 
'123456789'}

So I tested that with Postgres both using json and jsonb - my expectation was, 
that this would be similar in Postgres as well. 

However, it turned out that for a json column there was no difference at all 
(both versions would show up the same with pg_total_relation_size())

The table size with jsonb was bigger in general, but the one with the "integer" 
value was even bigger than the one with the "string" storage. 

The following little test script:

create table json_length_test1 (id serial primary key, d json);
insert into json_length_test1
select i, jsonb_build_object('a', 1234567890)
from generate_series(1,1e6) t(i);

create table json_length_test2 (id serial primary key, d json);
insert into json_length_test2
select i, jsonb_build_object('a', '1234567890')
from generate_series(1,1e6) t(i);

create table jsonb_length_test1 (id serial primary key, d jsonb);
insert into jsonb_length_test1
select i, jsonb_build_object('a', 1234567890)
from generate_series(1,1e6) t(i);

create table jsonb_length_test2 (id serial primary key, d jsonb);
insert into jsonb_length_test2
select i, jsonb_build_object('a', '1234567890')
from generate_series(1,1e6) t(i);

select 'json', pg_size_pretty(pg_total_relation_size('json_length_test1')) 
as json_int_size, 
   pg_size_pretty(pg_total_relation_size('json_length_test2')) 
as json_text_size
union all
select 'jsonb', 
pg_size_pretty(pg_total_relation_size('jsonb_length_test1')) as json_int_size, 

pg_size_pretty(pg_total_relation_size('jsonb_length_test2')) as json_text_size


Returns (Postgres 12, Windows 10)

?column? | json_int_size | json_text_size
-+---+---
json | 71 MB | 71 MB 
jsonb| 87 MB | 79 MB 

I am a bit surprised by this (not because the jsonb sizes are generally bigger, 
but that the string value takes less space)

Is this caused by the fact that a string value compresses better internally? 


Thomas

  [1] https://www.slideshare.net/billkarwin/how-to-use-json-in-mysql-wrong

A little confusion about JSON Path

2019-10-16 Thread Thomas Kellerer

Hello,

I don't understand why the following two JSON Path expressions aren't doing the 
same thing in Postgres 12:

with sample (data) as (
  values
('{"k1": {"list":[1,2,3]}}'::jsonb)
)
select data, 
   jsonb_path_exists(data, '$.k1.list.type() ? (@ == "array")'), -- 
returns true as expected
   jsonb_path_exists(data, '$.k1.list ? (@.type() == "array")') -- 
returns false - not expected
from sample;


Apparently "@.type()" returns something different then "$.k1.list.type()"

But maybe I simply don't understand how the @ is supposed to work.

Regards
Thomas

Re: A little confusion about JSON Path

2019-10-18 Thread Thomas Kellerer

Laurenz Albe schrieb am 17.10.2019 um 13:25:
>> I don't understand why the following two JSON Path expressions aren't doing 
>> the same thing in Postgres 12:
>>
>>  jsonb_path_exists(data, '$.k1.list.type() ? (@ == "array")'), -- returns 
>> true as expected
>>  jsonb_path_exists(data, '$.k1.list ? (@.type() == "array")') -- returns 
>> false - not expected
>>
>> Apparently "@.type()" returns something different then "$.k1.list.type()"
>>
> This seems to be a consequence of "lax" mode:
> 
>  "Besides, comparison operators automatically unwrap their operands in the 
> lax mode,
>   so you can compare SQL/JSON arrays out-of-the-box. An array of size 1 is 
> considered
>   equal to its sole element. Automatic unwrapping is not performed only when:
> 
>   - The path expression contains type() or size() methods that return the 
> type and
> the number of elements in the array, respectively.
> 
> (from https://www.postgresql.org/docs/12/functions-json.html)
> 
> with sample (data) as (
>   values
> ('{"k1": {"list":[1,2,3]}}'::jsonb)
> )
> select data,
>jsonb_path_exists(data, '$.k1.list ? (@.type() == "number")'),   
> -- lax mode unwraps the array
>jsonb_path_exists(data, 'strict $.k1.list ? (@.type() == "array")')  
> -- strict mode doesn't
> from sample;


Ah, thanks. I did not see that part.

Re: Calling jsonb_array_elements 4 times in the same query

2019-10-21 Thread Thomas Kellerer

Alexander Farber schrieb am 21.10.2019 um 15:39:
> I am trying to construct a query, which would draw a game board when given a 
> move id (aka mid):
> 
>     SELECT
>     hand,
>     JSONB_ARRAY_ELEMENTS(tiles)->'col' AS col,
>     JSONB_ARRAY_ELEMENTS(tiles)->'row' AS row,
>     JSONB_ARRAY_ELEMENTS(tiles)->'letter' AS letter,
>     JSONB_ARRAY_ELEMENTS(tiles)->'value' AS value
>     FROM words_moves
>     WHERE action = 'play' AND
>     gid = (SELECT gid FROM words_moves WHERE mid = 391416)
>     AND played <= (SELECT played FROM words_moves WHERE WHERE mid = 391416)
>     ORDER BY played DESC
> 
> The above query works for me and fetches all moves performed in a game id 
> (aka gid) up to the move id 391416.
> 
> 1. Is it okay to call JSONB_ARRAY_ELEMENTS four times in the query, will 
> PostgreSQL optimize that to a single call?

Typically set returning functions should be used in the FROM clause, not the 
SELECT list: 

SELECT
hand,
t.tile -> 'col' AS col,
t.tile -> 'row' AS row,
t.tile -> 'letter' AS letter,
t.tile -> 'value' AS value
FROM words_moves
  cross join jsonb_array_elements(tiles) as t(tile)
WHERE action = 'play' 
  AND gid = (SELECT gid FROM words_moves WHERE mid = 391416)
  AND played <= (SELECT played FROM words_moves WHERE WHERE mid = 391416)
ORDER BY played DESC

Re: Calling jsonb_array_elements 4 times in the same query

2019-10-21 Thread Thomas Kellerer

> I am trying to create the following strored function based on your suggestion 
> (and I have forgotten to mention, that I also need the board id aka bid from 
> another table, words_games), but hit the next problem:
> 
> CREATE OR REPLACE FUNCTION words_get_move(
>     in_mid integer
>     ) RETURNS TABLE (
>     out_bid    integer,
>     out_mid    bigint,
>     out_hand   text,
>     out_col    integer,
>     out_row    integer,
>     out_letter text,
>     out_value  integer
>     ) AS
> $func$
>     SELECT
>     g.bid,
>     m.mid,
>     m.hand,
>     (t->'col')::int AS col,
>     (t->'row')::int AS row,
>     (t->'letter')::text AS letter,
>     (t->'value')::int   AS value
>     FROM words_moves m
>     CROSS JOIN JSONB_ARRAY_ELEMENTS(m.tiles) AS t(tile)
>     LEFT JOIN words_games g USING(gid)
>     WHERE m.action = 'play' AND
>     m.gid = (SELECT gid FROM words_moves WHERE mid = in_mid)
>     AND m.played <= (SELECT played FROM words_moves WHERE mid = in_mid)
>     ORDER BY m.played DESC;
> $func$ LANGUAGE sql;
> 
> words_ru=> \i src/slova/dict/words_get_move.sql
> psql:src/slova/dict/words_get_move.sql:28: ERROR:  cannot cast type jsonb to 
> integer
> LINE 17: (t->'col')::int AS col,
>    ^
> 

Use ->> to return the value as text (not as JSONB) and you need to use the 
column alias, not the table alias:

(t.tile ->> 'col')::int

Re: Composite type storage overhead

2019-10-23 Thread Thomas Kellerer

> 3. The value is logically defined as a 128-bit integer, that is in
> itself a compound value split into a few "bit groups". Extracting
> these parts can be done by simple (and supposedly efficient) bitwise
> operators when stored as integer, but becomes much more cumbersome
> with UUID, I guess.

This is usually a bad idea. 

Putting logic into the primary key value and merging different types of 
information in a single column is typically not such a good idea. 
(And it violates first normal form to begin with) 

I would strongly recommend to revisit this idea, and e.g. think about a 
multi-column primary key instead. Where each of these "groups" are stored in a 
separate column where the actual (business) value can be retrieved without any 
bitshifting or similar operations. 

Thomas

Re: Search path

2019-10-24 Thread Thomas Kellerer

stan schrieb am 24.10.2019 um 18:41:
> I just was educated on the security issues of search path. As a result
> I am going to define a schema for the project we are working on.
> I set this in  postgresql.conf
> 
> search_path = 'ica , "$user", public'
> 
> Here is the question. Will this path be in effect for users connecting from
> MS Access clients? The name of this file makes me think that it only
> controls instances of psql, but I need this to be effective for all
> connections to the database.

As Adrian already pointed out, yes this will work for any connection including 
one from MS Access.

Note that you can also set this on a per user level, if you don't want all 
users having the same setting.

   ALTER USER ms_access_user SET search_path = ica , "$user", public;

Re: Upgrade PGSQL main version without backup/restore all databases?

2019-11-06 Thread Thomas Kellerer

Durumdara schrieb am 06.11.2019 um 14:09:
> We have PGSQL 9.6.xxx on a Linux server which heavily used.
> More than 100 databases, and more than 300 active users, and it is a master 
> of a cluster (the data replicated on a slave).
> 
> Somewhere we have read that 9.6 will become unsupported shortly.

"Shortly" is relative. It will fall out of support in 2021: 
https://www.postgresql.org/support/versioning/

But it's a good idea to plan the upgrade now. 


> We can't stop to lock out all users, make a dumpall, upgrade, restore
> them all in new version, and then leave them to connect (200 GB of
> data), because it is too long.
> 
> Is there any way to upgrade PG and databases without backup/restore?

Yes, you can use pg_upgrade. However it will still copy 200GB (but using a 
filecopy, not dump/restore) so it could still take some time. 

If you use it with the --link option, the upgrade will be very quick as only 
the catalog tables need to be copied (export/import).

 
> Maybe the solution is to upgrade slave without sync the data changes
> before; and if all is newer, leave to get the new data from the
> master. I don't know it's possible or not. The newer PG slave could
> make mistakes if the master have lower PG version...

There are some ways to do a near-zero upgrade using logical replication, but 
it's not easy to configure.

See this blog post for example: 
https://www.cybertec-postgresql.com/en/upgrading-postgres-major-versions-using-logical-replication/

Thomas

How to convert return values from JSON Path functions to text

2019-11-06 Thread Thomas Kellerer


The new JSON path functions in Postgres 12 are really convenient, however I 
cannot figure out how to properly convert their return values to a text value.

E.g. select jsonb_path_query_first('{"a": "foo"}', '$.a') returns a JSONB value.
Casting it to text, still yields "foo" (with quotes), rather than foo (without 
quotes)

For the time being I am using something like this:

create function jsonb_to_text(p_value jsonb)
  returns text
as
$$
  select case jsonb_typeof(p_value)
   when 'string' then trim('"' from p_value::text)
   else p_value::text
 end;
$$
language sql
immutable
strict;

But that feels a bit "heavyweight" - I was hoping for an easier (and more 
efficient) way to do that.

Thomas

Re: SQL SERVER migration to PostgreSql

2019-11-07 Thread Thomas Kellerer

İlyas Derse schrieb am 07.11.2019 um 14:28:
> I'm trying to migration to PostgreSql from SQL Server. I have Stored
> Procedures what have output parameters and returning tables.But you
> know what, we can not returning tables in stored procedures in
> PostgreSql and we can not use output parameters in functions in
> PostgreSql.

The correct migration path is to rewrite them to set-returning functions and 
use them in the FROM clause:

so instead of 

   sp_foobar 42;

use

  select *
  from fn_foobar(42);

Thomas

Re: INOUT PARAMETERS WITH RETURN TABLES IN FUNCTION

2019-11-08 Thread Thomas Kellerer

İlyas Derse schrieb am 08.11.2019 um 09:18:
> I'm trying to migration to PostgreSql from SQL Server.  I have Stored 
> Procedures what have output parameters and return tables. How can i do both 
> together. 
> 
> CREATE or replace FUNCTION public."test" (INOUT "x" integer, INOUT "y" 
> character varying(36))
> RETURNS TABLE  (
> "id" integer,
> "filesize" character varying(36)
> )
> AS $$
>  BEGIN
>   x=6;
> RETURN QUERY
> SELECT * FROM    public."tbl_employees" ;
> 
> END;
> $$ LANGUAGE plpgsql;
> 
> I can not create that because of inout parameters.
> Another place;
> 
> do $$
> DECLARE b integer = 1;
> DECLARE d integer = 2 ;
> BEGIN
>   select * from public."test"();
> END;
> $$;
> 
>  Anybody have an idea ? 


Can't you just include the "out" parameters in the result? 

CREATE or replace FUNCTION public.test(x integer, y character varying(36))
  RETURNS TABLE  (id integer, filesize character varying(36), x integer, y 
varchar)
AS $$
begin
  x := 42;
  y := 'foo';
  
  RETURN QUERY
SELECT t.*, x, y 
FROM  public.tbl_employees t;
END;

It's different because x and y are repeated for every row, but that's the only 
thing I can think of.

Re: How should I specify work_mem/max_worker_processes if I want to do big queries now and then?

2019-11-20 Thread Thomas Kellerer

James(王旭) schrieb am 20.11.2019 um 08:56:
> I am doing a query to fetch about 1000 records in one time. But
> the query seems very slow, like "mission impossible". I am very
> confident that these records should be fit into my shared_buffers
> settings(20G), and my query is totally on my index, which is this
> big:(19M x 100 partitions), this index size can also be put into
> shared_buffers easily.(actually I even made a new partial index which
> is smaller and delete the bigger old index)
> 
> This kind of situation makes me very disappointed.How can I make my
> queries much faster if my data grows more than 1000 in one
> partition? I am using pg11.6.

max_worker_processes can't be changed without a restart. 

But work_mem and max_parallel_workers_per_gather can be changed for a single 
session.

If you normally don't need parallel execution, I would suggest to configure 
max_worker_processes and max_parallel_workers to a sensible "maximum" value.
Set max_parallel_workers_per_gather to a very low value or even 0 if you want 
to disable it by default. 

Then, when you run a really "big" query, you can set 
max_parallel_workers_per_gather to a sensible value to make the query use 
parallel execution and increase work_mem so that potential sort, hash or 
grouping operations don't spill to disk. To give you some hints there we would 
need to see the current execution plan generated using explain (analyze, 
buffers, format text). If you turn on track_io_timing before that, it would be 
even more helpful.

Thomas

Re: Return Table in StoredProceure/Function

2019-11-20 Thread Thomas Kellerer

İlyas Derse schrieb am 20.11.2019 um 09:18:

> How can I return table in Stored Procedure ? I can do it in function but I
> have inout parameters.So I can not create in function. What can I do this
> case ?
> 
> I guess,It should be like for function :
> 
> CREATE or REPLACE FUNCTION public."test" (INOUT "x" integer, INOUT "y" text)
> 
> RETURNS TABLE  ("id" integer,"filesize" character varying(36))AS $$
>  BEGINRETURN QUERYSELECT * FROMpublic."tbl_employees" ;
> END;$$ LANGUAGE plpgsql;
> 

Procedures aren't meant to return anything (in Postgres specifically and in 
Computer Science in general). 

If you want to return something use a function.

Re: Return Table in StoredProceure/Function

2019-11-21 Thread Thomas Kellerer

Tony Shelver schrieb am 21.11.2019 um 07:33:
> Well then SQL Server breaks that rule big time :)

I am aware of that - but at the end it's essentially the only DBMS (except for 
Sybase because of their common roots) that works that way.

A migration from SQL Server to Oracle (or MySQL or DB2 or Firebird) would have 
the same problems.

> Most people coming from a SQL Server background expect procedures to
> return a result set that can be queried, and in-out or out parameters
> to return variables for further information.

One very important aspect of a migration is to also migrate your mindset and 
the way you solve problems (especially when migrating between two products that 
behave so differently).
The best practices for System A are not always the best practices for System B

Insisting on "But this is the way we did it in System A" is a pretty sure 
recipe for a failing migration. 

Note that this is true in both directions: if you apply best practices from 
Postgres or Oracle when migrating _to_ SQL Server you are in for very nasty 
surprises as well.

Thomas

Re: Why are clobs always "0"

2019-12-01 Thread Thomas Kellerer


Arnie Morein schrieb am 01.12.2019 um 18:31:

I have tested the most recent driver in three different SQL IDEs, and
now with an application I'm writing that uses JDBC metadata, the
comment on a field definition also isn't available as a string
value.

The only thing I ever see regarding data type "text" field values are
either a 0 or a 1; neither of which applies.

So why is this happening, even from the JDBC metadata results as
well?


The Postgres JDBC driver does not have any problems with the "text"
data type, neither with reporting it properly through DatabaseMetaData
(it's reported as Types.VARCHAR) nor with retrieving values from
such a column.

The column size for such a column is reported as 2147483647

Column comments are reliably returned in the column "REMARKS" in the
result of DatabaseMetaData.getColumns()

Given the following table:

create table test (data text);
comment on column test.data is 'The text column';

then the following Java code:

ResultSet rs = connection.getDatabaseMetaData().getColumn(null, "public", "test", 
"%");
rs.next();
System.out.println("column_name: " + rs.getString("COLUMN_NAME"));
System.out.println("column_size: " + rs.getInt("COLUMN_SIZE"));
System.out.println("data_type: " + rs.getInt("DATA_TYPE"));
System.out.println("remarks: " + rs.getString("REMARKS"));

will output:

  column_name: data
  column_size: 2147483647
  data_type: 12
  remarks: The text column

With 12 being the value of java.sql.Types.VARCHAR

Thomas

Re: upgrade and migrate

2019-12-03 Thread Thomas Kellerer

Michael Paquier schrieb am 04.12.2019 um 05:48:
> On Tue, Dec 03, 2019 at 10:32:22PM +, Julie Nishimura wrote:
>> Hello, what is the best way to migrate from PostgreSQL 8.3.11 on
>> x86_64-redhat-linux-gnu to PostgreSQL 9.6.16 on x86_64-pc-linux-gnu
>> server, with minimal downtime?
>> The caveat is the source has about 80 databases overall almost 30
>> TB. I could migrate the smallest ones (up to 1 tb) using pg_dump and
>> pg_restore, but the largest hot database is almost 17 tb, and I am
>> not sure how to approach this effort in a better and efficient way?
> 
> pg_upgrade could be one way to go here.  That's not the scale pg_dump
> would be very good at.  

But pg_upgrade only supports 8.4+ 

At least according to the manual.

Identity columns, DEFAULT keyword and multi-row inserts

2019-12-09 Thread Thomas Kellerer

Hello,

assume the following table:

create table test 
(
  id integer not null  generated always as identity,
  data integer not null 
);

The following insert works fine:

insert into test (id, data)
values (default,1);


However, a multi-row insert like the following:

insert into test (id, data)
values 
  (default,1),
  (default,2);


fails with: 

ERROR: cannot insert into column "id"
  Detail: Column "id" is an identity column defined as GENERATED ALWAYS.
  Hint: Use OVERRIDING SYSTEM VALUE to override.


My question is: 

* If DEFAULT is not allowed for identity columns, then why does the single-row 
insert work? 
* If DEFAULT _is_ allowed, then why does the multi-row insert fail? 

The above happens with Postgres 10,11 and 12

Regards
Thomas

Re: Identity columns, DEFAULT keyword and multi-row inserts

2019-12-10 Thread Thomas Kellerer

Patrick FICHE schrieb am 10.12.2019 um 08:56:
>> -Original Message-
>> From: Thomas Kellerer  
>> 
>> assume the following table:
>> 
>> create table test 
>> (
>>   id integer not null  generated always as identity,
>>   data integer not null 
>> );
>> 
>> However, a multi-row insert like the following:
>> 
>> insert into test (id, data)
>> values 
>>   (default,1),
>>   (default,2);
>> 
>> fails with: 
>> 
>> ERROR: cannot insert into column "id"
>>   Detail: Column "id" is an identity column defined as GENERATED ALWAYS.
>>   Hint: Use OVERRIDING SYSTEM VALUE to override.
>> 
>> 
>> My question is: 
>> 
>> * If DEFAULT is not allowed for identity columns, then why does the 
>> single-row insert work? 
>> * If DEFAULT _is_ allowed, then why does the multi-row insert fail? 
>
>
> I agree that it does not seem very consistent.
>
> But is there any specific reason why are you using DEFAULT ?
> 
> If you want / have to specify DEFAULT, then you should probably
> create your identity as "generated by default".
I don't really need (or use) it, I just stumbled upon this: 
https://stackoverflow.com/questions/59261048

And I think if the single row insert is allowed the multi-row should be as 
well. 

Not sure if this is a bug - and if it is, which one is the bug: the failing 
statement or the working one?

Re: READ UNCOMMITTED in postgres

2019-12-18 Thread Thomas Kellerer

Matthew Phillips schrieb am 19.12.2019 um 00:12:
> Hi, With the current READ UNCOMMITTED discussion happening on
> pgsql-hackers [1], It did raise a question/use-case I recently
> encountered and could not find a satisfactory solution for. If
> someone is attempting to poll for new records on a high insert volume
> table that has a monotonically increasing id, what is the best way to
> do it? As is, with a nave implementation, rows are not guaranteed to
> appear in monotonic order; so if you were to keep a $MAX_ID, and
> SELECT WHERE p_id > $MAX_ID, you would hit gaps. Is there a clean way
> to do this? I've seen READ UNCOMMITTED used for this with DB2.

In my understanding READ UNCOMMITTED in other databases is typically used to 
avoid read-locks which Postgres doesn't have. 
So I wonder what benefits READ UNCOMMITTED would have to begin with.

But, if you want to poll for new rows, then why don't you use a timestamp 
column?

  select *
  from the_table
  where created_at >=

Re: BigSQL pgc alternative

2019-12-22 Thread Thomas Kellerer


Samuel Teixeira Santos schrieb am 20.12.2019 um 16:15:

BigSQL still allow to install Postgres and others resources as like a
portable install.

But today, it's only offer your tool (pgc) for newer Postgresql
versions.

I would like to install as portable option because it's more easy to
config and use in my own user in my development station.

What you recommend to do to replace pgc tool as alternative to
install  postgresql 10 and the respective postgis version as like
portable option?


On Windows, I use the ZIP archives provided by Enterprise DB to spin up 
Postgres without an installer:

https://www.enterprisedb.com/download-postgresql-binaries

Unfortunately they stopped providing them for Linux (for which I used pgc in 
the past), 10.x is the last version for which they provide this.

Re: SQL operator '*='

2019-12-23 Thread Thomas Kellerer


Matthias Apitz schrieb am 23.12.2019 um 15:33:

I've here a smaller problem of our porting from Sybase/Oracle/Informix
code to PostgreSQL; the code reads for the mentioned DBS:


#ifdef DBSORA
EXEC SQL DECLARE land_cursor CURSOR FOR
SELECT stammprio, lkz, landbez, plkz, postbez, karenz1, karenz2,
karenz3, land.wkz, webez, we, kurs, land.del
FROM   land, devisen
WHERE  land.wkz = devisen.wkz (+) AND land.brgroup = 
devisen.brgroup (+) AND land.brgroup = :brgroupHost_for_helpland_cursor
ORDER  BY stammprio, landbez;
#endif

#ifdef DBSSYB
EXEC SQL DECLARE land_cursor CURSOR FOR
SELECT stammprio, lkz, landbez, plkz, postbez, karenz1, karenz2,
karenz3, land.wkz, webez, we, kurs, land.del
FROM   land, devisen
WHERE  land.wkz *= devisen.wkz AND land.brgroup *= 
devisen.brgroup AND land.brgroup = :brgroupHost_for_helpland_cursor
ORDER  BY stammprio, landbez;
#endif

(the code for DBSPOS was just copied from Sybase). It compiles fine but
raises on execution en error about operator '*=' is not supported...


T-SQL (Sybase and SQL Server) uses *= for outer joins, just as Oracle uses (+)

Haven't used either of those outdated operators in decades, but I think the 
equivalent would be:

FROM land
  LEFT JOINdevisen
on land.wkz = devisen.wkz
   AND land.brgroup = devisen.brgroup
   AND land.brgroup = :brgroupHost_for_helpland_cursor

Re: Date created for tables

2019-12-24 Thread Thomas Kellerer


Ron schrieb am 24.12.2019 um 03:14:

Having moved to PostgreSQL from Oracle a few years ago I have been generally
very impressed by Postgres, but there are a few things that I still miss. One
of those is being able to see the created and last modified dates for database
objects.

Is this something that has been considered for implementation?

I wrote a blog about this:

https://momjian.us/main/blogs/pgblog/2017.html#November_21_2017


You all are *grossly* over-complicating this.

By creation time, "we DBAs" think the time we ran "CREATE object", not when 
pg_dump, pg_basebackup and pg_update ran.

Likewise, modification time is when we last ran an ALTER command ran, not when 
VACUUM ran (that's tracked elsewhere) or DML ran.

That's all.


+1

Although I don't really need this, there were a few situations where this came 
in handy in Oracle.

I think _any_ tracking would already help those people that need something like 
that.
Simply picking the easiest implementation and documenting the situations where 
those columns are updated would probably be enough.

Re: How can I set a timeout for a locked table in Function ?

2020-01-03 Thread Thomas Kellerer


Michael Lewis schrieb am 03.01.2020 um 18:00:


Why take an exclusive lock on an entire table to update a single row?


That's what I was asking myself as well.

Re: Multiple Aggregations Order

2020-01-14 Thread Thomas Kellerer


João Haas schrieb am 14.01.2020 um 18:26:

I'm working on a query where I need to fetch information from a table
along with some data from a many-to-many connection table in a single
query. My idea is to do an outer join with the connection query and
aggregate the needed data in multiple 'array_agg's, and then handle
this aggregated data later in code.

The issue is, there are a lot of aggs (4 by now, may increase later),
and I need to order these by a 'order' field on the connection table.
I can put an 'ORDER BY "order"' statement inside each 'array_agg',
but I don't think that would be the most efficient way. Doing the
join with a sorted connection table didn't work for me as well,
probably due to other joins on the query. I tried doing some stuff
with subqueries, but all attempts ended up in either failure or
increased query time.



What about aggregating into a single jsonb array?
You lose some of the data type information, but maybe that's OK for the backend 
that processes the data.

Something along the lines:

  SELECT tb.*,
 array_length(tree.tree_path, 1) AS depth,
 jsonb_agg(jsonb_build_object('child_id', conn.child_id, 'kind', conn.kind, 
'restrictions', conn.restrictions) order by conn."order")
  FROM tb
  ...
  GROUP BY ...

Re: Declare variable from other variable

2020-02-05 Thread Thomas Kellerer

Raul Kaubi schrieb am 05.02.2020 um 12:21:
> How can I declare another variable from another variable.
> Basically from oracle, I can just: 
>
> var1 := 'asda'||var2;
>
> In postgres, I have the following example, I would like to use variable j to 
> add number of months there.
>
> " interval 'j month')::date; "
>
>
> DO $$
> DECLARE
> v_var integer := 1;
> v_from_date date;
> BEGIN
> for j in 0..v_var LOOP
> v_from_date := (date_trunc('month',current_date) + interval 'j 
> month')::date;
> RAISE NOTICE '%', v_from_date;
> END LOOP;
> END;
> $$ LANGUAGE plpgsql;

The easiest way is to use make_interval()

v_from_date := (date_trunc('month',current_date) + make_interval(months => 
j))::date;


But it sounds as if generate_series() is what you are really looking for.

Re: Backup & Restore

2020-02-25 Thread Thomas Kellerer

sivapostg...@yahoo.com schrieb am 25.02.2020 um 02:55:
> Can u suggest a good backup solution for a windows installation ?
> Looks like the suggested two [ pgbarman, pgbackrest ] works only in
> Linux.
pg_probackup provides Windows binaries: 
https://github.com/postgrespro/pg_probackup/

Re: Postgresql 12.x on Windows (vs Linux)

2020-03-02 Thread Thomas Kellerer



Robert Ford schrieb am 02.03.2020 um 15:42:
> I am aware that this might be a broad question, but I am not
> expecting *very *specific answers either:
>
> When it come to running a modern PostgreSQL server, which serves say
> 1 TB of data, are there substantial differences in performance
> between Windows Server 2019 and Linux today?  I know there are some
> issues with shared_buffers and windows, but does it manifest in real
> performance issues?
>
> I have searched the web for this question, but is surprised to find
> very few concrete benchmarks on this.  Anyone with real world
> experience on this topic?

I think the major Linux file systems do still outperform NTFS.

But you might want to consider something else:

Most of the interesting extensions and tools only work on Linux - that includes 
administration tools like barman, pg_back or repmgr.

So preferring one over the other, is not only about performance, but might also 
be about administration.

Re: How to get RAISE INFO in JDBC

2020-03-20 Thread Thomas Kellerer


Ravi Krishna schrieb am 20.03.2020 um 15:48:

How do we get the notification mentioned in RAISE INFO of PL/PGSQL in JDBC.
I can get it in psql, but not in JDBC.

I am talking about redshift which has the same procedural language like 
postgres.


I can only answer this for Postgres.
As Redhisft has its own JDBC driver, I don't know if this applies to that as 
well:

All messages from a RAISE are reported as SQL Warnings on the Statement 
instance that you used.

So you can use Statement.getWarnings() to retrieve the RAISE output.
Note, that you need a loop that checks SQLWarning.getNextWarning() to find out 
if there was more than one RAISE

Thomas

Re: dbeaver

2020-04-01 Thread Thomas Kellerer


negora schrieb am 01.04.2020 um 21:44:

It has lots of useful features, such as good query completion, row
coloring, virtual columns, virtual foreign keys

What kind of feature is "virtual foreign keys"?

Or "virtual columns" in the context of a SQL GUI tool

Re: script libraries?

2020-04-30 Thread Thomas Kellerer


Chris Stephens schrieb am 30.04.2020 um 15:54:

as another Oracle DBA trying to pick up Postgresql one thing i
haven't come across are script libraries such as there are for Oracle
(https://github.com/tanelpoder/tpt-oracle and
https://oracle-base.com/dba/scripts as examples).

Does anything like that exist for PG? Would be nice to see how people
navigate through PG on the command line.


The Postgres Wiki has some scripts:

https://wiki.postgresql.org/wiki/Category:Snippets

Then there is a collection from pgExperets:

https://github.com/pgexperts/pgx_scripts

Re: pg_temp schema created while using DB Link

2020-05-08 Thread Thomas Kellerer

Jill Jade schrieb am 08.05.2020 um 09:32:
> We do not have any temporary tables in the Oracle database. The
> temporary schema is created only while using the db_link. These temp
> schemas (pg_temp, pg_toast) are not disappearing even after
> restarting the database.

The temp tables would be created in Postgres, not in Oracle.

> Why the db_link is causing these schemas? Is there any issue with db_link?

Which DB_Link are you referring to exactly?

Postgres' "dblink" module can only connect to another Postgres server.
So it must be something on the Oracle side.

Re: Oracle vs. PostgreSQL - a comment

2020-06-01 Thread Thomas Kellerer


Stefan Knecht schrieb am 01.06.2020 um 07:36:

Oracle is also the single most feature-rich database out there - the
feature set of Postgres isn't even 1% of what Oracle has.


I try to stay out of discussions like this, but the above is simply
not true.

Oracle indeed has more features but 1% is by far not correct.
Monitoring and analyzing performance problems (using AWR, ASH) are one point,
rolling upgrade without downtime are another one.

I'd say it's more in the vicinity of 80% or 90% depending on which features
you find more important would be more realistic.

But then Postgres has features that Oracle has not, like
transactional DDL, a much richer set of data types (Oracle still
has no proper DATE or BOOLEAN type) and I think the extension system
is something that Oracle lacks as well (at least I am not aware
of any API that would let self-written code e.g. influence the
query optimizer). Postgres also has a more flexible indexing infrastructure
and it's full text search is much more stable and reliable.

So bottom line is - as far as I see it: you can't really come up with a 
percentage.

From a DBA point of view, the percentage is probably lower than 80%, from
a developer's point of view, Oracle lacks a lot of things and the percentage
would be greater than 100%.

My €0.02

Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key

2020-06-02 Thread Thomas Kellerer

> One question: as a novice here, I think I understand the right hand
> side of your JOIN "... k(value)" is shorthand for:
>
> ... AS table_name(column_name)
>
> except that I don't see any clues in the docs that
> jsonb_object_keys() is a "table function".> Can you kindly clarify?

The clue is in the column "return type" which states: "setof text" for 
jsonb_object_keys()
A function returning "setof" is the same as a "table function"

Re: Oracle vs. PostgreSQL - a comment

2020-06-02 Thread Thomas Kellerer


Ron schrieb am 02.06.2020 um 20:38:



   PG's individual clusters are relatively lightweight, after all.


But require a new port, and Enterprises have Processes that must be followed.


I am not 100% sure, but I think you can get around that by putting pgPool or 
pgBouncer
in front and make all connections through that (with a single port)

Re: Monitoring for long running transactions

2020-06-04 Thread Thomas Kellerer


Samuel Smith schrieb am 04.06.2020 um 21:59:

Sorry, I should have clarified that I was aware of the
pg_stat_activity table. That is how we found the problem in the first
place. And yes I could just write a bash script and run it in cron. I
just didn't know if there was a more "official" way to go about this
since it is probably a common monitoring point and/or if something
like this was already made.


What about setting idle_in_transaction_session_timeout to some reasonably high 
value (15 minutes?)

Then you would get an error in your application because the connection was 
terminated and you would learn early about the problem.
Even with monitoring enabled, you would probably still kill those sessions 
manually as the application most probably can't commit them properly any more.

Thomas

Re: Oracle vs. PostgreSQL - a comment

2020-06-05 Thread Thomas Kellerer

Achilleas Mantzios schrieb am 05.06.2020 um 14:05:
>> Plus PG does not directly support cross database queries using 3 part name, 
>> something
>> sqlserver excels at.
>
> Maybe because SQL server does not have real databases but schemas instead ?
> This sucks security wise.

That is wrong.

SQL Server has both: databases and schemas and it allows for standard compliant 
catalog.schema.table references.

I think you are confusing that with MySQL where a schema and a database are the 
same thing

Re: checking existence of a table before updating its SERIAL

2020-06-08 Thread Thomas Kellerer

Matthias Apitz schrieb am 08.06.2020 um 09:53:
> We're updating the SERIAL of a bunch of tables with a SQL script which
> does for any table:
>
> /* table: idm_tasktab */
> DO $$
> DECLARE
>   max_id int;
> BEGIN
>   SELECT INTO max_id GREATEST(COALESCE(max(taskid), 0),0) + 1 FROM 
> idm_tasktab;
>   RAISE NOTICE '% % %', 'idm_tasktab', 'taskid', max_id ;
>   EXECUTE 'ALTER SEQUENCE idm_tasktab_taskid_seq RESTART ' || max_id::text;
> END $$ LANGUAGE plpgsql;
>
> Can some kind soul help me with doing a test for the existence of the
> table to avoid the error message about non existing relation?


I think the easiest way is to use to_regclass():

DO $$
DECLARE
  max_id int;
BEGIN
  if to_regclass('idm_tasktab') is not null then
SELECT INTO max_id GREATEST(COALESCE(max(taskid), 0),0) + 1 FROM 
idm_tasktab;
RAISE NOTICE '% % %', 'idm_tasktab', 'taskid', max_id ;
EXECUTE 'ALTER SEQUENCE idm_tasktab_taskid_seq RESTART ' || max_id::text;
  end if;
END $$ LANGUAGE plpgsql;

Note that you don't really need dynamic SQL for this, you can simplify this to:

  select setval('idm_tasktab_taskid_seq', GREATEST(COALESCE(max(taskid), 0),0))
  from idm_tasktab;


I also don't think greatest() is necessary.

Thomas

1 2 3 4 >

1 - 100 of 317 matches

Mail list logo