checking existence of a table before updating its SERIAL

2020-06-08 Thread Matthias Apitz


Hello,

We're updating the SERIAL of a bunch of tables with a SQL script which
does for any table:

/* table: idm_tasktab */
DO $$
DECLARE
  max_id int;
BEGIN
  SELECT INTO max_id GREATEST(COALESCE(max(taskid), 0),0) + 1 FROM idm_tasktab;
  RAISE NOTICE '% % %', 'idm_tasktab', 'taskid', max_id ;
  EXECUTE 'ALTER SEQUENCE idm_tasktab_taskid_seq RESTART ' || max_id::text;
END $$ LANGUAGE plpgsql;

Can some kind soul help me with doing a test for the existence of the
table to avoid the error message about non existing relation? 

Thanks in advance.

matthias

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub




Re: checking existence of a table before updating its SERIAL

2020-06-08 Thread David G. Johnston
On Monday, June 8, 2020, Matthias Apitz  wrote:

>
> Can some kind soul help me with doing a test for the existence of the
> table to avoid the error message about non existing relation?
>


 https://www.postgresql.org/docs/12/catalogs-overview.html

David J.


Re: Multitenent architecture

2020-06-08 Thread Vasu Madhineni
Hi All,

Thanks a lot for information, I will look into it and get back to you.

Regards,
Vasu Madhineni

On Sun, Jun 7, 2020 at 1:21 AM Michel Pelletier 
wrote:

>
> On Sat, Jun 6, 2020 at 3:14 AM Vasu Madhineni 
> wrote:
>
>> Hi Rob,
>>
>> Our environment is medical clinical data, so each clinic as a tenant.
>> Approximately 500+ tenants with 6TB data.
>>
>> Thank you in advance.
>>
>>
> There's a good article on the AWS blog on multi tenancy with postgres:
>
>
> https://aws.amazon.com/blogs/database/multi-tenant-data-isolation-with-postgresql-row-level-security/
>
> This is similar to Laurenz's second suggestion.
>
> -Michel
>
>
>> Regards,
>> Vasu Madhineni
>>
>> On Fri, Jun 5, 2020 at 6:09 PM Rob Sargent  wrote:
>>
>>>
>>>
>>> On Jun 5, 2020, at 2:54 AM, Vasu Madhineni 
>>> wrote:
>>>
>>> 
>>> If the data size is more than 6TB, which approach better?
>>>
>>> On Fri, Jun 5, 2020 at 2:57 PM Laurenz Albe 
>>> wrote:
>>>
 On Thu, 2020-06-04 at 23:52 +0800, Vasu Madhineni wrote:
 > We are planning a POC on multitenant architecture in Postgres, Could
 you please
 > help us with steps for multitenant using schema for each application
 model.

 For few tenants, you can keep identical tables in several schemas and
 set "search_path" to select a tenant.

 With many tenants, you are better off with one table that holds the
 data for all clients.  You can use Row Level Security to have each
 tenant see only his or her data, and it might be a good idea to
 use list partitioning on the tenant ID.

 Yours,
 Laurenz Albe
 --
 Cybertec | https://www.cybertec-postgresql.com

 The question is How many separate data owners?
>>>
>>


Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Thorsten Schöning
Hi all,

I have an app exposing web services to generate reports. Those web
services accept multiple reports per request and calculate them 
concurrently. There's one transaction spanning each entire request and
used by ALL spawned threads. The app makes sure that e.g. committing
transactions is handled by ONE thread only and individual statements
are NEVER reused by multiple threads.

There are two types of reports: Some need to read data from the DB
themself, others can work on the data provided by the former ones. The
current implementation always creates temporary tables for results of
some query, so that one can simply check if some temporary table is
available and either only query that or create it before as necessary.

I'm using "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" and the
associated queries can take a long time. So the following lists some
questions about executing those concurrently, even thouzgh I've
already read threads like the following:

> The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend
> to handle concurrency issues any better than regular old CREATE
> TABLE, which is to say not very well.[...]

https://www.postgresql.org/message-id/ca+tgmozadyvtwbfp1fl2smzbihcwt4uprzrlnnx1nb30ku3...@mail.gmail.com

When the table needs to be created, when is it visible to other
threads using the same transaction, before or after executing the
additional query?

Am I correct that with using "IF NOT EXISTS" the associated query is
only executed as well if the table needs to be created? In theory
those two things could be independent of each other, maybe resulting
in duplicated rows or stuff like that per execution. But doesn't seem
so according to my tests.

Am I correct that in my described setup I need to make sure on my own
that only one thread creates each individual temporary table and
executes the associated query? Otherwise it might happen that multiple
attempts creating the table at the same time simply fails and the only
question is if this happens before or after the associated query.

Or would creating the table itself succeeds even with many threads,
but some of them simply wouldn't read any data, becauser the
associated query is executed in a second step by that thread that
created the table in the first place? I don't think so, though.

Am I correct that because of the same transaction used by multiple
threads I need to synchronize them on web service-level? E.g. things
like advisory locks won't work because they have session or
transaction level and would be granted to all threads instantly.

Thanks for your insights!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow





Re: checking existence of a table before updating its SERIAL

2020-06-08 Thread Thomas Kellerer
Matthias Apitz schrieb am 08.06.2020 um 09:53:
> We're updating the SERIAL of a bunch of tables with a SQL script which
> does for any table:
>
> /* table: idm_tasktab */
> DO $$
> DECLARE
>   max_id int;
> BEGIN
>   SELECT INTO max_id GREATEST(COALESCE(max(taskid), 0),0) + 1 FROM 
> idm_tasktab;
>   RAISE NOTICE '% % %', 'idm_tasktab', 'taskid', max_id ;
>   EXECUTE 'ALTER SEQUENCE idm_tasktab_taskid_seq RESTART ' || max_id::text;
> END $$ LANGUAGE plpgsql;
>
> Can some kind soul help me with doing a test for the existence of the
> table to avoid the error message about non existing relation?


I think the easiest way is to use to_regclass():

DO $$
DECLARE
  max_id int;
BEGIN
  if to_regclass('idm_tasktab') is not null then
SELECT INTO max_id GREATEST(COALESCE(max(taskid), 0),0) + 1 FROM 
idm_tasktab;
RAISE NOTICE '% % %', 'idm_tasktab', 'taskid', max_id ;
EXECUTE 'ALTER SEQUENCE idm_tasktab_taskid_seq RESTART ' || max_id::text;
  end if;
END $$ LANGUAGE plpgsql;

Note that you don't really need dynamic SQL for this, you can simplify this to:

  select setval('idm_tasktab_taskid_seq', GREATEST(COALESCE(max(taskid), 0),0))
  from idm_tasktab;


I also don't think greatest() is necessary.

Thomas




Re: When to use PARTITION BY HASH?

2020-06-08 Thread Oleksandr Shulgin
On Sat, Jun 6, 2020 at 6:14 PM Michel Pelletier 
wrote:

>
> Well lets take a step back here and look at the question, hash
> partitioning exists in Postgres, is it useful?  While I appreciate the need
> to see a fact demonstrated, and generally avoiding argument by authority,
> it is true that many of the very smartest database people in the world
> conceived of, discussed, implemented and documented this feature for us.
> It stands to reason that it is useful, or it wouldn't exist.  So maybe this
> is more about finding or needing better partitioning documentation.
>

Fair point.

I've found the original commit adding this feature in version 11:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e
It says:

"Hash partitioning is useful when you want to partition a growing data
set evenly.  This can be useful to keep table sizes reasonable, which
makes maintenance operations such as VACUUM faster, or to enable
partition-wise join."

It also includes a link to discussion, though that starts in the middle of
a long thread.
The original thread is here:
https://www.postgresql.org/message-id/flat/20170228233313.fc14d8b6.nagata%40sraoss.co.jp

However, these threads only argue about implementation details and it's not
easy to find a discussion of motivation for this particular partitioning
scheme support.
I guess it was quite obvious to the participants at that point already.

With hash partitioning you are not expected, in general, to end up with a
>> small number of partitions being accessed more heavily than the rest.  So
>> your indexes will also not fit into memory.
>>
>
> Indexes are not (usually) constant time structures, they take more time
> the bigger they get.  So partitioned indexes will be smaller, quicker to
> insert into, and quicker to vacuum, and also gain possible pruning
> advantages on query when you split them up.  If the planner can, knowing
> the key, exclude all but one partition, it won't even look at the other
> tables, so if you hash partition by primary key, you reduce the search
> space to 1/N immediately.
>
> Indexes with high update activity also suffer from a problem called "index
> bloat" where spares "holes" get punched in the buckets of btree indexes
> from updates and delete deletes.  These holes are minimized by vacuuming
> but the bigger the index gets, the harder that process is to maintain.
> Smaller indexes suffer less from index bloat, and remedying the situation
> is easier because you can reindex partitions independently of each other.
> Your not just reducing the query load to an Nth, you're reducing the
> maintenance load.
>

Thanks for taking your time to explain it in detail.  Though I do not tend
to believe the insert/scan performance benefit is measurable without trying
it, I do see the benefits for maintenance.

I have the feeling that using a hash function to distribute rows simply
>> contradicts the basic assumption of when you would think of partitioning
>> your table at all: that is to make sure the most active part of the table
>> and indexes is small enough to be cached in memory.
>>
>
> I think you might be framing this with a specific data pattern in mind,
> not all data distributions have a "most active" or power law distribution
> of data.
>

I'm just referring to the first bullet-point in the docs:

"Query performance can be improved dramatically in certain situations,
particularly when most of the heavily accessed rows of the table are in a
single partition or a small number of partitions. The partitioning
substitutes for leading columns of indexes, reducing index size and making
it more likely that the heavily-used parts of the indexes fit in memory."

I think it does not apply to hash partitioning in the general case.

--
Alex


Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Daniel Verite
Thorsten Schöning wrote:

> I'm using "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" and the
> associated queries can take a long time. So the following lists some
> questions about executing those concurrently, even thouzgh I've
> already read threads like the following:

> > The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend
> > to handle concurrency issues any better than regular old CREATE
> > TABLE, which is to say not very well.[...]
> 
> https://www.postgresql.org/message-id/ca+tgmozadyvtwbfp1fl2smzbihcwt4uprzrlnnx1nb30ku3...@mail.gmail.com


The caveat you mention about IF NOT EXISTS does not apply to 
temporary tables, as they're not shared across sessions.
That is, if two concurrent transactions execute at the same time
CREATE TEMP TABLE IF NOT EXISTS foo(...)
it can't fail as described above because that creates two distinct tables,
each private to their session.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Thorsten Schöning
Guten Tag Daniel Verite,
am Montag, 8. Juni 2020 um 11:05 schrieben Sie:

> The caveat you mention about IF NOT EXISTS does not apply to
> temporary tables, as they're not shared across sessions.[...]

That's what I understood as well, but I'm creating those concurrently
WITHIN one and the same session and transaction. :-) And that's where
I wanted to make sure that I properly understood things using my
questions.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow





Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Daniel Verite
Thorsten Schöning wrote:

> > The caveat you mention about IF NOT EXISTS does not apply to
> > temporary tables, as they're not shared across sessions.[...]
> 
> That's what I understood as well, but I'm creating those concurrently
> WITHIN one and the same session and transaction. :-) 

But a SQL session on the server takes its statements from a FIFO queue
and processes them serially, so there's no intra-session concurrency.
In fact multi-threaded SQL clients *must* make sure that they don't
send concurrent queries to the same connection. The best they can
do in terms of throughput is to queue up a new query while
the server is busy executing a previous one, but that's pipelining,
not parallelism.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Postgres 12 RLS

2020-06-08 Thread Laura Smith
Hi,

I'm having a little trouble with RLS in Postgres 12, although first time I've 
used RLS, so it might just be me !

The problem is that I am calling a function from a web-app, but the function 
seems to be executing as "postgres" even thouhg the web-app logs in as a 
completely different role ?

This means that current_user in the function resolves to "postgres" instead of 
the app user.

This is an example of a function :
create function addses(p_regid text,p_msgid text,p_reqid text) returns integer 
AS $$
BEGIN
UPDATE foo_regs set reg_aws_ses_msgid=p_msgid,reg_aws_amzn_requestid=p_reqid 
where uuid=p_regid;
IF FOUND THEN
return 1;
ELSE
return 0;
END IF;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
grant execute on function addses(p_regid text,p_msgid text,p_reqid text) to 
myappuser;

The foo_regs table has the following RLS policy:
Policies:
POLICY "foo_regs_policy"
  USING (((event_id = CURRENT_USER) AND (reg_hide IS FALSE)))




Re: Postgres 12 RLS

2020-06-08 Thread Paul Förster
Hi Laura,

> On 08. Jun, 2020, at 12:17, Laura Smith  
> $$ LANGUAGE plpgsql SECURITY DEFINER;

you might want to use security invoker instead of definer.

https://www.postgresql.org/docs/current/sql-createfunction.html#SQL-CREATEFUNCTION-SECURITY

https://www.cybertec-postgresql.com/en/abusing-security-definer-functions/

Cheers,
Paul





Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Ravi Krishna

> That's what I understood as well, but I'm creating those concurrently
> WITHIN one and the same session and transaction. :-)

Did I interpret this as "two different sessions via application threads within 
the same transactions of PG".
Does the thread create its own PG session for each thread or share it across 
threads.

I would be wary of running concurrent threads within the same transactions. 
RDBMSs are not designed
to allow multiple SQLstatements running concurrently within the same session, 
if at all possible.


Re: Postgres 12 RLS

2020-06-08 Thread Laura Smith


On Monday, 8 June 2020 11:25, Paul Förster  wrote:

> Hi Laura,
>
> > On 08. Jun, 2020, at 12:17, Laura Smith n5d9xq3ti233xiyif...@protonmail.ch 
> > $$ LANGUAGE plpgsql SECURITY DEFINER;
>
> you might want to use security invoker instead of definer.
>
> https://www.postgresql.org/docs/current/sql-createfunction.html#SQL-CREATEFUNCTION-SECURITY
>
> https://www.cybertec-postgresql.com/en/abusing-security-definer-functions/
>
> Cheers,
> Paul


Hi Paul,

I had a lightbulb moment just now and tried that, but it doesn't seem to be 
working.

The app returns "pg_execute(): Query failed: ERROR:  permission denied for 
table"

This is despite me:
• Changing to SECURITY INVOKER on the PG function.
• Granting the app user relevant perms on the underlying table
• Re-granting execute for the app on the function

Am I missing somehthing ?






ts_debug() style functions for jsonpath debugging

2020-06-08 Thread Alastair McKinley
Hello everyone,

I am working with jsonpaths heavily and was wondering if there is any method 
for debugging a jsonpath expression in a similar method to ts_debug() for text 
search?

Essentially what I would like to do is debug the splitting of a path into 
tokens or logical components using the same parser as postgres itself.

Is there any method currently available that could be exposed at the SQL layer 
to inspect jsonpath expressions?

Best regards,

Alastair




Re: Postgres 12 RLS

2020-06-08 Thread Paul Förster
Hi Laura,

> On 08. Jun, 2020, at 12:46, Laura Smith  
> I had a lightbulb moment just now and tried that, but it doesn't seem to be 
> working.
> 
> The app returns "pg_execute(): Query failed: ERROR:  permission denied for 
> table"
> 
> This is despite me:
> • Changing to SECURITY INVOKER on the PG function.
> • Granting the app user relevant perms on the underlying table
> • Re-granting execute for the app on the function
> 
> Am I missing somehthing ?

another possibility maybe is to use session_user instead of current_user in 
your policy.

current_usernameuser name of current execution context
session_usernamesession user name

The latter is the name of the user who actually started the session. So it 
should be myappuser in your case.

https://www.postgresql.org/docs/current/functions-info.html

Cheers,
Paul



Re: Postgres 12 RLS

2020-06-08 Thread Laura Smith


On Monday, 8 June 2020 12:42, Paul Förster  wrote:

> Hi Laura,
>
> > On 08. Jun, 2020, at 12:46, Laura Smith n5d9xq3ti233xiyif...@protonmail.ch 
> > I had a lightbulb moment just now and tried that, but it doesn't seem to be 
> > working.
> > The app returns "pg_execute(): Query failed: ERROR: permission denied for 
> > table"
> > This is despite me:
> > • Changing to SECURITY INVOKER on the PG function.
> > • Granting the app user relevant perms on the underlying table
> > • Re-granting execute for the app on the function
> > Am I missing somehthing ?
>
> another possibility maybe is to use session_user instead of current_user in 
> your policy.
>
> current_user name user name of current execution context
> session_user name session user name
>
> The latter is the name of the user who actually started the session. So it 
> should be myappuser in your case.
>
> https://www.postgresql.org/docs/current/functions-info.html
>
> Cheers,
> Paul


Thanks Paul, will experiment with session_user.

But actually I found the solution, the function I was testing was using "INSERT 
ON CONFLICT UPDATE".  And it seems that requires SELECT permissions due to "ON 
CONFLICT" (appuser was previously only granted INSERT and UPDATE).





"INSERT ON CONFLICT UPDATE" - Use of indexes ?

2020-06-08 Thread Laura Smith
Hi,

What'st the current state of play with indexes and ON CONFLICT ?  The docs seem 
to vaguely suggest it is possible, but this SO question 
(https://stackoverflow.com/questions/38945027/) seems to suggest it is not.

I've got a unique named index on a table (i.e. "create unique index xyz...") 
but I cannot seem to be able to refer to it in a function ?
ON CONFLICT (index_name)  : does not work
ON CONFLICT ON CONSTRAINT index_name: does not work




Re: When to use PARTITION BY HASH?

2020-06-08 Thread Ron

On 6/8/20 3:40 AM, Oleksandr Shulgin wrote:
[snip]
I've found the original commit adding this feature in version 11: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e

It says:

"Hash partitioning is useful when you want to partition a growing data
set evenly.  This can be useful to keep table sizes reasonable, which
makes maintenance operations such as VACUUM faster, or to enable
partition-wise join."


How does hashed (meaning "randomly?) distribution of records make 
partition-wise joins more efficient?


Or -- since I interpret that as having to do with "locality of data" -- am I 
misunderstanding the meaning of "partition-wise joins"?


--
Angular momentum makes the world go 'round.


Re: "INSERT ON CONFLICT UPDATE" - Use of indexes ?

2020-06-08 Thread Adrian Klaver

On 6/8/20 5:42 AM, Laura Smith wrote:

Hi,

What'st the current state of play with indexes and ON CONFLICT ?  The docs seem 
to vaguely suggest it is possible, but this SO question 
(https://stackoverflow.com/questions/38945027/) seems to suggest it is not.

I've got a unique named index on a table (i.e. "create unique index xyz...") 
but I cannot seem to be able to refer to it in a function ?
ON CONFLICT (index_name)  : does not work
ON CONFLICT ON CONSTRAINT index_name: does not work



The above is going to need more information:

1) Table definition.

2) Actual index definition

3) Complete ON CONFLICT definition

4) Error message returned.



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: "INSERT ON CONFLICT UPDATE" - Use of indexes ?

2020-06-08 Thread Thomas Kellerer
Laura Smith schrieb am 08.06.2020 um 14:42:
> Hi,
>
> What'st the current state of play with indexes and ON CONFLICT ?  The docs 
> seem to vaguely suggest it is possible, but this SO question 
> (https://stackoverflow.com/questions/38945027/) seems to suggest it is not.
>
> I've got a unique named index on a table (i.e. "create unique index xyz...") 
> but I cannot seem to be able to refer to it in a function ?
> ON CONFLICT (index_name)  : does not work
> ON CONFLICT ON CONSTRAINT index_name: does not work

If c1 and c2 make up a unique index, then

ON CONFLICT (c1, c2)

works






Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Ron

On 6/8/20 3:14 AM, Thorsten Schöning wrote:

Hi all,

I have an app exposing web services to generate reports. Those web
services accept multiple reports per request and calculate them
concurrently. There's one transaction spanning each entire request and
used by ALL spawned threads. The app makes sure that e.g. committing
transactions is handled by ONE thread only and individual statements
are NEVER reused by multiple threads.

There are two types of reports: Some need to read data from the DB
themself, others can work on the data provided by the former ones. The
current implementation always creates temporary tables for results of
some query, so that one can simply check if some temporary table is
available and either only query that or create it before as necessary.

I'm using "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" and the
associated queries can take a long time.


Would a set of GLOBAL temporary tables be a better fit for your problem 
(eliminating the need to create temporary tables each time)?


--
Angular momentum makes the world go 'round.




Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Thorsten Schöning
Guten Tag Daniel Verite,
am Montag, 8. Juni 2020 um 12:07 schrieben Sie:

> But a SQL session on the server takes its statements from a FIFO queue
> and processes them serially, so there's no intra-session concurrency.
> In fact multi-threaded SQL clients *must* make sure that they don't
> send concurrent queries to the same connection.[...]

I was looking at JDBC-docs and those explicitly allow sharing one
connection by multiple threads, while at the same time not explicitly
forbidding concurrent access.

https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts23499.html
https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts89498.html

Therefore I had assumed that concurrent access is available as long as
one follows the recommened practices, which is what I do. Though, that
concurrent access seems to not be available because of additional
synchronization in the JDBC-driver of Postgres.

https://stackoverflow.com/a/52798543/2055163
https://github.com/pgjdbc/pgjdbc/blob/ecbc48ff6ad293a2555465d4f7fb6a33a403af26/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L276

Is that the case with all JDBC-drivers most likely or simply some
implementation detail of some and others might allow truly concurrent
access within one session?

Nevertheless, I think I can at least partly answer some of my
questions now. So thanks for forcing digging deeper into this than
I obviously did before. :-)

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow





Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Thorsten Schöning
Guten Tag Ron,
am Montag, 8. Juni 2020 um 15:35 schrieben Sie:

> Would a set of GLOBAL temporary tables be a better fit for your problem
> (eliminating the need to create temporary tables each time)?

Do you mean plain tables simply created using CREATE TABLE or is there
some additional concept of temporary tables I'm not aware of? Because I
thought truly temporary tables are always session-specific: 

> Temporary tables are automatically dropped at the end of a session,
> or optionally at the end of the current transaction (see ON COMMIT
> below).

https://www.postgresql.org/docs/9.5/sql-createtable.html

It totally depends on runtime and users if those tables are necessary
at all in theory. So the benefit of using session-private once
automatically maintained in my opinion is that I can introduce them
easily as necessary without customizing my schema. My schema and all
changes to it are versioned and I would like to avoid the maintenance
overhead adding temp-tables which can easily change with the
implementation using them.

OTOH, if they are necessary at all at some point, the schema might be
a good place to make aware of them, document their usage and stuff. i
will think about that, so thanks for the hint!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow





Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Thorsten Schöning
Guten Tag Thorsten Schöning,
am Montag, 8. Juni 2020 um 10:14 schrieben Sie:

> When the table needs to be created, when is it visible to other
> threads using the same transaction, before or after executing the
> additional query?

There is a misconception here: Multiple concurrent exec doesn't seem
to be implemented at all, even though the JDBC-docs seem to allow the
sharing of connections. Execution of statements seem to lock the
underlying connection, preventing truly concurrent access by multiple
threads.

https://stackoverflow.com/a/52798543/2055163
https://github.com/pgjdbc/pgjdbc/blob/ecbc48ff6ad293a2555465d4f7fb6a33a403af26/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L276
https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts23499.html
https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts89498.html

So while things don't work as I originally expected, this makes
answering my question easy: Because "CREATE TEMP TABLE IF NOT EXISTS
[...] AS [...]" is ONE statement only forwarded to the driver, its
execution should be atomic with respect to other threads using the
same connection.

It shouldn't make any difference how long the calculation of the
associated query takes, because all other threads need to wait for the
whole statement anyway. And if some other thread comes with the query
subsequently, "IF NOT EXISTS" comes into play and the query should
succeed instantly.

> Am I correct that with using "IF NOT EXISTS" the associated query is
> only executed as well if the table needs to be created?[...]

An answer to that question would still be of interest to me.

> Am I correct that in my described setup I need to make sure on my own
> that only one thread creates each individual temporary table and
> executes the associated query?[...]

No, not unless truly concurrent access is available in the
JDBC-driver. If it would be OTOH, I guess the answer would be yes.

> Am I correct that because of the same transaction used by multiple
> threads I need to synchronize them on web service-level?[...]

Yes and this should be the most performant implementation anyway. In
case of really one and the same connection, everything is handled by
the same JVM already, so can be synchronized within that JVM without
roundtrips to Postgres as well.

[1]: https://stackoverflow.com/a/24089729/2055163

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow





Re: Index no longer being used, destroying and recreating it restores use.

2020-06-08 Thread Koen De Groote
So, this query:

select * from item where shouldbebackedup=true and itemCreated<='2020-06-05
00:00:00.000' and backupperformed=false order by filepath asc, id asc limit
100 offset 10400;

Was made into a function:

create or replace function NeedsBackup(text, int, int default 100)
returns setof item as $$
BEGIN
return query select * from item where shouldbebackedup=true and
itemCreated<=$1::timestamp without time zone and backupperformed=false
order by filepath asc, id asc limit $3 offset $2;
END;
$$
language 'plpgsql';

Having read the documentation, I do think this is STABLE and not VOLATILE?
While the external process performing the query will eventually update
these rows, setting backupperformed to true instead of false, that doesn't
happen within this function.

Doing an EXPLAIN ANALYZE yields this output:

 QUERY PLAN

---
 Function Scan on public.needsbackup  (cost=0.25..10.25 rows=1000
width=5275) (actual time=80.107..80.115 rows=100 loops=1)
   Output: id, shouldbebackedup, backupperformed, itemCreated, filepath
   Function Call: needsbackup('2020-06-08 12:00:00.016'::text, 2100, 100)
   Buffers: shared read=1572
 Planning Time: 0.199 ms
 Execution Time: 80.499 ms
(6 rows)

And the actual output is identical to the full query.

I'm kind of worried I'm not seeing any more details as to how it all
worked. Yes, it's as fast as I hoped, but that also happened last time when
I re-created the index. Fast for a few weeks, then suddenly not anymore.

I tried looking up how to create indexes on functions, but found nothing
that could help, also the suggestion that this is not possible.

So, at this point, what should I still look at, I wonder? EXPLAIN says it
did a function call, but what happened under the hood there?

And before I forget: thank you most kindly for the advice so far, to all
involved.

Regards,
Koen


On Sun, Jun 7, 2020 at 12:45 AM Koen De Groote  wrote:

> I'll attempt this next week.
>
> On Fri, Jun 5, 2020, 21:11 Michael Lewis  wrote:
>
>> Those row estimates are pretty far off.
>>
>> Standard indexes and partial indexes don't get custom statistics created
>> on them, but functional indexes do. I wonder if a small function
>> needs_backup( shouldbebackedup, backupperformed ) and an index created on
>> that function would nicely alleviate the pain. I would expect PG12 and
>> extended statistics of type MCVs would bypass the need for that work around
>> though.
>>
>


troubleshooting postgresql ldap authentication

2020-06-08 Thread Chris Stephens
posgresql 12
centos 7

i am trying to configure ldap authentication.

i have the following pg_hba.conf entry (server and basedn are correct but
not shown):

hostsslall all 0.0.0.0/0  ldap
ldapserver="ldaps://xxx" ldapbasedn="yyy" ldaptls=1

when i attempt to authenticate, i see the following in pg logs:

2020-06-08 15:58:43 CDT [20235]: [1-1]
user=[unknown],db=[unknown],app=[unknown],client=zzz LOG:  connection
received: host=zzz port=12378
2020-06-08 15:58:43 CDT [20235]: [2-1]
user=uuu,db=postgres,app=[unknown],client=zzz LOG:  could not initialize
LDAP: Bad parameter to an ldap routine
2020-06-08 15:58:43 CDT [20235]: [3-1]
user=uuu,db=postgres,app=[unknown],client=zzz FATAL:  LDAP authentication
failed for user "uuu"
2020-06-08 15:58:43 CDT [20235]: [4-1]
user=uuu,db=postgres,app=[unknown],client=zzz DETAIL:  Connection matched
pg_hba.conf line 90: "hostsslall all 0.0.0.0/0
ldap ldapserver="ldaps://xxx" ldapbasedn="yyy" ldaptls=1"

does anyone know what might be causing "LDAP: Bad parameter to an ldap
routine"

ldapsearch works fine w/ the supplied ldapserver and base dn.


Re: Index no longer being used, destroying and recreating it restores use.

2020-06-08 Thread Michael Lewis
On Mon, Jun 8, 2020 at 2:34 PM Koen De Groote  wrote:

> So, this query:
>
> select * from item where shouldbebackedup=true and
> itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order by
> filepath asc, id asc limit 100 offset 10400;
>
> Was made into a function:
>
> create or replace function NeedsBackup(text, int, int default 100)
> returns setof item as $$
> BEGIN
> return query select * from item where shouldbebackedup=true and
> itemCreated<=$1::timestamp without time zone and backupperformed=false
> order by filepath asc, id asc limit $3 offset $2;
> END;
> $$
> language 'plpgsql';
>


What I had meant was a function perhaps called backup_needed_still(
backupperformed bool, shouldbebackedup bool) which would return bool; This
could be written in SQL only with no need for plpgsql. By the way, the
language name being in single quotes is deprecated.

Then you could create an index on the table- either on the timestamp column
where that function returns true, or just creating the index directly on
the boolean result of that function call if that is what is needed to get
the custom stats from a functional index. Then you would include the
function call in your query instead of the two individual boolean columns.


Re: When to use PARTITION BY HASH?

2020-06-08 Thread David Rowley
On Tue, 9 Jun 2020 at 01:07, Ron  wrote:
>
> On 6/8/20 3:40 AM, Oleksandr Shulgin wrote:
> [snip]
>
> I've found the original commit adding this feature in version 11: 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e
> It says:
>
> "Hash partitioning is useful when you want to partition a growing data
> set evenly.  This can be useful to keep table sizes reasonable, which
> makes maintenance operations such as VACUUM faster, or to enable
> partition-wise join."
>
>
> How does hashed (meaning "randomly?) distribution of records make 
> partition-wise joins more efficient?

Hash partitioning certainly does not mean putting the tuple in some
random partition. It means putting the tuple in the partition with the
correct remainder value after dividing the hash value by the largest
partition modulus.

> Or -- since I interpret that as having to do with "locality of data" -- am I 
> misunderstanding the meaning of "partition-wise joins"?

If it was not a partitioned table before then partition-wise joins
wouldn't be possible.  Having partition-wise joins could make joining
two identically partitioned tables faster. We need only look in the
corresponding partition on the other side of the join for join
partners for each tuple. For hash joins, hash tables can be smaller,
which can mean not having to batch, and possibly having the hash table
fit better into a CPU cache. For merge joins, sorts, having the data
partially pre-sorted in chunks means fewer operations for qsort which
can result in speedups.

David




Is it possible to use keywords (date units) in a function definition?

2020-06-08 Thread Alistair Johnson
Hello,

I recently tried to write a wrapper function to calculate the difference
between two dates, mainly as a convenience. I'd essentially be emulating
EXTRACT( FROM date1 - date2), in various ways. I got a bit stuck
on allowing specification of the : is this possible in function
definitions? I'd like to be able to write something along the lines of:

CREATE OR REPLACE FUNCTION DATETIME_DIFF(end TIMESTAMP(3), start
TIMESTAMP(3), datepart UNIT) RETURNS DOUBLE PRECISION AS $$
BEGIN
RETURN EXTRACT(datepart FROM end - start);
END; $$
LANGUAGE PLPGSQL;

One option would be to treat datepart as a string, but this doesn't work
for my use case. (Background: I'm trying to refactor a bunch of SQL scripts
to work on Google BigQuery and PostgreSQL by writing PostgreSQL functions
to emulate BigQuery functions. Unfortunately BigQuery does not recognize
the third argument if it is a string (i.e. 'HOUR' does not work but HOUR
does)).

Any ideas? Is this even possible?

-Alistair


Re: Is it possible to use keywords (date units) in a function definition?

2020-06-08 Thread David G. Johnston
On Mon, Jun 8, 2020 at 2:57 PM Alistair Johnson  wrote:

>
> RETURN EXTRACT(datepart FROM end - start);
> Any ideas? Is this even possible?
>

Use the "date_part" function.

David J.


Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Ron

On 6/8/20 9:23 AM, Thorsten Schöning wrote:

Guten Tag Ron,
am Montag, 8. Juni 2020 um 15:35 schrieben Sie:


Would a set of GLOBAL temporary tables be a better fit for your problem
(eliminating the need to create temporary tables each time)?

Do you mean plain tables simply created using CREATE TABLE or is there
some additional concept of temporary tables I'm not aware of? Because I
thought truly temporary tables are always session-specific:


Hmm...  Postgres doesn't support global temporary tables.  Never mind. :)

--
Angular momentum makes the world go 'round.




Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Adrian Klaver

On 6/8/20 3:31 PM, Ron wrote:

On 6/8/20 9:23 AM, Thorsten Schöning wrote:

Guten Tag Ron,
am Montag, 8. Juni 2020 um 15:35 schrieben Sie:


Would a set of GLOBAL temporary tables be a better fit for your problem
(eliminating the need to create temporary tables each time)?

Do you mean plain tables simply created using CREATE TABLE or is there
some additional concept of temporary tables I'm not aware of? Because I
thought truly temporary tables are always session-specific:


Hmm...  Postgres doesn't support global temporary tables.  Never mind. :)



There is a newly released extension that offers that:

https://www.postgresql.org/message-id/4a390375-0a69-8901-fc5a-4a0336c5c6b4%40darold.net

--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Is it possible to use keywords (date units) in a function definition?

2020-06-08 Thread Adrian Klaver

On 6/8/20 3:10 PM, David G. Johnston wrote:
On Mon, Jun 8, 2020 at 2:57 PM Alistair Johnson > wrote:



RETURN EXTRACT(datepart FROM end - start);
Any ideas? Is this even possible?


Use the "date_part" function.


That would work on the Postgres side, but the OP also wants this to run 
on BigQuery and unfortunately it won't work there.




David J.



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: "INSERT ON CONFLICT UPDATE" - Use of indexes ?

2020-06-08 Thread David Rowley
On Tue, 9 Jun 2020 at 00:42, Laura Smith
 wrote:
> What'st the current state of play with indexes and ON CONFLICT ?  The docs 
> seem to vaguely suggest it is possible, but this SO question 
> (https://stackoverflow.com/questions/38945027/) seems to suggest it is not.
>
> I've got a unique named index on a table (i.e. "create unique index xyz...") 
> but I cannot seem to be able to refer to it in a function ?
> ON CONFLICT (index_name)  : does not work
> ON CONFLICT ON CONSTRAINT index_name: does not work

Creating a unique index does not create a unique constraint.  If you
create a unique constraint, it'll create a unique index to enforce the
constraint.  ON CONSTRAINT requires a constraint name, not an index
name.

David




Re: Is it possible to use keywords (date units) in a function definition?

2020-06-08 Thread David G. Johnston
On Mon, Jun 8, 2020 at 2:57 PM Alistair Johnson  wrote:

> Hello,
>
> I recently tried to write a wrapper function to calculate the difference
> between two dates, mainly as a convenience. I'd essentially be emulating
> EXTRACT( FROM date1 - date2), in various ways. I got a bit stuck
> on allowing specification of the : is this possible in function
> definitions? I'd like to be able to write something along the lines of:
>
> CREATE OR REPLACE FUNCTION DATETIME_DIFF(end TIMESTAMP(3), start
> TIMESTAMP(3), datepart UNIT) RETURNS DOUBLE PRECISION AS $$
> BEGIN
> RETURN EXTRACT(datepart FROM end - start);
> END; $$
> LANGUAGE PLPGSQL;
>
> One option would be to treat datepart as a string, but this doesn't work
> for my use case. (Background: I'm trying to refactor a bunch of SQL scripts
> to work on Google BigQuery and PostgreSQL by writing PostgreSQL functions
> to emulate BigQuery functions. Unfortunately BigQuery does not recognize
> the third argument if it is a string (i.e. 'HOUR' does not work but HOUR
> does)).
>
> Any ideas? Is this even possible?
>
>
I think you need to be more specific as to what "this" means.

Looking again after Andrian's comment are you trying to write, in the
script file:

datetime_diff('start time as string'::timestamp, 'end time as
string'::timestamp, HOUR)

and get PostgreSQL to recognize the value HOUR as a custom type
value without single quotes surrounding it

If that is the question the answer is no.  The only type literals that can
be written without single quotes are numbers.

The parsing of SQL can handle some standard mandated non-quoted constants
but they are basically keywords, not values.

David J.


Is it possible to use keywords (date units) in a function definition?

2020-06-08 Thread Martin Gainty
Hello,

I recently tried to write a wrapper function to calculate the difference 
between two dates, mainly as a convenience. I'd essentially be emulating 
EXTRACT( FROM date1 - date2), in various ways. I got a bit stuck on 
allowing specification of the : is this possible in function 
definitions? I'd like to be able to write something along the lines of:

CREATE OR REPLACE FUNCTION DATETIME_DIFF(end TIMESTAMP(3), start TIMESTAMP(3), 
datepart UNIT) RETURNS DOUBLE PRECISION AS $$
BEGIN
RETURN EXTRACT(datepart FROM end - start);
END; $$
LANGUAGE PLPGSQL;

One option would be to treat datepart as a string, but this doesn't work for my 
use case. (Background: I'm trying to refactor a bunch of SQL scripts to work on 
Google BigQuery and PostgreSQL by writing PostgreSQL functions to emulate 
BigQuery functions. Unfortunately BigQuery does not recognize the third 
argument if it is a string (i.e. 'HOUR' does not work but HOUR does)).

Any ideas? Is this even possible?

-Alistair


12.2: Why do my Redo Logs disappear??

2020-06-08 Thread Peter
Hi all,
  this is a 12.2 Release on FreeBSD 11.3.

I am doing RedoLog Archiving according to Docs Chapter 25.1. 

During the last week I have lost 4 distinct Redo Logs; they are
not in the backup.

Loosing a RedoLog is very bad, because there is no redundancy,
loosing a single one of them makes the timeline disappear and it
will only reappear after another Base Backup. Very very bad.

So I did analyze the matter.

There are three ways to restart a Berkeley machine:

1. Cowboy it (aka pull-the-plug). This makes sure that everything is
   certainly dead immediately, and, given all hard- and software is
   well designed, nothing bad should happen.

2. Shut it down. This is the official means, and it takes very long,
   because each and every applications are signalled and given time to
   bring up whatever objections they may have.

   In this case "pg_ctl stop" will be invoked with whatever options the
   sysop has configured, and postgres will copy out a full log into
   archive before terminating.

3. Halt/Reboot it, like this:
   
https://www.freebsd.org/cgi/man.cgi?query=reboot&sektion=8&manpath=FreeBSD+11.3-RELEASE
   This is considered more safe than pull-the-plug, and still fast.
   Applications are killed without much waiting, but all disk buffers
   are flushed to permanent storage and filesystems closed.
   
   In this case, it seems,  Postgres will delete the current log
   without archiving it. :(

   What precisely happens (according to the OS sources) during reboot
   is this: processes will be sent SIGTERM, and after some 2-10
   seconds followed by SIGKILL.

--

Lets have a closer look:

We did a regular shutdown at 17:09, and then we did a reboot at 19:24.

Here is the content of the staging area (where the logs are
copied to and accumulated until it is worth to run a backup job):

# dir arch/
total 240
drwx--  2 postgres  postgres 5 Jun  8 17:09 .
drwxr-xr-x  6 postgres  postgres 7 Jun  8 17:09 ..
-rw---  1 postgres  postgres  16777216 Jun  8 09:38 
0001001700FC.ok
-rw---  1 postgres  postgres  16777216 Jun  8 10:48 
0001001700FD.ok
-rw---  1 postgres  postgres  16777216 Jun  8 17:09 
0001001700FE.ok

And here is the pg_wal directory:

# dir data12/pg_wal/
total 89256
drwx--   3 postgres  postgres10 Jun  8 19:28 .
drwx--  19 postgres  postgres23 Jun  8 19:28 ..
-rw---   1 postgres  postgres   335 Jun  7 07:36 
0001001700EF.0060.backup
-rw---   1 postgres  postgres  16777216 Jun  8 19:38 
00010018
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180001
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180002
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180003
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180004
-rw---   1 postgres  postgres  16777216 Jun  7 07:36 
000100180005
drwx--   2 postgres  postgres 3 Jun  8 17:09 archive_status
# dir data12/pg_wal/archive_status
total 23
drwx--  2 postgres  postgres   3 Jun  8 17:09 .
drwx--  3 postgres  postgres  10 Jun  8 19:28 ..
-rw---  1 postgres  postgres   0 Jun  7 07:36 
0001001700EF.0060.backup.done

Now where the hell is my "FF" log ???

Lets check syslog - this was the full shutdown at 17:09:

Jun  8 17:09:38  admn pg-bck[73534]: [10-1] :[] LOG:  0: 
received fast shutdown request
Jun  8 17:09:38  admn pg-bck[73534]: [10-2] :[] LOCATION:  pmdie, 
postmaster.c:2780
Jun  8 17:09:38  admn pg-bck[73534]: [11-1] :[] LOG:  0: 
aborting any active transactions
Jun  8 17:09:38  admn pg-bck[73534]: [11-2] :[] LOCATION:  pmdie, 
postmaster.c:2813
Jun  8 17:09:38  admn pg-bck[73549]: [10-1] :[] DEBUG:  0: 
logical replication launcher shutting down
Jun  8 17:09:38  admn pg-bck[73549]: [10-2] :[] LOCATION:  
ProcessInterrupts, postgres.c:2981
Jun  8 17:09:38  admn pg-bck[73534]: [12-1] :[] LOG:  0: 
background worker "logical replication launcher" (PID 73549) exited with exit 
code 1
Jun  8 17:09:38  admn pg-bck[73534]: [12-2] :[] LOCATION:  
LogChildExit, postmaster.c:3657
Jun  8 17:09:38  admn pg-bck[73544]: [13-1] :[] LOG:  0: 
shutting down
Jun  8 17:09:38  admn pg-bck[73544]: [13-2] :[] LOCATION:  
ShutdownXLOG, xlog.c:8321
Jun  8 17:09:45  admn pg-bck[82223]: RedoLog.bck invoked 
pg_wal/0001001700FE 0001001700FE
Jun  8 17:09:45  admn pg-bck[82223]: RedoLog.bck 
pg_wal/0001001700FE 0001001700FE returns 0
Jun  8 17:09:45  admn pg-bck[73547]: [8-1] :[] DEBUG:  0: 
archived write-ahead log file "0001001700FE"
Jun  8 17:09:45  admn pg-bck[73547]: [8-2] :[] LOCATION:  
pgarch_archiveXlog, pgarch.c:675
Jun  8 17:09:55  admn pg-bck[73544]: [14-1] :[] LOG:  0: 
checkpoint starting: shutdown immediate
Jun  8 17:09:55  admn pg-bck[735

Re: Is it possible to use keywords (date units) in a function definition?

2020-06-08 Thread David G. Johnston
On Mon, Jun 8, 2020 at 5:06 PM Martin Gainty  wrote:

> CREATE OR REPLACE FUNCTION DATETIME_DIFF(end TIMESTAMP(3), start
> TIMESTAMP(3), datepart UNIT) RETURNS DOUBLE PRECISION AS $$
>

Duplicate email from account (same sender) - already answered on the
original/correct thread.

David J.


Re: 12.2: Why do my Redo Logs disappear??

2020-06-08 Thread Adrian Klaver

On 6/8/20 5:02 PM, Peter wrote:

Hi all,
   this is a 12.2 Release on FreeBSD 11.3.

I am doing RedoLog Archiving according to Docs Chapter 25.1.


There is no ReDo logging, there is WAL logging.

What docs, because section 25.1 in the Postgres docs is :

https://www.postgresql.org/docs/12/backup-dump.html

25.1. SQL Dump



During the last week I have lost 4 distinct Redo Logs; they are
not in the backup.

Loosing a RedoLog is very bad, because there is no redundancy,
loosing a single one of them makes the timeline disappear and it
will only reappear after another Base Backup. Very very bad.

So I did analyze the matter.

There are three ways to restart a Berkeley machine:

1. Cowboy it (aka pull-the-plug). This makes sure that everything is
certainly dead immediately, and, given all hard- and software is
well designed, nothing bad should happen.

2. Shut it down. This is the official means, and it takes very long,
because each and every applications are signalled and given time to
bring up whatever objections they may have.

In this case "pg_ctl stop" will be invoked with whatever options the
sysop has configured, and postgres will copy out a full log into
archive before terminating.

3. Halt/Reboot it, like this:

https://www.freebsd.org/cgi/man.cgi?query=reboot&sektion=8&manpath=FreeBSD+11.3-RELEASE
This is considered more safe than pull-the-plug, and still fast.
Applications are killed without much waiting, but all disk buffers
are flushed to permanent storage and filesystems closed.

In this case, it seems,  Postgres will delete the current log

without archiving it. :(

What precisely happens (according to the OS sources) during reboot
is this: processes will be sent SIGTERM, and after some 2-10
seconds followed by SIGKILL.


https://www.postgresql.org/docs/12/server-shutdown.html
"
Important

It is best not to use SIGKILL to shut down the server. Doing so will 
prevent the server from releasing shared memory and semaphores. 
Furthermore, SIGKILL kills the postgres process without letting it relay 
the signal to its subprocesses, so it might be necessary to kill the 
individual subprocesses by hand as well.


To terminate an individual session while allowing other sessions to 
continue, use pg_terminate_backend() (see Table 9.83) or send a SIGTERM 
signal to the child process associated with the session."


What is RedoLog.bck?



--

Lets have a closer look:

We did a regular shutdown at 17:09, and then we did a reboot at 19:24.

Here is the content of the staging area (where the logs are
copied to and accumulated until it is worth to run a backup job):

# dir arch/
total 240
drwx--  2 postgres  postgres 5 Jun  8 17:09 .
drwxr-xr-x  6 postgres  postgres 7 Jun  8 17:09 ..
-rw---  1 postgres  postgres  16777216 Jun  8 09:38 
0001001700FC.ok
-rw---  1 postgres  postgres  16777216 Jun  8 10:48 
0001001700FD.ok
-rw---  1 postgres  postgres  16777216 Jun  8 17:09 
0001001700FE.ok

And here is the pg_wal directory:

# dir data12/pg_wal/
total 89256
drwx--   3 postgres  postgres10 Jun  8 19:28 .
drwx--  19 postgres  postgres23 Jun  8 19:28 ..
-rw---   1 postgres  postgres   335 Jun  7 07:36 
0001001700EF.0060.backup
-rw---   1 postgres  postgres  16777216 Jun  8 19:38 
00010018
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180001
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180002
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180003
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180004
-rw---   1 postgres  postgres  16777216 Jun  7 07:36 
000100180005
drwx--   2 postgres  postgres 3 Jun  8 17:09 archive_status
# dir data12/pg_wal/archive_status
total 23
drwx--  2 postgres  postgres   3 Jun  8 17:09 .
drwx--  3 postgres  postgres  10 Jun  8 19:28 ..
-rw---  1 postgres  postgres   0 Jun  7 07:36 
0001001700EF.0060.backup.done

Now where the hell is my "FF" log ???

Lets check syslog - this was the full shutdown at 17:09:

Jun  8 17:09:38  admn pg-bck[73534]: [10-1] :[] LOG:  0: 
received fast shutdown request
Jun  8 17:09:38  admn pg-bck[73534]: [10-2] :[] LOCATION:  pmdie, 
postmaster.c:2780
Jun  8 17:09:38  admn pg-bck[73534]: [11-1] :[] LOG:  0: 
aborting any active transactions
Jun  8 17:09:38  admn pg-bck[73534]: [11-2] :[] LOCATION:  pmdie, 
postmaster.c:2813
Jun  8 17:09:38  admn pg-bck[73549]: [10-1] :[] DEBUG:  0: 
logical replication launcher shutting down
Jun  8 17:09:38  admn pg-bck[73549]: [10-2] :[] LOCATION:  
ProcessInterrupts, postgres.c:2981
Jun  8 17:09:38  admn pg-bck[73534]: [12-1] :[] LOG:  0: background 
worker "logical replication launcher" (PID 73549) exited 

Re: 12.2: Why do my Redo Logs disappear??

2020-06-08 Thread Peter Geoghegan
On Mon, Jun 8, 2020 at 5:17 PM Peter  wrote:
> Loosing a RedoLog is very bad, because there is no redundancy,
> loosing a single one of them makes the timeline disappear and it
> will only reappear after another Base Backup. Very very bad.

>In this case, it seems,  Postgres will delete the current log
>without archiving it. :(

I strongly suspect that you were hit by the bug fixed in commit
4e87c483. You should upgrade to Postgres 12.3 ASAP, to get that fix:

"Avoid premature recycling of WAL segments during crash recovery
(Jehan-Guillaume de Rorthais)

WAL segments that become ready to be archived during crash recovery
were potentially recycled without being archived."

Sorry that you were affected by this bug -- it really sucks.

-- 
Peter Geoghegan




Re: troubleshooting postgresql ldap authentication

2020-06-08 Thread Thomas Munro
On Tue, Jun 9, 2020 at 9:05 AM Chris Stephens  wrote:
> hostsslall all 0.0.0.0/0  ldap 
> ldapserver="ldaps://xxx" ldapbasedn="yyy" ldaptls=1

> does anyone know what might be causing "LDAP: Bad parameter to an ldap 
> routine"

You probably want ldapurl="ldaps://xxx" (note: ldapurl, not
ldapserver).  Or you could use ldapscheme="ldaps" and
ldapserver="xxx".




Re: 12.2: Why do my Redo Logs disappear??

2020-06-08 Thread Peter
On Mon, Jun 08, 2020 at 05:40:20PM -0700, Peter Geoghegan wrote:
! 
! I strongly suspect that you were hit by the bug fixed in commit
! 4e87c483. You should upgrade to Postgres 12.3 ASAP, to get that fix:
! 
! "Avoid premature recycling of WAL segments during crash recovery
! (Jehan-Guillaume de Rorthais)
! 
! WAL segments that become ready to be archived during crash recovery
! were potentially recycled without being archived."

Ahh, thank You so much; this is good news: if it is an already known
bug, I can close all efforts, remove the debug-switches again, stuff
the sources back into their box and relax. ;)

I did a check if I would find something about lost archiving logs, but
didn't - and then it is always possible that it's just a mistake
of mine - as I know I do make mistakes.

And now for the nitpicking part :)

On Mon, Jun 08, 2020 at 05:35:40PM -0700, Adrian Klaver wrote:

! > I am doing RedoLog Archiving according to Docs Chapter 25.1.
! 
! There is no ReDo logging, there is WAL logging.

Yes I know - and i don't care. Technically they're no longer WAL when
they're going to be archived. Their purpose then becomes to redo the
transactions, and even the messages say so:
! > LOG:  0: redo done at 17/FF01C098

! What docs, because section 25.1 in the Postgres docs is :
! 
! https://www.postgresql.org/docs/12/backup-dump.html
! 
! 25.1. SQL Dump

Ups, mistake of mine. Should be 25.3.1.

! https://www.postgresql.org/docs/12/server-shutdown.html
! "
! Important
! 
! It is best not to use SIGKILL to shut down the server. Doing so will prevent
! the server from releasing shared memory and semaphores. Furthermore, SIGKILL
! kills the postgres process without letting it relay the signal to its
! subprocesses, so it might be necessary to kill the individual subprocesses
! by hand as well.

And which of these would be of any concern if the machine is rebooted
anyway?

I had to install new hardware, and currently I'm trying to identify
a memory exhaustion issue. This makes it necessary to reboot the full
system quite often, and I neither want to wait for orderly termination
of dozens of subsytems, nor do I want to need fsck at restart. This
would make SIGKILL/reboot the method of choice.

! What is RedoLog.bck?

The script which one has to write according to the docs' section which
would be correctly numbered 25.3.1.

cheerio,
PMc




Re: 12.2: Why do my Redo Logs disappear??

2020-06-08 Thread Adrian Klaver

On 6/8/20 6:38 PM, Peter wrote:

On Mon, Jun 08, 2020 at 05:40:20PM -0700, Peter Geoghegan wrote:




And now for the nitpicking part :)

On Mon, Jun 08, 2020 at 05:35:40PM -0700, Adrian Klaver wrote:

! > I am doing RedoLog Archiving according to Docs Chapter 25.1.
!
! There is no ReDo logging, there is WAL logging.

Yes I know - and i don't care. Technically they're no longer WAL when
they're going to be archived. Their purpose then becomes to redo the


Not according to the section you are referring to:

25.3.1. Setting Up WAL Archiving

Redoing is the process of replaying the WAL logs.



Ups, mistake of mine. Should be 25.3.1.




And which of these would be of any concern if the machine is rebooted
anyway?

I had to install new hardware, and currently I'm trying to identify
a memory exhaustion issue. This makes it necessary to reboot the full
system quite often, and I neither want to wait for orderly termination
of dozens of subsytems, nor do I want to need fsck at restart. This
would make SIGKILL/reboot the method of choice.


That is your prerogative, just not sure it is conducive to the health of 
your system.




! What is RedoLog.bck?

The script which one has to write according to the docs' section which
would be correctly numbered 25.3.1.

cheerio,
PMc





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Something else about Redo Logs disappearing

2020-06-08 Thread Peter


Actually, the affair had some good side: as usual I was checking
my own designs first and looking for flaws, and indeed I found one: 

If you do copy out the archive logs not directly to tape, but to
some disk area for further processing, then there is an issue with  
possible loss. If you do it like the docs say, with a command like  
this:   

archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p 
+/mnt/server/archivedir/%f'  # Unix 

That "cp" is usually not synchronous. So there is the possibility   
that this command terminates successfully, and reports exitcode zero
back to the Postgres, and then the Postgres will consider that log  
being safely away.  

But the target of the copy may not yet been written to disk. If 
at that point a power loss happens, the log may become missing/damaged/ 
incomplete, while the database may or may not consider it done  
when restarting.

Therefore, mounting such a target filesystem in all-synchronous mode
might be a good idea. (UFS: "-o sync", ZFS: "set sync=always")  

cheerio,
PMc 




Re: Something else about Redo Logs disappearing

2020-06-08 Thread Adrian Klaver



On 6/8/20 7:33 PM, Peter wrote:


Actually, the affair had some good side: as usual I was checking
my own designs first and looking for flaws, and indeed I found one:
 
If you do copy out the archive logs not directly to tape, but to

some disk area for further processing, then there is an issue with
possible loss. If you do it like the docs say, with a command like
this:
 
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p

+/mnt/server/archivedir/%f'  # Unix
 
That "cp" is usually not synchronous. So there is the possibility

that this command terminates successfully, and reports exitcode zero
back to the Postgres, and then the Postgres will consider that log
being safely away.


Which is why just following the above command in the docs is:

"(This is an example, not a recommendation, and might not work on all 
platforms.) "


Generally for peace of mind folks use third party tools like:

pg_backrest(https://pgbackrest.org/),
pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) 
or Barman(https://www.pgbarman.org/).


as they offer safety checks for your backups.

I use pg_backrest, but it does not look promising for running on BSD:
https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html

Not sure about pg_probackup.

Barman is Python package:
http://docs.pgbarman.org/release/2.10/#installation-from-sources

 
But the target of the copy may not yet been written to disk. If

at that point a power loss happens, the log may become missing/damaged/
incomplete, while the database may or may not consider it done
when restarting.
 
Therefore, mounting such a target filesystem in all-synchronous mode

might be a good idea. (UFS: "-o sync", ZFS: "set sync=always")
 
cheerio,

PMc





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Something else about Redo Logs disappearing

2020-06-08 Thread Tom Lane
Adrian Klaver  writes:
> On 6/8/20 7:33 PM, Peter wrote:
>> That "cp" is usually not synchronous. So there is the possibility
>> that this command terminates successfully, and reports exitcode zero
>> back to the Postgres, and then the Postgres will consider that log
>> being safely away.

> Which is why just following the above command in the docs is:
> "(This is an example, not a recommendation, and might not work on all 
> platforms.) "

Yeah.  There have been discussions about changing that disclaimer to be
more forceful, because in point of fact a plain "cp" is generally not safe
enough.  You need to fsync the written file, and on many filesystems you
also have to fsync the directory it's in.

> Generally for peace of mind folks use third party tools like:

+1.  Rolling your own archive script is seldom advisable.

regards, tom lane




Re: troubleshooting postgresql ldap authentication

2020-06-08 Thread Paul Förster
Hi Chris,

> On 08. Jun, 2020, at 23:05, Chris Stephens  wrote:
> posgresql 12
> centos 7

here: PostgreSQL 11.6 & 12.3, SLES 12 & 15

> hostsslall all 0.0.0.0/0  ldap 
> ldapserver="ldaps://xxx" ldapbasedn="yyy" ldaptls=1

we use:

hostssl all all 0.0.0.0/0 ldap ldapserver="host1 host2" ldapport=389 
ldapsearch="dc=domain,dc=example,dc=com" ldapsearchattribute="uid" ldaptls=1

Works pretty well.

Cheers,
Paul