from:"Thorsten Schöning"

Hi all,

I have an app exposing web services to generate reports. Those web
services accept multiple reports per request and calculate them
concurrently. There's one transaction spanning each entire request and
used by ALL spawned threads. The app makes sure that e.g. committing
transactions is handled by ONE thread only and individual statements
are NEVER reused by multiple threads.

There are two types of reports: Some need to read data from the DB
themself, others can work on the data provided by the former ones. The
current implementation always creates temporary tables for results of
some query, so that one can simply check if some temporary table is
available and either only query that or create it before as necessary.

I'm using "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" and the
associated queries can take a long time. So the following lists some
questions about executing those concurrently, even thouzgh I've
already read threads like the following:

> The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend
> to handle concurrency issues any better than regular old CREATE
> TABLE, which is to say not very well.[...]

https://www.postgresql.org/message-id/ca+tgmozadyvtwbfp1fl2smzbihcwt4uprzrlnnx1nb30ku3...@mail.gmail.com

When the table needs to be created, when is it visible to other
threads using the same transaction, before or after executing the
additional query?

Am I correct that with using "IF NOT EXISTS" the associated query is
only executed as well if the table needs to be created? In theory
those two things could be independent of each other, maybe resulting
in duplicated rows or stuff like that per execution. But doesn't seem
so according to my tests.

Am I correct that in my described setup I need to make sure on my own
that only one thread creates each individual temporary table and
executes the associated query? Otherwise it might happen that multiple
attempts creating the table at the same time simply fails and the only
question is if this happens before or after the associated query.

Or would creating the table itself succeeds even with many threads,
but some of them simply wouldn't read any data, becauser the
associated query is executed in a second step by that thread that
created the table in the first place? I don't think so, though.

Am I correct that because of the same transaction used by multiple
threads I need to synchronize them on web service-level? E.g. things
like advisory locks won't work because they have session or
transaction level and would be granted to all threads instantly.

Thanks for your insights!

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme http://www.AM-SoFT.de/

Telefon...05151- 9468- 55
Fax...05151- 9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

Guten Tag Daniel Verite,
am Montag, 8. Juni 2020 um 11:05 schrieben Sie:

> The caveat you mention about IF NOT EXISTS does not apply to
> temporary tables, as they're not shared across sessions.[...]

That's what I understood as well, but I'm creating those concurrently
WITHIN one and the same session and transaction. :-) And that's where
I wanted to make sure that I properly understood things using my
questions.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

Guten Tag Daniel Verite,
am Montag, 8. Juni 2020 um 12:07 schrieben Sie:

> But a SQL session on the server takes its statements from a FIFO queue
> and processes them serially, so there's no intra-session concurrency.
> In fact multi-threaded SQL clients *must* make sure that they don't
> send concurrent queries to the same connection.[...]

I was looking at JDBC-docs and those explicitly allow sharing one
connection by multiple threads, while at the same time not explicitly
forbidding concurrent access.

https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts23499.html
https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts89498.html

Therefore I had assumed that concurrent access is available as long as
one follows the recommened practices, which is what I do. Though, that
concurrent access seems to not be available because of additional
synchronization in the JDBC-driver of Postgres.

https://stackoverflow.com/a/52798543/2055163
https://github.com/pgjdbc/pgjdbc/blob/ecbc48ff6ad293a2555465d4f7fb6a33a403af26/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L276

Is that the case with all JDBC-drivers most likely or simply some
implementation detail of some and others might allow truly concurrent
access within one session?

Nevertheless, I think I can at least partly answer some of my
questions now. So thanks for forcing digging deeper into this than
I obviously did before. :-)

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

Guten Tag Ron,
am Montag, 8. Juni 2020 um 15:35 schrieben Sie:

> Would a set of GLOBAL temporary tables be a better fit for your problem
> (eliminating the need to create temporary tables each time)?

Do you mean plain tables simply created using CREATE TABLE or is there
some additional concept of temporary tables I'm not aware of? Because I
thought truly temporary tables are always session-specific: 

> Temporary tables are automatically dropped at the end of a session,
> or optionally at the end of the current transaction (see ON COMMIT
> below).

https://www.postgresql.org/docs/9.5/sql-createtable.html

It totally depends on runtime and users if those tables are necessary
at all in theory. So the benefit of using session-private once
automatically maintained in my opinion is that I can introduce them
easily as necessary without customizing my schema. My schema and all
changes to it are versioned and I would like to avoid the maintenance
overhead adding temp-tables which can easily change with the
implementation using them.

OTOH, if they are necessary at all at some point, the schema might be
a good place to make aware of them, document their usage and stuff. i
will think about that, so thanks for the hint!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

Guten Tag Thorsten Schöning,
am Montag, 8. Juni 2020 um 10:14 schrieben Sie:

> When the table needs to be created, when is it visible to other
> threads using the same transaction, before or after executing the
> additional query?

There is a misconception here: Multiple concurrent exec doesn't seem
to be implemented at all, even though the JDBC-docs seem to allow the
sharing of connections. Execution of statements seem to lock the
underlying connection, preventing truly concurrent access by multiple
threads.

https://stackoverflow.com/a/52798543/2055163
https://github.com/pgjdbc/pgjdbc/blob/ecbc48ff6ad293a2555465d4f7fb6a33a403af26/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L276
https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts23499.html
https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts89498.html

So while things don't work as I originally expected, this makes
answering my question easy: Because "CREATE TEMP TABLE IF NOT EXISTS
[...] AS [...]" is ONE statement only forwarded to the driver, its
execution should be atomic with respect to other threads using the
same connection.

It shouldn't make any difference how long the calculation of the
associated query takes, because all other threads need to wait for the
whole statement anyway. And if some other thread comes with the query
subsequently, "IF NOT EXISTS" comes into play and the query should
succeed instantly.

> Am I correct that with using "IF NOT EXISTS" the associated query is
> only executed as well if the table needs to be created?[...]

An answer to that question would still be of interest to me.

> Am I correct that in my described setup I need to make sure on my own
> that only one thread creates each individual temporary table and
> executes the associated query?[...]

No, not unless truly concurrent access is available in the
JDBC-driver. If it would be OTOH, I guess the answer would be yes.

> Am I correct that because of the same transaction used by multiple
> threads I need to synchronize them on web service-level?[...]

Yes and this should be the most performant implementation anyway. In
case of really one and the same connection, everything is handled by
the same JVM already, so can be synchronized within that JVM without
roundtrips to Postgres as well.

[1]: https://stackoverflow.com/a/24089729/2055163

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

How to restore a dump containing CASTs into a database with a new user?

Hi all,

one of my apps and databases uses custom CASTs and is used with the
user "postgres" for historical reasons. I would like to change that to
use a non-superuser for that app+database only. So I dumped the DB
using the C-format and tried to restore into a newly creeated DB:

> createuser --encrypted --pwprompt ams_sm_mtg
> createdb --encoding=UTF-8 --locale=de_DE.UTF-8 --owner=ams_sm_mtg 
> --template=template0 ams_sm_mtg
> pg_restore --dbname=ams_sm_mtg --no-owner --role=ams_sm_mtg < pg_dump/dump.c

Expectation was that whatever gets created in that DB is owned by the
new user. But I'm running into the following error:

> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 2721; 2605 342334 CAST CAST 
> (character varying AS inet)
> pg_restore: [archiver (db)] could not execute query: ERROR:  must be owner of 
> type character varying or type inet
> Command was: CREATE CAST (character varying AS inet) WITH FUNCTION 
> public.fn_cast_inet_from_varchar(character varying, integer, boolean) AS 
> ASSIGNMENT;

Most things I've found are related to that topic is about PLPGSQL,
which is not the problem I have (yet?). Somewhere the use of "-n" has
been suggested and restoring succeeds with that, but my CASTs are
missing afterwards. So that is not a solution.

> pg_restore --dbname=ams_sm_mtg --no-owner --role=ams_sm_mtg -n public < 
> pg_dump/dump.c

I'm having trouble to understand the root cause and how things should
work:

Is there some wrong owner in the dump related to anything of the CAST?

Or can the CAST really only be created when the new DB-owner owns
those types? Is that necessary per database then or globally for the
public schema or one of the template databases or ...?

What is the proper way to restore a dump containing arbitrary CASTs? I
would have expected that pg_restore is handling everything, like it
does for all other objects.

Thanks!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How to restore a dump containing CASTs into a database with a new user?

Guten Tag Tom Lane,
am Sonntag, 19. Juli 2020 um 20:37 schrieben Sie:

> It's a security thing.  A user who can create such a cast can thereby
> change the behavior of other people's queries.

I'm not sure what your are telling me: Can CASTs only be created by
superusers? I didn't read that in the docs.

If they are not only created by superusers, how can I restore CASTs to
a database owned by some other user? There are no other users than
the one owning the database in my case.

Even if there was, I don't see why CASTs should be handled differently
than all other objects successfully restored to have a new owner, when
CASTs seem to be per database as well. Looking at the source database,
the CASTs in question are only shown for the database they are used
in, no other.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How to restore a dump containing CASTs into a database with a new user?

Guten Tag Thorsten Schöning,
am Sonntag, 19. Juli 2020 um 21:51 schrieben Sie:

> If they are not only created by superusers, how can I restore CASTs to
> a database owned by some other user? There are no other users than
> the one owning the database in my case.

I've retried things and must have done something wrong before, but the
following works now:

> dropdb ams_sm_mtg
> createdb --encoding=UTF-8 --locale=de_DE.UTF-8 --owner=ams_sm_mtg 
> --template=template0 ams_sm_mtg
> psql --dbname=ams_sm_mtg
> ALTER TYPE inet OWNER TO ams_sm_mtg;

The latter is the important part and needs to be done as someone who
owns the type right now. I might have missed issuing ALTER in the
correct database before. The following doesn't work as well:

> ams_sm_mtg=# SET ROLE ams_sm_mtg;
> SET
> ams_sm_mtg=> ALTER TYPE inet OWNER TO ams_sm_mtg;
> ERROR:  must be owner of type inet

So only owners of types can give them to someone else. What I still
don't understand is if that ownership is per database or per user or
per cluster or per schema or ...?

"\dT+" doesn't show that type in "ams_sm_mtg", only really customly
created ones. But I could have different databases with different
users deploying the same schema for the same apps and each fo those
would need to own that type individually. Is that possible?

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How to restore a dump containing CASTs into a database with a new user?

Guten Tag Tom Lane,
am Sonntag, 19. Juli 2020 um 22:37 schrieben Sie:

> In this case, you have to restore the cast as superuser, because nobody
> else is going to be treated as owning these built-in types.

How do I do that when I have large dumps with lots of those CASTs and
in worst case don't even know that, because it's of some proprietary
app?

Would I need to restore the whole dump as super user? But how do I own
all those restored contents to some other database user afterwards?

And I don't mean individually GRANTing permissions on tables and
sequences and ... and taking additionally care of defauolts in case
new tables and sequences and ... get added. That reads so complicated.
I simply want to make everything in a database owned by one user, like
is easily possible in e.g. file systems and is the case with how I
restored.

But then CASTs don't work easily.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How to restore a dump containing CASTs into a database with a new user?

Guten Tag Christophe Pettus,
am Sonntag, 19. Juli 2020 um 23:38 schrieben Sie:

> In this case, you may need to change the ownership of the various
> objects directly in the database, rather than using dump/restore as
> a way of changing ownership all at once.

Does Postgres support that in an easy way, without the need to reverse
engineer an otherwise unknown the schema?

What I've read it does not, one always needs to know which objects are
used, needs to additionally handle newly created objects not used yet
etc. That seems very complicated when one simply wants to restore a
backup into a newly created database.

Additionally, who owns types on which level in the end? To
successfully restore, I needed to change ownership of type "inet" to
one new user. Is that per database restored into or for the whole
cluster now? Because I easily have multiple of the same databases for
different tenants and all those tenants are individual users which
would need to own that type "inet" to be able to restore their
individual databases.

> This is not infrequent
> when you have an existing database in which a superuser owns everything

In my setup it only seems to be necessary for the CASTs, everything
else seems to be properly reowned how I restore. And that approach
seems much easier than rewoning individual objects.

So depending on who owns the type "inet" on which level in the end,
Postgres could reown those types automatically during restores as
well. Looking at the provided arguments and used commands, it's very
clear what the user wants.

OTOH, if that type is owned on a higher level than the restored
database itself only, that seems to be a design issue. Because that
would prevent other users owning that type for their individual
databases as well.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How to restore a dump containing CASTs into a database with a new user?

2020-07-20 Thread Thorsten Schöning

Guten Tag Christophe Pettus,
am Montag, 20. Juli 2020 um 07:19 schrieben Sie:

> No, you don't, and you (probably) can't change the ownership of
> "inet". "inet" is a built-in type.

I'm somewhat sure I did and can answer my own questions now:

> dropdb ams_sm_mtg
> createdb --encoding=UTF-8 --locale=de_DE.UTF-8 --owner=ams_sm_mtg 
> --template=template0 ams_sm_mtg
> psql --dbname=ams_sm_mtg
> ALTER TYPE inet OWNER TO ams_sm_mtg;

This makes this type really owned by the given user, but as it seems
only for the current database! This can be seen in the table
"pg_type", in which the column "typowner" really changes to the new
user-ID:

> psql --dbname=ams_sm_mtg
> ALTER TYPE inet OWNER TO postgres;

> inet11  10
> _inet   11  10

vs.

> psql --dbname=ams_sm_mtg
> ALTER TYPE inet OWNER TO ams_sm_mtg;

> inet11  16389
> _inet   11  16389

"pg_type" contains some other user created types and those have the
same user-ID. The important thing is that doing this with another
database doesn't seem to influence the one of interest:

> psql --dbname=template1
> ALTER TYPE inet OWNER TO postgres;

This keeps the following in the table of interest:

> inet11  16389
> _inet   11  16389

Using other manually created databases and users results in the same:

> psql --dbname=ams_db_login
> ALTER TYPE inet OWNER TO ams_db_login;

This leads to the following in the given table:

> inet11  16390
> _inet   11  16390

But keeps things in other tables:

> inet11  16389
> _inet   11  16389

Additionally, when viewing "pg_table" connected as and to
"ams_db_login", it shows different types than are available when
viewing "pg_table" as and for "ams_sm_mtg". This is another strong
hint that those types are managed per database.

So things seem to work as expected, with the only caveat that one
needs to make some types being owned by new users BEFORE actually
restoring. The problem of course is to know which types those are,
seems one needs to restore, look for errors, ALTER, drop, restore
again etc.

Would be far easier if Postgres would do that automatically like it
seems to do for most other objects. The important point is that owning
those types seems to be per database, so things should be safe to do
automatically.

> [...]The issue is that you have
> user-defined objects which are owned by the user "postgres"; you
> should change those to the user that you want, leaving the CASTs owned by 
> "postgres".

The error messages and docs say otherwise and changing the owner to a
user which doesn't exist at all in the source-cluster doesn't make
sense as well. When creating the dump, I can't know into which target
database owned by which user it gets restored at some point.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How to restore a dump containing CASTs into a database with a new user?

2020-07-20 Thread Thorsten Schöning

Guten Tag Christophe Pettus,
am Montag, 20. Juli 2020 um 16:34 schrieben Sie:

> I'm not sure I understand exactly how this "feature" would work.

Make internal types used in CASTs owned by the restoring user, like
all other objects are owned automatically as well. Postgres obviously
is able to detect those problems already and reowning automatically
would prevent people from the need to research first and do things
manually afterwards. One could optionally introduce an additional
flag, but if things like "--role" and "--no-owner" are specified,
intention of the users seem pretty clear to me.

In the worst case this problem makes people even NOT use individual
DB-users and use superuser for everything.

Just search for the topic about backing up databases using a read-only
user: It's that difficult, that people suggest creating a special
superuser set into read-only mode, from which it can recover itself
and stuff like that. Simply because granting permissions on unknown
schemas and maintaining default permissions for objects nobody knows
if they ever will be needed isn't really straightforward.

Using one concrete owner for everything right from the start is the
best one can do and that includes being able to reown in case of
restoring backups. And Postgres supports most of that already, only
the CAST-thing has not been thought to an end I guess.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How to restore a dump containing CASTs into a database with a new user?

2020-07-21 Thread Thorsten Schöning

Guten Tag Christophe Pettus,
am Montag, 20. Juli 2020 um 19:21 schrieben Sie:

> On a database with multiple users, you can't just get away
> with changing the ownership of the types; you have to make sure that
> the USAGE is granted appropriately to other users.

If a user specifies "--no-owner" and "--role=...", it's pretty clear
that the user wants things to be reowned. And that use case could be
supported automatically. If other use cases need additional manual
work that would be fine, but this concrete one does not in theory.

> * A database has user-defined objects in it that only a superuser can create, 
> and,
> * The rest of the database objects are owned by that superuser, and,
> * You want to change the ownership of the database objects that can be 
> changed, and,
> * You want to have a single backup that you can restore multiple
> times, changing the ownership in a different way each time, and,
> * You want to use pg_restore to do it.

And pg_restore does all that already, it only misses the special case
of CASTs.

> This would require a fair amount of surgery to pg_restore.  Right
> now, pg_restore doesn't really have a "remap these users"
> functionality.

It's not necessarily about remapping users in general in theory, but
instead something like recognizing that creating a CAST first needs to
make associated types reowned as well. This could be a fallback
strategy like trying to create the CAST, recognizing that it fails
because of wrong ownership of types and simply reown the types to the
current role.

Or creating the CAST itself could be changed as well to optionally do
that. In setups like mine with one user per database it's absolutely
safe and totally makes sense to reown types for an individual DB,
otherwise creating the CAST fails anyway. But obviously I want that
CAST, so would do it manually, which is unnecessary in theory and
which one could tell the statement with some additional flag or else.

Even in cases with multiple different users per DB reowning types make
sense, because one has the problem when creating the CAST anyway. So
either there's some user designed to create the CAST with, which by
definition needs to own the associated type anyway. Or it is done as
superuser in which case Postgres could simply not reown because it's
not necessary. Depends on if even different superusers need to own
types or not.

>   --no-owner *looks* like it does that, and can be
> used for that in certain cases, but the user-remapping functionality
> of it is really a side-effect.  It happens to change the user
> because instead of altering the user to what it is in the backup, it
> just accepts the default ownership based on the user it is connected as.

And why do I need to care why things work like they totally make sense
and I need them? :-) I just see that things work already besides one
minor annoyance. So what is a side-effect in your opinion now could
easily be communicated as feature as well.

> You can accomplish the same thing by restoring as the superuser,
> not having to alter the ownership of any internal type, and then
> changing the ownership of the user-created objects in the new
> database once it is restored.[...]

But that is far more complicated, because one needs to know ALL
objects in the restored schema to reown them manually. It takes more
time because one needs to do it manually and for each object
individually. It is error prone because one can easily miss things,
especially if schemas are developed further over time etc.

My current, already supported approach is far easier. I only need to
take care about those CASTs manually now, nothing else yet.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

How to properly query lots of rows based on timestamps?

2020-08-29 Thread Thorsten Schöning

Hi all,

I have a table containing around 95 million rows, pretty much only
storing a timestamp and further IDs of related tables containing the
actual data in the end.

> CREATE TABLE clt_rec
> ( 
>   id BIGSERIAL NOT NULL, 
>   oms_rec BIGINT NOT NULL, 
>   captured_at TIMESTAMP(6) WITH TIME ZONE NOT NULL, 
>   rssiSMALLINT NOT NULL, 
>   CONSTRAINT pk_clt_rec PRIMARY KEY (id), 
>   CONSTRAINT fk_clt_rec_oms_rec FOREIGN KEY (oms_rec) REFERENCES "oms_rec" 
> ("id"), 
>   CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec) 
> );

In many use cases I need to search all of those rows based on their
timestamp to find rows arbitrary in the past: Sometimes it's only 15
minutes into the past, sometimes it's 2 years, sometimes it's finding
the first day of each month over 15 months for some of those telegrams
etc. In the end, I pretty often need to compare those timestamps and
some queries simply take multiple seconds in the end, especially
adding up if multiple, but slightly different queries need to be
executed one after another. The following are two abstracts of
Postgres' query plans:

Plan 1:

> ->  Nested Loop  (cost=1.14..343169.49 rows=43543 width=20) (actual 
> time=0.313..113.974 rows=34266 loops=3)
> ->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  
> (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.153..20.192 
> rows=34266 loops=3)
> Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp 
> with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 
> 00:00:00+02'::timestamp with time zone + '1 day'::interval)))
> ->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.80 rows=1 
> width=12) (actual time=0.002..0.002 rows=1 loops=102799)
> Index Cond: (id = clt_rec.oms_rec)

Plan 2:

> ->  Nested Loop  (cost=1.14..836381.50 rows=111934 width=20) (actual 
> time=0.379..911.697 rows=334465 loops=3)
> ->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  
> (cost=0.57..8819.57 rows=111934 width=24) (actual time=0.193..154.176 
> rows=334465 loops=3)
> Index Cond: ((captured_at >= ('2020-08-28 10:21:06.968+02'::timestamp 
> with time zone - '14 days'::interval)) AND (captured_at <= ('2020-08-28 
> 10:21:06.968+02'::timestamp with time zone + '00:00:00'::interval)))
> ->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.39 rows=1 
> width=12) (actual time=0.002..0.002 rows=1 loops=1003394)
> Index Cond: (id = clt_rec.oms_rec)

Postgres seems to properly use available indexes, parallel workers and
stuff like that. But looking at the actual times and compared to all
the other parts of the query, comparing those timestamps simply takes
the most time.

I've looked into this topic and found statements about that one
shouldn't put too many rows into the index[1] and stuff like that or
it will be ignored at all. But that doesn't seem to be the case for me
according to the plan. OTOH, my index really simply is about the
column containing the timestamp, no function reducing things to dates
or stuff like that to reduce the number of rows.

>  CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at );

So where does the actual runtime come from in the above plan? Is it
simply comparing lots of timestamps and that takes how long it takes?
Or is it calculating with those timestamps already?

Thanks!

[1]: 
https://stackoverflow.com/questions/15977741/postgres-is-ignoring-a-timestamp-index-why

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How to properly query lots of rows based on timestamps?

2020-08-30 Thread Thorsten Schöning

Guten Tag Tom Lane,
am Samstag, 29. August 2020 um 17:19 schrieben Sie:

> TBH, this seems like a pretty awful data design.  If you included the
> timestamp column into oms_rec, and had an index on it, then you would
> not need a join at all.

clt_rec and oms_rec are different types of data and only the former
includes the timestamp, that's simply how the stored data works. So if
at all, oms_rec needs to be embedded into clt_rec entirely. While I
considered that in the past already, wasn't sure if that is the best
approach to model those two different data types.

I'll make a note and consider refactoring at some point, so thanks for
your opinion!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How to properly query lots of rows based on timestamps?

2020-08-30 Thread Thorsten Schöning

Guten Tag Alban Hertroys,
am Samstag, 29. August 2020 um 11:29 schrieben Sie:

> [...]Next, it loops over those to match the other condition of
> your query (id = clt_rec.oms_rec). You didn’t tell whether there’s
> an index on that column.

Which column, oms_rec.id or clt_rec.oms_rec? The former has one
because it's the unique key and the plan says so as well:

> Index Scan using pk_oms_rec on oms_rec

clt_rec.oms_rec OTOH is only marked as unique, without explicitly
creating an index.

> CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec)

But doesn't that mean an index is available behind the scenes,
maintained by Postgres? So it should have been used if it would
provide any benefit?

> Adding a unique constraint will automatically create a unique B-tree
> index on the column or group of columns listed in the constraint.

https://www.postgresql.org/docs/9.4/ddl-constraints.html

> You’d probably see a performance improvement were you to create an
> index on (captured_at, id). If your Postgres version is somewhat
> recent, that could even lead to an Index Only Scan.

Tried that with your statement at the bottom and it didn't seem to
change anything even when using Postgres 11:

> ->  Nested Loop  (cost=1.14..343169.49 rows=43543 width=20) (actual 
> time=0.228..95.554 rows=34266 loops=3)
> ->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  
> (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.119..16.895 
> rows=34266 loops=3)
>   Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp 
> with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 
> 00:00:00+02'::timestamp with time zone + '1 day'::interval)))
> ->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.80 rows=1 
> width=12) (actual time=0.002..0.002 rows=1 loops=102799)
>   Index Cond: (id = clt_rec.oms_rec)

vs. with your suggested change:

> ->  Nested Loop  (cost=1.14..513397.11 rows=43543 width=20) (actual 
> time=0.236..97.044 rows=34266 loops=3)
> ->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  
> (cost=0.57..173665.52 rows=43543 width=24) (actual time=0.183..17.464 
> rows=34266 loops=3)
>   Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp 
> with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 
> 00:00:00+02'::timestamp with time zone + '1 day'::interval)))
> ->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.80 rows=1 
> width=12) (actual time=0.002..0.002 rows=1 loops=102799)
>   Index Cond: (id = clt_rec.oms_rec)

Good news is that having one of those indexes in place at all makes a
huge difference compared to having neither of both. :-D

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How to properly query lots of rows based on timestamps?

2020-08-30 Thread Thorsten Schöning

Guten Tag Tom Lane,
am Samstag, 29. August 2020 um 17:19 schrieben Sie:

> Timestamp comparison reduces to comparison of int64's, so it's
> hard to imagine that anything could be much cheaper.[...]

Do you think it matters much if the timestamp related calculations are
done outside the query by the executing Java instead?

> ('2020-08-01 00:00:00+02'::timestamp with time zone - '00:00:00'::interval)

In Java, I currently only have the timestamp and the intervals as
ISO8601-expressions like P1D, P14D etc. and leave all of their parsing
to Postgres as much as possible. This results in calculations like the
above necessary before actually comparing against stored timestamps.

I would expect that it doesn't make much difference if Postgres or
Java calculate the necessary timestamps. As long as Postgres is clever
enough to NOT calculate by row, but only once instead? Which I guess
it is? Is there any hint to such things in the plan? I didn't see or
understood them.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How bad is using queries with thousands of values for operators IN or ANY?

2020-08-31 Thread Thorsten Schöning

Guten Tag Laurenz Albe,
am Montag, 31. August 2020 um 10:23 schrieben Sie:

> Is it an option to change that usage pattern?

Not everywhere easily, because I simply have places in which I'm
working with IDs only and e.g. need to check if they are owned by some
user. Those checks are abstracted away into individual functions
instead of JOINing always, as the latter makes queries more difficult
to maintain.

Of course there are alternatives like row level security I might have
a look at etc. And that's exactly why I asked the question, to make my
mind about that thing, if it's worth or necessary to look into
alternatives at all etc.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How bad is using queries with thousands of values for operators IN or ANY?

2020-08-31 Thread Thorsten Schöning

Guten Tag Pavel Stehule,
am Montag, 31. August 2020 um 10:51 schrieben Sie:

> It is not good - it increases the memory necessary for query parsing,
> optimizer and executor are slower.

At least memory and I/O were not a problem regarding my tests,
CPU-load was pretty high. But I was unable to see if it's because
comparing IDs or parsing the query or alike.

Can one see the difference somehow? This would be the only chance to
see if splitting the large query up into multiple smaller is an
overall improvement.

> Postgres currently has not any optimization for processing searching in
> these long lists - so this search is very slow against other methods.

Which other methods do you have in mind, populating a temporary table
and joining that? I tested things like those in context of other
queries and creating those tables, indexes etc. introduced more
overhead than the query benefitted.

Additionally, it's not always these large queries. Most of the times
it's far less IDs and things are fast. I even introduced embedding
those IDs because the query was faster afterwards instead of using
some JOIN and comparing the IDs, even though indexes were in place and
used etc.

So for what query size or number of IDs to compare in IN would you
consider a different approach at all?

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

What's your experience with using Postgres in IoT-contexts?

2020-10-05 Thread Thorsten Schöning

Hi all,

TL;DR:

Does anyone actually use Postgres with ARM based low performance
hardware and only 256 MiB of RAM? What are your experiences in other
stripped down environments? Is there some lower RAM limit with which
using Postgres doesn't make any sense anymore? Is Postgres able to
compete with SQLite regaridng lower performance boundaries or is there
some point at which SQLite will be the better choice, because Postgres
needs some more resources to work properly?

Thanks!

Background:

I have some server side backend exposing web services and storing data
in Postgres. The stored data are telegrams from smart metering
devices, lots of them, mostly small. The provided web services take
care of generating reports etc. based on those telegrams.

It's now considered to put large parts of that backend onto some ARM
based IoT device, simply because that device would do a lot of the
same things. The available backend is Java-based, so it would simply
be possible as well and I already ran things on some Raspi-like
hardware and things worked pretty well. OTOH, the targetted hardware
is less capable than some Raspi, we talk about only 256 MiB of RAM
currently. The exact details of the hardware are not clear yet.

In the easiest case, the newly created device would only need to store
some hundreds of telegrams per day in some known time window and
forward those telegrams afterwards. In most cases it would be those of
the same day only, but might be those of the last X few days as well
in case of problems. In the long term, though, that device should be
able to store telegrams for years and might generate reports on it's
own as well.

I've already found some interesting infos like the following, but
would be interested in other expereinces as well.

https://www.postgresql.eu/events/pgconfeu2019/sessions/session/2781/slides/251/PostgreSQL%20for%20IoT.pdf

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme http://www.AM-SoFT.de/

Telefon...05151- 9468- 55
Fax...05151- 9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: What's your experience with using Postgres in IoT-contexts?

2020-10-05 Thread Thorsten Schöning

Guten Tag Tony Shelver,
am Montag, 5. Oktober 2020 um 15:44 schrieben Sie:

> Not sure about PG in that environment. Have you thought about something
> like H2 java database?  https://www.h2database.com/html/main.html

Yes, like SQLite as well. The point is that I was really interested in
keeping as much as possible and we e.g. use some Postgres-specific
constructs currently. Of course things can be changed, but the
question is if it's necessary at all, especially after reading the
PDF:

https://www.postgresql.eu/events/pgconfeu2019/sessions/session/2781/slides/251/PostgreSQL%20for%20IoT.pdf

> That is included as the standarDB in a vehicle tracking system we use,
> although we have re[placed with PG.

And your tracking system is more like the server that I already have
or an embedded system within the vehicles themself?

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: What's your experience with using Postgres in IoT-contexts?

2020-10-06 Thread Thorsten Schöning

Guten Tag Tony Shelver,
am Dienstag, 6. Oktober 2020 um 16:33 schrieben Sie:

> Each vehicle reports position to the database at least every 30 seconds
> when moving, along with any other data as it happens.  So it's quite a bit.

> H2 seemed to handle it fine for a limited number of vehicles, but we moved
> it to PG once we moved to production.

But where does Postgres run in your setup? :-) On the tracked vehicle
itself with limited resources most likely or do you simply send things
over network to some cloud/server/...? Your first paragraph reads like
the former, but the second like the latter. :-)

If it's on the vehicle, I would be interested to somewhat know which
hardware you use, to compare what I have in mind. Thanks!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Which is the setup with lowest resources you know Postgres is used in?

2020-10-07 Thread Thorsten Schöning

Hi all,

I'm regularly reading that Postgres is often used with containers and
in cloud environments these days, even on some not too powerful NAS.

What are the lowest resource setups you know of or even host Postgres
successfully with yourself? It's especially about RAM and CPU, if you
needed to e.g. configure anything special to make things somewhat work
in your stripped down environment etc.

Is there any point at which one is most likely forced to switch to
more specialized embedded databases like SQLite? E.g. because
Postgres requires a higher amount of resources because of it's
architecture? Or could Postgres in theory be used everywhere where
SQLite is used as well, as long as one is allowed to e.g. start an
additional process?

I would like to know if there's any realistic chance to use Postgres
in a low resources environment with little amount of RAM and somewhat
slow CPU like the following:

http://ww1.microchip.com/downloads/en/DeviceDoc/ATSAMA5D27-WLSOM1-Datasheet-60001590b.pdf
http://ww1.microchip.com/downloads/en/DeviceDoc/SAMA5D2-Series-Data-sheet-ds60001476F.pdf

One point is that I most likely need somewhat concurrent access to the
data, because of having web services exposing that data to clients,
daemons storing data locally only etc. OTOH, the number of concurrent
accessed won't be too high, there won't be too much load most of the
time. Things heavily depend on actual users of the device. Postgres'
architecture seems to better fit that use case than e.g. SQLite.

Thanks for sharing you experiences and suggestions!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: What's your experience with using Postgres in IoT-contexts?

2020-10-10 Thread Thorsten Schöning

Guten Tag Peter J. Holzer,
am Samstag, 10. Oktober 2020 um 10:56 schrieben Sie:

> Do you plan to move some of that reporting to the IoT devices? (Maybe
> equip them with a display with a dashboard, or something like that)

Not necessarily with a display, but something like a dashboard for a
web-UI on the device itself definitely needs to be kept in mind.
Though, some display has been considered as well.

> Because for simply storing a few hundred records which are written and
> read sequentially, an RDBMS of any kind seems overkill. I'd simply write
> them to files.

That's still considered as well and what's done right now. But at some
point we need to maintain individual logical devices where those
telegrams come from anyway. It as well has been considered to store
only metadata in the database and telegrams in files, but as telegrams
are small mostly, this doesn't seem worth it. And because telegrams
are small, we would need some kind of container format anyway to not
run out of inodes and stuff like that too easily.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: What's your experience with using Postgres in IoT-contexts?

2020-10-10 Thread Thorsten Schöning

Guten Tag Peter J. Holzer,
am Samstag, 10. Oktober 2020 um 12:00 schrieben Sie:

> Not necessarily. You would have to allocate the appropriate number of
> inodes, of course. Using one file per telegram wastes some space, but
> for a few thousand telegrams that may not matter[1]. Indeed, with ext4 you
> could make the inodes large enough that all or most of your telegrams
> fit inside the inode and don't need an extra data block at all.

Thanks for suggesting what I had noted at least worth discussing as
well already and what "other interested parties" said to be a really
stupid idea. Doesn't seem that stupid to us two obviously, especially
if ext4 is already used anyway. ;-)

But it's really not only about a simple queue: I want to have long
term storage and access to individual telegrams according different
criteria like individual devices, when received etc.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Declarative partitioning and automatically generated row-IDs using BIGSERIAL

Hi all,

I have the following table containing 100+ millions rows currently and
which needs to be queried by "captured_at" a lot. That table stores
rows for the last 6 years, but most of the queries focus on the last
15 months, 15 days or really only 15 minutes.

> CREATE TABLE public.clt_rec(
> id bigserial NOT NULL,
> oms_rec bigint NOT NULL,
> captured_at timestamp with time zone NOT NULL,
> rssi smallint NOT NULL,
> CONSTRAINT pk_clt_rec PRIMARY KEY (id)
>  WITH (FILLFACTOR = 10),
> CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec)
> );

We want to test if performance of some queries can be improved by
using declarative partitioning with far less rows, e.g. one table per
year or half a year or alike.

One important thing to have in mind is handling of the column "id",
because for historical reasons that column is not only used to address
individual rows. It's used as a transaction ID for some rudimentary
exporting of those rows as well: So there's some app requesting NEW
rows to export and simply providing the LAST "id" it received from
former requests. Therefore I need to keep that ID when splitting the
table into individual partitions and as well need to guarantee that
IDs are unique across all partitions.

I've read a lot of similar examples using BIGSERIAL or alike already,
but would like to make sure I understood correctly how those SERIAL
values are generated.

https://alexey-soshin.medium.com/dealing-with-partitions-in-postgres-11-fa9cc5ecf466

Am I correct that after migrating the available table to a partitioned
one I keep INSERTing into the partitioned table only in my app?
Because of the declarative partitioninig used, the only thing I need
to assure is to have necessary partitions available when INSERT
happens? I would create them beforehand as part of some maintenance,
but read about a DEFAULT partition already as well.

Because I keep INSERTing into the partitioned table, the semantic of
my ID doesn't change, correct? There's a SEQUENCE associated with the
column "id" in the partitioned table and that generated my value,
which is AFTERWARDS stored with all other values of a row in whichever
partitions fits to the partition key.

Or is the target partition calculated first and AFTERWARDS a SEQUENCE
private to each partition table is used to calculate the ID? I don't
think so, but according the docs indexes etc. are inherited by
partitions as well. So maybe Postgres maintains multiple SEQUENCES in
the background for some reason as well.

> Unique constraints (and hence primary keys) on partitioned tables
> must include all the partition key columns. This limitation exists
> because PostgreSQL can only enforce uniqueness in each partition
> individually.

https://www.postgresql.org/docs/11/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES

Doesn't the above allow manually updating different rows in different
partition tables to contain the same ID in the end? The main benefit
of the PRIMARY KEY left is uniqueness per partition, correct?

> While primary keys are supported on partitioned tables, foreign keys
> referencing partitioned tables are not supported. (Foreign key
> references from a partitioned table to some other table are
> supported.)

The docs for Postgres 11 mention that partitioned tables can not be
used as target of foreign keys, while that statement is removed from
the docs of Postgres 13. How is a 1:1 relationship guaranteed in newer
version when PRIMARY KEYS are still local to their individual
partition table?

Thanks for your help!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH

E-Mail:  thorsten.schoen...@am-soft.de
Web: http://www.AM-SoFT.de/

Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin
Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer 
Janine Galonska

Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

Guten Tag Michael Lewis,
am Donnerstag, 31. Dezember 2020 um 18:20 schrieben Sie:

> Why is your fillfactor so low?[...]

I've just copied what my GUI-tool pgModeler generated as SQL right
now, that fill factor might have never been applied at all.

> Perhaps a silly question, but do you have an index on captured_on?

Yes and that changed a lot some years ago, but doesn't seem to be
sufficient anymore.

> You included the full DDL for the table it seems so I wanted to ask.

No, only the relevant part necessary for my question. How SEQUENCES,
PKs etc. are handled is the most important thing right now.

> Can you
> provide some example queries that you want to speed up? What are you
> configuration settings (select * from pg_settings;)?

No thanks, I don't want to discuss such things in this thread. I'll
asked lot's of other questions regarding my concrete bottlenecks,
schema, app logic etc. already and simply want to test with partitions
now. That has been suggested in the other questions as well and is
what I focus on now.

> [...]That is, you could not have a primary key on ID and
> partition on captured_on for instance.

That's what I understood as well and is the reason why I asked: That
means IDs could be duplicated manually within individual partition
tables, while I need them to be unique across all of those.
Additionally I wonder when IDs are generated by which SEQUENCE etc.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH

E-Mail:  thorsten.schoen...@am-soft.de
Web: http://www.AM-SoFT.de/

Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin
Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer 
Janine Galonska

Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

Guten Tag Michael Lewis,
am Donnerstag, 31. Dezember 2020 um 19:28 schrieben Sie:

> select
>t.reloptions
> from pg_class t
>   join pg_namespace n on n.oid = t.relnamespace
> where t.relname = 'clt_rec'
>   and n.nspname = 'public';

That outputs NULL, as well for other tested indexes. Additionally, the
following two lines only apply to the index associtaed with the
primary key, not the table itself, don't they?

> CONSTRAINT pk_clt_rec PRIMARY KEY (id)
>  WITH (FILLFACTOR = 10),

I've checked pgModeler and it seems to set such low fill factors
automatically on old primary keys only. Will remove those.

> Sequences are incremented and return the new value whenever they are
> called. If all partitions inherit the same sequence (behavior determined by
> which options you use when you create partitions LIKE parent)[...]

Am I allowed or is it necessary to use LIKE? The docs don't use that
and the description sounds wrong as well.

> CREATE TABLE measurement_y2006m02 PARTITION OF measurement
> FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

> Unlike INHERITS, the new table and original table are completely
> decoupled after creation is complete. Changes to the original table
> will not be applied to the new table, and it is not possible to
> include data of the new table in scans of the original table.

The option you mention is INCLUDING IDENTITY and that would need to be
AVOIDED to NOT get multiple SEQUENCES?

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH

E-Mail:  thorsten.schoen...@am-soft.de
Web: http://www.AM-SoFT.de/

Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin
Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer 
Janine Galonska

How to REMOVE a fillfactor set by accident?

Hi all,

I'm using the GUI tool pgModeler to maintain schemas and at some point
in the past it used fillfactor 10 for some indexes by default. That
seems to have changed in the meantime and most of my indexes don't
have ANY fillfactor set explicitly.

Is there some way to REMOVE the fillfactor where it is set, so that
Postgres applies it's defaults? Would be great to have an output of
NULL everywhere where fillfactor is output instead of sometimes e.g.
100 and more often NULL.

I've already tried setting things to NULL, which failed with
mentioning that a decimal number is necessary. Though, even setting 0
fails, because a value between 10 and 100 is required.

> ALTER INDEX  pk_clt_rec_src SET (fillfactor = NULL);
> ALTER INDEX  pk_clt_rec_src SET (fillfactor = 0);

So removing isn't possible at all and I can only set to concrete
values like 100 or don't care at all instead? Because fillfactor=10 is
only set for some low throughput test databases currently.

Thanks!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH

E-Mail:  thorsten.schoen...@am-soft.de
Web: http://www.AM-SoFT.de/

Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin
Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer 
Janine Galonska

Re: How to REMOVE a fillfactor set by accident?

2021-01-01 Thread Thorsten Schöning

Guten Tag Thomas Kellerer,
am Donnerstag, 31. Dezember 2020 um 21:31 schrieben Sie:

> ALTER INDEX pk_clt_rec_src RESET (fillfactor);

Thanks, should have looked into the docs instead of googling only! :-)

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH

E-Mail:  thorsten.schoen...@am-soft.de
Web: http://www.AM-SoFT.de/

Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin
Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer 
Janine Galonska

Increased size of database dump even though LESS consumed storage

2021-02-09 Thread Thorsten Schöning

Hi all,

for various reasons I've migrated my database schema from 4 tables
with some additional indexes to keep integrity and stuff to 1 table
only. That made some of the former used indexes obsolete and resulted
in overall less consumed storage:

The old schema consumed ~42 GiB, while the new is ~16 GiB without the
formerly available indexes and ~25 GiB with the same logical indexes.
Though, a created dump of the new schema has increased from ~5,52 GiB
to 6,38 GiB. Of course I'm using the same settings to create both
dumps:

> pg_dump.exe "--username=%USERNAME%" "--encoding=UTF-8" "--compress=9" 
> "--format=c" "--dbname=%DB_NAME%" > "%DMP_PATH%"

My expectation was that the dump would be smaller as well, because the
data itself is the same, while lots of duplicate IDs, obsolete indexes
etc. in not available tables anymore have been removed.

Any explanation for the increased dump size? Thanks!

The tables themself were changed like the following:

> CREATE TABLE clt_rec
> (
>   id  bigserial NOT NULL,
>   oms_rec bigintNOT NULL,
>   captured_at timestamp with time zone  NOT NULL,
>   rssismallint  NOT NULL
> );

> CREATE TABLE oms_rec
> (
>   idbigserial NOT NULL,
>   meter integer   NOT NULL,
>   encrypted bytea,
>   decrypted bytea
> );

> CREATE TABLE clt_rec_src
> (
>   id  bigserial NOT NULL,
>   real_estate integer   NOT NULL,
>   collector   integer   NOT NULL,
>   clt_rec bigserial
> );

> CREATE TABLE meter_status_history
> (
>   id  serial  NOT NULL,
>   oms_rec bigint  NOT NULL,
>   status  smallint
> );

vs.

> CREATE TABLE datagram
> (
>   idbigserial NOT NULL,
>   src_reinteger   NOT NULL,
>   src_clt   integer   NOT NULL,
>   src_meter integer   NOT NULL,
>   captured_at   timestamp with time zone  NOT NULL,
>   captured_rssi smallint  NOT NULL,
>   oms_statussmallint  NOT NULL,
>   oms_enc   bytea,
>   oms_dec   bytea
> );

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska

Re: Increased size of database dump even though LESS consumed storage

2021-02-09 Thread Thorsten Schöning

Guten Tag Thorsten Schöning,
am Dienstag, 9. Februar 2021 um 10:56 schrieben Sie:

> Any explanation for the increased dump size? Thanks!

Some more data:

> --compress=0:   20  vs. 25 GiB
> --compress=DEFAULT: 6,5 vs. 5,6 GiB
> --compress=9:   6,4 vs. 5,5 GiB

From the docs:

> For the custom archive format, this specifies compression of
> individual table-data segments[...]

https://www.postgresql.org/docs/10/app-pgdump.html

So the new table layout really seems to negatively impact compression,
even though in both cases the "bytea" are at the end of the table. The
only difference is the number of overall columns per table. Wouldn't
have expected that.

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska

How does Postgres decide if to use additional workers?

2021-02-09 Thread Thorsten Schöning

Hi all,

I have the following table with an added BTREE-index on "captured_at".

> CREATE TABLE datagram
> (
>   idbigserial NOT NULL,
>   src_reinteger   NOT NULL,
>   src_clt   integer   NOT NULL,
>   src_meter integer   NOT NULL,
>   captured_at   timestamp with time zone  NOT NULL,
>   captured_rssi smallint  NOT NULL,
>   oms_statussmallint  NOT NULL,
>   oms_enc   bytea,
>   oms_dec   bytea
> );

I have a query containing the following condition:

> AND "datagram"."captured_at" BETWEEN (CAST('2020-08-28T10:34:32.855+02:00' AS 
> TIMESTAMP WITH TIME ZONE) - CAST('P5D' AS INTERVAL)) AND 
> (CAST('2020-08-28T10:34:32.855+02:00' AS TIMESTAMP WITH TIME ZONE) + 
> CAST('P0D' AS INTERVAL))

When that query is applied to the above table with it's index on
"captured_at", that results in the following query plan. The important
thing to note is that NO parallel workers a re used.

> ->  Hash Join  (cost=246164.35..2004405.07 rows=11323 width=51) (actual 
> time=93.802..5776.755 rows=104607 loops=1)
> Hash Cond: (meter.meter_bcd = meter_bcd.id)
> ->  Hash Join  (cost=246019.19..2003889.83 rows=68494 width=37) (actual 
> time=93.067..5744.787 rows=104607 loops=1)
> Hash Cond: (datagram.src_meter = meter.id)
> ->  Index Scan using idx_datagram_captured_at_btree on datagram  
> (cost=0.57..1756571.73 rows=495033 width=20) (actual time=0.054..5451.417 
> rows=514369 loops=1)
>   Index Cond: ((captured_at >= ('2020-08-28 
> 10:34:32.855+02'::timestamp with time zone - '5 days'::interval)) AND 
> (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + 
> '00:00:00'::interval)))

For various reasons I tested the above table as a partitioned one as
well, with individual partitions containing the rows of one year only.
The important thing to note is that I simply kept the same index on
"captured_at" like before, though the query plan looks different now:

> Workers Planned: 2
> Workers Launched: 2
> ->  Hash Join  (cost=245966.53..272335.67 rows=5419 width=51) (actual 
> time=625.846..1560.103 rows=34869 loops=3)
> Hash Cond: (datagram_y2020_h2.src_meter = meter.id)
> ->  Parallel Append  (cost=4.19..25430.72 rows=236911 width=20) (actual 
> time=2.827..863.298 rows=171456 loops=3)
>   Subplans Removed: 23
>   ->  Parallel Index Scan using datagram_y2020_h2_captured_at_idx on 
> datagram_y2020_h2  (cost=0.44..24051.22 rows=236888 width=20) (actual 
> time=2.826..848.388 rows=171456 loops=3)
> Index Cond: ((captured_at >= ('2020-08-28 
> 10:34:32.855+02'::timestamp with time zone - '5 days'::interval)) AND 
> (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + 
> '00:00:00'::interval)))

It seems that only because of a different number of rows per
individual table, additional workers are used. Though, in the past I
had all of those "captured_at" in one table as well, only with far
less columns and for that table additional workers have been used,
pretty much like is the case now:

> Workers Planned: 2
> Workers Launched: 2
> ->  Hash Join  (cost=264793.42..1666293.23 rows=4332 width=51) (actual 
> time=96.080..638.802 rows=34869 loops=3)
> Hash Cond: (oms_rec.meter = meter.id)
> ->  Nested Loop  (cost=1.14..1400747.39 rows=189399 width=20) (actual 
> time=0.145..496.366 rows=171456 loops=3)
> ->  Hash  (cost=264709.53..264709.53 rows=6620 width=39) (actual 
> time=95.521..95.528 rows=40044 loops=3)
>   Buckets: 65536 (originally 8192)  Batches: 1 (originally 1)  Memory 
> Usage: 3016kB
>   ->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  
> (cost=0.57..14853.95 rows=189399 width=24) (actual time=0.098..81.556 
> rows=171456 loops=3)
>   ->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.32 rows=1 
> width=12) (actual time=0.002..0.002 rows=1 loops=514369)
>   ->  Hash Join  (cost=145.59..264709.53 rows=6620 width=39) (actual 
> time=9.883..86.390 rows=40044 loops=3)
> Index Cond: (id = clt_rec.oms_rec)
> Index Cond: ((captured_at >= ('2020-08-28 
> 10:34:32.855+02'::timestamp with time zone - '5 days'::interval)) AND 
> (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + 
> '00:00:00'::interval)))
> Hash Cond: (meter.meter_bcd = meter_bcd.id)

So, based on which

Re: Increased size of database dump even though LESS consumed storage

Guten Tag Thorsten Schöning,
am Dienstag, 9. Februar 2021 um 10:56 schrieben Sie:

> Any explanation for the increased dump size? Thanks!

While I don't have a detailed explanation myself, there's the
following hint [in the docs][1]: 

> For the custom archive format, this specifies compression of
> individual table-data segments[...]

https://www.postgresql.org/docs/10/app-pgdump.html

I've changed the layout of my tables entirely and most likely have
changed the "individual table-data segments" as well. So I did some
further testing and got the following numbers. It's always OLD vs. NEW
schema for the schema documented earlier:

> --compress=0: 25  vs. 20  GiB
> --compress=?: 5,6 vs. 6,5 GiB
> --compress=9: 5,5 vs. 6,4 GiB

"?" means default settings, meaning "--compress" has been REMOVED from
the shell command.

As can be seen, pretty much as soon as compression comes into play,
the new table layout is less efficient, even though in OLD and NEW
schema exactly the same "bytea" were used, have been placed at the end
of their individual tables and alike. But things become more
interesting with another version of my schema, which as well indicates
a strong relationship of dump size, compression and table layout/size.

I've changed the new table "datagram" to be a partitioned one with
partitions containing rows per year and per half-year. Each partition
contains far less rows than before this way and while I only tested
with "--compress=9" this time, the numbers are quite interesting:

> unpartitioned: 6,4 GiB
> half-yearly parts: 4,8 GiB
> yearly parts:  4,8 GiB

The interesting part this time is that the table layout for all
partitions is the same like before, only the number of rows per table
is different. Though, the number of rows overall is the same like
before, the same data, IDs etc. Though, this time the dump really is
smaller than with the OLD schema containing far more data because of
duplicate IDs and stuff.

I wouldn't have expected table layout to be that important.

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska

Re: Increased size of database dump even though LESS consumed storage

Guten Tag Francisco Olarte,
am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie:

> [...]and partitioning sorts partially
> by date ( or fully if you have made the partitions by range-querying
> via index scan ).[...]

That statement is especially interesting not only because of my
dump-size, but I'm running into the problem that queries spanning more
than one partition seem to prefer sequential scan over using indexes.
My indexe seems to only be used when querying the rows of one
partition.

So, the following is my definition, should that be "range-queried via
index scan" properly? :-)

> CREATE TABLE datagram
> (
>   id bigserial NOT NULL,
>   src_re integer   NOT NULL,
>   src_cltinteger   NOT NULL,
>   src_meter  integer   NOT NULL,
>   captured_attimestamp with time zone  NOT NULL,
>   captured_rssi  smallint  NOT NULL,
>   oms_status smallint  NOT NULL,
>   oms_encbytea,
>   oms_decbytea
> ) PARTITION BY RANGE (captured_at);

> CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM 
> ('1970-01-01') TO ('1970-07-01');
> CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM 
> ('1970-07-01') TO ('1971-01-01');
> [...]
> INSERT INTO datagram([...]) SELECT * FROM datagram_unpart;

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska

Re: Increased size of database dump even though LESS consumed storage

Guten Tag Francisco Olarte,
am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie:

> [...]and partitioning sorts partially
> by date ( or fully if you have made the partitions by range-querying
> via index scan ).[...]

That statement is especially interesting not only because of my
dump-size, but I'm running into the problem that queries spanning more
than one partition seem to prefer sequential scan over using indexes.
My indexe seems to only be used when querying the rows of one
partition.

So, the following is my definition, should that be "range-queried via
index scan" properly? :-)

> CREATE TABLE datagram
> (
>   id bigserial NOT NULL,
>   src_re integer   NOT NULL,
>   src_cltinteger   NOT NULL,
>   src_meter  integer   NOT NULL,
>   captured_attimestamp with time zone  NOT NULL,
>   captured_rssi  smallint  NOT NULL,
>   oms_status smallint  NOT NULL,
>   oms_encbytea,
>   oms_decbytea
> ) PARTITION BY RANGE (captured_at);

> CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM 
> ('1970-01-01') TO ('1970-07-01');
> CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM 
> ('1970-07-01') TO ('1971-01-01');
> [...]
> INSERT INTO datagram([...]) SELECT * FROM datagram_unpart;
> CREATE INDEX idx_datagram_for_time_window ON datagram USING btree (src_meter, 
> captured_at DESC);

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska

Re: Increased size of database dump even though LESS consumed storage

Guten Tag Thorsten Schöning,
am Mittwoch, 10. Februar 2021 um 09:58 schrieben Sie:

>> unpartitioned: 6,4 GiB
>> half-yearly parts: 4,8 GiB
>> yearly parts:  4,8 GiB

The above number for `unpartitioned` might be wrong: I've re-created
the same database, applied the same data migration like in the past
and created a dump which was 4,8 GiB in size. Maybe I simply did
something wrong and left some renamed table with data in the formerly
used database, while NOT doing so for the other test databases. So my
entire question might simply be based on user errors. :-)

Sadly I don't seem to have the former dumps anymore to have a look at
its content.

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska

Which partition scheme makes sense for my time based IoT-datagrams?

Hi all,

I have a table storing datagrams from some IoT-devices, with one
datagram per device per day most likely for around 75'000 devices
currently. I want to test query performance with a partitioned table
and am interested in the following queries mostly: 

* querying arbitrary datagrams by their ID
* querying datagrams being X old based on some timestamp
  * 15 minutes
  * 15 days
  * 15 months

My app isn't prepared to e.g. migrate things in the background,
instead I would like to have Postgres deal with all apsects as
transparent as possible. At least for the next few years, things
worked without partitions in the past as well.

Therefore I thought of simply partitioning by RANGE of the timestamp
when the datagram has been received and create individual partitions
per half a year. That means that in most cases only the most current
1 partition needs to be queried, with the last 3 in many other cases.

Other approaches I've read on this list were e.g. partitioning the
most current months individually and afterwards moving rows to some
other "archive"-like table. Besides of the lack of the necessary
infrastructure, in my use case in theory users need to be able to ask
for the last 15 months at some arbitrary point in history at any time.
I wouldn't like to deal with different tables or alike in my app.

My approach would result in 24 tables already, with 2 more per year.
Does that for itself sound bad already? Is that a limit the planner
can deal with most likely or do I don't even need to care for hundreds
or partitions?

If partitioned by timestamp, how are lookups by ID performed? Is that
a sequential scan on all partitions, e.g. with using available indexes
per partition?

Is there some way to RANGE by timestamp and ID, by keeping the
half-year tables only? So that the planner knows easier which
partition to look at in case of IDs only? Or is that simply like
querying an ID-index of each partition?

The following is what I have currently, extra verbose to support
Postgres 10:

> CREATE TABLE datagram
> (
>   idbigserial NOT NULL,
>   src_reinteger   NOT NULL,
>   src_clt   integer   NOT NULL,
>   src_meter integer   NOT NULL,
>   captured_at   timestamp with time zone  NOT NULL,
>   captured_rssi smallint  NOT NULL,
>   oms_statussmallint  NOT NULL,
>   oms_enc   bytea,
>   oms_dec   bytea
> ) PARTITION BY RANGE (captured_at);

> CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM 
> ('1970-01-01') TO ('1970-07-01');
> CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM 
> ('1970-07-01') TO ('1971-01-01');
> [...]

> ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT pk_datagram_y1970_h1 
> PRIMARY KEY  (id);
> ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT 
> ck_datagram_y1970_h1_oms_data_avail  CHECK(oms_enc IS NOT NULL OR 
> oms_dec IS NOT NULL);
> ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_re  
> FOREIGN KEY  (src_re)  REFERENCES real_estate  (id);
> ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_clt 
> FOREIGN KEY  (src_clt)  REFERENCES collector  (id);
> ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_meter   
> FOREIGN KEY  (src_meter)  REFERENCES meter(id);
> ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT pk_datagram_y1970_h2 
> PRIMARY KEY  (id);
> ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT 
> ck_datagram_y1970_h2_oms_data_avail  CHECK(oms_enc IS NOT NULL OR 
> oms_dec IS NOT NULL);
> ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_re  
> FOREIGN KEY  (src_re)  REFERENCES real_estate  (id);
> ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_clt 
> FOREIGN KEY  (src_clt)  REFERENCES collector  (id);
> ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_meter   
> FOREIGN KEY  (src_meter)  REFERENCES meter(id);
> [...]

> CREATE INDEX idx_datagram_y1970_h1_for_time_window ON datagram_y1970_h1 USING 
> btree (src_meter, captured_at DESC);
> CREATE INDEX idx_datagram_y1970_h2_for_time_window ON datagram_y1970_h2 USING 
> btree (src_meter, captured_at DESC);
> [...]

An example query condition is the following, while "P5D" can simply be
"P15M" or alike instead.

> WHERE (
>   "real_estate"."id" IN ([...])
>   AND "meter"."id" IN ([...])
>   AND "datagram"."captured_at" BETWEEN (CAST('2020-08-28T10:34:32.855+02:00' 
> AS TIMESTAMP WITH TIME ZONE) - CAST('P5D' AS INTERVAL)) AND 
> (CAST(

Re: Which partition scheme makes sense for my time based IoT-datagrams?