Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]
Hi all, I have an app exposing web services to generate reports. Those web services accept multiple reports per request and calculate them concurrently. There's one transaction spanning each entire request and used by ALL spawned threads. The app makes sure that e.g. committing transactions is handled by ONE thread only and individual statements are NEVER reused by multiple threads. There are two types of reports: Some need to read data from the DB themself, others can work on the data provided by the former ones. The current implementation always creates temporary tables for results of some query, so that one can simply check if some temporary table is available and either only query that or create it before as necessary. I'm using "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" and the associated queries can take a long time. So the following lists some questions about executing those concurrently, even thouzgh I've already read threads like the following: > The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend > to handle concurrency issues any better than regular old CREATE > TABLE, which is to say not very well.[...] https://www.postgresql.org/message-id/ca+tgmozadyvtwbfp1fl2smzbihcwt4uprzrlnnx1nb30ku3...@mail.gmail.com When the table needs to be created, when is it visible to other threads using the same transaction, before or after executing the additional query? Am I correct that with using "IF NOT EXISTS" the associated query is only executed as well if the table needs to be created? In theory those two things could be independent of each other, maybe resulting in duplicated rows or stuff like that per execution. But doesn't seem so according to my tests. Am I correct that in my described setup I need to make sure on my own that only one thread creates each individual temporary table and executes the associated query? Otherwise it might happen that multiple attempts creating the table at the same time simply fails and the only question is if this happens before or after the associated query. Or would creating the table itself succeeds even with many threads, but some of them simply wouldn't read any data, becauser the associated query is executed in a second step by that thread that created the table in the first place? I don't think so, though. Am I correct that because of the same transaction used by multiple threads I need to synchronize them on web service-level? E.g. things like advisory locks won't work because they have session or transaction level and would be granted to all threads instantly. Thanks for your insights! Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]
Guten Tag Daniel Verite, am Montag, 8. Juni 2020 um 11:05 schrieben Sie: > The caveat you mention about IF NOT EXISTS does not apply to > temporary tables, as they're not shared across sessions.[...] That's what I understood as well, but I'm creating those concurrently WITHIN one and the same session and transaction. :-) And that's where I wanted to make sure that I properly understood things using my questions. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]
Guten Tag Daniel Verite, am Montag, 8. Juni 2020 um 12:07 schrieben Sie: > But a SQL session on the server takes its statements from a FIFO queue > and processes them serially, so there's no intra-session concurrency. > In fact multi-threaded SQL clients *must* make sure that they don't > send concurrent queries to the same connection.[...] I was looking at JDBC-docs and those explicitly allow sharing one connection by multiple threads, while at the same time not explicitly forbidding concurrent access. https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts23499.html https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts89498.html Therefore I had assumed that concurrent access is available as long as one follows the recommened practices, which is what I do. Though, that concurrent access seems to not be available because of additional synchronization in the JDBC-driver of Postgres. https://stackoverflow.com/a/52798543/2055163 https://github.com/pgjdbc/pgjdbc/blob/ecbc48ff6ad293a2555465d4f7fb6a33a403af26/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L276 Is that the case with all JDBC-drivers most likely or simply some implementation detail of some and others might allow truly concurrent access within one session? Nevertheless, I think I can at least partly answer some of my questions now. So thanks for forcing digging deeper into this than I obviously did before. :-) Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]
Guten Tag Ron, am Montag, 8. Juni 2020 um 15:35 schrieben Sie: > Would a set of GLOBAL temporary tables be a better fit for your problem > (eliminating the need to create temporary tables each time)? Do you mean plain tables simply created using CREATE TABLE or is there some additional concept of temporary tables I'm not aware of? Because I thought truly temporary tables are always session-specific: > Temporary tables are automatically dropped at the end of a session, > or optionally at the end of the current transaction (see ON COMMIT > below). https://www.postgresql.org/docs/9.5/sql-createtable.html It totally depends on runtime and users if those tables are necessary at all in theory. So the benefit of using session-private once automatically maintained in my opinion is that I can introduce them easily as necessary without customizing my schema. My schema and all changes to it are versioned and I would like to avoid the maintenance overhead adding temp-tables which can easily change with the implementation using them. OTOH, if they are necessary at all at some point, the schema might be a good place to make aware of them, document their usage and stuff. i will think about that, so thanks for the hint! Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]
Guten Tag Thorsten Schöning, am Montag, 8. Juni 2020 um 10:14 schrieben Sie: > When the table needs to be created, when is it visible to other > threads using the same transaction, before or after executing the > additional query? There is a misconception here: Multiple concurrent exec doesn't seem to be implemented at all, even though the JDBC-docs seem to allow the sharing of connections. Execution of statements seem to lock the underlying connection, preventing truly concurrent access by multiple threads. https://stackoverflow.com/a/52798543/2055163 https://github.com/pgjdbc/pgjdbc/blob/ecbc48ff6ad293a2555465d4f7fb6a33a403af26/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L276 https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts23499.html https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts89498.html So while things don't work as I originally expected, this makes answering my question easy: Because "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" is ONE statement only forwarded to the driver, its execution should be atomic with respect to other threads using the same connection. It shouldn't make any difference how long the calculation of the associated query takes, because all other threads need to wait for the whole statement anyway. And if some other thread comes with the query subsequently, "IF NOT EXISTS" comes into play and the query should succeed instantly. > Am I correct that with using "IF NOT EXISTS" the associated query is > only executed as well if the table needs to be created?[...] An answer to that question would still be of interest to me. > Am I correct that in my described setup I need to make sure on my own > that only one thread creates each individual temporary table and > executes the associated query?[...] No, not unless truly concurrent access is available in the JDBC-driver. If it would be OTOH, I guess the answer would be yes. > Am I correct that because of the same transaction used by multiple > threads I need to synchronize them on web service-level?[...] Yes and this should be the most performant implementation anyway. In case of really one and the same connection, everything is handled by the same JVM already, so can be synchronized within that JVM without roundtrips to Postgres as well. [1]: https://stackoverflow.com/a/24089729/2055163 Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
How to restore a dump containing CASTs into a database with a new user?
Hi all, one of my apps and databases uses custom CASTs and is used with the user "postgres" for historical reasons. I would like to change that to use a non-superuser for that app+database only. So I dumped the DB using the C-format and tried to restore into a newly creeated DB: > createuser --encrypted --pwprompt ams_sm_mtg > createdb --encoding=UTF-8 --locale=de_DE.UTF-8 --owner=ams_sm_mtg > --template=template0 ams_sm_mtg > pg_restore --dbname=ams_sm_mtg --no-owner --role=ams_sm_mtg < pg_dump/dump.c Expectation was that whatever gets created in that DB is owned by the new user. But I'm running into the following error: > pg_restore: [archiver (db)] Error while PROCESSING TOC: > pg_restore: [archiver (db)] Error from TOC entry 2721; 2605 342334 CAST CAST > (character varying AS inet) > pg_restore: [archiver (db)] could not execute query: ERROR: must be owner of > type character varying or type inet > Command was: CREATE CAST (character varying AS inet) WITH FUNCTION > public.fn_cast_inet_from_varchar(character varying, integer, boolean) AS > ASSIGNMENT; Most things I've found are related to that topic is about PLPGSQL, which is not the problem I have (yet?). Somewhere the use of "-n" has been suggested and restoring succeeds with that, but my CASTs are missing afterwards. So that is not a solution. > pg_restore --dbname=ams_sm_mtg --no-owner --role=ams_sm_mtg -n public < > pg_dump/dump.c I'm having trouble to understand the root cause and how things should work: Is there some wrong owner in the dump related to anything of the CAST? Or can the CAST really only be created when the new DB-owner owns those types? Is that necessary per database then or globally for the public schema or one of the template databases or ...? What is the proper way to restore a dump containing arbitrary CASTs? I would have expected that pg_restore is handling everything, like it does for all other objects. Thanks! Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How to restore a dump containing CASTs into a database with a new user?
Guten Tag Tom Lane, am Sonntag, 19. Juli 2020 um 20:37 schrieben Sie: > It's a security thing. A user who can create such a cast can thereby > change the behavior of other people's queries. I'm not sure what your are telling me: Can CASTs only be created by superusers? I didn't read that in the docs. If they are not only created by superusers, how can I restore CASTs to a database owned by some other user? There are no other users than the one owning the database in my case. Even if there was, I don't see why CASTs should be handled differently than all other objects successfully restored to have a new owner, when CASTs seem to be per database as well. Looking at the source database, the CASTs in question are only shown for the database they are used in, no other. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How to restore a dump containing CASTs into a database with a new user?
Guten Tag Thorsten Schöning, am Sonntag, 19. Juli 2020 um 21:51 schrieben Sie: > If they are not only created by superusers, how can I restore CASTs to > a database owned by some other user? There are no other users than > the one owning the database in my case. I've retried things and must have done something wrong before, but the following works now: > dropdb ams_sm_mtg > createdb --encoding=UTF-8 --locale=de_DE.UTF-8 --owner=ams_sm_mtg > --template=template0 ams_sm_mtg > psql --dbname=ams_sm_mtg > ALTER TYPE inet OWNER TO ams_sm_mtg; The latter is the important part and needs to be done as someone who owns the type right now. I might have missed issuing ALTER in the correct database before. The following doesn't work as well: > ams_sm_mtg=# SET ROLE ams_sm_mtg; > SET > ams_sm_mtg=> ALTER TYPE inet OWNER TO ams_sm_mtg; > ERROR: must be owner of type inet So only owners of types can give them to someone else. What I still don't understand is if that ownership is per database or per user or per cluster or per schema or ...? "\dT+" doesn't show that type in "ams_sm_mtg", only really customly created ones. But I could have different databases with different users deploying the same schema for the same apps and each fo those would need to own that type individually. Is that possible? Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How to restore a dump containing CASTs into a database with a new user?
Guten Tag Tom Lane, am Sonntag, 19. Juli 2020 um 22:37 schrieben Sie: > In this case, you have to restore the cast as superuser, because nobody > else is going to be treated as owning these built-in types. How do I do that when I have large dumps with lots of those CASTs and in worst case don't even know that, because it's of some proprietary app? Would I need to restore the whole dump as super user? But how do I own all those restored contents to some other database user afterwards? And I don't mean individually GRANTing permissions on tables and sequences and ... and taking additionally care of defauolts in case new tables and sequences and ... get added. That reads so complicated. I simply want to make everything in a database owned by one user, like is easily possible in e.g. file systems and is the case with how I restored. But then CASTs don't work easily. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How to restore a dump containing CASTs into a database with a new user?
Guten Tag Christophe Pettus, am Sonntag, 19. Juli 2020 um 23:38 schrieben Sie: > In this case, you may need to change the ownership of the various > objects directly in the database, rather than using dump/restore as > a way of changing ownership all at once. Does Postgres support that in an easy way, without the need to reverse engineer an otherwise unknown the schema? What I've read it does not, one always needs to know which objects are used, needs to additionally handle newly created objects not used yet etc. That seems very complicated when one simply wants to restore a backup into a newly created database. Additionally, who owns types on which level in the end? To successfully restore, I needed to change ownership of type "inet" to one new user. Is that per database restored into or for the whole cluster now? Because I easily have multiple of the same databases for different tenants and all those tenants are individual users which would need to own that type "inet" to be able to restore their individual databases. > This is not infrequent > when you have an existing database in which a superuser owns everything In my setup it only seems to be necessary for the CASTs, everything else seems to be properly reowned how I restore. And that approach seems much easier than rewoning individual objects. So depending on who owns the type "inet" on which level in the end, Postgres could reown those types automatically during restores as well. Looking at the provided arguments and used commands, it's very clear what the user wants. OTOH, if that type is owned on a higher level than the restored database itself only, that seems to be a design issue. Because that would prevent other users owning that type for their individual databases as well. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How to restore a dump containing CASTs into a database with a new user?
Guten Tag Christophe Pettus, am Montag, 20. Juli 2020 um 07:19 schrieben Sie: > No, you don't, and you (probably) can't change the ownership of > "inet". "inet" is a built-in type. I'm somewhat sure I did and can answer my own questions now: > dropdb ams_sm_mtg > createdb --encoding=UTF-8 --locale=de_DE.UTF-8 --owner=ams_sm_mtg > --template=template0 ams_sm_mtg > psql --dbname=ams_sm_mtg > ALTER TYPE inet OWNER TO ams_sm_mtg; This makes this type really owned by the given user, but as it seems only for the current database! This can be seen in the table "pg_type", in which the column "typowner" really changes to the new user-ID: > psql --dbname=ams_sm_mtg > ALTER TYPE inet OWNER TO postgres; > inet11 10 > _inet 11 10 vs. > psql --dbname=ams_sm_mtg > ALTER TYPE inet OWNER TO ams_sm_mtg; > inet11 16389 > _inet 11 16389 "pg_type" contains some other user created types and those have the same user-ID. The important thing is that doing this with another database doesn't seem to influence the one of interest: > psql --dbname=template1 > ALTER TYPE inet OWNER TO postgres; This keeps the following in the table of interest: > inet11 16389 > _inet 11 16389 Using other manually created databases and users results in the same: > psql --dbname=ams_db_login > ALTER TYPE inet OWNER TO ams_db_login; This leads to the following in the given table: > inet11 16390 > _inet 11 16390 But keeps things in other tables: > inet11 16389 > _inet 11 16389 Additionally, when viewing "pg_table" connected as and to "ams_db_login", it shows different types than are available when viewing "pg_table" as and for "ams_sm_mtg". This is another strong hint that those types are managed per database. So things seem to work as expected, with the only caveat that one needs to make some types being owned by new users BEFORE actually restoring. The problem of course is to know which types those are, seems one needs to restore, look for errors, ALTER, drop, restore again etc. Would be far easier if Postgres would do that automatically like it seems to do for most other objects. The important point is that owning those types seems to be per database, so things should be safe to do automatically. > [...]The issue is that you have > user-defined objects which are owned by the user "postgres"; you > should change those to the user that you want, leaving the CASTs owned by > "postgres". The error messages and docs say otherwise and changing the owner to a user which doesn't exist at all in the source-cluster doesn't make sense as well. When creating the dump, I can't know into which target database owned by which user it gets restored at some point. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How to restore a dump containing CASTs into a database with a new user?
Guten Tag Christophe Pettus, am Montag, 20. Juli 2020 um 16:34 schrieben Sie: > I'm not sure I understand exactly how this "feature" would work. Make internal types used in CASTs owned by the restoring user, like all other objects are owned automatically as well. Postgres obviously is able to detect those problems already and reowning automatically would prevent people from the need to research first and do things manually afterwards. One could optionally introduce an additional flag, but if things like "--role" and "--no-owner" are specified, intention of the users seem pretty clear to me. In the worst case this problem makes people even NOT use individual DB-users and use superuser for everything. Just search for the topic about backing up databases using a read-only user: It's that difficult, that people suggest creating a special superuser set into read-only mode, from which it can recover itself and stuff like that. Simply because granting permissions on unknown schemas and maintaining default permissions for objects nobody knows if they ever will be needed isn't really straightforward. Using one concrete owner for everything right from the start is the best one can do and that includes being able to reown in case of restoring backups. And Postgres supports most of that already, only the CAST-thing has not been thought to an end I guess. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How to restore a dump containing CASTs into a database with a new user?
Guten Tag Christophe Pettus, am Montag, 20. Juli 2020 um 19:21 schrieben Sie: > On a database with multiple users, you can't just get away > with changing the ownership of the types; you have to make sure that > the USAGE is granted appropriately to other users. If a user specifies "--no-owner" and "--role=...", it's pretty clear that the user wants things to be reowned. And that use case could be supported automatically. If other use cases need additional manual work that would be fine, but this concrete one does not in theory. > * A database has user-defined objects in it that only a superuser can create, > and, > * The rest of the database objects are owned by that superuser, and, > * You want to change the ownership of the database objects that can be > changed, and, > * You want to have a single backup that you can restore multiple > times, changing the ownership in a different way each time, and, > * You want to use pg_restore to do it. And pg_restore does all that already, it only misses the special case of CASTs. > This would require a fair amount of surgery to pg_restore. Right > now, pg_restore doesn't really have a "remap these users" > functionality. It's not necessarily about remapping users in general in theory, but instead something like recognizing that creating a CAST first needs to make associated types reowned as well. This could be a fallback strategy like trying to create the CAST, recognizing that it fails because of wrong ownership of types and simply reown the types to the current role. Or creating the CAST itself could be changed as well to optionally do that. In setups like mine with one user per database it's absolutely safe and totally makes sense to reown types for an individual DB, otherwise creating the CAST fails anyway. But obviously I want that CAST, so would do it manually, which is unnecessary in theory and which one could tell the statement with some additional flag or else. Even in cases with multiple different users per DB reowning types make sense, because one has the problem when creating the CAST anyway. So either there's some user designed to create the CAST with, which by definition needs to own the associated type anyway. Or it is done as superuser in which case Postgres could simply not reown because it's not necessary. Depends on if even different superusers need to own types or not. > --no-owner *looks* like it does that, and can be > used for that in certain cases, but the user-remapping functionality > of it is really a side-effect. It happens to change the user > because instead of altering the user to what it is in the backup, it > just accepts the default ownership based on the user it is connected as. And why do I need to care why things work like they totally make sense and I need them? :-) I just see that things work already besides one minor annoyance. So what is a side-effect in your opinion now could easily be communicated as feature as well. > You can accomplish the same thing by restoring as the superuser, > not having to alter the ownership of any internal type, and then > changing the ownership of the user-created objects in the new > database once it is restored.[...] But that is far more complicated, because one needs to know ALL objects in the restored schema to reown them manually. It takes more time because one needs to do it manually and for each object individually. It is error prone because one can easily miss things, especially if schemas are developed further over time etc. My current, already supported approach is far easier. I only need to take care about those CASTs manually now, nothing else yet. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
How to properly query lots of rows based on timestamps?
Hi all, I have a table containing around 95 million rows, pretty much only storing a timestamp and further IDs of related tables containing the actual data in the end. > CREATE TABLE clt_rec > ( > id BIGSERIAL NOT NULL, > oms_rec BIGINT NOT NULL, > captured_at TIMESTAMP(6) WITH TIME ZONE NOT NULL, > rssiSMALLINT NOT NULL, > CONSTRAINT pk_clt_rec PRIMARY KEY (id), > CONSTRAINT fk_clt_rec_oms_rec FOREIGN KEY (oms_rec) REFERENCES "oms_rec" > ("id"), > CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec) > ); In many use cases I need to search all of those rows based on their timestamp to find rows arbitrary in the past: Sometimes it's only 15 minutes into the past, sometimes it's 2 years, sometimes it's finding the first day of each month over 15 months for some of those telegrams etc. In the end, I pretty often need to compare those timestamps and some queries simply take multiple seconds in the end, especially adding up if multiple, but slightly different queries need to be executed one after another. The following are two abstracts of Postgres' query plans: Plan 1: > -> Nested Loop (cost=1.14..343169.49 rows=43543 width=20) (actual > time=0.313..113.974 rows=34266 loops=3) > -> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec > (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.153..20.192 > rows=34266 loops=3) > Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp > with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 > 00:00:00+02'::timestamp with time zone + '1 day'::interval))) > -> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.80 rows=1 > width=12) (actual time=0.002..0.002 rows=1 loops=102799) > Index Cond: (id = clt_rec.oms_rec) Plan 2: > -> Nested Loop (cost=1.14..836381.50 rows=111934 width=20) (actual > time=0.379..911.697 rows=334465 loops=3) > -> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec > (cost=0.57..8819.57 rows=111934 width=24) (actual time=0.193..154.176 > rows=334465 loops=3) > Index Cond: ((captured_at >= ('2020-08-28 10:21:06.968+02'::timestamp > with time zone - '14 days'::interval)) AND (captured_at <= ('2020-08-28 > 10:21:06.968+02'::timestamp with time zone + '00:00:00'::interval))) > -> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.39 rows=1 > width=12) (actual time=0.002..0.002 rows=1 loops=1003394) > Index Cond: (id = clt_rec.oms_rec) Postgres seems to properly use available indexes, parallel workers and stuff like that. But looking at the actual times and compared to all the other parts of the query, comparing those timestamps simply takes the most time. I've looked into this topic and found statements about that one shouldn't put too many rows into the index[1] and stuff like that or it will be ignored at all. But that doesn't seem to be the case for me according to the plan. OTOH, my index really simply is about the column containing the timestamp, no function reducing things to dates or stuff like that to reduce the number of rows. > CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at ); So where does the actual runtime come from in the above plan? Is it simply comparing lots of timestamps and that takes how long it takes? Or is it calculating with those timestamps already? Thanks! [1]: https://stackoverflow.com/questions/15977741/postgres-is-ignoring-a-timestamp-index-why Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How to properly query lots of rows based on timestamps?
Guten Tag Tom Lane, am Samstag, 29. August 2020 um 17:19 schrieben Sie: > TBH, this seems like a pretty awful data design. If you included the > timestamp column into oms_rec, and had an index on it, then you would > not need a join at all. clt_rec and oms_rec are different types of data and only the former includes the timestamp, that's simply how the stored data works. So if at all, oms_rec needs to be embedded into clt_rec entirely. While I considered that in the past already, wasn't sure if that is the best approach to model those two different data types. I'll make a note and consider refactoring at some point, so thanks for your opinion! Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How to properly query lots of rows based on timestamps?
Guten Tag Alban Hertroys, am Samstag, 29. August 2020 um 11:29 schrieben Sie: > [...]Next, it loops over those to match the other condition of > your query (id = clt_rec.oms_rec). You didn’t tell whether there’s > an index on that column. Which column, oms_rec.id or clt_rec.oms_rec? The former has one because it's the unique key and the plan says so as well: > Index Scan using pk_oms_rec on oms_rec clt_rec.oms_rec OTOH is only marked as unique, without explicitly creating an index. > CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec) But doesn't that mean an index is available behind the scenes, maintained by Postgres? So it should have been used if it would provide any benefit? > Adding a unique constraint will automatically create a unique B-tree > index on the column or group of columns listed in the constraint. https://www.postgresql.org/docs/9.4/ddl-constraints.html > You’d probably see a performance improvement were you to create an > index on (captured_at, id). If your Postgres version is somewhat > recent, that could even lead to an Index Only Scan. Tried that with your statement at the bottom and it didn't seem to change anything even when using Postgres 11: > -> Nested Loop (cost=1.14..343169.49 rows=43543 width=20) (actual > time=0.228..95.554 rows=34266 loops=3) > -> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec > (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.119..16.895 > rows=34266 loops=3) > Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp > with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 > 00:00:00+02'::timestamp with time zone + '1 day'::interval))) > -> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.80 rows=1 > width=12) (actual time=0.002..0.002 rows=1 loops=102799) > Index Cond: (id = clt_rec.oms_rec) vs. with your suggested change: > -> Nested Loop (cost=1.14..513397.11 rows=43543 width=20) (actual > time=0.236..97.044 rows=34266 loops=3) > -> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec > (cost=0.57..173665.52 rows=43543 width=24) (actual time=0.183..17.464 > rows=34266 loops=3) > Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp > with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 > 00:00:00+02'::timestamp with time zone + '1 day'::interval))) > -> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.80 rows=1 > width=12) (actual time=0.002..0.002 rows=1 loops=102799) > Index Cond: (id = clt_rec.oms_rec) Good news is that having one of those indexes in place at all makes a huge difference compared to having neither of both. :-D Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How to properly query lots of rows based on timestamps?
Guten Tag Tom Lane, am Samstag, 29. August 2020 um 17:19 schrieben Sie: > Timestamp comparison reduces to comparison of int64's, so it's > hard to imagine that anything could be much cheaper.[...] Do you think it matters much if the timestamp related calculations are done outside the query by the executing Java instead? > ('2020-08-01 00:00:00+02'::timestamp with time zone - '00:00:00'::interval) In Java, I currently only have the timestamp and the intervals as ISO8601-expressions like P1D, P14D etc. and leave all of their parsing to Postgres as much as possible. This results in calculations like the above necessary before actually comparing against stored timestamps. I would expect that it doesn't make much difference if Postgres or Java calculate the necessary timestamps. As long as Postgres is clever enough to NOT calculate by row, but only once instead? Which I guess it is? Is there any hint to such things in the plan? I didn't see or understood them. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How bad is using queries with thousands of values for operators IN or ANY?
Guten Tag Laurenz Albe, am Montag, 31. August 2020 um 10:23 schrieben Sie: > Is it an option to change that usage pattern? Not everywhere easily, because I simply have places in which I'm working with IDs only and e.g. need to check if they are owned by some user. Those checks are abstracted away into individual functions instead of JOINing always, as the latter makes queries more difficult to maintain. Of course there are alternatives like row level security I might have a look at etc. And that's exactly why I asked the question, to make my mind about that thing, if it's worth or necessary to look into alternatives at all etc. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: How bad is using queries with thousands of values for operators IN or ANY?
Guten Tag Pavel Stehule, am Montag, 31. August 2020 um 10:51 schrieben Sie: > It is not good - it increases the memory necessary for query parsing, > optimizer and executor are slower. At least memory and I/O were not a problem regarding my tests, CPU-load was pretty high. But I was unable to see if it's because comparing IDs or parsing the query or alike. Can one see the difference somehow? This would be the only chance to see if splitting the large query up into multiple smaller is an overall improvement. > Postgres currently has not any optimization for processing searching in > these long lists - so this search is very slow against other methods. Which other methods do you have in mind, populating a temporary table and joining that? I tested things like those in context of other queries and creating those tables, indexes etc. introduced more overhead than the query benefitted. Additionally, it's not always these large queries. Most of the times it's far less IDs and things are fast. I even introduced embedding those IDs because the query was faster afterwards instead of using some JOIN and comparing the IDs, even though indexes were in place and used etc. So for what query size or number of IDs to compare in IN would you consider a different approach at all? Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
What's your experience with using Postgres in IoT-contexts?
Hi all, TL;DR: Does anyone actually use Postgres with ARM based low performance hardware and only 256 MiB of RAM? What are your experiences in other stripped down environments? Is there some lower RAM limit with which using Postgres doesn't make any sense anymore? Is Postgres able to compete with SQLite regaridng lower performance boundaries or is there some point at which SQLite will be the better choice, because Postgres needs some more resources to work properly? Thanks! Background: I have some server side backend exposing web services and storing data in Postgres. The stored data are telegrams from smart metering devices, lots of them, mostly small. The provided web services take care of generating reports etc. based on those telegrams. It's now considered to put large parts of that backend onto some ARM based IoT device, simply because that device would do a lot of the same things. The available backend is Java-based, so it would simply be possible as well and I already ran things on some Raspi-like hardware and things worked pretty well. OTOH, the targetted hardware is less capable than some Raspi, we talk about only 256 MiB of RAM currently. The exact details of the hardware are not clear yet. In the easiest case, the newly created device would only need to store some hundreds of telegrams per day in some known time window and forward those telegrams afterwards. In most cases it would be those of the same day only, but might be those of the last X few days as well in case of problems. In the long term, though, that device should be able to store telegrams for years and might generate reports on it's own as well. I've already found some interesting infos like the following, but would be interested in other expereinces as well. https://www.postgresql.eu/events/pgconfeu2019/sessions/session/2781/slides/251/PostgreSQL%20for%20IoT.pdf Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: What's your experience with using Postgres in IoT-contexts?
Guten Tag Tony Shelver, am Montag, 5. Oktober 2020 um 15:44 schrieben Sie: > Not sure about PG in that environment. Have you thought about something > like H2 java database? https://www.h2database.com/html/main.html Yes, like SQLite as well. The point is that I was really interested in keeping as much as possible and we e.g. use some Postgres-specific constructs currently. Of course things can be changed, but the question is if it's necessary at all, especially after reading the PDF: https://www.postgresql.eu/events/pgconfeu2019/sessions/session/2781/slides/251/PostgreSQL%20for%20IoT.pdf > That is included as the standarDB in a vehicle tracking system we use, > although we have re[placed with PG. And your tracking system is more like the server that I already have or an embedded system within the vehicles themself? Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: What's your experience with using Postgres in IoT-contexts?
Guten Tag Tony Shelver, am Dienstag, 6. Oktober 2020 um 16:33 schrieben Sie: > Each vehicle reports position to the database at least every 30 seconds > when moving, along with any other data as it happens. So it's quite a bit. > H2 seemed to handle it fine for a limited number of vehicles, but we moved > it to PG once we moved to production. But where does Postgres run in your setup? :-) On the tracked vehicle itself with limited resources most likely or do you simply send things over network to some cloud/server/...? Your first paragraph reads like the former, but the second like the latter. :-) If it's on the vehicle, I would be interested to somewhat know which hardware you use, to compare what I have in mind. Thanks! Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Which is the setup with lowest resources you know Postgres is used in?
Hi all, I'm regularly reading that Postgres is often used with containers and in cloud environments these days, even on some not too powerful NAS. What are the lowest resource setups you know of or even host Postgres successfully with yourself? It's especially about RAM and CPU, if you needed to e.g. configure anything special to make things somewhat work in your stripped down environment etc. Is there any point at which one is most likely forced to switch to more specialized embedded databases like SQLite? E.g. because Postgres requires a higher amount of resources because of it's architecture? Or could Postgres in theory be used everywhere where SQLite is used as well, as long as one is allowed to e.g. start an additional process? I would like to know if there's any realistic chance to use Postgres in a low resources environment with little amount of RAM and somewhat slow CPU like the following: http://ww1.microchip.com/downloads/en/DeviceDoc/ATSAMA5D27-WLSOM1-Datasheet-60001590b.pdf http://ww1.microchip.com/downloads/en/DeviceDoc/SAMA5D2-Series-Data-sheet-ds60001476F.pdf One point is that I most likely need somewhat concurrent access to the data, because of having web services exposing that data to clients, daemons storing data locally only etc. OTOH, the number of concurrent accessed won't be too high, there won't be too much load most of the time. Things heavily depend on actual users of the device. Postgres' architecture seems to better fit that use case than e.g. SQLite. Thanks for sharing you experiences and suggestions! Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: What's your experience with using Postgres in IoT-contexts?
Guten Tag Peter J. Holzer, am Samstag, 10. Oktober 2020 um 10:56 schrieben Sie: > Do you plan to move some of that reporting to the IoT devices? (Maybe > equip them with a display with a dashboard, or something like that) Not necessarily with a display, but something like a dashboard for a web-UI on the device itself definitely needs to be kept in mind. Though, some display has been considered as well. > Because for simply storing a few hundred records which are written and > read sequentially, an RDBMS of any kind seems overkill. I'd simply write > them to files. That's still considered as well and what's done right now. But at some point we need to maintain individual logical devices where those telegrams come from anyway. It as well has been considered to store only metadata in the database and telegrams in files, but as telegrams are small mostly, this doesn't seem worth it. And because telegrams are small, we would need some kind of container format anyway to not run out of inodes and stuff like that too easily. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: What's your experience with using Postgres in IoT-contexts?
Guten Tag Peter J. Holzer, am Samstag, 10. Oktober 2020 um 12:00 schrieben Sie: > Not necessarily. You would have to allocate the appropriate number of > inodes, of course. Using one file per telegram wastes some space, but > for a few thousand telegrams that may not matter[1]. Indeed, with ext4 you > could make the inodes large enough that all or most of your telegrams > fit inside the inode and don't need an extra data block at all. Thanks for suggesting what I had noted at least worth discussing as well already and what "other interested parties" said to be a really stupid idea. Doesn't seem that stupid to us two obviously, especially if ext4 is already used anyway. ;-) But it's really not only about a simple queue: I want to have long term storage and access to individual telegrams according different criteria like individual devices, when received etc. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Declarative partitioning and automatically generated row-IDs using BIGSERIAL
Hi all, I have the following table containing 100+ millions rows currently and which needs to be queried by "captured_at" a lot. That table stores rows for the last 6 years, but most of the queries focus on the last 15 months, 15 days or really only 15 minutes. > CREATE TABLE public.clt_rec( > id bigserial NOT NULL, > oms_rec bigint NOT NULL, > captured_at timestamp with time zone NOT NULL, > rssi smallint NOT NULL, > CONSTRAINT pk_clt_rec PRIMARY KEY (id) > WITH (FILLFACTOR = 10), > CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec) > ); We want to test if performance of some queries can be improved by using declarative partitioning with far less rows, e.g. one table per year or half a year or alike. One important thing to have in mind is handling of the column "id", because for historical reasons that column is not only used to address individual rows. It's used as a transaction ID for some rudimentary exporting of those rows as well: So there's some app requesting NEW rows to export and simply providing the LAST "id" it received from former requests. Therefore I need to keep that ID when splitting the table into individual partitions and as well need to guarantee that IDs are unique across all partitions. I've read a lot of similar examples using BIGSERIAL or alike already, but would like to make sure I understood correctly how those SERIAL values are generated. https://alexey-soshin.medium.com/dealing-with-partitions-in-postgres-11-fa9cc5ecf466 Am I correct that after migrating the available table to a partitioned one I keep INSERTing into the partitioned table only in my app? Because of the declarative partitioninig used, the only thing I need to assure is to have necessary partitions available when INSERT happens? I would create them beforehand as part of some maintenance, but read about a DEFAULT partition already as well. Because I keep INSERTing into the partitioned table, the semantic of my ID doesn't change, correct? There's a SEQUENCE associated with the column "id" in the partitioned table and that generated my value, which is AFTERWARDS stored with all other values of a row in whichever partitions fits to the partition key. Or is the target partition calculated first and AFTERWARDS a SEQUENCE private to each partition table is used to calculate the ID? I don't think so, but according the docs indexes etc. are inherited by partitions as well. So maybe Postgres maintains multiple SEQUENCES in the background for some reason as well. > Unique constraints (and hence primary keys) on partitioned tables > must include all the partition key columns. This limitation exists > because PostgreSQL can only enforce uniqueness in each partition > individually. https://www.postgresql.org/docs/11/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES Doesn't the above allow manually updating different rows in different partition tables to contain the same ID in the end? The main benefit of the PRIMARY KEY left is uniqueness per partition, correct? > While primary keys are supported on partitioned tables, foreign keys > referencing partitioned tables are not supported. (Foreign key > references from a partitioned table to some other table are > supported.) The docs for Postgres 11 mention that partitioned tables can not be used as target of foreign keys, while that statement is removed from the docs of Postgres 13. How is a 1:1 relationship guaranteed in newer version when PRIMARY KEYS are still local to their individual partition table? Thanks for your help! Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH E-Mail: thorsten.schoen...@am-soft.de Web: http://www.AM-SoFT.de/ Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer Janine Galonska
Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL
Guten Tag Michael Lewis, am Donnerstag, 31. Dezember 2020 um 18:20 schrieben Sie: > Why is your fillfactor so low?[...] I've just copied what my GUI-tool pgModeler generated as SQL right now, that fill factor might have never been applied at all. > Perhaps a silly question, but do you have an index on captured_on? Yes and that changed a lot some years ago, but doesn't seem to be sufficient anymore. > You included the full DDL for the table it seems so I wanted to ask. No, only the relevant part necessary for my question. How SEQUENCES, PKs etc. are handled is the most important thing right now. > Can you > provide some example queries that you want to speed up? What are you > configuration settings (select * from pg_settings;)? No thanks, I don't want to discuss such things in this thread. I'll asked lot's of other questions regarding my concrete bottlenecks, schema, app logic etc. already and simply want to test with partitions now. That has been suggested in the other questions as well and is what I focus on now. > [...]That is, you could not have a primary key on ID and > partition on captured_on for instance. That's what I understood as well and is the reason why I asked: That means IDs could be duplicated manually within individual partition tables, while I need them to be unique across all of those. Additionally I wonder when IDs are generated by which SEQUENCE etc. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH E-Mail: thorsten.schoen...@am-soft.de Web: http://www.AM-SoFT.de/ Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer Janine Galonska
Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL
Guten Tag Michael Lewis, am Donnerstag, 31. Dezember 2020 um 19:28 schrieben Sie: > select >t.reloptions > from pg_class t > join pg_namespace n on n.oid = t.relnamespace > where t.relname = 'clt_rec' > and n.nspname = 'public'; That outputs NULL, as well for other tested indexes. Additionally, the following two lines only apply to the index associtaed with the primary key, not the table itself, don't they? > CONSTRAINT pk_clt_rec PRIMARY KEY (id) > WITH (FILLFACTOR = 10), I've checked pgModeler and it seems to set such low fill factors automatically on old primary keys only. Will remove those. > Sequences are incremented and return the new value whenever they are > called. If all partitions inherit the same sequence (behavior determined by > which options you use when you create partitions LIKE parent)[...] Am I allowed or is it necessary to use LIKE? The docs don't use that and the description sounds wrong as well. > CREATE TABLE measurement_y2006m02 PARTITION OF measurement > FOR VALUES FROM ('2006-02-01') TO ('2006-03-01'); > Unlike INHERITS, the new table and original table are completely > decoupled after creation is complete. Changes to the original table > will not be applied to the new table, and it is not possible to > include data of the new table in scans of the original table. The option you mention is INCLUDING IDENTITY and that would need to be AVOIDED to NOT get multiple SEQUENCES? Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH E-Mail: thorsten.schoen...@am-soft.de Web: http://www.AM-SoFT.de/ Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer Janine Galonska
How to REMOVE a fillfactor set by accident?
Hi all, I'm using the GUI tool pgModeler to maintain schemas and at some point in the past it used fillfactor 10 for some indexes by default. That seems to have changed in the meantime and most of my indexes don't have ANY fillfactor set explicitly. Is there some way to REMOVE the fillfactor where it is set, so that Postgres applies it's defaults? Would be great to have an output of NULL everywhere where fillfactor is output instead of sometimes e.g. 100 and more often NULL. I've already tried setting things to NULL, which failed with mentioning that a decimal number is necessary. Though, even setting 0 fails, because a value between 10 and 100 is required. > ALTER INDEX pk_clt_rec_src SET (fillfactor = NULL); > ALTER INDEX pk_clt_rec_src SET (fillfactor = 0); So removing isn't possible at all and I can only set to concrete values like 100 or don't care at all instead? Because fillfactor=10 is only set for some low throughput test databases currently. Thanks! Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH E-Mail: thorsten.schoen...@am-soft.de Web: http://www.AM-SoFT.de/ Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer Janine Galonska
Re: How to REMOVE a fillfactor set by accident?
Guten Tag Thomas Kellerer, am Donnerstag, 31. Dezember 2020 um 21:31 schrieben Sie: > ALTER INDEX pk_clt_rec_src RESET (fillfactor); Thanks, should have looked into the docs instead of googling only! :-) Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH E-Mail: thorsten.schoen...@am-soft.de Web: http://www.AM-SoFT.de/ Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer Janine Galonska
Increased size of database dump even though LESS consumed storage
Hi all, for various reasons I've migrated my database schema from 4 tables with some additional indexes to keep integrity and stuff to 1 table only. That made some of the former used indexes obsolete and resulted in overall less consumed storage: The old schema consumed ~42 GiB, while the new is ~16 GiB without the formerly available indexes and ~25 GiB with the same logical indexes. Though, a created dump of the new schema has increased from ~5,52 GiB to 6,38 GiB. Of course I'm using the same settings to create both dumps: > pg_dump.exe "--username=%USERNAME%" "--encoding=UTF-8" "--compress=9" > "--format=c" "--dbname=%DB_NAME%" > "%DMP_PATH%" My expectation was that the dump would be smaller as well, because the data itself is the same, while lots of duplicate IDs, obsolete indexes etc. in not available tables anymore have been removed. Any explanation for the increased dump size? Thanks! The tables themself were changed like the following: > CREATE TABLE clt_rec > ( > id bigserial NOT NULL, > oms_rec bigintNOT NULL, > captured_at timestamp with time zone NOT NULL, > rssismallint NOT NULL > ); > CREATE TABLE oms_rec > ( > idbigserial NOT NULL, > meter integer NOT NULL, > encrypted bytea, > decrypted bytea > ); > CREATE TABLE clt_rec_src > ( > id bigserial NOT NULL, > real_estate integer NOT NULL, > collector integer NOT NULL, > clt_rec bigserial > ); > CREATE TABLE meter_status_history > ( > id serial NOT NULL, > oms_rec bigint NOT NULL, > status smallint > ); vs. > CREATE TABLE datagram > ( > idbigserial NOT NULL, > src_reinteger NOT NULL, > src_clt integer NOT NULL, > src_meter integer NOT NULL, > captured_at timestamp with time zone NOT NULL, > captured_rssi smallint NOT NULL, > oms_statussmallint NOT NULL, > oms_enc bytea, > oms_dec bytea > ); Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: Increased size of database dump even though LESS consumed storage
Guten Tag Thorsten Schöning, am Dienstag, 9. Februar 2021 um 10:56 schrieben Sie: > Any explanation for the increased dump size? Thanks! Some more data: > --compress=0: 20 vs. 25 GiB > --compress=DEFAULT: 6,5 vs. 5,6 GiB > --compress=9: 6,4 vs. 5,5 GiB From the docs: > For the custom archive format, this specifies compression of > individual table-data segments[...] https://www.postgresql.org/docs/10/app-pgdump.html So the new table layout really seems to negatively impact compression, even though in both cases the "bytea" are at the end of the table. The only difference is the number of overall columns per table. Wouldn't have expected that. Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
How does Postgres decide if to use additional workers?
Hi all, I have the following table with an added BTREE-index on "captured_at". > CREATE TABLE datagram > ( > idbigserial NOT NULL, > src_reinteger NOT NULL, > src_clt integer NOT NULL, > src_meter integer NOT NULL, > captured_at timestamp with time zone NOT NULL, > captured_rssi smallint NOT NULL, > oms_statussmallint NOT NULL, > oms_enc bytea, > oms_dec bytea > ); I have a query containing the following condition: > AND "datagram"."captured_at" BETWEEN (CAST('2020-08-28T10:34:32.855+02:00' AS > TIMESTAMP WITH TIME ZONE) - CAST('P5D' AS INTERVAL)) AND > (CAST('2020-08-28T10:34:32.855+02:00' AS TIMESTAMP WITH TIME ZONE) + > CAST('P0D' AS INTERVAL)) When that query is applied to the above table with it's index on "captured_at", that results in the following query plan. The important thing to note is that NO parallel workers a re used. > -> Hash Join (cost=246164.35..2004405.07 rows=11323 width=51) (actual > time=93.802..5776.755 rows=104607 loops=1) > Hash Cond: (meter.meter_bcd = meter_bcd.id) > -> Hash Join (cost=246019.19..2003889.83 rows=68494 width=37) (actual > time=93.067..5744.787 rows=104607 loops=1) > Hash Cond: (datagram.src_meter = meter.id) > -> Index Scan using idx_datagram_captured_at_btree on datagram > (cost=0.57..1756571.73 rows=495033 width=20) (actual time=0.054..5451.417 > rows=514369 loops=1) > Index Cond: ((captured_at >= ('2020-08-28 > 10:34:32.855+02'::timestamp with time zone - '5 days'::interval)) AND > (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + > '00:00:00'::interval))) For various reasons I tested the above table as a partitioned one as well, with individual partitions containing the rows of one year only. The important thing to note is that I simply kept the same index on "captured_at" like before, though the query plan looks different now: > Workers Planned: 2 > Workers Launched: 2 > -> Hash Join (cost=245966.53..272335.67 rows=5419 width=51) (actual > time=625.846..1560.103 rows=34869 loops=3) > Hash Cond: (datagram_y2020_h2.src_meter = meter.id) > -> Parallel Append (cost=4.19..25430.72 rows=236911 width=20) (actual > time=2.827..863.298 rows=171456 loops=3) > Subplans Removed: 23 > -> Parallel Index Scan using datagram_y2020_h2_captured_at_idx on > datagram_y2020_h2 (cost=0.44..24051.22 rows=236888 width=20) (actual > time=2.826..848.388 rows=171456 loops=3) > Index Cond: ((captured_at >= ('2020-08-28 > 10:34:32.855+02'::timestamp with time zone - '5 days'::interval)) AND > (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + > '00:00:00'::interval))) It seems that only because of a different number of rows per individual table, additional workers are used. Though, in the past I had all of those "captured_at" in one table as well, only with far less columns and for that table additional workers have been used, pretty much like is the case now: > Workers Planned: 2 > Workers Launched: 2 > -> Hash Join (cost=264793.42..1666293.23 rows=4332 width=51) (actual > time=96.080..638.802 rows=34869 loops=3) > Hash Cond: (oms_rec.meter = meter.id) > -> Nested Loop (cost=1.14..1400747.39 rows=189399 width=20) (actual > time=0.145..496.366 rows=171456 loops=3) > -> Hash (cost=264709.53..264709.53 rows=6620 width=39) (actual > time=95.521..95.528 rows=40044 loops=3) > Buckets: 65536 (originally 8192) Batches: 1 (originally 1) Memory > Usage: 3016kB > -> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec > (cost=0.57..14853.95 rows=189399 width=24) (actual time=0.098..81.556 > rows=171456 loops=3) > -> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.32 rows=1 > width=12) (actual time=0.002..0.002 rows=1 loops=514369) > -> Hash Join (cost=145.59..264709.53 rows=6620 width=39) (actual > time=9.883..86.390 rows=40044 loops=3) > Index Cond: (id = clt_rec.oms_rec) > Index Cond: ((captured_at >= ('2020-08-28 > 10:34:32.855+02'::timestamp with time zone - '5 days'::interval)) AND > (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + > '00:00:00'::interval))) > Hash Cond: (meter.meter_bcd = meter_bcd.id) So, based on which
Re: Increased size of database dump even though LESS consumed storage
Guten Tag Thorsten Schöning, am Dienstag, 9. Februar 2021 um 10:56 schrieben Sie: > Any explanation for the increased dump size? Thanks! While I don't have a detailed explanation myself, there's the following hint [in the docs][1]: > For the custom archive format, this specifies compression of > individual table-data segments[...] https://www.postgresql.org/docs/10/app-pgdump.html I've changed the layout of my tables entirely and most likely have changed the "individual table-data segments" as well. So I did some further testing and got the following numbers. It's always OLD vs. NEW schema for the schema documented earlier: > --compress=0: 25 vs. 20 GiB > --compress=?: 5,6 vs. 6,5 GiB > --compress=9: 5,5 vs. 6,4 GiB "?" means default settings, meaning "--compress" has been REMOVED from the shell command. As can be seen, pretty much as soon as compression comes into play, the new table layout is less efficient, even though in OLD and NEW schema exactly the same "bytea" were used, have been placed at the end of their individual tables and alike. But things become more interesting with another version of my schema, which as well indicates a strong relationship of dump size, compression and table layout/size. I've changed the new table "datagram" to be a partitioned one with partitions containing rows per year and per half-year. Each partition contains far less rows than before this way and while I only tested with "--compress=9" this time, the numbers are quite interesting: > unpartitioned: 6,4 GiB > half-yearly parts: 4,8 GiB > yearly parts: 4,8 GiB The interesting part this time is that the table layout for all partitions is the same like before, only the number of rows per table is different. Though, the number of rows overall is the same like before, the same data, IDs etc. Though, this time the dump really is smaller than with the OLD schema containing far more data because of duplicate IDs and stuff. I wouldn't have expected table layout to be that important. Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: Increased size of database dump even though LESS consumed storage
Guten Tag Francisco Olarte, am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie: > [...]and partitioning sorts partially > by date ( or fully if you have made the partitions by range-querying > via index scan ).[...] That statement is especially interesting not only because of my dump-size, but I'm running into the problem that queries spanning more than one partition seem to prefer sequential scan over using indexes. My indexe seems to only be used when querying the rows of one partition. So, the following is my definition, should that be "range-queried via index scan" properly? :-) > CREATE TABLE datagram > ( > id bigserial NOT NULL, > src_re integer NOT NULL, > src_cltinteger NOT NULL, > src_meter integer NOT NULL, > captured_attimestamp with time zone NOT NULL, > captured_rssi smallint NOT NULL, > oms_status smallint NOT NULL, > oms_encbytea, > oms_decbytea > ) PARTITION BY RANGE (captured_at); > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM > ('1970-01-01') TO ('1970-07-01'); > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM > ('1970-07-01') TO ('1971-01-01'); > [...] > INSERT INTO datagram([...]) SELECT * FROM datagram_unpart; Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: Increased size of database dump even though LESS consumed storage
Guten Tag Francisco Olarte, am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie: > [...]and partitioning sorts partially > by date ( or fully if you have made the partitions by range-querying > via index scan ).[...] That statement is especially interesting not only because of my dump-size, but I'm running into the problem that queries spanning more than one partition seem to prefer sequential scan over using indexes. My indexe seems to only be used when querying the rows of one partition. So, the following is my definition, should that be "range-queried via index scan" properly? :-) > CREATE TABLE datagram > ( > id bigserial NOT NULL, > src_re integer NOT NULL, > src_cltinteger NOT NULL, > src_meter integer NOT NULL, > captured_attimestamp with time zone NOT NULL, > captured_rssi smallint NOT NULL, > oms_status smallint NOT NULL, > oms_encbytea, > oms_decbytea > ) PARTITION BY RANGE (captured_at); > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM > ('1970-01-01') TO ('1970-07-01'); > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM > ('1970-07-01') TO ('1971-01-01'); > [...] > INSERT INTO datagram([...]) SELECT * FROM datagram_unpart; > CREATE INDEX idx_datagram_for_time_window ON datagram USING btree (src_meter, > captured_at DESC); Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: Increased size of database dump even though LESS consumed storage
Guten Tag Thorsten Schöning, am Mittwoch, 10. Februar 2021 um 09:58 schrieben Sie: >> unpartitioned: 6,4 GiB >> half-yearly parts: 4,8 GiB >> yearly parts: 4,8 GiB The above number for `unpartitioned` might be wrong: I've re-created the same database, applied the same data migration like in the past and created a dump which was 4,8 GiB in size. Maybe I simply did something wrong and left some renamed table with data in the formerly used database, while NOT doing so for the other test databases. So my entire question might simply be based on user errors. :-) Sadly I don't seem to have the former dumps anymore to have a look at its content. Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Which partition scheme makes sense for my time based IoT-datagrams?
Hi all, I have a table storing datagrams from some IoT-devices, with one datagram per device per day most likely for around 75'000 devices currently. I want to test query performance with a partitioned table and am interested in the following queries mostly: * querying arbitrary datagrams by their ID * querying datagrams being X old based on some timestamp * 15 minutes * 15 days * 15 months My app isn't prepared to e.g. migrate things in the background, instead I would like to have Postgres deal with all apsects as transparent as possible. At least for the next few years, things worked without partitions in the past as well. Therefore I thought of simply partitioning by RANGE of the timestamp when the datagram has been received and create individual partitions per half a year. That means that in most cases only the most current 1 partition needs to be queried, with the last 3 in many other cases. Other approaches I've read on this list were e.g. partitioning the most current months individually and afterwards moving rows to some other "archive"-like table. Besides of the lack of the necessary infrastructure, in my use case in theory users need to be able to ask for the last 15 months at some arbitrary point in history at any time. I wouldn't like to deal with different tables or alike in my app. My approach would result in 24 tables already, with 2 more per year. Does that for itself sound bad already? Is that a limit the planner can deal with most likely or do I don't even need to care for hundreds or partitions? If partitioned by timestamp, how are lookups by ID performed? Is that a sequential scan on all partitions, e.g. with using available indexes per partition? Is there some way to RANGE by timestamp and ID, by keeping the half-year tables only? So that the planner knows easier which partition to look at in case of IDs only? Or is that simply like querying an ID-index of each partition? The following is what I have currently, extra verbose to support Postgres 10: > CREATE TABLE datagram > ( > idbigserial NOT NULL, > src_reinteger NOT NULL, > src_clt integer NOT NULL, > src_meter integer NOT NULL, > captured_at timestamp with time zone NOT NULL, > captured_rssi smallint NOT NULL, > oms_statussmallint NOT NULL, > oms_enc bytea, > oms_dec bytea > ) PARTITION BY RANGE (captured_at); > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM > ('1970-01-01') TO ('1970-07-01'); > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM > ('1970-07-01') TO ('1971-01-01'); > [...] > ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT pk_datagram_y1970_h1 > PRIMARY KEY (id); > ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT > ck_datagram_y1970_h1_oms_data_avail CHECK(oms_enc IS NOT NULL OR > oms_dec IS NOT NULL); > ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_re > FOREIGN KEY (src_re) REFERENCES real_estate (id); > ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_clt > FOREIGN KEY (src_clt) REFERENCES collector (id); > ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_meter > FOREIGN KEY (src_meter) REFERENCES meter(id); > ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT pk_datagram_y1970_h2 > PRIMARY KEY (id); > ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT > ck_datagram_y1970_h2_oms_data_avail CHECK(oms_enc IS NOT NULL OR > oms_dec IS NOT NULL); > ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_re > FOREIGN KEY (src_re) REFERENCES real_estate (id); > ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_clt > FOREIGN KEY (src_clt) REFERENCES collector (id); > ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_meter > FOREIGN KEY (src_meter) REFERENCES meter(id); > [...] > CREATE INDEX idx_datagram_y1970_h1_for_time_window ON datagram_y1970_h1 USING > btree (src_meter, captured_at DESC); > CREATE INDEX idx_datagram_y1970_h2_for_time_window ON datagram_y1970_h2 USING > btree (src_meter, captured_at DESC); > [...] An example query condition is the following, while "P5D" can simply be "P15M" or alike instead. > WHERE ( > "real_estate"."id" IN ([...]) > AND "meter"."id" IN ([...]) > AND "datagram"."captured_at" BETWEEN (CAST('2020-08-28T10:34:32.855+02:00' > AS TIMESTAMP WITH TIME ZONE) - CAST('P5D' AS INTERVAL)) AND > (CAST(
Re: Which partition scheme makes sense for my time based IoT-datagrams?
Guten Tag Laurenz Albe, am Mittwoch, 10. Februar 2021 um 17:45 schrieben Sie: > They will scan all partitions. You have to be prepared that most queries > will become at least slightly slower with partitioning. That is > expected. Does "most queries" address thos efor IDs scanning all partitions or those time-based as well? In the end, I'm trying to improve query performance by reducing the size of indexes, number of rows etc. per table using partitions. :-) The docs read like my use case would fit to partitioning as well and I've already tried all sorts of indexing on the unpartitioned table to improve some of my example queries with not much luck. Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: Which partition scheme makes sense for my time based IoT-datagrams?
Guten Tag Peter J. Holzer, am Donnerstag, 11. Februar 2021 um 15:03 schrieben Sie: > If you get one datagram per day, how it is useful to query all datagrams > received in the last 15 minutes?[...] There's a special installation mode during which users are setting up their IoT-devices. During that mode datagrams might be received more often, while newer ones are still deleting older ones. 15 minutes is simply the default value in the corresponding UI to check if anything has been received properly, but users might send+check more often. 15 days is some different default value regarding monitoring IoT-devices, if they have been received at all and stuff like that. 15 months again is a value users are interested in, because they need one measuring value per month most likely. And 3 times 15 simply "looks" good as well... :-) Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT - Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: PostgreSQL occasionally unable to rename WAL files (NTFS)
Guten Tag Guy Burgess, am Donnerstag, 11. Februar 2021 um 01:21 schrieben Sie: > What appears to be happening is the affected WAL files (which is > usually only 2 or 3 WAL files at a time) are somehow "losing" their > NTFS permissions, so the PG process can't rename them - though of > course the PG process created them. Even running icacls as admin > gives "Access is denied" on those files. A further oddity is the > affected files do end up disappearing after a while. If you see that somewhat frequently, use Process Monitor and Process Explorer to see who accesses those files how. ProcExp easily allows you to find all open handles per file. If it's not AV, it might be something like Windows Search Indexer as well, if that is enabled by default in Server 2019 at all. Though, even with my Windows 10 and Search Indexer enabled I didn't run into such problems yet. And especially when rewriting large parts of my databases with lots of created WAL files, I see the Indexer working on those files, but NOT conflicting with Postgres yet. The behaviour you describe happens exactly when two processes e.g. concurrently hold HANDLEs on the same file and one of those deletes the file then. Windows keeps file names until all open HANDLEs are closed and depending on how those HANDLEs have been opened by the first app, concurrent deletion is perferctly fine for Windows. Though, a such deleted file can't be opened easily anymore and looks like it has lost permissions only. But that's not the case, it's deleted already. It might be that this happens for Postgres to itself somehow when some other app has an open HANDLE. I don't think that some other app is deleting that file by purpose instead, reading it for some reason seems more likely to me. > dwShareMode > FILE_SHARE_DELETE > Enables subsequent open operations on a file or device to request > delete access. Otherwise, other processes cannot open the file or > device if they request delete access. https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: PostgreSQL occasionally unable to rename WAL files (NTFS)
Guten Tag Guy Burgess, am Montag, 15. Februar 2021 um 11:52 schrieben Sie: > The mystery now is that the only process logged as touching the > affected WAL files is postgres.exe (of which there are many separate > processes). Could it be that one of the postgres.exe instances is > holding the affected WAL files in use after another postgres.exe > instance has flagged the file as deleted?[...] I suggest checking your WAL-related and archive/backup settings for Postgres again. There's e.g. "archive_command" optionally copying WALs to some other place and postgres.exe would wait until that process has finished, maybe locking the file to copy itself as well. Or "archive_timeout" interfering with some other operations or alike. Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
How to hash a large amount of data within Postgres?
Hi all, I'm one of those people still storing user uploaded files within Postgres instead of some file system and over the years this increased to a point where individual uploads of multiple GiB are reached. Some years ago I implemented some SQL to read all files, build a table of SHA256 hashes and tell me how much data is redundant. The goal was to have a look at which files share the same hash with different LOIDs and optionally change that, so that all those files are only stored once on the end. While the approach was pretty naive, because it simply read all files into memory to calculate the hashes, I'm somewhat sure it worked in the past with Postgres 9.6. The executing server had enough free RAM available as well to process the at most ~4 GiB large files one after another. I tried that SQL today with Postgres 11 on UB 18.04 and it failed: > [Code: 0, SQL State: XX000] FEHLER: invalid memory alloc request size > 1898107949 > Wobei: PL/pgSQL-Funktion loid_hash_calc(oid,text)[...] > PostgreSQL > 11.12 (Ubuntu 11.12-1.pgdg18.04+1) > PostgreSQL JDBC Driver > 42.2.9 I searched regaridng that issue and only found two relevant results: Corrupted rows for some reason and simply size restrictions when allocating memory. The latter is more likely than the former in my case, as the restrictions seems to be 1 GiB and I do have larger files. I'm doing the following simply currently, because I didn't find any interfaces allowing to forward blocks of data, LOIDs, file descriptors or anything like that working with smaller buffers or alike. > fd := lo_open( loid, INV_READ); > size:= lo_lseek(fd, 0, SEEK_END); > PERFORMlo_lseek(fd, 0, SEEK_SET); > hashBin := digest(loread(fd, size), algorithm); > hashHex := encode(hashBin, 'hex'); So, is there any way to work around the problem I have currently? Can I increase the memory restriction somewhere in the config? Are there any functions available working with blocks of data I'm missing now? I didn't find any state maintainig HASH-calls. Thanks! Mit freundlichen Grüßen Thorsten Schöning -- AM-SoFT IT-Service - Bitstore Hameln GmbH Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Tel: 05151- 9468- 0 Tel: 05151- 9468-55 Fax: 05151- 9468-88 Mobil: 0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 221853 - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus - Bitstore Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: AM-Soft IT-Service - Bitstore Hameln GmbH i.G. , Brandenburger Str. 7c , 31789 Hameln Geschäftsführer Janine Galonska
Re: [SPAM] Re: How to hash a large amount of data within Postgres?
Guten Tag Tomas Vondra, am Donnerstag, 24. Juni 2021 um 00:56 schrieben Sie: > Not sure where you searched, but there definitely are interfaces to read > chunks of data from large objects - see this: That wasn't the point, but the lack of HASH-functions working with such read blocks of larger data and maintaining state internally. I only see functions outputting a calculated hash for a given block of data, start/progress/end-interfaces like in many other libs. So, am I missing something or do I need to build something based on multiple individually output hash results instead, like suggested? Mit freundlichen Grüßen Thorsten Schöning -- AM-SoFT IT-Service - Bitstore Hameln GmbH Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Tel: 05151- 9468- 0 Tel: 05151- 9468-55 Fax: 05151- 9468-88 Mobil: 0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 221853 - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus - Bitstore Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: AM-Soft IT-Service - Bitstore Hameln GmbH i.G. , Brandenburger Str. 7c , 31789 Hameln Geschäftsführer Janine Galonska
How much data does server side "loread" return?
Hi all, I need to read some large object and am wondering how much data "loread" returns in case it successfully reads at all AND the object contains at least as much data as requested. In that case, does "loread" always return the requested amount of data or does one need to always loop when calling that function? I didn't find any explicit docs regarding that point and almost all examples I saw where issuing one call with large amounts of requested data, regardless of the size of the large object. The latter most likely means that if available, all requested data is always returned. Besides that, if "loread" is used in a loop for various reasons already, one would need to count the overall number of bytes read to know when the last chunk was read. Instead, it would be enough to check for if the last returned chunk is less than requested. > LOOP > chunk := loread(fd, CHUNK_SIZE); > chunkSize := length(chunk); > fdRead := fdRead + chunkSize; > [...] > --EXIT WHEN (chunkSize < CHUNK_SIZE); > EXIT WHEN (fdRead = fdSize); > END LOOP; Is the first EXIT sufficient already and "fdRead" unnecessary? Thanks! Mit freundlichen Grüßen Thorsten Schöning -- AM-SoFT IT-Service - Bitstore Hameln GmbH Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Tel: 05151- 9468- 0 Tel: 05151- 9468-55 Fax: 05151- 9468-88 Mobil: 0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 221853 - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus - Bitstore Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: AM-Soft IT-Service - Bitstore Hameln GmbH i.G. , Brandenburger Str. 7c , 31789 Hameln Geschäftsführer Janine Galonska
pg_dumpall with individual output files per database?
Hi everyone, I need to backup multiple host with multiple Postgres databases each. In all of those cases I'm interested to backup all databases, which makes pg_dumpall a natural choice at first glance. Though, from my understanding of the docs that is only capable of storing all databases per host into one single file. That's not a too good approach for my databases, because some of those store files with few GiB in size for legacy reasons. I would like to avoid creating one large file only which then mixes those file-databases with others containing less or different types of data. So, is there some option I'm missing telling pg_dumpall to dump into individual files, simply named after e.g. the dumped databases? If not, was a feature like that discussed already or what's the reasons to not do that? There are a lot of search results how to dump all databases with lots of different scripting approaches. Many of those could simply be avoided with pg_dumpall supporting that already. Tools like BorgMatic making use of pg_dumpall might benefit of such a feature as well: https://projects.torsion.org/witten/borgmatic/issues/393 Thanks! Mit freundlichen Grüßen Thorsten Schöning -- AM-SoFT IT-Service - Bitstore Hameln GmbH Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Tel: 05151- 9468- 0 Tel: 05151- 9468-55 Fax: 05151- 9468-88 Mobil: 0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 221853 - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus - Bitstore Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: AM-Soft IT-Service - Bitstore Hameln GmbH i.G. , Brandenburger Str. 7c , 31789 Hameln Geschäftsführer Janine Galonska