Re: Enforce primary key on every table during dev?
> On 1 Mar 2018, at 1:47, Melvin Davidson wrote: > > I think you would be better off having an automated report which alerts > >you to tables lacking a primary key and deal with that policy through > >other means. > > Perhaps a better solution is to have a meeting with the developers and > explain to them > WHY the policy of enforcing a primary key is important. Also, explain the > purpose of > primary keys and why it is not always suitable to just use an integer or > serial as the key, > but rather why natural unique (even multi column) keys are better. But this > begs the question, > why are "developers" allowed to design database tables? That should be the > job of the DBA! At > the very minimum, the DBA should be reviewing and have the authority to > approve of disapprove > of table/schema designs/changes . Not to mention that not all types of tables necessarily have suitable candidates for a primary key. You could add a surrogate key based on a serial type, but in such cases that may not serve any purpose other than to have some arbitrary primary key. An example of such tables is a monetary transaction table that contains records for deposits and withdrawals to accounts. It will have lots of foreign key references to other tables, but rows containing the same values are probably not duplicates. Adding a surrogate key to such a table just adds overhead, although that could be useful in case specific rows need updating or deleting without also modifying the other rows with that same data - normally, only insertions and selections happen on such tables though, and updates or deletes are absolutely forbidden - corrections happen by inserting rows with an opposite transaction. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Enforce primary key on every table during dev?
On 1 March 2018 at 17:22, Steven Lembark wrote: > >> On 03/01/2018 02:20 AM, Alban Hertroys wrote: >> [snip] >> > Not to mention that not all types of tables necessarily have >> > suitable candidates for a primary key. You could add a surrogate >> > key based on a serial type, but in such cases that may not serve >> > any purpose other than to have some arbitrary primary key. >> > >> > An example of such tables is a monetary transaction table that >> > contains records for deposits and withdrawals to accounts. (...) > Start with Date's notion that a database exists to correclty represent > data about the real world. Storing un-identified data breaks this > since we have no idea what the data means or have any good way of > getting it back out. Net result is that any workable relational > database will have at least one candidate key for any table in it. (...) > If you have a design with un-identified data it means that you havn't > normalized it properly: something is missing from the table with > un-identifiable rows. While that holds true for a relational model, in reporting for example, it is common practice to denormalize data without a requirement to be able to identify a single record. The use case for such tables is providing quick aggregates on the data. Often this deals with derived data. It's not that uncommon to not have a primary or even a uniquely identifiable key on such tables. I do not disagree that having a primary key on a table is a bad thing, but I do disagree that a primary key is a requirement for all tables. More generally: For every rule there are exceptions. Even for this one. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Programmatically duplicating a schema
> On 13 Mar 2018, at 4:23, matt.f...@internode.on.net wrote: > > Hi all, > > What is a reliable way to programmatically & generically populate an empty > schema with all the objects in the public schema as a template? > > We are using the multi tenancy ruby gem Apartment ( > https://github.com/influitive/apartment ), which was recently broken by the > changes made to pg_dump to address CVE-2018-1058 > https://nvd.nist.gov/vuln/detail/CVE-2018-1058 > > Apartment attempts to duplicate the public schema whenever creating a new > schema by running: > > pg_dump -s -x -0 -n public > > to get the SQL statements needed to recreate the public schema & then > executes the pg_dump's sql output after creating & switching to the new > schema ( via set search_path to ; ) > > After the fix to CVE-2018-1058, all table references in pg_dump's output > (including within SQL of stored procedures) are prefixed by the public. > schema, which means you cannot just reuse this output in a different schema > context without first manually changing the sql. > As a temporary fix so we can handle new customers in production, we are using > a regex search/replace for public. in the pg_dump output, but clearly this is > not a reliable solution for a generic gem such as Apartment. In my opinion, it makes sense that if you have the option of dumping the contents of a specific schema, it should be possible to restore that dump into a different schema. Unfortunately, looking at pg_restore, there does not appear to be such an option (yet). I'd even go so far to suggest that every single object type that can be dumped with pg_dump (single database, single schema, single table, single function, etc) should be restorable under a different name. I realise that this could make pg_restore options potentially more confusing. I suppose people currently manually edit the dumps to this effect, but that risks silent corruption of data when for example a data value contains a string such as 'The building is now open to public.'. Regular expressions don't know the difference between data and identifiers in a dump file - pg_restore does. Whether psql needs the same treatment? I'd qualify this as "advanced" use and limit it to pg_restore. But then, I'm just a list-lurker, I currently have but the option of voicing my opinion. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Use pgloader with FDW
I suppose MySQL has the concept of views. Perhaps you can create a view over the table that translates those 0-datetimes to NULL and have the FDW connect to the view instead? On 23 March 2018 at 14:27, Patricia DECHANDOL wrote: > Thanks a lot Adrian. > We can't update the mysql columns values, so I will adopt the datetime -> > text solution. > Thanks again > > > -Message d'origine- > De : Adrian Klaver > Envoyé : vendredi 23 mars 2018 14:25 > À : Patricia DECHANDOL ; > pgsql-general@lists.postgresql.org > Objet : Re: Use pgloader with FDW > > On 03/23/2018 06:07 AM, Patricia DECHANDOL wrote: >> Hello Adrian, >> >> So, if I well understand, the only solution is to wrap the mysql datetime >> columns to "text" columns in my foreign tables. >> And then use a cast function to convert from text to date in Postgre when I >> want to use these columns ? > > There is also the option of changing the values in the MySQL database to > either an actual datetime or NULL. Though there is the potential issue of > what that would do to code that is pulling from the MySQL database. > >> >> No other way. >> The pgloader can't be used by the FDW to manage this point ? > > It has been a while since I used pgloader, but from what I remember it is a > tool for doing the initial migration of data from MySQL/Sqlite/SQL Server to > Postgres. What you seem to be looking for is pgloader to sit between the > MySQL database and the Postgres one and do the transformation in real time. > AFAIK that is not possible. > >> >> >> >> -Message d'origine- >> De : Adrian Klaver Envoyé : vendredi 23 >> mars 2018 13:52 À : Patricia DECHANDOL ; >> pgsql-general@lists.postgresql.org >> Objet : Re: Use pgloader with FDW >> >> On 03/23/2018 03:55 AM, Patricia DECHANDOL wrote: >>> Hello, >>> >>> I'm new to Postgresql and try to use the FDW with mysql database. >>> >>> Everything is OK to create my FDW and foreign tables, but I meet a >>> problem when I try to do a select on a FOREIGN table containing >>> datetime column which contains the value "-00-00 00:00:00" in mysql. >> >> That is MySQL's version of NULL for datetimes. It is not a valid datetime >> though. >> >>> >>> Select on the foreign table fails. >>> >>> The mysql datetime column has been automatically wrapped to >>> "timestamp without timezone" column in the foreign table by the >>> instruction "import foreign schema" that I used. >> >>> >>> How can I deal with this ? >> >> https://github.com/EnterpriseDB/mysql_fdw/issues/38 >> >>> >>> I read about the pgloader with can manage this king of problem, but >>> can't figure out how to use it with FDW. >>> >>> Thanks for your help. >>> >> >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com >> > > > -- > Adrian Klaver > adrian.kla...@aklaver.com -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Concatenate of values in hierarchical data
010,439755 > 'CTS', '536', 2009, 2010,439765 > > There are primary keys on indexlist (on indexlistid) and indexlistapp (on > indexlistid) but there is no foreign key pointing to the other table. The > indexlistid in indexlist > points directly to the indexlistid in indexlistapp. The parent column is > simply calculated from the treelevel. The tree is built entirely from the seq > and treelevel. > > I need the data to be returned in this format: > > app price ic > --- > 'Front-V-Series-opt J56-R.', '$693','536-01132AR' > 'Front-V-Series-opt J56-L.', '$693','536-01132AL' > 'Front-V-Series-opt J63-R.', null, '536-01130R' > 'Front-V-Series-opt J63-L.', null, '536-01131L' > 'Front-exc. V-Series-opt JE5-AWD-R.', null, '536-01142' > 'Front-exc. V-Series-opt JE5-AWD-L.', null, '536-01143' > 'Front-exc. V-Series-opt JE5-RWD-R.', null, '536-01143' > 'Front-exc. V-Series-opt JE5-RWD-L.', null, '536-01142' > 'Front-exc. V-Series-opt J55-AWD-R.', null, '536-01136' > 'Front-exc. V-Series-opt J55-AWD-L.', null, '536-01137' > 'Front-exc. V-Series-opt J55-RWD-R.', null, '536-01137' > 'Front-exc. V-Series-opt J55-RWD-L.', null, '536-01136' > 'Rear-Base-opt JE5-R.', null, '536-01038R' > 'Rear-Base-opt JE5-L.', null, '536-01039L' > 'Rear-Base-opt J55-R.', null, '536-01042R' > 'Rear-Base-opt J55-L.', null, '536-01043L' > 'Rear-V-Series-R.', '$403.00', '536-01134AR' > 'Rear-V-Series-L.', '$466.00', '536-01135AL' > > I am unsure how to do this in SQL. (…) > Also, using this recursive sql > (REF:https://stackoverflow.com/questions/26280379/how-to-concatenate-field-values-with-recursive-query-in-postgresql), > I'm able to get it fairly close. > Just not sure why it is returning 476 rows vs. 34. > http://sqlfiddle.com/#!15/ca1ee/3 > > WITH RECURSIVE the_tree AS ( > > SELECT g.seq, g.app::TEXT, g.price, g.ic, g.treelevel::INTEGER > FROM indexlistapp g > WHERE g.indexlistid in (SELECT il.indexlistid > FROM indexlist il > WHERE il.model = 'CTS' > AND ('2010' BETWEEN il.beginyear AND > il.endyear) > AND il.type = '536') > AND g.treelevel = 1 > > > UNION > > SELECT t.seq, t.app || ', ' || g.app AS app, t.price, t.ic, > t.treelevel::INTEGER + 1 > FROM the_tree AS t > INNER JOIN indexlistapp g ON g.treelevel = t.treelevel + 1 > WHERE g.indexlistid in (SELECT il.indexlistid > FROM indexlist il > WHERE il.model = 'CTS' > AND ('2010' BETWEEN il.beginyear AND > il.endyear) > AND il.type = '536') > ) > SELECT * from the_tree; > > > Can anyone help me out here? Your JOIN in the recursive part of the CTE should probably be ON g.parent = t.treelevel AND g.indexlistid = t.indexlistid. Since both the initial and recursive part use the same value for indexlistid, the subquery in the recursive part is superfluous. You should also take the order in seq into account for items on the same node, if that is possible in your data, otherwise there is no guarantee that you'll get your string components in the correct order. I think you need something closer to this: WITH RECURSIVE the_tree AS ( SELECT g.seq, g.app::TEXT, g.price, g.ic, g.treelevel::INTEGER, g.indexlistid FROM indexlistapp g WHERE g.indexlistid IN ( SELECT il.indexlistid FROM indexlist il WHERE il.model = 'CTS' AND '2010' BETWEEN il.beginyear AND il.endyear AND il.type = '536' ) AND g.parent = 0 UNION ALL SELECT t.seq, t.app || ', ' || g.app AS app, t.price, t.ic, g.treelevel, g.indexlistid FROM the_tree t INNER JOIN indexlistapp g ON g.parent = t.treelevel AND g.indexlistid = t.indexlistid ORDER BY g.seq ) SELECT * FROM the_tree Another observation is that you only seem interested in the leaf nodes of your tree. Recognising leaf nodes in your hierarchical query can be achieved by adding a column that tests the existence of a node with the current node as its parent - if it does not exist, this node is a leaf node. The result of SELECT * FROM the_tree can then be filtered to return leaf nodes only. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Same condition in the CTE and in the subsequent JOIN using it
> On 25 Apr 2018, at 17:45, Alexander Farber wrote: (…) > And here is the function source code: > > CREATE OR REPLACE FUNCTION words_stat_scores( > in_social integer, > in_sidtext > ) RETURNS TABLE ( > out_day text, > out_diff numeric, > out_score numeric > ) AS > $func$ > WITH cte AS ( > SELECT > DATE_TRUNC('day', m.played) AS day, > m.mid, > EXTRACT(EPOCH FROM m.played - LAG(m.played) OVER > (PARTITION BY m.gid ORDER BY m.played))::int/60 AS diff > FROMwords_moves m > JOINwords_games g ON (m.gid = g.gid) > JOINwords_social s ON (s.uid IN (g.player1, g.player2)) > WHERE s.social = in_social -- CAN > THIS BE REFERRED TO FROM BELOW? > AND s.sid = in_sid > AND m.played > CURRENT_TIMESTAMP - interval '1 month' > ) > SELECT > TO_CHAR(c.day, 'DD.MM.'), > ROUND(AVG(c.diff)), > ROUND(AVG(m.score), 1) > FROMwords_moves m > JOINcte c using(mid) > JOINwords_social s USING(uid) > WHERE s.social = in_social > AND s.sid = in_sid > AND m.action = 'play' > GROUP BY c.day > ORDER BY c.day; > > $func$ LANGUAGE sql STABLE; > > By looking at the above source code, do you think, that the condition being > used twice (the s.social = in_social AND s.sid = in_sid) is "too much" and > can be optimized? :-) Actually, no. The conditions are part of different joins. Within the CTE, you have a join that boils down to: > FROMwords_games g ON (m.gid = g.gid) > JOINwords_social s ON (s.uid IN (g.player1, g.player2) AND s.social = > in_social AND s.sid = in_sid) In your outer query, you have: > FROMwords_moves m > JOINwords_social s ON (s.uid = m.uid AND s.social = in_social AND s.sid > = in_sid) The joins are on different fields, in different tables even, so you can't just leave the conditions out because they filter different rows. What you _can_ do is move the words_social JOIN and it's conditions into a new CTE and join with that instead. Something like so: WITH words_in_social AS ( SELECT sid, uid FROM words_social WHERE social = in_social AND sid = in_sid ), cte AS ( SELECT DATE_TRUNC('day', m.played) AS day, m.mid, EXTRACT(EPOCH FROM m.played - LAG(m.played) OVER (PARTITION BY m.gid ORDER BY m.played))::int/60 AS diff FROMwords_moves m JOINwords_games g ON (m.gid = g.gid) JOINwords_in_social s ON (s.uid IN (g.player1, g.player2)) WHERE m.played > CURRENT_TIMESTAMP - interval '1 ) SELECT TO_CHAR(c.day, 'DD.MM.'), ROUND(AVG(c.diff)), ROUND(AVG(m.score), 1) FROM words_moves m JOIN cte c using(mid) JOIN words_in_social s USING(uid) WHERE m.action = 'play' GROUP BY c.day ORDER BY c.day; That may look cleaner, but your original query probably performs better, since CTE's also act as an optimisation fence. BTW, I suggest to use a better name for your CTE than cte; I'd rather use a name that clarifies its purpose. > Thank you for any hints, I apologize if my question is too specific and > difficult to answer... > > Regards > Alex Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: ON CONFLICT DO UPDATE
> On 10 May 2018, at 7:13, tango ward wrote: > ON CONFLICT (school_system_id, > student_id, > campus_name > ) DO UPDATE > SET school_system_id = > excluded.school_system_id, > student_id = excluded.student_id, > campus_name = excluded.campus_name I'm pretty sure this ought to read: ON CONFLICT (school_system_id, student_id, campus_name) DO UPDATE SET modified = EXCLUDED.modified, balance = EXCLUDED.balance, balance_as_of = EXCLUDED.balance_as_of Instead, you were re-assigning the keys (school_system_id, student_id, campus_name) to the same values again. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: What to do when dynamic shared memory control segment is corrupt
> On 18 Jun 2018, at 17:34, Sherrylyn Branchaw wrote: > > In the other case, the logs recorded > > LOG: all server processes terminated; reinitializing > LOG: dynamic shared memory control segment is corrupt > LOG: incomplete data in "postmaster.pid": found only 1 newlines while trying > to add line 7 > > In that case, the database did not restart on its own. It was 5 am on Sunday, > so the on-call SRE just manually started the database up, and it appears to > have been running fine since. That rings a bell. Some versions of systemd apparently clean up shared memory belonging to a user when it detects the user logs out. ISTR that we had to disable that on our CentOS 7 server to stop crashes from happening. More details here: https://wiki.postgresql.org/wiki/Systemd Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Is postorder tree traversal possible with recursive CTE's?
Hi all, I'm struggling with a hierarchical query where I'm tasked to calculate weights of items in an (exploded) Bill of Materials, based on the weights of their components. Not all components are measured with a weight, sometimes there are pieces, meters, areas, etc, and the hierarchy is of varying levels of depth. It would help if I could track a sum() throughout the explosion that would write back onto parent rows when the recursion returns: postorder traversal. I created a simplified example about making pizza: CREATE TABLE ingredient ( name text NOT NULL ); CREATE TABLE recipe ( name text NOT NULL, ingredient text NOT NULL, quantity numeric(6,2) NOT NULL, unit text NOT NULL, step integer NOT NULL ); COPY ingredient (name) FROM stdin; tomato basil salt tomato sauce flour water yeast dough pizza bottom pizza \. COPY recipe (name, ingredient, quantity, unit, step) FROM stdin; tomato saucetomato 100.00 g 1 dough flour 150.00 g 1 tomato saucebasil 10.00 g 2 pizza pizza bottom1.00pcs 2 tomato saucesalt3.00g 3 dough salt1.00pinch 3 pizza tomato sauce1.00pcs 1 pizza bottomdough 1.00pcs 2 dough water 50.00 g 2 \. ALTER TABLE ONLY ingredient ADD CONSTRAINT ingredient_pkey PRIMARY KEY (name); ALTER TABLE ONLY recipe ADD CONSTRAINT recipe_pkey PRIMARY KEY (name, ingredient); ALTER TABLE ONLY recipe ADD CONSTRAINT recipe_ingredient_fkey FOREIGN KEY (ingredient) REFERENCES ingredient(name); ALTER TABLE ONLY recipe ADD CONSTRAINT recipe_name_fkey FOREIGN KEY (name) REFERENCES ingredient(name); A query listing the recipe for 'pizza' would be as follows: development=> with recursive pizza (name, step, ingredient, quantity, unit, rel_qty, path, weight) as ( select name, step, ingredient, quantity, unit , quantity::numeric(10,2) , step::text , case when unit = 'g' then quantity::numeric(10,2) else null end from recipe where name = 'pizza' union all select recipe.name, recipe.step, recipe.ingredient, recipe.quantity, recipe.unit , (pizza.rel_qty * recipe.quantity)::numeric(10,2) , pizza.path || '.' || recipe.step , case when recipe.unit = 'g' then (pizza.rel_qty * recipe.quantity)::numeric(10,2) else null end from pizza join recipe on (recipe.name = pizza.ingredient) ) select path, ingredient, quantity, rel_qty, unit, weight from pizza order by path; path | ingredient | quantity | rel_qty | unit | weight ---+--+--+-+---+ 1 | tomato sauce | 1.00 |1.00 | pcs | 1.1 | tomato | 100.00 | 100.00 | g | 100.00 1.2 | basil|10.00 | 10.00 | g | 10.00 1.3 | salt | 3.00 |3.00 | g | 3.00 2 | pizza bottom | 1.00 |1.00 | pcs | 2.2 | dough| 1.00 |1.00 | pcs | 2.2.1 | flour| 150.00 | 150.00 | g | 150.00 2.2.2 | water|50.00 | 50.00 | g | 50.00 2.2.3 | salt | 1.00 |1.00 | pinch | (9 rows) With these results, I somehow need to calculate that the weights of 'tomato sauce', 'dough' and 'pizza bottom' are 113 g, 200 g and 200 g respectively, bringing the total weight of 'pizza' to 313 g. My first thought was to traverse the result of this recursive CTE using another one, but in the opposite direction. But since this tends to be kept as a temporary materialized result set with no indices, that's not performing great and it adds a fair amount of complexity to the query too. Then I realised that if we somehow could track the sum() of 'weight' throughout exploding these recipe items, by using a postorder tree traversal, the desired result would be readily available to pick up when the recursive CTE travels up through the hierarchy. In above example; When the CTE would reach '1.3 salt', it would write the summed 'weight' value 113 back on the result for '1 tomato sauce' and when it reached '2.2.2 salt' it would write back 200 to '2.2 dough' and then 200 to '2 pizza bottom'. Is that possible? I've seen a couple of "solutions" on the internet that just summed up the results of the CTE, but that won't do as it would not put the correct weights onto intermediate levels of the tree as far as I can see (in above, the weight of 'dough'). Regards, Alban Hertroys PS. Don't try to make pizza using this recipe, it probably won't succeed. I forgot the yeast, for one thing, and quantities are probably way off. Not t
Re: Is postorder tree traversal possible with recursive CTE's?
3.00 |3.00 | g | 3.00 | 313.00 > 2 | pizza bottom | 1.00 |1.00 | pcs | | 313.00 > 2.2 | dough| 1.00 |1.00 | pcs || 313.00 > 2.2.1 | flour| 150.00 | 150.00 | g | 150.00 | 313.00 > 2.2.2 | water|50.00 | 50.00 | g | 50.00 | 313.00 > 2.2.3 | salt | 1.00 |1.00 | pinch || 313.00 > (9 rows) > > > > > > El mar., 19 de jun. de 2018 a la(s) 08:39, Alban Hertroys > (haram...@gmail.com) escribió: > Hi all, > > I'm struggling with a hierarchical query where I'm tasked to calculate > weights of items in an (exploded) Bill of Materials, based on the weights of > their components. Not all components are measured with a weight, sometimes > there are pieces, meters, areas, etc, and the hierarchy is of varying levels > of depth. > > It would help if I could track a sum() throughout the explosion that would > write back onto parent rows when the recursion returns: postorder traversal. > > I created a simplified example about making pizza: > > CREATE TABLE ingredient ( > name text NOT NULL > ); > > CREATE TABLE recipe ( > name text NOT NULL, > ingredient text NOT NULL, > quantity numeric(6,2) NOT NULL, > unit text NOT NULL, > step integer NOT NULL > ); > > COPY ingredient (name) FROM stdin; > tomato > basil > salt > tomato sauce > flour > water > yeast > dough > pizza bottom > pizza > \. > > COPY recipe (name, ingredient, quantity, unit, step) FROM stdin; > tomato saucetomato 100.00 g 1 > dough flour 150.00 g 1 > tomato saucebasil 10.00 g 2 > pizza pizza bottom1.00pcs 2 > tomato saucesalt3.00g 3 > dough salt1.00pinch 3 > pizza tomato sauce1.00pcs 1 > pizza bottomdough 1.00pcs 2 > dough water 50.00 g 2 > \. > > ALTER TABLE ONLY ingredient > ADD CONSTRAINT ingredient_pkey PRIMARY KEY (name); > > ALTER TABLE ONLY recipe > ADD CONSTRAINT recipe_pkey PRIMARY KEY (name, ingredient); > > ALTER TABLE ONLY recipe > ADD CONSTRAINT recipe_ingredient_fkey FOREIGN KEY (ingredient) REFERENCES > ingredient(name); > > ALTER TABLE ONLY recipe > ADD CONSTRAINT recipe_name_fkey FOREIGN KEY (name) REFERENCES > ingredient(name); > > > A query listing the recipe for 'pizza' would be as follows: > development=> with recursive pizza (name, step, ingredient, quantity, unit, > rel_qty, path, weight) > as ( > select > name, step, ingredient, quantity, unit > , quantity::numeric(10,2) > , step::text > , case when unit = 'g' then quantity::numeric(10,2) else null > end > from recipe > where name = 'pizza' > union all > select > recipe.name, recipe.step, recipe.ingredient, recipe.quantity, > recipe.unit > , (pizza.rel_qty * recipe.quantity)::numeric(10,2) > , pizza.path || '.' || recipe.step > , case when recipe.unit = 'g' then (pizza.rel_qty * > recipe.quantity)::numeric(10,2) else null end > from pizza > join recipe on (recipe.name = pizza.ingredient) > ) > select path, ingredient, quantity, rel_qty, unit, weight > from pizza > order by path; > > path | ingredient | quantity | rel_qty | unit | weight > ---+--+--+-+---+ > 1 | tomato sauce | 1.00 |1.00 | pcs | > 1.1 | tomato | 100.00 | 100.00 | g | 100.00 > 1.2 | basil|10.00 | 10.00 | g | 10.00 > 1.3 | salt | 3.00 |3.00 | g | 3.00 > 2 | pizza bottom | 1.00 |1.00 | pcs | > 2.2 | dough| 1.00 |1.00 | pcs | > 2.2.1 | flour| 150.00 | 150.00 | g | 150.00 > 2.2.2 | water|50.00 | 50.00 | g | 50.00 > 2.2.3 | salt | 1.00 |1.00 | pinch | > (9 rows) > > > With these results, I somehow need to calculate that the weights of 'tomato > sauce', 'dough' and 'pizza bottom' are 113 g, 200 g and 200 g respectively, > bringing the total weight of 'pizza' to 313 g. > > My first thought was to traverse the result of this recursive CTE using > another one, but in the opposite direction. But since this tends to be kept > as a temporary materialized result set with no indices, that's n
Re: Load data from a csv file without using COPY
> On 19 Jun 2018, at 22:16, Ravi Krishna wrote: > > In order to test a real life scenario (and use it for benchmarking) I want to > load large number of data from csv files. > The requirement is that the load should happen like an application writing to > the database ( that is, no COPY command). > Is there a tool which can do the job. Basically parse the csv file and > insert it to the database row by row. > > thanks I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Is postorder tree traversal possible with recursive CTE's?
313.00 >> 1.3 | salt | 3.00 |3.00 | g | 3.00 | 313.00 >> 2 | pizza bottom | 1.00 |1.00 | pcs || 313.00 >> 2.2 | dough| 1.00 |1.00 | pcs || 313.00 >> 2.2.1 | flour| 150.00 | 150.00 | g | 150.00 | 313.00 >> 2.2.2 | water|50.00 | 50.00 | g | 50.00 | 313.00 >> 2.2.3 | salt | 1.00 |1.00 | pinch || 313.00 >> (9 rows) >> >> >> >> >> >> El mar., 19 de jun. de 2018 a la(s) 08:39, Alban Hertroys >> (haram...@gmail.com) escribió: >>> >>> Hi all, >>> >>> I'm struggling with a hierarchical query where I'm tasked to calculate >>> weights of items in an (exploded) Bill of Materials, based on the weights of >>> their components. Not all components are measured with a weight, sometimes >>> there are pieces, meters, areas, etc, and the hierarchy is of varying levels >>> of depth. >>> >>> It would help if I could track a sum() throughout the explosion that >>> would write back onto parent rows when the recursion returns: postorder >>> traversal. >>> >>> I created a simplified example about making pizza: >>> >>> CREATE TABLE ingredient ( >>> name text NOT NULL >>> ); >>> >>> CREATE TABLE recipe ( >>> name text NOT NULL, >>> ingredient text NOT NULL, >>> quantity numeric(6,2) NOT NULL, >>> unit text NOT NULL, >>> step integer NOT NULL >>> ); >>> >>> COPY ingredient (name) FROM stdin; >>> tomato >>> basil >>> salt >>> tomato sauce >>> flour >>> water >>> yeast >>> dough >>> pizza bottom >>> pizza >>> \. >>> >>> COPY recipe (name, ingredient, quantity, unit, step) FROM stdin; >>> tomato saucetomato 100.00 g 1 >>> dough flour 150.00 g 1 >>> tomato saucebasil 10.00 g 2 >>> pizza pizza bottom1.00pcs 2 >>> tomato saucesalt3.00g 3 >>> dough salt1.00pinch 3 >>> pizza tomato sauce1.00pcs 1 >>> pizza bottomdough 1.00pcs 2 >>> dough water 50.00 g 2 >>> \. >>> >>> ALTER TABLE ONLY ingredient >>> ADD CONSTRAINT ingredient_pkey PRIMARY KEY (name); >>> >>> ALTER TABLE ONLY recipe >>> ADD CONSTRAINT recipe_pkey PRIMARY KEY (name, ingredient); >>> >>> ALTER TABLE ONLY recipe >>> ADD CONSTRAINT recipe_ingredient_fkey FOREIGN KEY (ingredient) >>> REFERENCES ingredient(name); >>> >>> ALTER TABLE ONLY recipe >>> ADD CONSTRAINT recipe_name_fkey FOREIGN KEY (name) REFERENCES >>> ingredient(name); >>> >>> >>> A query listing the recipe for 'pizza' would be as follows: >>> development=> with recursive pizza (name, step, ingredient, quantity, >>> unit, rel_qty, path, weight) >>> as ( >>> select >>> name, step, ingredient, quantity, unit >>> , quantity::numeric(10,2) >>> , step::text >>> , case when unit = 'g' then quantity::numeric(10,2) else >>> null end >>> from recipe >>> where name = 'pizza' >>> union all >>> select >>> recipe.name, recipe.step, recipe.ingredient, >>> recipe.quantity, recipe.unit >>> , (pizza.rel_qty * recipe.quantity)::numeric(10,2) >>> , pizza.path || '.' || recipe.step >>> , case when recipe.unit = 'g' then (pizza.rel_qty * >>> recipe.quantity)::numeric(10,2) else null end >>> from pizza >>> join recipe on (recipe.name = pizza.ingredient) >>> ) >>> select path, ingredient, quantity, rel_qty, unit, weight >>> from pizza >>> order by path; >>> >>> path | ingredient | quantity | rel_qty | unit | weight >>> ---+--+--+-+---+ >>> 1 | tomato sauce | 1.00 |1.00 | pcs | >>> 1.1 | tomato | 100.00 | 100.00 | g | 100.00 >>> 1.2 | basil|10.00 | 10.00 | g | 10.00 >>> 1.3 | salt | 3.00 |3.0
Re: DB size growing exponentially when materialized view refreshed concurrently (postgres 9.6)
> On 25 Jun 2018, at 19:21, Vikas Sharma wrote: > > I am looking for advice in a issue where two materialized views are being > refreshed concurrently and dbsize has grown to 150gb from 4gb in two days. > > We use two materialized views to keep processed data for faster query results > for a search function. Earlier materialized views were refreshed not > concurrently and all was good on DB. > > We changed mv refresh to concurrently to take advantage of simultaneous > access when mv refreshed. Now the refresh takes slightly longer and but DB > size has grown exponentially. > > I ran full vacuum on DB and size again reduced to 4gb from 150gb. You did not disable or tune down autovacuum perchance? With materialized view refreshes that often, you probably need fairly aggressive autovacuuming on that table - you can tune autovacuum parameters per table (see Storage parameters). That probably won't put you at 4GB, more around double that size, but it should stay a reasonable size that way. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Split daterange into sub periods
On 5 July 2018 at 16:16, hmidi slim wrote: > In fact I'm trying to split a period in sub periods. Following this example > : > If I have a period =[2018-01-01, 2018-01-31] and two other periods > [2018-01-04, 2018-01-06] and [2018-01-08, 2018-01-08]. > If I split the base period '[2018-01-01, 2018-01-31]' by the other two > periods '[2018-01-04, 2018-01-06]' and '[2018-01-08, 2018-01-08]' I will got > such a result: > [2018-01-01, 2018-01-03] > [2018-01-07, 2018-01-07] > [2018-01-09, 2018-01-31]. What about a recursive CTE? What about a recursive CTE? with recursive period as (select '[2018-01-01, 2018-01-31]'::daterange as range) ,exclude as ( select range from (values ('[2018-01-01, 2018-01-03]'::daterange) ,('[2018-01-07, 2018-01-07]'::daterange) ,('[2018-01-09, 2018-01-31]'::daterange) ) v(range) ) ,available (lo, hi, exclude, available) as ( select lower(p.range), upper(p.range) ,x.range ,p.range - x.range from period p,exclude x where not exists ( select 1 from exclude x2 where lower(x2.range) < lower(x.range) and lower(x2.range) >= lower(p.range) ) union all select upper(x.range), hi ,x.range ,daterange(upper(x.range), hi) from available a, exclude x where a.lo <= a.hi and lower(x.range) > lo and not exists ( select 1 from exclude x2 where lower(x2.range) < lower(x.range) and lower(x2.range) > lo ) ) select * from available; lo | hi | exclude |available ++-+- 2018-01-01 | 2018-02-01 | [2018-01-01,2018-01-04) | [2018-01-04,2018-02-01) 2018-01-08 | 2018-02-01 | [2018-01-07,2018-01-08) | [2018-01-08,2018-02-01) 2018-02-01 | 2018-02-01 | [2018-01-09,2018-02-01) | empty (3 rows) It can probably be optimized a bit, I haven't played with ranges much yet. Regards, Alban Hertroys -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Restore from dumps
> On 25 Jul 2018, at 9:43, Nicola Contu wrote: > > Hello, > we recently moved from postgres 9.6.6 to 10.4 > > We perform a pg_dump in production to restore daily in a preprod env. > This process used to work perfectly, but now we have a tiny problem. > > We first restore data, we perform a vacuum and then we restore matviews. What are the commands you used? You don't seem to mention restoring the schema? > Restoring matviews now we have : > > pg_restore: [archiver (db)] Error while PROCESSING TOC: > pg_restore: [archiver (db)] Error from TOC entry 23090; 0 1912379424 > MATERIALIZED VIEW DATA matview_vrs_request_sla postgres > pg_restore: [archiver (db)] could not execute query: ERROR: relation > "all_days" does not exist > LINE 3: from all_days > ^ > QUERY: > select count(*)::numeric > from all_days > where (("date" between $2::date and $1::date) or ("date" between $1::date and > $2::date)) > and dow not in (0,6) > > CONTEXT: SQL function "bdays" during inlining > Command was: REFRESH MATERIALIZED VIEW public.matview_vrs_request_sla; Is all_days a table? Or is it perhaps another view, possibly materialized even? > The relation is there, in fact if I go there when I get in to the office, the > same command works. This sounds to me like you may be using a different version of pg_restore in the office. Are both versions 10.4 or newer? It can't hurt to check that you used version 10.4 of pg_dump as well. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Select rows when all all ids of its children records matches
> On 12 Sep 2018, at 17:44, Arup Rakshit wrote: > > Can you show me the SQL construction? Do I need to use `WITH`? An option is to create a bit-wise OR and SUM the grouped results. If the result of these 3 bits is 7, than the post matches all three tags. select p.id, p.name from post p join post_tag pt on (pt.post = p.id) join tag t on (t.id = pt.tag) where t.id in (1, 2, 3) group by case t.id when 1 then 1 when 2 then 2 when 3 then 4 else 0 end having sum(case t.id when 1 then 1 when 2 then 2 when 3 then 4 else 0 end) = 7; I used ints here for the bitwise OR, a bitstring would probably be neater. Another approach is to aggregate the set of matching tags into an array using array_agg(). I think that's what David means. You could then check the length of the array to see if you have all 3 (or 4 or 5 or 9000). >> On 12-Sep-2018, at 9:13 PM, David G. Johnston >> wrote: >> >> On Wednesday, September 12, 2018, Arup Rakshit wrote: >> IN is OR, I want the AND logic. Select posts which has tag 1, 2 and 3 ( tag >> ids ) >> >> Build arrays and then use the “contains” operator. >> >> David J. > Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Regrading brin_index on required column of the table
The results of explain analyze would shed light on the core problem. My guess is that your conditions are not very selective - ie. most records in both tables have bFetch = false - and therefore you are retrieving most of your data and that is what's taking 7 minutes. No index is going to fix that. If those boolean values are distributed very unevenly (say 99.9% has false and 0.1% has true), you may get better results by excluding the records with 'true' values (instead of including those that are 'false'), for example by using a where not exists(...) subquery. Obviously, that still won't help if you're just fetching a lot of data. On Wed, 19 Sep 2018 at 16:23, Durgamahesh Manne wrote: > > > > > > > On Wed, Sep 19, 2018 at 7:41 PM Igor Neyman wrote: >> >> >> >> From: Durgamahesh Manne [mailto:maheshpostgr...@gmail.com] >> Sent: Wednesday, September 19, 2018 10:04 AM >> To: Igor Neyman >> Subject: Re: Regrading brin_index on required column of the table >> >> On Wed, Sep 19, 2018 at 7:19 PM Igor Neyman wrote: >> >> >> >> From: Durgamahesh Manne [mailto:maheshpostgr...@gmail.com] >> Sent: Wednesday, September 19, 2018 9:43 AM >> To: PostgreSQL mailing lists >> Subject: Regrading brin_index on required column of the table >> >> Hi >> >> Respected postgres community members >> >> >> >> I have created BRIN index on few columns of the table without any issues. >> But i am unable to create BRIN index on one column of the table as i got >> error listed below >> >> >> >> >> >> [local]:6263 postgres@huawei=# CREATE INDEX brin_idx on huawei using brin >> ("dFetch"); >> >> ERROR: data type boolean has no default operator class for access method >> "brin" >> >> HINT: You must specify an operator class for the index or define a default >> operator class for the data type. >> >> >> >> below is the column description: >> >> Column datatype collationnullable defaultstorage >> >> >> >> dFetchboolean false >> plain >> >> >> >> >> >> >> >> so please help in creating of the BRIN index on above column of the table . >> >> >> >> >> >> >> >> Regards >> >> >> >> Durgamahesh Manne >> >> >> >> Why would you want BRIN index on Boolean-type column? >> >> What kind of interval will you specify? >> >> >> >> Regards, >> >> Igor Neyman >> >> >> >> >> >> >> >> >> >> Hi >> >> >> >> >> >> I have complex query like for ex select distinct >> sub_head."vchSubmittersCode" ,rec."vchFileName" , >> rec."vchCusipFundIDSubFundID" , rec."vchFundUnitPrice" , >> sub_head."vchValuationDate" ,rec."vchAdvisorLabel" from "table1" rec join >> "table2" sub_head on rec."vchSubmittersCode"=sub_head."vchSubmittersCode" >> where rec."bFetch"=false and sub_head."bFetch"=false ; >> >> >> >> >> >> Query taken around 7 minutes time to execute without indexes on required >> columns >> >> >> >> SO i need to execute this distinct query at less time by creating indexes >> on required columns of the tables >> >> >> >> i have created brin indexes on vchsubmitterscode of two tables >> >> >> >> i am not able to create brin indexes on bfetch tables as i got a error >> ERROR: data type boolean has no default operator class for access method >> "brin" >> >> HINT: You must specify an operator class for the index or define a default >> operator class for the data type. >> >> >> >> >> >> so please help in creating of the BRIN index on above column of the table as >> i need to reduce the query execution time >> >> >> >> >> >> Regards >> >> >> >> Durgamahesh Manne >> >> Again, BRIN indexes are not design to work on Boolean columns. If you want >> to index Boolean column, just create regular BTREE index. >> >> Regards, >> >> Igor >> >> >> >> > > > Hi > > I have already tried with BTREE indexes & HASH indexes on required columns > .but distinct query execution time was not reduced > > > Query taken around 7 minutes time to execute with BTREE indexes & HASH > indexes on required columns > > > Regards > > Durgamahesh Manne > -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Regrading brin_index on required column of the table
On Thu, 20 Sep 2018 at 11:42, Durgamahesh Manne wrote: ... > | -> Sort (cost=5871873.64..5973124.65 rows=40500405 width=89) (actual > time=326397.550..372470.846 rows=4050 loops=1) > | > | Sort Key: sub_head."vchSubmittersCode", rec."vchFileName", > rec."vchCusipFundIDSubFundID", rec."vchFundUnitPrice", > sub_head."vchValuationDate", rec."vchAdvisorLabel" | > | Sort Method: external merge Disk: 3923224kB > >| The above is a clear sign of a problem. To get distinct records, the results need to be sorted, and that doesn't fit in the available memory and spills to disk. The actual filters on the boolean fields, even though they are performed in seq-scans, hardly take any time at all. The hash join to combine them takes a bit over 6s. > so i am unable to reduce the query execution time as it is taken around 7 > minutes to execute with indexes & without indexes > > please help in reducing the query execution time ... > On Wed, Sep 19, 2018 at 9:07 PM Alban Hertroys wrote: >> >> The results of explain analyze would shed light on the core problem. >> >> My guess is that your conditions are not very selective - ie. most >> records in both tables have bFetch = false - and therefore you are >> retrieving most of your data and that is what's taking 7 minutes. No >> index is going to fix that. >> >> If those boolean values are distributed very unevenly (say 99.9% has >> false and 0.1% has true), you may get better results by excluding the >> records with 'true' values (instead of including those that are >> 'false'), for example by using a where not exists(...) subquery. >> >> Obviously, that still won't help if you're just fetching a lot of data. >> On Wed, 19 Sep 2018 at 16:23, Durgamahesh Manne >> wrote: >> > >> > >> > >> > >> > >> > >> > On Wed, Sep 19, 2018 at 7:41 PM Igor Neyman wrote: >> >> >> >> >> >> >> >> From: Durgamahesh Manne [mailto:maheshpostgr...@gmail.com] >> >> Sent: Wednesday, September 19, 2018 10:04 AM >> >> To: Igor Neyman >> >> Subject: Re: Regrading brin_index on required column of the table >> >> >> >> On Wed, Sep 19, 2018 at 7:19 PM Igor Neyman >> >> wrote: >> >> >> >> >> >> >> >> From: Durgamahesh Manne [mailto:maheshpostgr...@gmail.com] >> >> Sent: Wednesday, September 19, 2018 9:43 AM >> >> To: PostgreSQL mailing lists >> >> Subject: Regrading brin_index on required column of the table >> >> >> >> Hi >> >> >> >> Respected postgres community members >> >> >> >> >> >> >> >> I have created BRIN index on few columns of the table without any issues. >> >> But i am unable to create BRIN index on one column of the table as i got >> >> error listed below >> >> >> >> >> >> >> >> >> >> >> >> [local]:6263 postgres@huawei=# CREATE INDEX brin_idx on huawei using brin >> >> ("dFetch"); >> >> >> >> ERROR: data type boolean has no default operator class for access method >> >> "brin" >> >> >> >> HINT: You must specify an operator class for the index or define a >> >> default operator class for the data type. >> >> >> >> >> >> >> >> below is the column description: >> >> >> >> Column datatype collationnullable defaultstorage >> >> >> >> >> >> >> >> dFetchboolean false >> >> plain >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> so please help in creating of the BRIN index on above column of the table >> >> . >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Regards >> >> >> >> >> >> >> >> Durgamahesh Manne >> >> >> >> >> >> >> >> Why would you want BRIN index on Boolean-type column?
Re: *Regarding brin_index on required column of the table
Your plan is not readable to me (perhaps because of gmail). Does https://explain.depesz.com/ give you any useful insights? On Fri, 21 Sep 2018 at 16:15, Durgamahesh Manne wrote: > > > On Fri, Sep 21, 2018 at 7:38 PM Durgamahesh Manne > wrote: >> >> Hi >> >> Complex query taken around 30 minutes to execute even i have increased >> work_mem value to 4GB temporarily as total ram is 16gb >> >> Explain analyze query taken around 30 minutes to execute even i have created >> partial indexes with where condition on required columns >> >> >> >> Below is the query plan for explain analyze query : >> >> | HashAggregate (cost=16034967.98..16035010.75 rows=4277 width=1710) >> (actual time=1806653.536..1806680.802 rows=26098 loops=1) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> | >> | Group Key: (max((v."vchSubmittersCode")::text)), >> (max((v."vchRecordType")::text)), (max((v."vchSequenceNumber")::text)), >> v."vchContractNumber", v."vchContractPartyRoleQualifier", v."vchPartyRole", >> (max((v."vchPartyNatural_Non_NaturalEntity")::text)), >> (max((v."vchPartyLastName")::text)), (max((v."vchPartyFirstName")::text)), >> (max((v."vchPartyMiddleName")::text)), (max((v."vchPartyPrefix")::text)), >> (max((v."vchPartySuffix")::text)), (NULL::text), v."vchPartyID", >> (max((v."vchPartyIDQualifier")::text)), >> (max((v."vchTrustRevocabilityIndicator")::text)), (NULL::text), >> (NULL::text), (max((v."vchFiller1")::text)), >> (max((v."vchRejectCode")::text)), >> (max((v."vchContractEntityAddressLine1")::text)), >> (max((v."vchContractEntityAddressLine2")::text)), >> (max((v."vchContractEntityCity")::text)), >> (max((v."vchContractEntityState")::text)), >> (max((v."vchContractEntityZip")::text)), >> (max((v."vchContractEntityAddressLine3")::text)), >> (max((v."vchContractEntityAddressLine4")::text)), >> (max((v."vchContractEntityAddressLine5")::text)), >> (max((v."vchPartyDateofBirth")::text)), >> (max((v."vchPartyAddressLine1")::text)), >> (max((v."vchContractStatus")::text)), (string_agg(DISTINCT >> btrim((s."vchAgentTaxID")::text), ','::text)), v."vchPartyRole", >> (max((v."vchAdvisorLabel")::text)), v."vchFileName", >> (max((v.vchpartycity)::text)), (max((v.vchpartystate)::text)), >> (max((v.vchpartypostalcode)::text)), (string_agg(DISTINCT >> btrim((s."vchAgentFirstName")::text) || ' '::text) || >> btrim((s."vchAgentMiddleName")::text)) || ' '::text) || >> btrim((s."vchAgentLastName")::text)), ','::text)) | >> | -> Append (cost=48944.67..16034550.97 rows=4277 width=1710) (actual >> time=3324.233..1806605.691 rows=26098 loops=1) >> >> >> >> >> >> >> >> >> >> >> >>
Re: *Regarding brin_index on required column of the table
> On 21 Sep 2018, at 17:49, Durgamahesh Manne wrote: > > Considering how hard you try to get rid of duplicates, I'm quite convinced that you're at least short a few join conditions. Getting rid of duplicates early has the added benefit of having to aggregate fewer rows, which should drastically improve the performance of this query. In addition, your approach of using both distinct and group by raises a red flag that you're fighting symptoms (most likely: duplicate results) without understanding their causes. I can't think of a single situation where both are justified in the same (sub-)query. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: using the nextval('sequence_name') in sql, the result maybe is not right
On Wed, 26 Sep 2018 at 14:08, Wanglin wrote: > > Hi, all: > PostgreSQL version : 10.3. I use "nextval" in the sql , but I think the > result is not right, maybe it is a bug. >The test case as bellow: >create sequence seq1; > select nextval('seq1'); > create table tx1(id1 int, id2 int); > insert into tx1 select generate_series(1,100), random()*102; > explain verbose select * from tx1 where id2 = nextval('seq1');; > select * from tx1 where id2 = nextval('seq1'); > postgres=# explain verbose select * from tx1 where id2 = nextval('seq1');; > QUERY PLAN Seq > Scan on public.tx1 (cost=0.00..43.90 rows=11 width=8) Output: id1, id2 > Filter: (tx1.id2 = nextval('seq1'::regclass)) (3 rows) > > postgres=# select * from tx1 where id2 = nextval('seq1'); -- here, may be the > result is not right id1 | id2 -+- 56 | 57 (1 row) > > :: I think "nextval('seq1')" equal 2, so " select * from tx1 where id2 = > nextval('seq1')" equals " select * from tx1 where id2 = 2", is it ? No. nextval("seq1") increments the sequence and returns the new value. It never[*] returns the same value in subsequent calls, that is the purpose of the function (and sequences in general). Normally, you would assign a sequence to a surrogate key field in your table, so that you automatically get unique values in that field (unless you mess around). That's not how you're using it, so I wonder what your purpose is for the sequence? [*] Never is never without exceptions, just like always always has. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Want to acquire lock on tables where primary of one table is foreign key on othere
On Thu, 11 Oct 2018 at 16:38, Abhishek Tripathi wrote: > Actually I have acquired a "Select for Update" on a table whose id is > refrence as a foreign key on another table So I want those table won't update > until there is lock. Is it possible? Becuase postgres is acquiring lock but > AccessShare Lock which allow to write on those table How I restrict this. For what purpose do you want that? What is inadequate about the lock that Postgres acquires? Table locks are very rarely what you want, as it blocks all concurrent access to the entire table, while that is only necessary for a few rarely used corner cases; a foreign key update is not among those. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Want to acquire lock on tables where primary of one table is foreign key on othere
You forgot to CC the list. Also, top-posting is generally not appreciated here. > On Thu, Oct 11, 2018 at 8:26 PM Alban Hertroys wrote: > On Thu, 11 Oct 2018 at 16:38, Abhishek Tripathi > wrote: > > Actually I have acquired a "Select for Update" on a table whose id is > > refrence as a foreign key on another table So I want those table won't > > update until there is lock. Is it possible? Becuase postgres is acquiring > > lock but AccessShare Lock which allow to write on those table How I > > restrict this. > > For what purpose do you want that? What is inadequate about the lock > that Postgres acquires? > > Table locks are very rarely what you want, as it blocks all concurrent > access to the entire table, while that is only necessary for a few > rarely used corner cases; a foreign key update is not among those. > On Fri, Oct 12, 2018 at 10:33 AM Abhishek Tripathi > wrote: > Thank you for your response. > > I am explaining my situation there is table A on which I have taken a row > lock and the primary key of table A is related with table B as a foreign key > so automatically I can not insert new row with that foreign key now the > primary key of table B is used as foreign key in table C and insertion can be > done on table C. I just want to lock table C also No insertion can be done on > table C related to table B primary key. > > On 12 Oct 2018, at 8:08, Abhishek Tripathi wrote: > > My bad sorry actually there is updation- > > there is table A on which I have taken a row lock and the primary key of > table A is related with table B as a foreign key so automatically I can not > insert new row with that foreign key on table B that is fine now table C > any insertion can be done on table C. I just want to lock table C also No > insertion can be done on table C becuse table C primary key is related to > table B as a foreign key of B. So your tables (simplified) are something like this?: create table A (id primary key) create table B (id primary key, a_id references A (id)) create table C (id primary key, b_id references B (id)) And you insert a new value for b_id in C? That's not possible, unless that value exists in table B, which in turn can't exist unless it's a_id exists in table A. That's how foreign key constraints are designed, you don't need to do manual locking for that to work. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Optimizing Postgresql ILIKE while query
> On 22 Oct 2018, at 7:56, aman gupta wrote: > > Issue: > > We have the base table which contains 22M records and we created a view on > top of it while querying the view with ILIKE clause it took 44 seconds and > with LIKE Clause 20 Seconds > > Query: > > fm_db_custom_db=# EXPLAIN (ANALYZE, TIMING OFF) > select > destination,hostname,inputfilename,inputtime,logicalservername,outputfilename,outputtime,processinglink,source,totalinputbytes,totalinputcdrs,totaloutputbytes,totaloutputcdrs > from mmsuper.test_20m_view where inputfilename ilike > '%SDPOUTPUTCDR_4001_BLSDP09_ADM_4997_18-10-15-02549.ASN%'; Perhaps, when you have a question about timing, you shouldn't turn off the timing in the query plan? Now we can't see where the time is spent. > That's all sequential scans that each remove a significant amount of rows. That probably costs a significant amount of time to do. It looks like you don't have any indices on the underlying table(s) at all. I'd start there and then look at the ILIKE problem again. By that time, Pavel's suggestion for a trigram index on that text field is probably spot-on. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Query plan: SELECT vs INSERT from same select
> On 23 Jul 2019, at 22:29, Alexander Voytsekhovskyy > wrote: > > I have quite complicated query: > > SELECT axis_x1, axis_y1, SUM(delivery_price) as v_1 FROM ( > SELECT to_char(delivery_data.delivery_date, '-MM') as axis_x1, > clients.id_client as axis_y1, delivery_data.amount * production_price.price * > groups.discount as delivery_price > > FROM delivery_data > JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt) > JOIN clients ON (client_tt.id_client = clients.id_client) > JOIN production ON (production.id = delivery_data.id_product) > JOIN groups ON (groups.id = delivery_data.delivery_group_id AND > client_tt.id_group = groups.id AND groups.id = clients.id_group) Are client_tt.id_group and clients.id_group ever different from each other? It looks like you might have redundant information there, but... If they are guaranteed to be the same then you don’t need the JOIN to clients, which would both remove a JOIN and reduce the complexity of the JOIN condition on groups. Or (assuming the group id’s are indeed supposed to be equal), you could JOIN clients ON (client_tt.id_client = clients.id_client AND client_tt.id_group = clients.id_group) instead of putting that condition within the JOIN condition on groups. I don’t think either option will make a huge difference (the first probably more than the second, as it reduces an entire join), but it could be enough to help the database figure out a better plan. > LEFT JOIN production_price on (delivery_data.id_product = > production_price.id_production AND groups.price_list_id = > production_price.price_list_id AND delivery_data.delivery_date BETWEEN > production_price.date_from AND production_price.date_to) > > WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30' > AND delivery_data.delivery_group_id IN (...short list of values...) > AND delivery_data.id_product IN ()) AS tmpsource You don’t have a price if your goods weren’t produced in the delivery window you set? Or do you have goods that get delivered without having a price? You seem to be using this query for a report on nett sales by month, but I have my doubts whether that LEFT JOIN, and especially the condition on the production date window, is really what you want: Your formula for delivery_price includes the price column from that LEFT JOIN, so you’re going to get 0 values when there is no production_price record in your delivery-window, resulting in a SUM that’s too low if the product was produced before (or after, but that seems unlikely) the delivery window. > WHERE TRUE This line is unnecessary. > GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ()) Apparently (I’m new to these statements), CUBE (axis_x1, axis_y1) is a shorthand for the above. They seem to have been introduced at the same time (in 9.6?). See: https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-GROUPING-SETS > It runs well, took 1s and returns 4000 rows. I won’t go into the performance issue ash this point, other more knowledgeable people already did. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Request for resolution || Support
> On 24 Jul 2019, at 10:08, jay chauhan wrote: > > Hi Thomas, David/Team, > > Thanks you for your response. However we need your confirmation whether my > Error/issue as mentioned below will be resolved if we upgrade our PostgreSQL > Version. It won’t, you are talking about Oracle-specific features. You need to change the code. PostgreSQL is not Oracle, some features are quite RDBMS-specific. > < compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit">> > David response: Use a newer version > Tomas response: Yeah, you should use release 11 for a new project. > > My Issue while migrating procedure/function from Oracle to PostgreSQL: > Error-1) > ERROR: cannot begin/end transactions in PL/pgSQL > HINT: Use a BEGIN block with an EXCEPTION clause instead. > CONTEXT: PL/pgSQL function > icmsuatnew.eload_commission_payout_active(text,text,text,text,text,text) line > 486 at SQL statement > SQL state: 0A000 > David Response on it : Rewrite your code as instructed How to handle these depends on your use of sub-transactions, but the HINT gives a pretty good general approach. > Error-2) > ERROR: schema "utl_http" does not exist > LINE 38: L_HTTP_REQUEST UTL_HTTP.REQ; > ^ > SQL state: 3F000 > Character: 1785 > Thomas response: That's an Oracle thing for doing HTTP requests from PL/SQL. > To do that from plpgsql you could try an extension like this one: > https://github.com/pramsey/pgsql-http > Or you could write your own function in Python or favourite PL>. That's what I'd probably do. > https://www.postgresql.org/docs/11/plpython-funcs.html Initiating TCP/IP from the database means that a database process needs to wait for a response. In the meantime, it cannot do anything else. You’re effectively blocking it for other transactions and keeping that particular transaction ‘waiting in transaction’ until, in the worst case, a time-out. That means that no maintenance can be done on records touched by this transaction, which can lead to bloat. This is generally considered a bad idea, at least in this community. You’re usually better off handing the connection over to an external process that reports back to the database when appropriate. The exception to that is if your transaction cannot be allowed to commit without a response from the other peer. In such cases it is appropriate to use plpython, plperl, etc All that said, I am talking about PostgreSQL here. If you’re instead using EnterpriseDB, which does have an Oracle compatibility layer that could perhaps support these features (I don’t know), you should talk to the EnterpriseDB guys. Alban Hertroys -- There is always an exception to always.
Re: adding more space to the existing server
> On 2 Aug 2019, at 21:45, Julie Nishimura wrote: > 1) We use streaming replication, and due to hardware limitation, we cannot > add more drives to the existing host. That is why we thought by breaking the > existing streaming replication (from a->b), instead of currently identical > standby (b), we can introduce twice larger host, then start the replication > to the newly larger host, and when it is caught up, break it again. Then > break rep again, make modification to 'a" host, making it larger, then > replicate b->a. After it is caught up, break the rep again, switch > master->standby (if necessary). Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I managed to mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I was attempting to replace for, like, 15 minutes). > 2) I am not sure about the time, but it is understood it is required 2 full > replication cycles, and might be up to 2 weeks with no standby situation No standby situation? Murphy is probably just waiting for that to strike… I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytes from her main server (because of some failing disks in her RAID set) using her backup server to move data around (with backups removed to make room), when, due to an accident outside the building, an aerial 10KV power line hit another power line in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant that the file systems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’s were at the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (And then it turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is what her actual inquiry was about) I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental perhaps? From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested. > 4) by pg_basebackup and restore > > As of now, we are thinking about possibly other solutions, as of splitting > existing 37 databases on the cluster into 2 hosts with their own standbys. > This solution requires breaking up existing replication as well. Can you > please point me to some document which lists all steps describing breaking up > the existing replication properly? we are using 9.6 postgres I’m going to assume that you will have data coming in while this split is taking place and that you therefore cannot offline the entire set of databases for as long as this takes. If not, that would probably allow for a simpler (faster) scenario. I think the easiest for this scenario would be to add two more machines (c and d) and replicate them off the current setup. You want that to happen as parallel as possible, so perhaps replicate c off a and d off b. If you aren’t already using “replication slots”, I found that to make things both easier to understand and more reliable. You can query their status, for one thing. Those replicas will take extra time of course (about double) because you’re replicating twice what you need, but I don’t think you can replicate parts of a cluster with your setup unless you go for a different replication approach (I think per database replication requires statement level replication?). After that, decouple both sets into: a —> b (your current machine) c —> d (the new ones) (Although any order should be fine, really, as long as they have caught up.) At that point I would probably (temporarily) pause replication in at least one set and create a backup of that. This is the point to start removing superfluous databases from a and c (so that a+c make up the complete set again). After verifying that no databases are missing, unpause replication. If instead you find that you accidentally removed a database from both a and c, you still have replicas to recover it from. And the backups, of course, but that will not contain the data that came in after replication was paused. I do hope the remaining 3% disk space is enough to cover all that, though... Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: adding more space to the existing server
On 5 Aug 2019, at 0:39, Julie Nishimura wrote: Alban, thank you for your reply. Your suggestion makes sense, and I will be talking to our engineers about it. Currently we need to understand: a) How do we break A -> B replication such that both can become independent primaries That is pretty much like normal failover from A to B, except that you don’t reverse replication. You will need to stop your clients from sending data for a bit (if it’s continuous data, having a buffer in between is a big help - at our company we’re looking into Apache Kafka for that), so that you can switch half of them to connect to B instead of A. Next, you promote B to master. I used the docs for that last time, and they were pretty clear on the subject. b) How do we reassign C from B->C replication to A->C replication I don’t think you need to. If you indeed already have A->B->C, after promoting B to master, you end up with B->C, which is alright. You just need to add A->D for the other set. c) Thoughts on why this isn’t a good plan That depends on your clients and how you decide which database in the current cluster they connect to. If you connect specific clients to specific databases, then all you need to do is to configure half your clients to connect to B instead. Another option is to put a virtual database layer in front, such that both clusters still look like a single database to the outside world. We have some experience with Dremio for similar purposes (although for read-only reporting). Mind that the community edition doesn’t do authorization. Current: A replicates to B all requests go to A Soon: A replicates to B -> cascading to C and D Transition: break A replication to B such that both can become primary Correct. stop B replication to C then setup A to replicate to C I would change this in: setup A to replicate to D End state: A replicates to C B replicates to D End state: A replicates to D B replicates to C we remove some of the dbs from A and B, then reassign the traffic based on db selections I hope it all makes sense... Thank you It does to me. Now would be a good time for people to chime in if they don't agree ;) From: Alban Hertroys Sent: Saturday, August 3, 2019 3:15 AM To: Julie Nishimura Cc: Adrian Klaver ; pgsql-general@lists.postgresql.org ; pgsql-general Subject: Re: adding more space to the existing server > On 2 Aug 2019, at 21:45, Julie Nishimura wrote: > 1) We use streaming replication, and due to hardware limitation, we cannot add more drives to the existing host. That is why we thought by breaking the existing streaming replication (from a->b), instead of currently identical standby (b), we can introduce twice larger host, then start the replication to the newly larger host, and when it is caught up, break it again. Then break rep again, make modification to 'a" host, making it larger, then replicate b->a. After it is caught up, break the rep again, switch master->standby (if necessary). Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I managed to mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I was attempting to replace for, like, 15 minutes). > 2) I am not sure about the time, but it is understood it is required 2 full replication cycles, and might be up to 2 weeks with no standby situation No standby situation? Murphy is probably just waiting for that to strike… I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytes from her main server (because of some failing disks in her RAID set) using her backup server to move data around (with backups removed to make room), when, due to an accident outside the building, an aerial 10KV power line hit another power line in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant that the file systems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’s were at the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (And then it turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is what her actual inquiry was about) I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental perhaps? >From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested. > 4) by pg_basebackup and restore > > As of now, we are thinking about possibly other solutions, as of splitting existing 37 databases on the cluster into 2 hosts with their own standbys. This solution requires breaking up existing replication as well. Can you please point me to some document which lists a
Re: adding more space to the existing server
> On 5 Aug 2019, at 17:27, Julie Nishimura wrote: > > Thanks for your reply Alban. Currently we only have A->B replication. Is > adding B->C replication difficult? I remember in the past I tried to seed > pg_basebackup from hot standby, and it was erroring out after awhile, so > needed to switch to run from master. I’ve never used multi-level replication (or cascading replication, as it’s called in the PG docs). I expect that replication slots (w/ streaming replication) would be resilient to streaming to multiple levels, provided that you have the disk space left to keep the logs that your farthest-behind replica is at, but I do not know that. I’m pretty sure that’s being done though and not just theoretically possible. The basic problem you have is a bit like Towers of Hanoi. The details of setting up each replica may be a little complicated, but if you look at it from a high-level design phase, you’re just shuffling around clusters. The details come later, when you’re actually designing how to apply those replicas/clusters. One of the main problems is that creating a new replica takes a lot of time, you want to minimise the total time that takes. Apparently, you have neither C nor D ready yet, so you need to create two replicas - being able to do those in parallel instead of sequentially would save you time. The other problem is that, to be on the safe side, you want to have a full replica of A at any point in time. If you disconnect B from A before either C or D is complete, you run a risk: If A fails, you don’t have the data that accumulated while B was ‘offline’. So that’s not the best scenario. That is why I think your initial set of replicas should look like: A — B — C \ D IIRC, streaming replication uses pg_basebackup to create the initial replica and then it streams what’s needed to catch up. With replication slots, the master knows what the slaves still need, so it won’t clean up too early. Apparently, the slave (B) knows that it needs to retain data for C as well. It looks perfectly safe on paper, except for the replication lag between A — B. You can, according to the docs, cascade replicas from each other and that even allows replication (among the slaves) to go on after the master gets disconnected - quite what you need for the B — C chain, I would think. Take a look at: https://www.postgresql.org/docs/11/warm-standby.html#STREAMING-REPLICATION And note the sections on Replication Slots and on Cascading Replication. The replication lag between A — B can be solved by pausing the clients connecting to A (half of them need to be changed to B anyway) while B is catching up on its lag. You probably have at least that much down-time to change the connections anyway. Regards, Alban. > From: Alban Hertroys > Sent: Monday, August 5, 2019 5:01 AM > To: Julie Nishimura > Cc: Adrian Klaver ; > pgsql-general@lists.postgresql.org ; > pgsql-general > Subject: Re: adding more space to the existing server > > >> On 5 Aug 2019, at 0:39, Julie Nishimura wrote: >> >> Alban, thank you for your reply. Your suggestion makes sense, and I will be >> talking to our engineers about it. Currently we need to understand: >> >> a) How do we break A -> B replication such that both can become >> independent primaries > > That is pretty much like normal failover from A to B, except that you don’t > reverse replication. > > You will need to stop your clients from sending data for a bit (if it’s > continuous data, having a buffer in between is a big help - at our company > we’re looking into Apache Kafka for that), so that you can switch half of > them to connect to B instead of A. > > Next, you promote B to master. I used the docs for that last time, and they > were pretty clear on the subject. > > >> b) How do we reassign C from B->C replication to A->C replication > > I don’t think you need to. If you indeed already have A->B->C, after > promoting B to master, you end up with B->C, which is alright. > You just need to add A->D for the other set. > >> c) Thoughts on why this isn’t a good plan > > That depends on your clients and how you decide which database in the current > cluster they connect to. If you connect specific clients to specific > databases, then all you need to do is to configure half your clients to > connect to B instead. > > Another option is to put a virtual database layer in front, such that both > clusters still look like a single database to the outside world. We have some > experience with Dremio for similar purposes (although for read-only > reporting). Mind that the community edition doesn’t do authorization. > >> Current: >> A replicates to B >> all requests go to A >>
Re: A GROUP BY question
> On 13 Aug 2019, at 13:10, stan wrote: > > select > project.proj_no , Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningful way. > SUM (rate.rate * task_instance.hours) > from > task_instance > join rate on > rate.employee_key = task_instance.employee_key > AND > rate.work_type_key = task_instance.work_type_key (break) > inner join employee on > rate.employee_key = employee.employee_key > inner join work_type on > rate.work_type_key = work_type.work_type_key These are now probably redundant, you don’t need them unless they filter your results. > inner join project on > project.project_key = task_instance.project_key And this JOIN could be dropped if project_key and proj_no weren’t different fields. If both are unique in project, you could drop one of them and keep the same functionality with fewer joins. That said, in the “war” between surrogate and natural keys I’m on the natural keys side. Clearly, not everyone agrees on that. > GROUP BY > project.project_key , Same columns removed here too. > ORDER BY > project.proj_no > ; That should give you the total cost for each project. You could get the same result repeated per employee and per work type as you tried originally, by putting the above revised query as a subquery and joining that back into the full query in the place of your project-related tables (add the project_key so you have something to join against). The repeated sum risks getting multiplied in the final output though, especially if unaware people will be putting the results in an Excel sheet or something. From experience, that either results in people reporting the wrong financial results (several orders too high) or blaming your query. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: A GROUP BY question
> On 13 Aug 2019, at 15:19, David G. Johnston > wrote: > > On Tuesday, August 13, 2019, Alban Hertroys wrote: > > > On 13 Aug 2019, at 13:10, stan wrote: > > > > select > > project.proj_no , > > Removed columns that get in the way of your desired result. You can’t have > both details and the sum over them in a meaningful way. > > Sure you can, at least generally, with Window Functions/Expressions (i.e., > OVER) That’s why I added “in a meaningful way” ;) Repeating the same SUM-result on every line in a group is not what I’d call a meaningful result; the SUM has no bearing on the detailed line and leads to the kind of mistakes I already mentioned. (For the record; I do this kind of grouping in a hierarchical database regularly, but there the grouped SUM is at a different level in the hierarchy and I consider it thus sufficiently separated from the detail rows.) Besides, I figured the OP was already struggling with the query syntax, adding window functions into the mix didn’t seem a good idea in the context. Possible?, sure, desirable?, I would say not. Alban Hertroys -- There is always an exception to always.
Re: Question about password character in ECPG's connection string
> 2) "tcp:postgresql://localhost?user=myuser&password=password" looks like > > "tcp:postgresql://localhost?user=myuser&password=my&pwd" > > and password is parsed on the & and you also end up with an extra parameter > pwd Perhaps it helps to URL-encode the & in the password as %26? Alban Hertroys -- There is always an exception to always.
Re: Arrays and ANY problem
> On 25 Sep 2019, at 22:25, David Salisbury wrote: > db=# select name from table_name_ds_tmp where ARRAY[categoryid] = ANY ( > select string_to_array( '200,400', ',')::bigint[] ); > name > -- > (0 rows) You are comparing two arrays for equality. Since the left-hand array has only 1 item and the right-hand one has two, there’s not much equality between them. You probably meant: select name from table_name_ds_tmp where categoryid = ANY ( select string_to_array( '200,400', ',')::bigint[] ); Alban Hertroys -- There is always an exception to always.
Re: Arrays and ANY problem
> On 25 Sep 2019, at 22:50, Alban Hertroys wrote: > > >> On 25 Sep 2019, at 22:25, David Salisbury wrote: >> db=# select name from table_name_ds_tmp where ARRAY[categoryid] = ANY ( >> select string_to_array( '200,400', ',')::bigint[] ); >> name >> -- >> (0 rows) > > You are comparing two arrays for equality. Since the left-hand array has only > 1 item and the right-hand one has two, there’s not much equality between them. > > You probably meant: > select name from table_name_ds_tmp where categoryid = ANY ( select > string_to_array( '200,400', ',')::bigint[] ); Or rather: select name from table_name_ds_tmp where categoryid = ANY ( string_to_array( '200,400', ',')::bigint[] ); Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: The connection to the server was lost. Attempting reset: Failed.
> On 10 Oct 2019, at 17:55, Yessica Brinkmann > wrote: > > I really thought a lot, but I don't understand why but the function fails > after the expression is executed: > appendStringInfo (& cols, "% s a.attnum =% d", (i> 0? "OR": ""), idxcd-> > varattno [i]); > The error appears only to me when entering the cycle: > foreach (cell, candidates) / * foreach cell in candidates * / > more than once, that is, when you have more than one candidate index. If the > cycle is entered only once, the function works correctly. > The error that appears to me is that the connection to the PostgreSQL server > is directly lost. I proved that the error occurs in that statement, printing > some values. There is probably an error in the Postgres log-file providing you more info. That said, at least the below bit in your code is dangerous: foreach( cell, candidates ) /* foreach cell in candidates */ { idxcd = (IndexCandidate*)lfirst( cell ); if( !idxcd->idxused ) continue; if (idxcd!=NULL) { You should at least check for NULL before referencing an attribute of that structure. Personally, I would invert the test like so (and then move it before the idxused test: if (idxcd == NULL) { elog( INFO, "idxcd IS NULL" ); continue; /* Or is that fatal enough to break instead? */ ) if (!idxcd->idxused) continue; Alban Hertroys -- There is always an exception to always.
Re: Finding out about the dates of table modification
> On 23 Nov 2019, at 3:24, Martin Mueller > wrote: > > I've moved happily from MySQL to Postgres but miss one really good feature of > MYSQL: the table of tables that let you use SQL queries to find out metadata > about your table. Thus looking at the table of tables and sorting it by last > change, lets you quickly look at the most recently modified table. Which is > really useful if you have a bad memory, as I do, and can't remember the name > of a particular table that I worked on last Wednesday. > > Are those capabilities hidden somewhere in Postgres? There isn't an obvious > section in the documentation. At least I can't find it. AFAIK, there’s nothing like that built-in, but it’s not impossible to deduce. You could start with getting a list of files in $PG_DATA/base that were modified in that period (provided you have sufficient privileges on that directory): find base/ -mtime -2 -type f -print For figuring out to what tables these files belong [1]: pg_filenode_relation(0, ); and: pg_relation_filepath(); For example, I did: # create table creation_date(test text); [/home/postgres/10/data]$ find base/ -mtime -2 -type f -print base/16403/2608 base/16403/29784 base/16403/2659 base/16403/29789 base/16403/2678 base/16403/29787 base/16403/2662 base/16403/2703 base/16403/2679 base/16403/2673 base/16403/2658 base/16403/1249 base/16403/2610 base/16403/2704 base/16403/2674 base/16403/3455 base/16403/2663 base/16403/1247 base/16403/1259 The lower numbers are probably core tables, such as pg_depend: # SELECT pg_filenode_relation(0, 2608); -- 0 being the default table-space pg_filenode_relation -- pg_depend (1 row) But!: # SELECT pg_filenode_relation(0, 29784); pg_filenode_relation -- creation_date (1 row) And indeed: # select pg_relation_filepath('creation_date'); pg_relation_filepath -- base/16403/29784 (1 row) I was looking for the inverse function pg_filepath_relation(), but that does not appear to exist; That would have been useful in combination with file listings like those from `find`. Mind that larger tables consist of multiple files. I’m sure this would become a head-ache quick on a larger database. Having an actual creation-date of a file would be nice too, but that doesn’t necessarily mean much when growing tables create extra files too. Apparently, someone already turned the process into a number of queries[2]. As they mention though, it’s not 100% reliable though, as there are operations that recreate table files, such as CLUSTER. Then again, if you’re just looking for the table you created last Wednesday, that’s probably not a major concern. Another option is to add a DDL Event trigger on create table statements and log that to some table[3]. Regards, Alban Hertroys [1]: https://www.2ndquadrant.com/en/blog/postgresql-filename-to-table/ [2]: https://stackoverflow.com/questions/18849756/automatically-drop-tables-and-indexes-older-than-90-days/18852752#18852752 [3]: https://www.postgresql.org/docs/current/event-triggers.html -- There is always an exception to always.
Re: Range contains element filter not using index of the element column
> On 27 Nov 2019, at 10:32, Lauri Kajan wrote: > > Hi all, > I'm wondering if there are anything to do to utilize a index when doing a > range contains element query. I have tested this with 9.6 and 12.0. > > I have a table with a timestamp column that has a btree index. > I would like to do a query: > SELECT * FROM table WHERE ts <@ tsrange($1, $2, '(]'); > The index is not used and a seq scan is done instead. > > To use the index correctly I have to do the query like this: > SELECT * FROM table WHERE ($1 IS null OR $1 < ts) AND ($2 IS null OR ts <= > $2); > I like the <@ syntax more. Is there something I can do differently? Maybe a > different type of index instead? Does it help to use timestamps -infinity and infinity instead of nulls in your case? => select t, t < current_timestamp, current_timestamp <= t from (values ('-infinity'::timestamp), ('infinity'::timestamp)) x(t); t | ?column? | ?column? -------+--+-- -infinity | t| f infinity | f| t (2 rows) Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Conditional return of aggregated data
On Mon, 2 Dec 2019 at 12:11, Laura Smith wrote: > > My initial idea was something along the lines of : > select (select sum(statcount) from stats_residence) as > aggstat,statcount,short_name_en from stats_residence where > aggstat>some_number; > One option is to move the aggregate to the where-clause. If you also need the value in your select-list, you can just repeat the subselect there, usually the planner is smart enough to figure out that it can just re-use the result. select short_name_en from stats_residence where (select sum(statcount) from stats_residence) >some_number; -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: SQL operator '*='
> On 23 Dec 2019, at 15:33, Matthias Apitz wrote: > > #ifdef DBSPOS > EXEC SQL DECLARE land_cursor CURSOR FOR > SELECT stammprio, lkz, landbez, plkz, postbez, karenz1, > karenz2, > karenz3, land.wkz, webez, we, kurs, land.del > FROM land LEFT JOIN devisen ON land.wkz = devisen.wkz AND land.brgroup = devisen.brgroup > WHERE land.brgroup = :brgroupHost_for_helpland_cursor > ORDER BY stammprio, landbez; > #endif Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: UPDATE many records
> On 6 Jan 2020, at 21:15, Israel Brewster wrote: > >> On Jan 6, 2020, at 10:08 AM, Christopher Browne wrote: >> >> On Mon, 6 Jan 2020 at 13:36, Israel Brewster wrote: >> Thanks to a change in historical data, I have a need to update a large >> number of records (around 50 million). The update itself is straight >> forward, as I can just issue an "UPDATE table_name SET >> changed_field=new_value();" (yes, new_value is the result of a stored >> procedure, if that makes a difference) command via psql, and it should work. >> However, due to the large number of records this command will obviously take >> a while, and if anything goes wrong during the update (one bad value in row >> 45 million, lost connection, etc), all the work that has been done already >> will be lost due to the transactional nature of such commands (unless I am >> missing something). >> >> Given that each row update is completely independent of any other row, I >> have the following questions: >> >> 1) Is there any way to set the command such that each row change is >> committed as it is calculated? >> 2) Is there some way to run this command in parallel in order to better >> utilize multiple processor cores, other than manually breaking the data into >> chunks and running a separate psql/update process for each chunk? Honestly, >> manual parallelizing wouldn’t be too bad (there are a number of logical >> segregations I can apply), I’m just wondering if there is a more automatic >> option. >> >> Yeah, I'd be inclined to do this in batches. I think you’re overcomplicating the matter. I’d just do it as a single update in one transaction. It’s only 50M rows. It may take half an hour or so on decent hardware, depending on how resource-intensive your function is. If that fails[1], only then would I start looking into batching things. But then you still need to figure out why it fails and what to do about that; if it fails it will probably fail fast, and if not, then you’re looking at a one-off situation that won’t require more than a few workarounds - after which you can just run the update again. Ad 1). No harm has been done, it’s a single transaction that rolled back. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Trigger
> On 25 Feb 2020, at 17:53, Adrian Klaver wrote: > > On 2/25/20 12:01 AM, Sonam Sharma wrote: >> I have a trigger, like many other triggers that fire after >> update and checks a field of the OLD set. For some reason this trigger throw >> this error: >> ERROR: record "old" has no field "ivo_sts_cd" CONTEXT: SQL statement > >> if exc_count = 0 then >> UPDATE pps.T8071_CAI_IVO_HDR SET IVO_STS_CD = 1 where >> T616_VBU_NBR=old.T616_VBU_NBR and T617_FNC_TYP_CD=old.T617_FNC_TYP_CD and >> T8071_CAI_IVO_ID=old.T8071_CAI_IVO_ID and T8071_ADD_DM= old. T8071_ADD_DM >> and old.ivo_sts_cd != 10 and old.ivo_sts_cd != 3; > > Realized I went through the above to quickly. I do not see a SET, nor am I > clear what table you are trying to UPDATE. I’m pretty sure that if the OP were to format their query in a more readable and consistent way, they would spot their error pretty quickly. It’s a simple typo. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Advice request : simultaneous function/data updates on many databases
> On 4 Mar 2020, at 23:42, Guyren Howe wrote: > > On Mar 4, 2020, at 14:33 , Rory Campbell-Lange > wrote: >> >> Essentially we wish to reduce the window where the frontend and backend >> aren't synchronised. >> >> If we have (for example) 200 databases which each take 2 seconds to >> update, a client could be on the wrong frontend code for over 6 minutes. >> Send each of the servers a PL/PGSQL method that executes all the things in a >> transaction and then waits until the same clock time to commit. Then all the >> servers are committing at the same moment. They will still be out of synch >> somewhat, but this would reduce the degree. I’m wondering whether this could be done with a more generic event-based approach, where each server sends a ‘done’ event to a central machine once it’s ready to commit, and the central machine returns an ‘acknowledged’ once the last server sent it’s ‘done’ event. The challenge there is that the ‘ack’ needs to be caught and processed within the same waiting transaction… Not sure how to do that right now - maybe through web services, MQTT or similar. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key
> On 1 Jun 2020, at 20:18, Shaheed Haque wrote: > > Hi, > > I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB > column called 'snapshot'. In Python terms, each row's 'snapshot' looks like > this: > > == > snapshot = { > 'pay_definition' : { > '1234': {..., 'name': 'foo', ...}, > '99': {..., 'name': 'bar', ...}, > } > == > > I'd like to find all unique values of 'name' in all rows of MyModel. I have > this working using native JSON functions from the ORM like this: > > = > class PayDef(Func): > function='to_jsonb' > > template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')" > > MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef', > flat=True) > = > > So, skipping the ordering/distinct/ORM parts, the core looks like this: > > to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name’) I do something like this to get a set of sub-paths in a JSONB field (no idea how to write that in Django): select snapshot->’pay_definition’->k.value->’name’ from MyModel join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true I don’t know how that compares performance-wise to using jsonb_each, but perhaps worth a try. Obviously, the way it’s written above it doesn’t return distinct values of ’name’ yet, but that’s fairly easy to remedy. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key
> On 2 Jun 2020, at 9:30, Shaheed Haque wrote: > > >> I do something like this to get a set of sub-paths in a JSONB field (no idea >> how to write that in Django): >> >> select snapshot->’pay_definition’->k.value->’name’ >> from MyModel >> join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true >> > I was unaware of the LATERAL keyword, so thanks. After a bit of Googling > however, it seems that it is tricky/impossible to use from the ORM (barring a > full scale escape to a "raw" query). One question: as a novice here, I think > I understand the right hand side of your JOIN "... k(value)" is shorthand for: > > ... AS table_name(column_name) > > except that I don't see any clues in the docs that jsonb_object_keys() is a > "table function". Can you kindly clarify? Correct. Thomas already explained the return type, but the plural form of the function name is also an indication that it returns multiple results. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: table name
> On 11 Jun 2020, at 20:58, Paul Förster wrote: > > Hi Marc, > >> On 11. Jun, 2020, at 20:54, Marc Millas wrote: >> sorry if my question is tooo simple :-) > > it's not. :-) > >> obviously if I ask: >> select * from regions-20180101; >> I get a syntax error. >> if I try select * from $$regions_20180101$$; >> I get another syntax error. >> If I try to rename that table, same thing. >> if I try a cte, same thing. >> >> What should I do ? > > you can just quote its name: > > select * from "regions-20180101"; > > Cheers, > Paul The background here is that ‘’ and $$ are quoting of literals (strings, integers, JSON objects, etc.), while “” is identifier quoting (tables, indices, types, etc.). Identifier quoting not only allows to include special symbols, it also makes the identifier case-sensitive. That’s probably why Paul suggested to rename the table to no longer require identifier quoting - many people consider it a PITA, but it can be used to get out of trouble like yours - some people insist on it, for example because it makes using camel-caps in identifiers meaningful. Regards, Alban Hertroys -- There is always an exception to always.
Re: Postgres and alias
> On 28 Aug 2020, at 2:14, Fontana Daniel C (Desartec S.R.L.) > wrote: > > Perfect. > > now let's imagine that '1234567890' is a function f_art_get_price(id_code), > which returns in a string like the following 'XXXZMMM1234567890123/mm/dd' > where 1234567890123 is the price and /mm/dd the date it was last changed > price. > How would you do in this case to obtain these values separately? > without calling the function 2 times avoiding overloading the base? > > something like this > > select art.description, >f_art_get_price_str( art.id ) as ls_price_and_date > SUBSTRING( ls_price_and_date, 7, 13 ) > from articulos; Let's assume art is supposed to be an alias for articulos. Something like this?: select art.description, p.ls_price_and_date, SUBSTRING( p.ls_price_and_date, 7, 13 ) from articulos art cross join lateral f_art_get_price_str( art.id ) p(ls_price_and_date); Alban Hertroys -- There is always an exception to always.
Re: How to properly query lots of rows based on timestamps?
> On 29 Aug 2020, at 10:24, Thorsten Schöning wrote: > > Hi all, > > I have a table containing around 95 million rows, pretty much only > storing a timestamp and further IDs of related tables containing the > actual data in the end. > >> CREATE TABLE clt_rec >> ( >> id BIGSERIAL NOT NULL, >> oms_rec BIGINT NOT NULL, >> captured_at TIMESTAMP(6) WITH TIME ZONE NOT NULL, >> rssiSMALLINT NOT NULL, >> CONSTRAINT pk_clt_rec PRIMARY KEY (id), >> CONSTRAINT fk_clt_rec_oms_rec FOREIGN KEY (oms_rec) REFERENCES "oms_rec" >> ("id"), >> CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec) >> ); > > In many use cases I need to search all of those rows based on their > timestamp to find rows arbitrary in the past: Sometimes it's only 15 > minutes into the past, sometimes it's 2 years, sometimes it's finding > the first day of each month over 15 months for some of those telegrams > etc. In the end, I pretty often need to compare those timestamps and > some queries simply take multiple seconds in the end, especially > adding up if multiple, but slightly different queries need to be > executed one after another. The following are two abstracts of > Postgres' query plans: > > Plan 1: > >> -> Nested Loop (cost=1.14..343169.49 rows=43543 width=20) (actual >> time=0.313..113.974 rows=34266 loops=3) >>-> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec >> (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.153..20.192 >> rows=34266 loops=3) >>Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp >> with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 >> 00:00:00+02'::timestamp with time zone + '1 day'::interval))) >>-> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.80 rows=1 >> width=12) (actual time=0.002..0.002 rows=1 loops=102799) >>Index Cond: (id = clt_rec.oms_rec) What happens here is that the planner looks up the lower and upper boundaries, everything in between those index nodes is a candidate record. Next, it loops over those to match the other condition of your query (id = clt_rec.oms_rec). You didn’t tell whether there’s an index on that column. You’d probably see a performance improvement were you to create an index on (captured_at, id). If your Postgres version is somewhat recent, that could even lead to an Index Only Scan. > Plan 2: > >> -> Nested Loop (cost=1.14..836381.50 rows=111934 width=20) (actual >> time=0.379..911.697 rows=334465 loops=3) >>-> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec >> (cost=0.57..8819.57 rows=111934 width=24) (actual time=0.193..154.176 >> rows=334465 loops=3) >>Index Cond: ((captured_at >= ('2020-08-28 10:21:06.968+02'::timestamp >> with time zone - '14 days'::interval)) AND (captured_at <= ('2020-08-28 >> 10:21:06.968+02'::timestamp with time zone + '00:00:00'::interval))) >>-> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.39 rows=1 >> width=12) (actual time=0.002..0.002 rows=1 loops=1003394) >>Index Cond: (id = clt_rec.oms_rec) And this situation is very much the same issue, apart from the larger number of candidate records. > Postgres seems to properly use available indexes, parallel workers and > stuff like that. But looking at the actual times and compared to all > the other parts of the query, comparing those timestamps simply takes > the most time. It only needs to compare 2 timestamps. > I've looked into this topic and found statements about that one > shouldn't put too many rows into the index[1] and stuff like that or > it will be ignored at all. But that doesn't seem to be the case for me > according to the plan. OTOH, my index really simply is about the > column containing the timestamp, no function reducing things to dates > or stuff like that to reduce the number of rows. > >> CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at ); Try this: CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at, id ); Alban Hertroys -- There is always an exception to always.
How to enumerate partitions from a window function?
We are trying to add some information to a query over data from a continuous process. Most of what we want can be done quite nicely using window functions, but I got stuck on this particular problem: The data has records with a timestamp and a few properties that make subsequent rows belong to the same group. Say we have: create table process_data ( timestamp timestamp not null, property_A text not null, property_B text not null, value numeric(12, 3) ); And a query like: select datetime, property_A, property_B , first_value(datetime)::time over run as swap_time --, here I want to enumerate the runs themselves , value from process_data window run as (partition by property_A, property_B order by datetime) ; As stated above, I want to enumerate the runs, starting at 1 and incrementing by 1 every time a partition from the 'run' window closes, which would result in something like this: datetime | property_A | property_B | swap_time | run_nr | value 2020-09-03 15:06 | tea | earl grey | 15:06 | 1 | 0.23 2020-09-03 15:07 | tea | earl grey | 15:06 | 1 | 0.22 2020-09-03 15:08 | tea | ceylon | 15:08 | 2 | 0.34 2020-09-03 15:09 | coffee | cappucino | 15:09 | 3 | 0.45 2020-09-03 15:10 | coffee | cappucino | 15:09 | 3 | 0.43 etc. Is there a way to achieve this through window functions, or do we need to wrap the thing in a subquery to achieve this? Regards, Alban Hertroys -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: How to enumerate partitions from a window function?
On Thu, 3 Sep 2020 at 16:01, Tom Lane wrote: > Alban Hertroys writes: > > As stated above, I want to enumerate the runs, starting at 1 and > > incrementing by 1 every time a partition from the 'run' window closes, > > Is there a way to achieve this through window functions, or do we need to > > wrap the thing in a subquery to achieve this? > > I think this'll work: > > select datetime, property_A, property_B > , first_value(datetime)::time over run as swap_time > , dense_rank() over (order by property_A, property_B) > , value > from process_data > window run as (partition by property_A, property_B order by datetime) > ; > > You can't do it with a window function over the "run" window because > no window function ever looks outside the current partition. But > that's easy to fix by using a different window definition. The > planner is smart enough to see that these windows are compatible > and only need one sort to be performed. > > regards, tom lane > Thanks Tom, That gets us close, but it ignores the order of the runs over time. I think it also reassigns the same number to later runs at the same 'day' that happen to have the same values for property_A and _B. That's some crucial information that I forgot to include. To expand on my original example: datetime | property_A | property_B | swap_time | run_nr | value 2020-09-03 15:06 | tea | earl grey | 15:06 | 1 | 0.23 2020-09-03 15:07 | tea | earl grey | 15:06 | 1 | 0.22 2020-09-03 15:08 | tea | ceylon | 15:08 | 2 | 0.34 2020-09-03 15:09 | coffee | cappucino | 15:09 | 3 | 0.45 2020-09-03 15:10 | coffee | cappucino | 15:09 | 3 | 0.43 2020-09-03 15:11 | tea | earl grey | 15:11 | 4 | 0.23 etc. Where the last row has the same characteristic properties as the first 2 rows (from run 1), but is in run 4 due to it having started after run 3. The runs normally start at 1 hour before midnight, with run 1, and continue 24h from there (it's a shifted day-schedule). The above example starting at 15:06 is unlikely to occur in reality, although possible (with long downtime). That's mostly to clarify how the run numbers should function, it would require to at least partition run_nr by a date shifted 1 hour back, as long as they number their runs correctly along the time axis. Regards, Alban. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: How to enumerate partitions from a window function?
On Thu, 3 Sep 2020 at 20:59, Michael Lewis wrote: > It seems like you are maybe wanting this- If the previous row is the same, > then get the previous row's run_nr. If it is different, then increment. > > case when lag( property_A ) over() = property_A and lag( property_B ) > over() = property_B then coalesce( lag( run_nr ) over(), 1 ) else lag( > run_nr ) over() + 1 end > > Perhaps there is a much simpler implementation though. > That would work were it not that the very column we're defining is the one to be aliased run_nr. The data does not contain that information, it's what I'm trying to enrich it with and what I'm having trouble wrapping my head around. Your query (adopted a tiny bit) unfortunately results in: select datetime, property_A, property_B , first_value(datetime::time) over run_win as swap_time , case when lag(property_A) over time_win = property_A and lag(property_B) over time_win = property_B then coalesce(lag(run_nr) over time_win, 1) else lag(run_nr) over time_win +1 end , value from process_data window time_win as (order by datetime) , run_win as (partition by property_A, property_B order by datetime) order by datetime ; ERROR: column "run_nr" does not exist LINE 6: then coalesce(lag(run_nr) over time_win, 1) ^ SQL state: 42703 Character: 221 I turned my example into a proper test-case (better late than never): CREATE TABLE process_data ( datetime timestamp without time zone NOT NULL, property_a text NOT NULL, property_b text NOT NULL, value numeric(12,3) ); COPY process_data (datetime, property_a, property_b, value) FROM stdin; 2020-09-03 15:06:00 tea earl grey 0.230 2020-09-03 15:07:00 tea earl grey 0.220 2020-09-03 15:08:00 tea ceylon 0.340 2020-09-03 15:09:00 coffee cappucino 0.450 2020-09-03 15:10:00 coffee cappucino 0.430 2020-09-03 15:11:00 tea earl grey 0.230 \. With the desired result (note that swap_time and run_nr are calculated columns): datetime | property_a | property_b | swap_time | run_nr | value -+++---++--- 2020-09-03 15:06:00 | tea| earl grey | 15:06:00 | 1 | 0.230 2020-09-03 15:07:00 | tea| earl grey | 15:06:00 | 1 | 0.220 2020-09-03 15:08:00 | tea| ceylon | 15:08:00 | 2 | 0.340 2020-09-03 15:09:00 | coffee | cappucino | 15:09:00 | 3 | 0.450 2020-09-03 15:10:00 | coffee | cappucino | 15:09:00 | 3 | 0.430 2020-09-03 15:11:00 | tea| earl grey | 15:06:00 | 4 | 0.230 (6 rows) I've been looking around on the Internet in the meantime, and it seems people either solve this with a recursive CTE (referencing the previous row by row_number() over (...)) or by writing a set-returning function that walks over the data in datetime order using a cursor. Since the actual query is growing more and more state-tracking flags, using a function has the added benefit that referencing state columns from the previous row gets a lot easier (lots of repeated window functions otherwise). It would become a procedural solution instead of a set-based one, but considering that this data is order-sensitive (on datetime), that's probably what a set-based solution would also end up doing anyway. Regards, Alban. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
FATAL: terminating connection due to administrator command
h exit code 1 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker "parallel worker" (PID 30658) exited with exit code 1 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker "parallel worker" (PID 30659) exited with exit code 1 2020-09-30 22:43:08.459 CEST [8055] 172.30.2.25 selfservice_prd ERROR: schema "somethingelse" does not exist at character 71 Apparently, something is sending SIGTERM to our pg processes. I know that I'm not doing that, certainly not at those hours, and I'm the one who set up this system and am the only DBA of it. Advice I found on the Internet is to use systemtap with some tap-script, but the scripts that I found just displayed the PID's of processes without telling me their names, which I didn't find all that useful in figuring out who was responsible, so I made an attempt (I have no experience with stap) at modifying it to print process names of signal sender and target: /* * killsnoop-nd.stp Trace process signals. * For Linux, uses SystemTap (non-debuginfo). * * Copyright (C) 2015 Brendan Gregg. (etc) */ global target; global signal; probe begin { printf("%-6s %-12s %-5s %-6s %6s\n", "FROM", "COMMAND", "SIG", "TO", "COMMAND"); } probe nd_syscall.kill { target[tid()] = uint_arg(1); signal[tid()] = uint_arg(2); } probe nd_syscall.kill.return { if (signal[tid()] == 15 && target[tid()] != 0) { printf("%-6d %-12s %-5d %-6d %12s\n" , pid(), execname() , signal[tid()] , target[tid()], pid2execname(target[tid()])); } delete target[tid()]; delete signal[tid()]; } The output of last night was: FROM COMMAND SIG TO COMMAND 30068 systemd-udevd 1514151 systemd-udevd 30068 systemd-udevd 1514836 systemd-udevd (...) 6482 postmaster 1530649postmaster 6482 postmaster 1530648postmaster 6482 postmaster 1530647postmaster 6482 postmaster 1530646postmaster 6482 postmaster 1530645postmaster 6482 postmaster 1530659postmaster 6482 postmaster 1530658postmaster 6482 postmaster 1530657postmaster 6482 postmaster 15 30656postmaster 6482 postmaster 1530655postmaster 6482 postmaster 152065 postmaster 6482 postmaster 15 2064 postmaster 6482 postmaster 152063 postmaster Several of these TO-pid's match those in the PG log. $ ps aux | grep 6482 postgres 6482 0.0 0.5 51755776 1043872 ?Ss Sep28 3:25 /usr/pgsql-11/bin/postmaster -D /data/11/data Am I correct in concluding that postmaster is killing its own processes? If so, what is going on here? And more importantly, what do I do about it? Regards, Alban Hertroys. P.S. I'm mailing from my work account, as this is work related. You may remember me from my private account. Alban Hertroys D: +31 (0)53 4 888 888 | T: +31 (0)53 4888 888 | E: alban.hertr...@apollovredestein.com Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The Netherlands Chamber of Commerce number: 34223268 The information contained in this e-mail is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or action in relation to the contents of this information is strictly prohibited and may be unlawful and request you to delete this message and any attachments and advise the sender by return e-mail. The confidentiality of this message is not warranted. Apollo Vredestein and its subsidiaries rule out any and every liability resulting from this or any other electronic transmission Please consider the environment before printing this e-mail
Betr: Re: FATAL: terminating connection due to administrator command
"Srinivasa T N" wrote on 01/10/2020 11:47:33: > On Thu, Oct 1, 2020 at 2:47 PM Alban Hertroys < > alban.hertr...@apollovredestein.com> wrote: > Hi all, > > We're seeing the FATAL error message from the subject pop up in our > logs at regular intervals, but I haven't been able to pinpoint what > is causing it. I'm hoping for some insights here. > > We run a PostgreSQL 11.9 server on CentOS 7, within a vmware environment: > PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 > 20150623 (Red Hat 4.8.5-39), 64-bit > > The package was installed from the PGDG repository. > > I'm not even sure I should be worried, there doesn't appear to be > any impact on the servers' functioning, but it does say 'FATAL'. > What we're seeing are lines like these two instances: > > 2020-09-30 22:27:56.446 CEST [30659] STATEMENT: select count(*) > from "dm_b2b"."prlwytzkofskiv1" > 2020-09-30 22:27:56.446 CEST [30658] FATAL: terminating > connection due to administrator command > 2020-09-30 22:27:56.446 CEST [30658] STATEMENT: select count(*) > from "dm_b2b"."prlwytzkofskiv1" > 2020-09-30 22:27:56.446 CEST [30657] FATAL: terminating > connection due to administrator command > 2020-09-30 22:27:56.446 CEST [30657] STATEMENT: select count(*) > from "dm_b2b"."prlwytzkofskiv1" > 2020-09-30 22:27:56.446 CEST [30656] FATAL: terminating > connection due to administrator command > 2020-09-30 22:27:56.446 CEST [30656] STATEMENT: select count(*) > from "dm_b2b"."prlwytzkofskiv1" > 2020-09-30 22:27:56.446 CEST [30655] FATAL: terminating > connection due to administrator command > 2020-09-30 22:27:56.446 CEST [30655] STATEMENT: select count(*) > from "dm_b2b"."prlwytzkofskiv1" > 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker > "parallel worker" (PID 30655) exited with exit code 1 > 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker > "parallel worker" (PID 30656) exited with exit code 1 > 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker > "parallel worker" (PID 30657) exited with exit code 1 > 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker > "parallel worker" (PID 30658) exited with exit code 1 > 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker > "parallel worker" (PID 30659) exited with exit code 1 > 2020-09-30 22:43:08.459 CEST [8055] 172.30.2.25 selfservice_prd > ERROR: schema "somethingelse" does not exist at character 71 > I am guessing that 6 background workers are started, 1 worker had > the result and hence killing the other 5 workers. Maybe, some more > pg experts can comment. Anyway, explain of your query helps. I think you may have the right of it: QUERY PLAN -- Finalize Aggregate (cost=3065970.74..3065970.75 rows=1 width=8) -> Gather (cost=3065970.21..3065970.72 rows=5 width=8) Workers Planned: 5 -> Partial Aggregate (cost=3064970.21..3064970.22 rows=1 width=8) -> Nested Loop Left Join (cost=2772.30..2743631.23 rows=128535594 width=0) Join Filter: ((avl.xx)::text <> ''::text) -> Parallel Hash Left Join (cost=2772.01..943286.00 rows=5574292 width=13) Hash Cond: (avl.x = (dc.x)::integer) -> Parallel Seq Scan on avl (cost=0.00..596772.71 rows=5574171 width=21) -> Parallel Hash (cost=2262.01..2262.01 rows=40800 width=8) -> Parallel Index Only Scan using on dc (cost=0.42..2 -> Index Scan using ix_xxxx on xxx dm (cost=0.29..0.31 rows=1 width=19) Index Cond: ((avl.xx)::text = (xx)::text) Filter: ((xx)::text <> ''::text) (14 rows) So, apparently these FATAL errors are just caused by parallel workers being aborted because they're no longer needed. Good to know. Regards, Alban. Alban Hertroys D: +31 (0)53 4 888 888 | T: +31 (0)53 4888 888 | E: alban.hertr...@apollovredestein.com Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The Netherlands Chamber of Commerce number: 34223268 The
Re: How to update a table with the result of deleting rows in another table
> On 6 Oct 2020, at 7:37, Hemil Ruparel wrote: > > I am trying to delete orders for a given customer on a given date and add the > cost of those orders to credit for the customer. > > So far, I came up with this: > ``` > with data as ( > delete from orders > where customer_id = > and date = '2020-10-05' returning price > ), total as ( > select sum(price) from data > ) > update paymentdetail > set temp_credit = temp_credit + (select * from total) > where customer_id = > ``` > > which works. but is there a better way to update one table using the result > of deleting rows from another table given that I only want the aggregate of > the result? Adding the customer id to your returning clause and using update..from could help: with data as ( delete from orders where customer_id = returning customer_id, price ), total as ( select customer_id, sum(price) as total_price from data group by customer_id ) update paymentdetail set temp_credit = temp_credit + total.total_price from total where customer_id = total.customer_id You could also do this using subqueries instead of CTE’s, that may perform better as CTE’s act as optimisation fences. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Check constraints do not seem to be working!!!
On 11 Nov 2020, at 11:15, Jitendra Loyal wrote: > > > Thanks Nikolay > > I read that but is there a way to meet the above requirement. And I will like > to add that IS NULL and IS NOT NULL should evaluate to true/false. These > operators are made for this and should not be returning NULL. That is exactly what they do. Your problem is with the equality operator and its behaviour with NULL values, which is described in the referenced document. -- If you can't see the forest for the trees, Cut the trees and you'll find there is no forest. >> On Wed 11 Nov, 2020, 14:18 Nikolay Samokhvalov, >> wrote: >>> On Wed, Nov 11, 2020 at 12:26 AM Jitendra Loyal >>> wrote: >>> Despite the above two constraints, the following rows get into the table: >>> insert into t (b , c) values (null, true), (null, false); >> >> This behavior is described in the docs >> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS: >> >> > It should be noted that a check constraint is satisfied if the check >> > expression evaluates to true or the null value. Since most expressions >> > will evaluate to the null value if any operand is null, they will not >> > prevent null values in the constrained columns. To ensure that a column >> > does not contain null values, the not-null constraint described in the >> > next section can be used.
Re: Is it possible to write a generic UPSERT?
> On 12 Nov 2020, at 14:58, Mario Emmenlauer wrote: (…) > But the statement is slightly complex to type, and I find me and my > colleagues often spend more time on this than I would hope. Our two > main challenges are: > (1) we have to look up the uniqueness constraints on the table, and > (2) we have to duplicate the insert statement in the UPDATE section >again, because virtually all fields should get overwritten >(except for the conflicting ones). On long inserts this can be >quite annoying and error-prone. > > I can see how "ON CONFLICT" is very powerful. But that power seems > often a burden for us. We would prefer something that is less manual > effort for the specific use case. Basically, we would like: >INSERT if not exist, and >UPDATE _all_ non-conflicting fields in case of _any_ conflict > > In my (naiive) thinking, such a construct would cover 99% of our > use cases. Or did other people make very different experiences? (…) > Has anybody ever done something like this? Is there an SQL way to > achieve this? Or another programmatic way? We generate the SQL @work based on the definitions in, IIRC, the information_schema. It has tables for both the column lists per table and the primary key definitions. With that, an SQL statement that returns the required SQL statement is easy to generate, after which you can execute it either from a plpgsql execute statement in a function or in a do-block. We do this in plpgsql, but that’s mostly because this code is part of our ETL process and it has to perform some other logic on the same data anyway. I could look up our code for you tomorrow, but that looks to be a busy day, so I can’t promise. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Accessing Postgres Server and database from other Machine
> On 5 Dec 2020, at 10:05, Paul Förster wrote: > > Hi Hemil, > >> On 05. Dec, 2020, at 07:50, Hemil Ruparel wrote: >> >> Did you restart postgres after changing pg_hba.conf? > > that shouldn't be necessary for changes in pg_hba.conf. Just do either on the > command line: > > $ pg_ctl reload While you’re in there, also verify that something is listening on the port (see below) $ netstat -an > or from psql: > > postgres=# select pg_reload_conf(); > > You can then see the effective result immediately in pg_hab_file_rules: > > postgres=# table pg_hba_file_rules; Also: postgres=# show listen_addresses; postgres=# show port; Those will tell you whether the server is listening on the network and on the expected port. Alban Hertroys -- There is always an exception to always.
Re: SQL group by help
> On 11 Dec 2020, at 18:24, Chris Stephens wrote: > > I'm trying to create a visual representation of a 6x8 grid of samples on a > rack using the following SQL format: > > with rack_display as ( > select sr.ts rack_ts > , sr.rack_id > , r.rack_barcode > , 1 as row_pos > , max(case when rack_well = 0 then 'A1: '||sample_barcode end) as col1 > , max(case when rack_well = 1 then 'A2: '||sample_barcode end) as col2 > , max(case when rack_well = 2 then 'A3: '||sample_barcode end) as col3 > , max(case when rack_well = 3 then 'A4: '||sample_barcode end) as col4 > , max(case when rack_well = 4 then 'A5: '||sample_barcode end) as col5 > , max(case when rack_well = 5 then 'A6: '||sample_barcode end) as col6 > from rack r > , sample_rack sr > , sample s > where r.rack_id = sr.rack_id >and sr.sample_id = s.sample_id >and sr.rack_well < 6 > group by sr.ts, sr.rack_id, r.rack_barcode, row_pos > union all > select sr.ts rack_ts > , sr.rack_id > , r.rack_barcode > , 2 as row_pos > , max(case when rack_well = 6 then 'B1: '||sample_barcode end) as col1 > , max(case when rack_well = 7 then 'B2: '||sample_barcode end) as col2 > , max(case when rack_well = 8 then 'B3: '||sample_barcode end) as col3 > , max(case when rack_well = 9 then 'B4: '||sample_barcode end) as col4 > , max(case when rack_well = 10 then 'B5: '||sample_barcode end) as > col5 > , max(case when rack_well = 11 then 'B6: '||sample_barcode end) as > col6 > from rack r > , sample_rack sr > , sample s > where r.rack_id = sr.rack_id >and sr.sample_id = s.sample_id >and sr.rack_well >= 6 >and sr.rack_well < 12 > group by sr.ts, sr.rack_id, r.rack_barcode, row_pos > union all > ... > ) > select * from rack_display order by rack_ts, rack_id, row_pos; > > the "union all"s continue for another 6 blocks. reports would filter on > rack_id and timestamp. Is time really what groups these batches? I would double-check whether you may be omitting to store some data relevant to this process. > if timestamps for each load of a rack were guaranteed to be the same, this > would work. however, the "sr.ts" values may vary by a few seconds so there is > potential for the "group by" to break. ts differences will be a minimum of 5 > minutes for each distinct load of a rack. > > what i think i need is to manufacture a group by column based off rows in > "sample_rack" that have "ts" values that are < 1 minute from each other and > rack_id is the same. i'm coming up blank on how to accomplish that though. > my first thought was to create an interval of +/- 1 min then find all rows > that overlap and assign a group number but i'm not sure how to accomplish > that. You could date_trunc those timestamps to the minute and group on that. > there's also no guarantee an entire rack is full of samples so some "cells" > of display might be null. i think that makes the use of tablefunc crosstab a > little harder. if i remember correctly, it does not handle missing values > well. i'm open to any pivoting strategy. Many reporting tools have features to support just that. We use WebFOCUS, which calls those ACROSS columns. It’s a common requirement in reporting. Alban Hertroys -- There is always an exception to always.
Re: Trigger with conditional predicates
> On 1 Jan 2021, at 16:56, Dirk Mika wrote: > > Hi all and a happy new Year! > > We have an Oracle schema that is to be converted to PostgreSQL, where > conditional predicates are used in some triggers. > > In particular, columns are populated with values if they are not specified in > the update statement which is used. > Usually with an expression like this: > > IF NOT UPDATING('IS_CANCELED') > THEN > :new.is_canceled := ...; > END IF; > > I have not found anything similar in PostgreSQL. What is the common approach > to this problem? > > BR > Dirk Can't you use column defaults to handle these cases? Alban Hertroys -- There is always an exception to always.
Re: Trigger with conditional predicates
> On 4 Jan 2021, at 20:02, Dirk Mika wrote: > >>> On 1 Jan 2021, at 16:56, Dirk Mika wrote: >>> >>> Hi all and a happy new Year! >>> >>> We have an Oracle schema that is to be converted to PostgreSQL, where >>> conditional predicates are used in some triggers. >>> >>> In particular, columns are populated with values if they are not specified >>> in the update statement which is used. >>> Usually with an expression like this: >>> >>> IF NOT UPDATING('IS_CANCELED') >>> THEN >>>:new.is_canceled := ...; >>> END IF; >>> >>> I have not found anything similar in PostgreSQL. What is the common >>> approach to this problem? > >> Can't you use column defaults to handle these cases? > > That would work for inserts, but not for updates. Usually, if you don’t mention a column in an UPDATE, you want the value to remain as it was, which is precisely what happens by default. That certainly makes sense to me when you’re dealing with an application that doesn’t know about the existence of said column; overwriting an existing value that some other application put there looks like a problem to me. But of course, that depends a lot on what you’re trying to achieve. What is your use-case that that is not the desired behaviour? Or are we talking about a mixed problem here, where this approach works for some fields, but other fields (such as a status change date) always need to be updated (regardless of whether a value was specified)? Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Define hash partition for certain column values
> On 12 Jan 2021, at 16:51, Голубева Яна wrote: > > Values for the key partitioning column are generated randomly and I can't > predict their distribution between ranges. > If I just create some ranges I won't have any guarantee that partitions will > have similar amount of data. It is possible that I will have 2 or 3 extremely > big partitions and a bit of data in others. A hash of a random number is also random, so when using hashes for partitioning you will get the same problem. If you want to distribute values equally over a fixed number of partitions, I suggest you partition on a modulo of a monotonously increasing number (a sequence for example), instead of relying on a random number. > 12.01.2021, 17:55, "Michael Lewis" : > On Tue, Jan 12, 2021 at 1:21 AM Голубева Яна wrote: > List or range partitioning isn't suitable for my case. > I am using a column of numeric(20) type as a base for partitioning. The > values of the column are generated randomly. > So there will be too many partitions if I use list partitioning as is. > > Sorry, but why is range not suited for this? It would seem fairly trivial to > create 50 or 1000 partitions to break up the range of values allowed by your > field definition. Alban Hertroys -- There is always an exception to always.
Re: How to keep format of views source code as entered?
> On 12 Jan 2021, at 20:54, Alex Williams wrote: > > Hi Ingolf, > > For comments in views, I create a unused CTE and put my comments there, e.g. > > WITH v_comments AS ( > SELECT 'this is my comment' AS comment > ) > > Alex You do know about COMMENT ON VIEW v_comments IS ’this is my comment’, right? > ‐‐‐ Original Message ‐‐‐ > On Thursday, January 7, 2021 11:19 AM, Markhof, Ingolf > wrote: > >> Hi! >> >> >> >> Switching from Oracle SLQ to PostgreSQL I am facing the issue that the SQL >> code the system returns when I open a views source code is different from >> the code I entered. The code is formatted differently, comments are gone and >> e.g. all text constants got an explicit cast to ::text added. (see sample >> below). >> >> >> >> I want the SLQ code of my views stored as I entered it. Is there any way to >> achieve this? Or will I be forced to maintain my views SQL code outside of >> PostgreSQL views? >> >> >> >> Any hints welcome! >> >> >> >> Here is an example: >> >> >> >> I enter this code to define a simple view: >> >> >> >> create or replace view myview as >> >> select >> >> product_id, >> >> product_acronym >> >> from >> >> products -- my comment here >> >> where >> >> product_acronym = 'ABC' >> >> ; >> >> >> >> However, when I open the view my SQL client (DBeaver) again, this is what I >> get: >> >> >> >> CREATE OR REPLACE VIEW myview >> >> AS SELECT product_id, >> >> product_acronym >> >>FROM products >> >> WHERE product_acronym = 'ABC'::text; >> >> >> >> So, the formatting changed, keywords are capitalized, the comment I added in >> the from-part has gone and the text constant 'ABC' changed to 'ABC'::text. >> >> >> >> >> >> >> >> >> >> >> Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - >> Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - >> Vorsitzender des Aufsichtsrats: Francesco de Maio >> > Alban Hertroys -- There is always an exception to always.
Re: Puzzline CROSS JOIN when doing RECURSIVE CTE
> On 18 Apr 2022, at 11:56, Pól Ua Laoínecháin wrote: (…) > All of the code below is available on the fiddle here: > > https://dbfiddle.uk/?rdbms=postgres_13&fiddle=0cc20c9081867131260e6e3550bd08ab (…) > OK, grand, now I wish to perform a RECURSIVE CTE on it. So, I start by > trying something (I thought was) very simple. Obviously, I plan to do > more, but I wanted to get the "mechanics" correct to start with. So, > my query is: > > WITH RECURSIVE cte1 (n, ln) AS > ( > SELECT 1 AS n, string > FROM line Here is your first problem, this will yield a result for each row in your line table, numbering it ‘1’. You seem to have expected just a single result here, but that is something that you need to take care of in your query. This part is called the base case, base step or initial step. > UNION ALL > SELECT n + 1, ln > FROM cte1 > WHERE n < (SELECT COUNT(*) FROM line) And then for each of those rows, it will add all those rows (from the same CTE!) again. This part is called the recursive step. You did add a termination condition here, which indeed manages to terminate, but it does so too late. It seems that you do understand some of the concepts of recursive CTE’s, but you appear to be missing some crucial knowledge. For example, it is actually possible to query multiple trees with a single recursive CTE. It is not limited to a single tree. How many trees the CTE will navigate depends on how you selected the rows in the base case. > ) > SELECT * FROM cte1; > > i.e. have a counter variable and a string from the line table My first question is why you’re using a recursive CTE here? This doesn’t appear to be hierarchical data (such as a tree), unless perhaps you intended to actually traverse the HTML document hierarchy? > > But, then to my horror, the result of this query is > > 1with t(x) as (values( XMLPARSE(DOCUMENT > (' AlarmCode="mail" AlarmStartTime="10:00:00" AlarmTime="0" Id ="2" >> Id="2">
Re: Puzzline CROSS JOIN when doing RECURSIVE CTE
> On 18 Apr 2022, at 14:51, Pól Ua Laoínecháin wrote: > > Hi Alban, and many thanks for your input. > >> My first question is why you’re using a recursive CTE here? This doesn’t >> appear to be hierarchical data (such as a tree), unless perhaps you intended >> to actually traverse the HTML document hierarchy? > > This is basically an exercise on my part. > > The question that I'm trying to answer is here: > > https://stackoverflow.com/questions/70574881/how-can-get-html-inner-tag-in-posgresql > > I've already answered it in 3 different ways - but I was trying to do > it with RCTEs in order to improve my comprehension of them. > > So, basically, I want to pick out a subsection of text from a "passage". > > So then, I wanted to establish a true/false state for the lines that I > want and don't want, going through line by line. I know that the RCTE > is a very contrived way of doing this, but it's for learning really. Considering that you’re already looking at the elements of a parsed DOM tree, the exercise boils down to traversing that tree. Due to how xmlparse() is implemented, you probably already get them in the right order even when not using an explicit order by. That is, if you’re looking for a DFT (depth first traversal) as opposed to a BFT (breadth first). One of the difficulties here is that there are some CDATA sections involved with more XML in them. My guess is that that’s the data that you’re actually after, but that’s just a matter of entering the document with the correct path I suppose? > I wonder if you could be so kind as to give me a "skeleton" RCTE for > this - I've been staring at this for hours - and it's not that I'm > lazy or haven't studied RCTEs - I wrote this RCTE > > https://stackoverflow.com/a/71674990/470530 > > recently, so it's not as if I'm completely ignorant of RCTEs - I'm > just stuck in a rut. Any help would be appreciated. You would first need to determine the root node(s). Those are the ones w/o parents, or you may have some other way of determining those. Next is finding all nodes that have an earlier node as their parent. You could go an extra step here with preserving the order of the siblings in the document, by numbering nodes (directly) under the same parent. I usually build an ltree structure with that information, while traversing the tree - that gets you an ltree with entries (1, 1.1, 1.1.1, 1.1.2, 1.2.1, etc) that you then can use for the final order by, for example. In case you didn’t know, ltree is a module you can install. I find it still very useful in tree traversals. The one drawback I see is that for these scenario’s you’d ideally want an ltree based on integers, such that 10 sorts after 9 instead of between 1 and 2. Padding enough zeroes before the ltree text items is a bit of an extra hassle that I’d prefer to do without. I haven’t actually looked at what DOM navigation functions exist for PG, so this is more or less pseudo code. Worse, my local copy of PG was compiled w/o XML support, so I don’t know what kind of result the query from that SO article produces. But then again, I don’t really know what you’re after anyway, so... This is basically how I would go about it. with recursive -- First we need to get the DOM-tree parsed (this is not actually recursive) domtree as ( select node from xmlparse(document(‘...')) ), -- Next we can traverse it cte (node, hierarchy, n) as ( select node, 1::text::ltree, 1 from domtree where parent(node) is null union all select node, cte.hierarchy || (cte.n+1)::text::ltree, n+1 from domtree t join cte on parent(t.node) = cte.node ) select * from cte order by hierarchy; Function parent() is made-up. It would return the parent node of a node, so that there is some way to connect the different parts in the hierarchy. I guess xpath() could fulfil that purpose, but I have no way of testing that hypothesis. I hope that’s a good enough starting point for you? Alban Hertroys -- There is always an exception to always.
Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?
> On 18 Jun 2022, at 2:14, Bryn Llewellyn wrote: > > I implemented two complementary functions: > > —"no_null_keys()" checks that a "jsonb" value has no occurrences of « "some > key": null » > > —"strip_null_keys()" removes « "some key": null » occurrences from a "jsonb" > value > > The code checks with "no_null_keys()" that, as expected, no ingested JSON > document has an occurrence of « "some key": null ». > > And it uses "strip_null_keys()" on the output of "to_jsonb()" — and, as > appropriate, any other built-in JSON function that produces a "jsonb" value. > > It was straightforward to implement these two functions by using REGEXP > built-in functionality on the canonically formatted "text" value produced by > the "jsonb::text" typecast. In my experience, using regular expressions applied to document formats tends to get you false positives. I’d be worried about books with titles similar to 'How we wrote a regular expression to detect occurrences of "some key": null in our JSON documents', for example. For stripping those null occurrences, you are aware of the json_strip_nulls(json) and jsonb_strip_nulls(jsonb) functions, right? For detecting them on a recent PG, the @? operator or json_path_exists(json, jsonpath) functions would probably do the trick. I am not too familiar with JSONPATH expressions, but I expect (it passed some preliminary testing) this would detect your nulls just fine, while taking JSON semantics into account: jsonb_path_exists(document, '$.** ? (@ == null)'::jsonpath) For PG-specifics on JSONPATH, see section 9.16.2 on: https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-OP-TABLE A recursive query is another possible solution. It would probably perform far worse, but I find them more rewarding to write. Some people prefer Sodoku. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Getting data from a record variable dynamically
> On 5 Jul 2022, at 1:02, Rhys A.D. Stewart wrote: > > Greetings All, > > I have a trigger that is attached to several different tables. In the > trigger function I retrieve a single row and I want the info from a > specific column. This column is dependent on the table in question. > and I have the column name stored in a variable as well. Without > writing a conditional for each table, what is the best way to > dynamically get the data from the record variable? I would create a simple trigger function for each of those tables that just extracts the value(s) from the field(s) you mentioned, and then pass those values on to a generic function that does the actual table-independent work. The usual trigger pseudo-columns and variables wouldn’t be available in that generic function, but considering that your triggers fire from different tables, you could either pass them on or they (probably) don’t make sense in the context of the generic function. Alban Hertroys -- There is always an exception to always.
Re: pg_restore creates public schema?
> On 6 Oct 2022, at 22:52, Ron wrote: > > On 10/6/22 12:46, Christophe Pettus wrote: >>> On Oct 6, 2022, at 10:44, Ron wrote: >>> Sadly, that VM doesn't have nearly enough disk space to hold the backup >>> folder. >> Use file mode, and stream the output via scp/ssh to a different machine? > > I thought of that, too. Unfortunately, the ssh version in RHEL 8.6 is > sufficiently old that "three way" ssh (person at HostA wanting to transfer a > file from Server1 to Server2) requires that port 22 be open from Server1 to > Server2. Bit of a wild idea, I haven’t used SSH tunnels in years, so a bit fuzzy on the details: Can you create an SSH tunnel to the new machine from the VM, then pipe that to an SSH connection from a machine that does have enough space to dump? And then vice versa to the new machine to restore? (Unless access to that one is easier of course) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Duda sobre como imprimir un campo INTERVAL
> On 19 Nov 2022, at 4:58, Ken Tanzer wrote: > > On Thu, Nov 17, 2022 at 2:30 PM Alejandro Baeza Rangel > wrote: (…) > don't fully understand it. But what really confuses me is the example below. > How can these two intervals be equal and still yield different output in the > to_char function? And as a practical matter, and for the OPs question, how > can you convert from one to the other of these "equal" values? > > WITH inters AS ( > SELECT > '1 day 2 hours'::interval AS i1, > '26 hours'::interval AS i2 > ) > SELECT > *, > to_char(i1,'HH24:MM:SS') AS i1_char, > to_char(i2,'HH24:MM:SS') AS i2_char, > i1=i2 AS "Equal?" > FROM inters; > >i1 |i2| i1_char | i2_char | Equal? > +--+--+--+ > 1 day 02:00:00 | 26:00:00 | 02:00:00 | 26:00:00 | t > > Cheers, > Ken Those intervals are not identical. I think the reasoning is that due to DST changes, ‘1 day 2 hours’ is more specific than its conversion to ’26 hours’ (or 25 or 27 at DST change). And since you’re not converting the number of days in to_char, that information gets lost. That problem doesn’t seem to arise in the OP’s question (as far as I understand his question), he does have dates to base the intervals on. However, converting the differences in dates to intervals decouples the difference from the dates (the intervals could, for example, subsequently be added to an entirely different date) and he ends up in the same boat. It would seem that the way to do this is to convert the difference to (seconds since) epoch and do the math to convert that to a character string yourself. See for example: https://stackoverflow.com/questions/341384/how-to-convert-an-interval-like-1-day-013000-into-253000 That seems unnecessarily complicated, perhaps there is/could be a more convenient method? I’m sort of thinking of a "relative timestamp offset" type, that tracks an exact difference relative to a given timestamp? Alban Hertroys -- There is always an exception to always.
Re: Stored procedure code no longer stored in v14 and v15, changed behaviour
> On 2 Dec 2022, at 15:10, DAVID ROTH wrote: > > Is there a way to reverse engineer the original code (or its equivalent) from > what is saved in the database? I’m dumping an idea here… Treading back quite a bit with information from later in this thread. With the original procedure source code under version control and assuming the tokenization converts that source code consistently (since it’s done by a computer), you could store another level of diffs: From the source code you pushed, against the output of pg_get_functiondef. Assuming that changes to the tokenised code, when converted back to text, only involve renames of database objects, the result of a reversely applied diff could very well be comparable to the original source code. I suspect that would be sufficient for telling whether a developer is responsible for the changes, or that they were caused by renaming of database artefacts. You would need to wrap the function creation calls into some automation to generate and store those diffs, comparing it back, etc, but that may be doable. I would also generate new diffs right after major version updates of the database (a before and after of the output of pg_get_functiondef, applied to the stored diff?). I’m not so sure that would work for auditing, but that seems to have been tackled down-thread. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Stored procedure code no longer stored in v14 and v15, changed behaviour
> On 3 Dec 2022, at 20:55, Karsten Hilbert wrote: > >> You would need to wrap the function creation calls into some automation to >> generate and store those diffs, comparing it back, etc, but that may be >> doable. I would also generate new diffs right after major version updates of >> the database (a before and after of the output of pg_get_functiondef, >> applied to the stored diff?). > > I wonder whether that would tie the sanity check to a particular PG version. > > I mean, pg_get_functiondef output being a server runtime artifact it might > well change between server versions, no ? I meant to write: “I would also generate new diffs right _before and_ after…”, precisely for that reason. The before patch should get you the last ’sane’ situation to get back to the source code. Next, you can diff that to the newly tokenised version after the upgrade. It is a bit of a hassle, as you need to remember to do that before an upgrade, but at least you’d have something… Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Regular expression for lower case to upper case.
> On 10 Dec 2022, at 12:00, Eagna wrote: > > > Hi, and thanks for your input. > > >> RegExp by itself cannot do this. You have to match all parts of the input >> into different capturing groups, then use lower() combined with format() to >> build a new string. Putting the capturing groups into an array is the most >> useful option. > > > OK - I *_kind_* of see what you're saying. > > There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd care to > give an outline of the solution that you propose. If you put all the regexes and their replacements into a table[1], you could use an aggregate over them to combine all the replacements into the final string. It would need some aggregate like regex_replace_agg, which would probably be a custom aggregate. [1]: If you stick to ASCII, you could just calculate them and even omit storing them in a physical table. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?
> On 16 Jan 2023, at 15:37, HECTOR INGERTO wrote: > > > The database relies on the data being consistent when it performs crash > > recovery. > > Imagine that a checkpoint is running while you take your snapshot. The > > checkpoint > > syncs a data file with a new row to disk. Then it writes a WAL record and > > updates > > the control file. Now imagine that the table with the new row is on a > > different > > file system, and your snapshot captures the WAL and the control file, but > > not > > the new row (it was still sitting in the kernel page cache when the > > snapshot was taken). > > You end up with a lost row. > > > > That is only one scenario. Many other ways of corruption can happen. > > Can we say then that the risk comes only from the possibility of a checkpoint > running inside the time gap between the non-simultaneous snapshots? I recently followed a course on distributed algorithms and recognised one of the patterns here. The problem boils down to a distributed snapshotting algorithm, where both ZFS filesystem processes each initiate their own snapshot independently. Without communicating with each other and with the database which messages (in this case traffic to and from the database to each FS) are part of their snapshots (sent or received), there are chances of lost messages, where either none of the process snapshots know that a 'message' was sent or none received it. Algorithms like Tarry, Lai-Yang or the Echo algorithm solve this by adding communication between those processes about messages in transit. Alban Hertroys -- There is always an exception to always.
Re: WHERE col = ANY($1) extended to 2 or more columns?
> On 9 Feb 2023, at 16:41, Dominique Devienne wrote: > > Hi. We are implementing an API which takes a list of row keys, and must > return info about those rows. To implement that efficiently, in as few > round-trips as possible, we bind a (binary) array of keys (ints, uuids, or > strings) and that works great, but only if the key is a scalar one. > > Now we'd like to do the same for composite keys, and I don't know how to do > that. > Is it possible? Could someone please help out or demo such a thing? > We are doing it in C++ using libpq, but a pure SQL or PL/pgSQL demo would > still help (I think). This works: => select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, 'one'::text), (2, 'two'::text)); ?column? -- t (1 row) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: WHERE col = ANY($1) extended to 2 or more columns?
> On 9 Feb 2023, at 18:35, Dominique Devienne wrote: > > On Thu, Feb 9, 2023 at 5:37 PM David G. Johnston > wrote: > On Thu, Feb 9, 2023 at 9:28 AM Alban Hertroys wrote: > > On 9 Feb 2023, at 16:41, Dominique Devienne wrote: > > Now we'd like to do the same for composite keys, and I don't know how to do > > that. > > This works: > => select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, > 'one'::text), (2, 'two'::text)); > But you cannot write the right-side of the IN as a single parameter which > seems to be the primary constraint trying to be conformed to. > > Right. The goal is to (re)use a prepared statement (i.e. plan once), and bind > the RHS (binary) array > and do a single exec (single round-trip) to get the matching rows. AFAIK, > this is the fastest way. > If there's a better/faster way, I'm interested. --DD How would an ORM like that push a list of tuples into a single query parameter though? Is that feasible? Perhaps this is easier to use with an ORM then? It would need a list of placeholders for each item, but I suspect you would need that anyway… => with v(col1, col2) as ( values (1, 'two'::text), (2, 'one'::text), (1, 'one'::text), (2, 'two'::text) ) select * from v where (col1, col2) = (1, 'one'::text); col1 | col2 --+-- 1 | one (1 row) This could be written as a join to a table with fixed values in the OP’s case. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
> On 28 Feb 2023, at 3:54, Thorsten Glaser wrote: (…) >> Well, that may be what you want, but it's not what you wrote in >> the query. Follow David's advice and do > […] >> I'm pretty sure that this will only incur one evaluation of the >> common subexpression, so even though it's tedious to type it's not >> inefficient. > > Thanks. But I fear it’s not as simple as you wrote. More like: > > jsonb_build_object('opening_times', > jsonb_agg(DISTINCT jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour) > ORDER BY > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'weekday', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'from_hour', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'to_hour') > ) > > Isn’t that more like it? Perhaps you can use a lateral cross join to get the result of jsonb_build_object as a jsonb value to pass around? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
> On 3 Mar 2023, at 0:02, Thorsten Glaser wrote: > > On Tue, 28 Feb 2023, Alban Hertroys wrote: > >> Perhaps you can use a lateral cross join to get the result of >> jsonb_build_object as a jsonb value to pass around? > > I don’t see how. (But then I’ve not yet worked with lateral JOINs.) You posted this bit: > jsonb_build_object('opening_times', > jsonb_agg(DISTINCT jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour) > ORDER BY > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'weekday', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'from_hour', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'to_hour') > ) You can rewrite that into something like this: select jsonb_build_object('opening_times’, obj ORDER BY obj->>'weekday’, obj->>'from_hour’, obj->>'to_hour') ) from cot cross join lateral jsonb_agg(jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour) obj That’s off the top of my head and I did leave out the DISTINCT. Just to show the concept here. A bit of experimenting and reading should get you there, I’m keeping $work waiting :P (…) > WITH >cgwaj AS ( > SELECT cgwa.id AS id, jsonb_build_object( > 'weekday', cgwa.weekday, > 'forenoon', cgwa.forenoon, > 'afternoon', cgwa.afternoon, > 'evening', cgwa.evening) AS obj > FROM core_generalworkavailability cgwa >), >-- … same for opening times > SELECT cp.email, …, > -- … > jsonb_build_object('possible_work_times', COALESCE( > jsonb_agg(DISTINCT cgwaj.obj ORDER BY cgwaj.obj->>'weekday', > cgwaj.obj->>'forenoon', cgwaj.obj->>'afternoon', > cgwaj.obj->>'evening') > FILTER (WHERE cgwaj.id IS NOT NULL))) || > -- … >FROM core_person cp > -- … > LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id > LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id > -- … > > That is, add a CTE for each m:n-attached table whose “value” is > an object, not a single field, keep the id field; LEFT JOIN that > (instead of the original table), then we have a field to use in > ORDER BY. > > I think. I’ve not yet tried it (I don’t have access to that DB > normally, I was just helping out). > > This avoids sub-SELECTs in the sense of needing to run one for > each user row, because the innermost JSON object building needs > to be done for each (connected (if the query is not filtering on > specific users)) row of the “property table”, anyway. (And even > if filtered, that can be passed down.) > > bye, > //mirabilos > -- > Solange man keine schmutzigen Tricks macht, und ich meine *wirklich* > schmutzige Tricks, wie bei einer doppelt verketteten Liste beide > Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz > hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r > Alban Hertroys -- There is always an exception to always.
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
> On 3 Mar 2023, at 20:32, Thorsten Glaser wrote: > > On Fri, 3 Mar 2023, Alban Hertroys wrote: > >> You can rewrite that into something like this: >> >> select jsonb_build_object('opening_times’, >> obj >> ORDER BY >> obj->>'weekday’, >> obj->>'from_hour’, >> obj->>'to_hour') >> ) >> from cot >> cross join lateral jsonb_agg(jsonb_build_object( >> 'weekday', cot.weekday, >> 'from_hour', cot.from_hour, >> 'to_hour', cot.to_hour) obj > > But isn’t that the same as with a regular LEFT JOIN? Similar, but not the same, I’d say. I do now notice that I made some copying errors there, I was a bit nauseous at that time. That should have read: >> select jsonb_build_object('opening_times’, >> jsonb_agg(obj >> ORDER BY >> obj->>'weekday’, >> obj->>'from_hour’, >> obj->>'to_hour') >> ) >> from cot >> cross join lateral jsonb_build_object( >> 'weekday', cot.weekday, >> 'from_hour', cot.from_hour, >> 'to_hour', cot.to_hour) obj The lateral join applies the function to each row returned from the left side of the join and enriches that row with the function result. I used a cross join because there is no join condition to apply to the lateral, otherwise you could also use an inner join on true. I think you could also have used an implicit Cartesian product (using ‘,’ for the join), and that in that case the lateral would be implied. I prefer explicit notation though. A left join wouldn’t make much sense here, unless the function could return NULL - for example if it were a function marked as STRICT and some of the input parameter values (from the table) could be NULL. >>> cgwaj AS ( >>> SELECT cgwa.id AS id, jsonb_build_object( >>> 'weekday', cgwa.weekday, >>> 'forenoon', cgwa.forenoon, >>> 'afternoon', cgwa.afternoon, >>> 'evening', cgwa.evening) AS obj >>> FROM core_generalworkavailability cgwa > > plus There are some differences. You need a sub-select, which in turn creates its own result set. It’s up to the planner whether the left or the right side gets executed first, after which the results of the other side of the join get merged to this, or whether this can all be collected in one go. That’s up to the query planner to decide though, and it could be right. >>> LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id > > With the addition that I can aggregate… You can do so in both situations, but I guess that confusion stems from my copy/paste mistake. In my experience, lateral joins go well with the jsonb functions. They tend to reduce code repetition when referencing object members, such as in your case. Regards, Alban Hertroys -- There is always an exception to always.
Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause
> On 7 Mar 2023, at 4:11, David G. Johnston wrote: > > On Mon, Mar 6, 2023 at 7:51 PM David Rowley wrote: > On Tue, 7 Mar 2023 at 12:40, Tom Lane wrote: > > > > Ben Clements writes: > > > As shown above, the following calculated column can bring in the city > > > name, > > > even though the city name isn't in the GROUP BY: > > >max(city) keep (dense_rank first order by population desc) > > > > You haven't really explained what this does, let alone why it can't > > be implemented with existing features such as FILTER and ORDER BY. > > (It wasn't clear to me until I watched the youtube video.) > > Likely KEEP is more flexible than just the given example but I think > that something similar to the example given could be done by inventing > a TOP() and BOTTOM() aggregate. Then you could write something like: > > select >country, >count(*), >max(population), >bottom(city, population) > from >cities > group by >country > having >count(*) > 1 > > the transfn for bottom() would need to remember the city and the > population for the highest yet seen value of the 2nd arg. > > BOTTOM() remembers the highest value? > > Where this wouldn't work would be if multiple columns were > required to tiebreak the sort. > > TOP(city, ROW(population, land_area)) ? What should be the expected behaviour on a tie though? Say that we count the number of districts or airfields or train stations per city and query for the one(s) with the most or least of them? There could well be multiple cities with the same max number, and there will be many cities with the same minimum number (namely 0). Should the result be just the first of the maximums (or minimums) through some selection criterium (such as their alphabetical order), should that give each of the tied results, or should there be a means to define that behaviour? I suppose a combination with FIRST and LAST could solve that issue? Regards, Alban Hertroys -- There is always an exception to always.
Re: Using CTID system column as a "temporary" primary key
> On 29 Mar 2023, at 21:11, Sebastien Flaesch wrote: > > Oh the use of default keyword is new to me, thanks for that. > > But to make PostgreSQL more Informix-compatible, zero should have been > considered as well. …No, I’m not going to be humble about this opinion… Postgres does a sane thing here. It’s Informix that you should be complaining about. Zero is not a sane value to specify special behaviour, it could mean zero and be just as valid. By consequence, Informix probably forbids zero as a sequence value, but that is an artefact (and a limitation) of its implementation, not a feature. The core of your problem however, is that you’re trying to get database-agnostic behaviour by relying on database-specific features. That is not going to work, you’ve just been lucky enough to get away with it until now. There’s really only one realistic answer here: Fix your design. Regards, Alban Hertroys -- There is always an exception to always.
Re: [EXTERNAL]: Re: UPSERT in Postgres
> On 7 Apr 2023, at 2:49, Louis Tian wrote: (…) > I am not expecting an error here. The problem is with no conflict it always > go down the insert path first and results in a not null constraint error. > While I am expecting the insert is never executed in the first place when > that row already exist (as identified by it primary key). So the update > execute without error. > I hope the pesudo code above is enough to clarify the difference? Your assumption on what the problem is, is not correct. The problem is not with the conflict resolution, it is with your statement violating a not null constraint. It doesn’t matter whether you insert first or update first, either operation is going to violate that constraint. You’re specifying a NULL value for a column that doesn’t accept that because it has a NOT NULL constraint. That is your problem. Alban Hertroys -- There is always an exception to always.
Re: "PANIC: could not open critical system index 2662" - twice
> On 14 Apr 2023, at 9:38, Evgeny Morozov wrote: (…) > I don't know whether ZFS zero-fills blocks on disk errors. As I > understood, ZFS should have been able to recover from disk errors (that > were "unrecoverable" at the hardware level) using the data on the other > two disks (which did not report any errors). Thus, PG should not have > seen any corrupted data (if ZFS was working correctly). > https://unix.stackexchange.com/questions/341614/understanding-the-error-reporting-of-zfs-on-linux > seems to confirm this. Am I misunderstanding something? Your problem coincides with a thread at freebsd-current with very similar data corruption after a recent OpenZFS import: blocks of all zeroes, but also missing files. So, perhaps these problems are related? Apparently, there was a recent fix for a data corruption issue with the block_cloning feature enabled, but people are still seeing corruption even when they never enabled that feature. I couldn’t really find the start of the thread in the archives, so this one kind of jumps into the middle of the thread at a relevant-looking point: https://lists.freebsd.org/archives/freebsd-current/2023-April/003446.html Regards, Alban Hertroys -- There is always an exception to always.
Re: Ltree: set of allowed charcters is limited to [A-Za-z0-9_]. Could the dash "-" be included?
> On 27 Oct 2018, at 18:14, David G. Johnston > wrote: > >> On Saturday, October 27, 2018, joernbs wrote: >> Dear friends, >> >> I would like to use ltree for search paths in a warehouse application, >> something like "Material-Entry-01.Main-Aisle.Shelf-Aisle-R07/R08.R07-12-03" >> Unfortunately I can not use common separators like dash (-) or slash(/) >> >> Documentation states only thes characters [A-Za-z0-9_] are >> allowed.https://www.postgresql.org/docs/10/static/ltree.html >> > I don’t see how this would be possible to do with the existing type - too > much potential breakage of existing data. Your example itself shows why > using dash as a separator is a bad idea. David, I don't think he's talking about separators ;) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Empty Range Bound Specified for Partition
> On 22 Nov 2018, at 7:21, ramsiddu007 wrote: > > pgsql> create table test_upto_100 partition of test_parent_partition > for values from ('51') to ('100'); > >It was showing error like this.. > > ERROR: empty range bound specified for partition "test_upto_100" > DETAIL: Specified lower bound ('51') is greater than or equal to upper bound > ('100'). > SQL state: 42P17 > > > I am not getting, please give me cause of this and solution. Alphabetically, '1' comes before '5', so '100' comes before '51'. Numerically, you would get what you intended, but you're not partitioning on numbers but on strings instead and those sort alphabetically: '1', '10', '100', '11', '2', '3', '51', etc. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: using a function in where
> On 3 Dec 2018, at 8:06, Glenn Schultz wrote: > > All, > I am using the function below to convert a continuous variable to a binned > value. Sometimes a value other than zero is passed through the query. For > example -.5 result value is passed to the query result. The basic of the > query is below. > > select > incentivebin(wac, rate, .25) > from > my_table > where incentivebin(was, rate, .25) = 0 > CREATE or REPLACE FUNCTION embs_owner.IncentiveBin(IN "Gwac" double > precision, > AS 'select ceiling(($1 - $2)/$3) *$3'; If the difference of (wac - rate) = -0.5, and you multiply that by 4 (divide by .25), ceil will round that correctly to -2. If you then divide by 4 again, you get -0.5 back (which is twice your bin size). The result matches your function definition with the given parameters. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: IF NOT EXIST
> On 18 Dec 2018, at 7:10, Igor Korot wrote: > > Hi, ALL, > I have a following statement: > > IF NOT EXIST( SELECT 1 SELECT 1 FROM pg_proc AS proc, pg_namespace AS > ns ) CREATE FUNCTION(); > > Unfortunately trying to execute it thru the ODBC interface with: > > ret = SQLExecDirect( m_hstmt, query, SQL_NTS ); > > gives syntax error near IF. > > What is the proper way to do that? It looks like you’re trying to create a function unless it already exists, but you’re missing several important parts. Firstly, the statement you’re looking for is CREATE OR REPLACE FUNCTION. Look it up in the docs for the various syntax options and for how to use it, as… Secondly, your function doesn’t have a name. A function requires a name, or you wouldn’t ever be able to call it. You will also have to specify a return type (functions return values) and the language the function is implemented in. The documentation will show you that there are several options you can provide too. And lastly, a function requires an implementation. Regards, Alban Hertroys -- There is always an exception to always.
Reclaiming space for dropped database
Hi all, Our current development database server is running a bit low on diskspace, so I dropped an old but rather large database with the intention of claiming back some space. However, the space remains claimed. This server was upgraded from PG10 to PG11 using pg_upgrade's --link option. I see this after having dropped the (300+GB) database: -bash-4.2$ du -d 1 -h 4.0K./.cache 0 ./.config 4.0K./9.6 376G./10 72G ./11 447G. The files of that database apparently still exist within the ./10 directory: -bash-4.2$ du -d 1 -h ./10/data/ 4.1G./10/data/pg_wal 816K./10/data/global 0 ./10/data/pg_commit_ts 0 ./10/data/pg_dynshmem 8.0K./10/data/pg_notify 0 ./10/data/pg_serial 0 ./10/data/pg_snapshots 208K./10/data/pg_subtrans 0 ./10/data/pg_twophase 16K ./10/data/pg_multixact 372G./10/data/base 0 ./10/data/pg_replslot 0 ./10/data/pg_tblspc 3.5M./10/data/pg_stat 0 ./10/data/pg_stat_tmp 4.0K./10/data/pg_logical 96K ./10/data/log 688K./10/data/pg_xact 376G./10/data/ How do I reclaim that space? Regards, Alban Hertroys Alban Hertroys D: +31 (0)53 4 888 888 | T: +31 (0)53 4888 888 | E: alban.hertr...@apollovredestein.com Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The Netherlands Chamber of Commerce number: 34223268 The information contained in this e-mail is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or action in relation to the contents of this information is strictly prohibited and may be unlawful and request you to delete this message and any attachments and advise the sender by return e-mail. The confidentiality of this message is not warranted. Apollo Vredestein and its subsidiaries rule out any and every liability resulting from this or any other electronic transmission Please consider the environment before printing this e-mail
Betr: Re: Reclaiming space for dropped database
"Tom Lane" wrote on 2019-01-23 16:02:01: > Alban Hertroys writes: > > Our current development database server is running a bit low on diskspace, > > so I dropped an old but rather large database with the intention of > > claiming back some space. However, the space remains claimed. > > This server was upgraded from PG10 to PG11 using pg_upgrade's --link > > option. > > If you used --link, then all the files would remain hard-linked from both > the old and new database directories. You've got to remove them from the > old DB directory as well. > > There's not really any point in keeping around the source DB directory > once you've completed a --link migration. Starting the postmaster in > the old DB directory would be disastrous because the files are > inconsistent from its standpoint once the new postmaster has modified > them at all. (In fact, I think pg_upgrade intentionally makes the old > directory non-runnable to prevent that error.) So you might as well > just "rm -rf ./10", not only its biggest subdirectory. That explains what I'm seeing. After creating a dump (better safe than sorry), I'll remove that directory. Thanks! Regards, Alban. Alban Hertroys D: +31 (0)53 4 888 888 | T: +31 (0)53 4888 888 | E: alban.hertr...@apollovredestein.com Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The Netherlands Chamber of Commerce number: 34223268 The information contained in this e-mail is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or action in relation to the contents of this information is strictly prohibited and may be unlawful and request you to delete this message and any attachments and advise the sender by return e-mail. The confidentiality of this message is not warranted. Apollo Vredestein and its subsidiaries rule out any and every liability resulting from this or any other electronic transmission Please consider the environment before printing this e-mail
Re: Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations
Is there a reason not to use a relational model instead of json(b) here? I think that is in fact considered best practice. On Fri, 8 Mar 2019 at 15:40, Alexandru Lazarev wrote: > I am working on product managing and monitoring Network (NMS-like > products). > > Product manages configuration of network devices, for now each device has > stored its configuration in simple table - this was the original design. > > CREATE TABLE public.configuration( > id integer NOT NULL, > config json NOT NULL, > CONSTRAINT configuration_pkey PRIMARY KEY (id),) > > A config looks like: > > { > "_id": 20818132, > "type": "Modem", > "data": [{ > "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.40", > "instance": "24", > "value": "null" > }, > { > "oid": "1.3.6.1.4.1..3.5.10.1.86", > "instance": "0", > "value": "562" > }, > { > "oid": "1.3.6.1.4.1..3.5.10.3.92.4.1", > "instance": "0", > "value": "0" > }, > { > "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.43", > "instance": "24", > "value": "vlan24" > }, > { > "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.43", > "instance": "25", > "value": "vlan25" > } > ]} > > And there are many plv8 (java script procedural language extension for > PostgreSQL) stored procedures working on bulks of such config, reading some > OIDs, changing them conditionally, removing some of them and adding others, > especially in use cases like: There are some upper-level META-configuration > of different level, which during change have to update all their updated > parameters to all affected leaves configs. An simple test-example (but > without touching 'data' node) > > CREATE OR REPLACE FUNCTION public.process_jsonb_plv8() > RETURNS void AS$BODY$ > var CFG_TABLE_NAME = "configurations"; > var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME + " c > where c.id = $1", ['int'] ); > var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config = $1 > where id = $2', ['jsonb','int'] ); > > try { > > var ids = plv8.execute('select id from devices'); > > for (var i = 0; i < ids.length; i++) { > var db_cfg = selPlan.execute(ids[i].id); //Get current json config > from DB > var cfg = db_cfg[0].config; > cfg["key0"] = 'plv8_json'; //-add some dummy key > updPlan.execute(cfg, ids[i].id); //put uopdated JSON config in DB > plv8.elog(NOTICE, "UPDATED = " + ids[i].id); > > > }} finally { > selPlan.free(); > updPlan.free();} > return;$BODY$ > LANGUAGE plv8 VOLATILE > COST 100; > > For real use-cases plv8 SPs are more complicated, doing FOR-LOOP through > ALL OIDs object of 'data' array, checking if it is looking for and update > value an/or remove it and/or add newer if necessary. > > Since number of devices in DB increased from several hundreds to 40K or > even 70K, and number of OID+Instance combinations also increased from > several hundred to ~1K and sometimes up to 10K within a config, we start > facing slowness in bulk (especially global -> update to ALL Devices) > updates/searches. > > In order to get rid off FOR LOOP step for each configuration I've > converted data-node from array to object (key-value model), something like > : > > { > "_id": 20818132, > "type": "Modem", > "data": { > "1.3.6.1.4.1..3.10.2.2.25.4.1.40": { > "24": "null" > }, > "1.3.6.1.4.1..3.5.10.1.86": { > "0": "562" > }, > "1.3.6.1.4.1..3.5.10.3.92.4.1": { > "0": "0" > }, > "1.3.6.1.4.1..3.10.2.2.25.4.1.43": { > "24": "vlan24", > "25": "vlan25" > } > }} > > Now in order to get a concrete OID (e.g. > "1.3.6.1.4.1..3.10.2.2.25.4.1.43") and/or its instance I do 1-2 *O(1)* > operations instead *O(n)*. And it become a bit faster. After I've changed > column type from json to jsonb - I've got a lot of memory issues with > plv8 stored procedures, so now ideas is: > > *What are the best practices to store such data and use cases in DB?* > taking in considerations following: - Bulk and global updates are often > enough (user-done operation) - several times per week and it takes long > time - several minutes, annoying user experience. - Consulting some OIDs > only from concrete config is medium frequency use case - Consulting ALL > devices have some specific OID (SNMP Parameter) settled to a specific value > - medium frequency cases. - Consult (read) a configuration for a specific > device as a whole document - often use case (it is send to device as json > or as converted CSV, it is send in modified json format to other utilities, > etc) > > One of suggestion from other oppinions is to move ALL configurations to > simple plain relational tabl
Re: WAL Archive Cleanup?
That seems to be a misconfigured client application that explicitly tries to connect to a non-existent database 'admin' (via db=admin). Instead of adding that database, it seems more logical to fix the client configuration. On Tue, 2 Apr 2019 at 09:53, Foo Bar wrote: > Hello All, > > Ok, so maybe something helpful? On my master node I am seeing a bunch of: > > 2019-03-28 23:54:44 GMT [2611]: [1-1] > user=admin,db=admin,client=17210.10.37 172.10.10.37(57552) (0:3D000)FATAL: > database "admin" does not exist > 2019-03-28 23:54:46 GMT [2613]: [1-1] > user=admin,db=admin,client=172.10.10.18 172.10.10.18(56970) > (0:3D000)FATAL: database "admin" does not exist > > Which is accurate, as there is no admin database... I usually connect > with > > psql -h localhost -U admin postgres > > Should there be? Will this fix my issue with pgsql filling up the disk? > > Thanks, > - QBR > > On Mon, Mar 25, 2019 at 10:21 AM Foo Bar wrote: > >> Hello All, >> >> Wow! Lots of awesome replies, Went away for the weekend thinking my >> email had been rejected and come back to a full inbox. Thanks for all the >> help! >> >> >> Postgres version? >> >> 9.6.11 >> >> Doh. Fairly important detail there. :) >> >> >> FYI, psql is the Postgres client program, Postgres(ql) is the server. >> >> "psql" is the name of a specific command line tool used to connect to >> a PostgreSQL database server, it is not the name of the database itself. >> The database is usually abbreviated "pgsql". >> >> Duly noted, thanks for the correction. >> >> >> It's the standby that has not seen any traffic? >> >> There's really no traffic. I built three nodes, connected them, created >> a test table and inserted some values, then left the cluster be for a >> couple weeks. >> >> >> And "restartpoint" is usually spelled as one work in technical >> discussions of it. Or at least, searching for it that way avoids finding >> things which mention each word separately in different senses. >> >> Ah ha. I had seen it that way but thought it was a typo. Thanks for the >> clarification! >> >> >> Are you sure it is the archive >> directory (/hab/svc/postgresql/data/archive) which is filling up, and not >> the live directory (pg_wal or pg_xlog)? This is often a point of confusion. >> >> Right before I sent the mail last week I deleted everything in >> /hab/svc/postgresql/data/archive, this morning I'm seeing: >> >> # du -h --max=1 /hab/svc/postgresql/data/ >> 198M /hab/svc/postgresql/data/pgdata >> 8.9G /hab/svc/postgresql/data/archive >> 9.1G /hab/svc/postgresql/data/ >> # du -hs /hab/svc/postgresql/data/pgdata/pg_xlog/ >> 177M /hab/svc/postgresql/data/pgdata/pg_xlog/ >> # ls -lah /hab/svc/postgresql/data/archive/ | wc -l >> 571 >> >> There is no pg_wal directory though (should there be?) >> >> # find /hab/svc/postgresql/ -name '*pg*wal*' >> # >> >> >> If the only reason you want an archive is for replication, then use >> streaming replication and do away with the archive completely >> >> To be honest, I thought it was required for streaming replication based >> on the guides linked above. >> >> >> There are reasons other than replication that one might want to keep a >> WAL archive, but those reasons don't seem to apply to you >> >> Like backup maybe? A wholly other topic, we recently had a power outage >> and I lost a pgsql node... having an archive would allow me to "replay" any >> transactions? >> >> >> What needs to be determined here is why the standby never consumed >> the WAL's from the master? >> Ok, so it the standby that's the problem. >> >> >> Do you still have the logs from the standby and do they show anything >> relevant? >> >> Sure, what am I looking for? I see a bunch of entries like: >> >> 2019-03-08 17:06:11 GMT [1813]: [815-1] user=,db=,client= (0:0)LOG: >> restartpoint complete: wrote 22 buffers (0.0%); 0 transaction log file(s) >> added, 0 removed, 1 recycled; write=2.211 s, sync=0.062 s, total=2.281 s; >> sync files=18, longest=0.062 s, average=0.003 s; distance=16383 kB, >> estimate=16383 kB >> 2019-03-08 17:06:11 GMT [1813]: [816-1] user=,db=,client= (0:0)LOG: >> recovery restart point at 0/8D28 >> >> On the 15th, around when I think I filled the disk, I see a bunch of: >> >> cp: cannot stat '/0002.history': No such file or directory >> cp: cannot stat '/000100040049': No such file or directory >> 2019-03-15 23:59:49 GMT [16691]: [1-1] user=,db=,client= >> (0:XX000)FATAL: could not connect to the primary server: could not connect >> to server: Connection refused >> Is the server running on host "172.16.10.23" and accepting >> TCP/IP connections on port 5432? >> >> Which makes sense since the pgsql service was down. >> >> This appears to be when I recovered the master on Thursday: >> >> cp: cannot stat '/0002.history': No such file or directory >> cp: cannot stat '/00010004004D': No such file or directory >> 2019-03-21 17:37:31 GMT [31338]: [1-1] user=,db
Re: random generated string matching index in inexplicable ways
> On 7 May 2019, at 13:53, Myles Miller wrote: > > PROBLEM: > Strings or characters generated by any random function (including pg_crypto > gen_random_bytes) are matching a string/char index in surprising ways. I fail to see anything surprising in your examples. > Reduced down to its simplest example: (…) > -- if we use random-generated 'A' or 'B', things get inexplicable > > # SELECT y FROM x WHERE y = chr(round(random())::int + 65); > y > --- > A > B > (2 rows) Here you got a random value in the lower range of 0..1 for the record with value ‘A’, so that’s a match, and one in the higher range for value ‘B’, a match again, so you get 2 rows. > > # SELECT y FROM x WHERE y = chr(round(random())::int + 65); > y > --- > (0 rows) Here you got a random value in the higher range for the record with value ‘A’, so no match, and one in the lower range for value ‘B’, no match again, so you get 0 rows. > # SELECT y FROM x WHERE y = chr(round(random())::int + 65); > y > --- > B > (1 row) Here you got two random values in the higher range, so only the row with ‘B’ matches. You could also get two random values in the lower range and only get a match against ‘A’. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: How To: A large [2D] matrix, 100,000+ rows/columns
> On 9 Jun 2023, at 04:17, Pat Trainor wrote: (…) > Imagine something akin to stocks, where you have a row for every stock, and a > column for every stock. Except where the same stock is the row & col, a > number is at each X-Y (row/column), and that is the big picture. I need to > have a very large matrix to maintain & query, and if not (1,600 column > limit), then how could such data be broken down to work? If your matrix contains values that are all of the same type, as matrices usually do, then a matrix can be described as the Carthesian product of rows and columns, with values connecting those. For rows and columns you could enumerate them using generate_series() or a pair of recursive CTEs, or you could put them into their own table. For the values (or cells), a tuple of (row, column, value) would be sufficient. Then in the end, the matrix would be a presentation of the left joins of the Carthesian product of rows and columns with your cell values. The left joins are to account for missing cell values (empty cells), or you could explicitly add tuples for those with an ‘empty’ value. For presentation, I would use something like Python Pandas and the xlsxwriter. Data-entry is going to be a bit of a pain if you cannot automate it, and it’s not going to be very space-efficient, but it does fit the relational model this way and it would be easy to expand the matrix in either direction. > By wanting postgresql as a solution, am I round-hole, square-pegging myself? I expect that there are solutions that were explicitly designed for handling (large) matrices and that those would perhaps perform better. > I don't mind keeping, say, a 1,500 column max per table, but then adding new > items (stocks in the above analogy) might make it difficult to keep track of > things... That’s also a possibility, but that sort of pushes the column lookups down to the catalog level and induces overhead on all other catalog lookups as well. It’s not a huge amount though. An alternative approach would be a table of rows (or columns) with each an array of values, especially since you don’t expect many updates. That would be a first attempt at optimisation if the pure relational model doesn’t work out. > Hoping someone has tackled this before, and performance isn't too big a > concern, as the data changes seldom. Not in practice, AFAIR, but I was thinking of a solution like this for small matrices (Sudoku’s, I hate the things, but I need some solutions to a few to help me with a number of test scripts). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Nu-B here
> On 20 Jul 2023, at 02:36, Amn Ojee Uw wrote: > > After this command 'sudo -u postgres psql' > I get this message : > could not change directory to "/home/my_account": Permission denied What’s the reason that you’re using the OS user postgres? If you’re simply trying to connect to the database named postgres as database user postgres, you can instead use the command 'psql -U postgres postgres’. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: How to improve the performance of my SQL query?
> On 29 Jul 2023, at 10:59, Peter J. Holzer wrote: > > On 2023-07-26 15:46:16 +0800, gzh wrote: >> SET enable_seqscan TO off; > [...] >>-> Parallel Bitmap Heap Scan on tbl_sha >> (cost=92112.45..2663789.14 rows=800650 width=18) (actual >> time=260.540..21442.169 rows=804500 loops=3) >> Recheck Cond: (ms_cd = 'MLD009'::bpchar) >> Rows Removed by Index Recheck: 49 >> Filter: (etrys = '0001'::bpchar) >> Rows Removed by Filter: 295500 >> Heap Blocks: exact=13788 lossy=10565 >> -> Bitmap Index Scan on index_search_04_mscd_cdate >> (cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 >> rows=330 loops=1) >>Index Cond: (ms_cd = 'MLD009'::bpchar) > > So now it's using index_search_04_mscd_cdate which contains only ms_cd > (and - judging from the name, other fields not relevant to this query), > but it still doesn't use index_search_01 which would fit the query > exactly. I can understand that Postgres prefers a sequential scan over > an index scan (the number of matching rows is about 10% of the total > table size which is a lot), but why would it prefer a less specific > index to a more specific one? > > Can you get Postgres to use that index at all? > > Find a combination of ms_cd and etrys which doesn't cover millions of > rows and try that. > > Also try lowering random_page_cost. Wasn’t this an RDS server with just 4GB of memory? How large are those multi-column indices? Perhaps they don’t (all) fit into available cache memory and the server decided to use the one that it had cached? I’m frankly not at all certain how the server would behave around such resource shortage situations, but I suppose loading an uncached index into cache could get a higher cost than using a less optimal (costlier) index that’s already cached. Regarding lowering random_page_cost; If your index files are on SSD storage, lowering that sufficiently (to a realistic value) could then sufficiently lower the cost of loading that uncached index into memory, evicting the index it was using in above plan to make room (unless other active sessions are using it). Alban Hertroys -- There is always an exception to always.
Re: ident auth does not works as usual
> On 26 Aug 2023, at 11:31, pan snowave wrote: (…) > pg_indent.conf > > test rootcce If that is indeed the name of the file, that would explain your problem. No doubt that it should be named pg_ident.conf instead, without the ’n'. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Even if I have a lot of free space PgSQL returns "ERROR: could not extend file - No space left on device"
> On 8 Sep 2023, at 13:25, Nature Conservation Geovista Space > wrote: > > Dear Pg-users, > I am coming back to Postgres/PostGIS after a few years. I am dealing with a > big database with a lot of geometries and too many vertices. So a lot of geometry going on then… That seems important in this case. > After hours running a query to Subdivide, I get this Postgres error > 2023-09-08 02:11:23.745 BST [328594] postgres@database ERROR: could not > extend file "base/16388/7985375.1020": No space left on device > 2023-09-08 02:11:23.745 BST [328594] postgres@database HINT: Check free disk > space. > 2023-09-08 02:11:23.745 BST [328594] postgres@database STATEMENT: CREATE > TABLE _gaul_administrative_subdivided100 AS ( > SELECT *, st_subdivide(geom,100) AS geom_subdivided100 > FROM gaul_administrative > ); That _looks_ like a query that could blow through space rather quickly. How large is gaul_administrative in GB? In rows? How many subdivisions does this generate per row on average? How many bytes are those subdivisions on average? Multiply those numbers, and you get a fair indication of how much space that table requires. Does that fit in 1.1TB? Frankly, I don’t see the point of repeating the geom column in that table after dividing it up, aren’t you just wasting space there? The original is still available in the source table, after all. And what about the other columns that you clone into this new table? My suggestion would be to minimise that with an operation like the above. This would be one of those situations where I’d introduce a (bigint) surrogate key and use that to link the two tables together, even though I’m a natural key person. Something like this (I had to make a few assumptions about your source table): CREATE TABLE _gaul_administrative_subdivided100 AS ( id bigint NOT NULL, geomid bigserial NOT NULL, geom_subdivided100 geometry NOT NULL ); -- If you care about referential integrity here ALTER TABLE _gaul_administrative_subdivided100 ADD FOREIGN KEY (id) REFERENCES gaul_administrative (id) ON DELETE CASCADE ON UPDATE RESTRICT; INSERT INTO _gaul_administrative_subdivided100 (id, geom_subdivided100) SELECT id, st_subdivide(geom,100) FROM gaul_administrative; (…) > It seems that it is not a problem of space. I wouldn’t be too sure of that, but I don’t have the numbers. > Command df -h returns: > Filesystem Size Used Avail Use% Mounted on > tmpfs 6.3G 1.1M 6.3G 1% /run > /dev/sda1.3T 164G 1.1T 14% / > tmpfs32G 3.2M 32G 1% /dev/shm > tmpfs 5.0M 0 5.0M 0% /run/lock > tmpfs 6.3G 4.0K 6.3G 1% /run/user/1000 Is this after the error and after PG finished rolling back? What does this show while that query is going on? If gaul_administrative takes up a large part of those 164G, then you probably don’t have enough space for a 10x multiplication in size from the original table to the new table. And that happening looks entirely possible from the information you provided. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Making Sure Primary and Secondary Keys Alligns
> On 11 Sep 2023, at 16:09, Anthony Apollis wrote: > > Fact Table: > CREATE TABLE IF NOT EXISTS system."IMETA_ZTRB_MP$F_ZTBR_TA_BW" > ( (…) > ) > and Dimension:CREATE TABLE IF NOT EXISTS dim."IMETA_Region_Mapping" > ( (…) > ) > How do i get that all these columns that are joined are aligned, meaning if > it starts with 1 in one column it must be 1 in the other columns. Or how > would you assign unique keys in Postgres? Are you perhaps asking how to define FOREIGN KEY CONSTRAINTs? https://www.postgresql.org/docs/15/ddl-constraints.html#DDL-CONSTRAINTS-FK Regards, Alban Hertroys -- There is always an exception to always.
Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones
> On 4 Oct 2023, at 17:58, Lincoln Swaine-Moore wrote: > > > SELECT > > sub.gs AS ts_in_utc > > ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz > > ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York', > > '2023-01-01') > > FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz, > > '2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub > > WHERE > > sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND > > sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz > > I believe this query will be funky around DST borders, because `sub.gs AT > TIME ZONE 'America/New_York'` will be localized in a way that erases the > difference between hours with different offsets, which are genuinely > different. For instance, I ran this and there are two rows within it that > look like: > > ` 2023-11-05 05:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00` > and > ` 2023-11-05 06:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00` > > I think that the non-unique second column will pose an issue for the date > binning at a resolution finer than 1 day. What I do in such cases is to add an extra column with the UTC timestamp to serve as a linear scale to the local timestamps. That also helps with ordering buckets in reports and such during DST changes (especially the ones where an hour repeats). Filtering in the queries occurs on the UTC scale, with the local timestamps calculated back to UTC, so that it doesn’t matter whether the local time has 23, 24, 25 or 24.5 or 23.5 or whatever number of hours on a date-range - it all maps back because UTC always has 24 hours. Something that I also do is to create calendar tables and views for the buckets, with 2 timestamps per bucket: the start of the bucket and the start of the next bucket. That gives you a range to put actual timestamps between (not BETWEEN between, because that’s inclusive). You can store and index that, as opposed to generated results using generate_series - basically I materialise those. For hours and quarter hours I found it to be fairly convenient to base a view on a join between a date calendar and an (quarter of an) hour per UTC day table, but materialising that with some indexes may perform better (at the cost of disk space). I do materialise that currently, but our database server doesn’t have a lot of memory so I’m often not hitting the cache and performance suffers a bit (infrastructure is about to change for the better though). Regards, Alban Hertroys -- There is always an exception to always.
Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones
> On 4 Oct 2023, at 21:30, Lincoln Swaine-Moore wrote: > > > What I do in such cases is to add an extra column with the UTC timestamp to > > serve as a linear scale to the local timestamps. That also helps with > > ordering buckets in reports and such during DST changes (especially the > > ones where an hour repeats). > > > For hours and quarter hours I found it to be fairly convenient to base a > > view on a join between a date calendar and an (quarter of an) hour per UTC > > day table, but materialising that with some indexes may perform better (at > > the cost of disk space). I do materialise that currently, but our database > > server doesn’t have a lot of memory so I’m often not hitting the cache and > > performance suffers a bit (infrastructure is about to change for the better > > though). > > That's an interesting idea, but I'm not sure I fully understand. Assuming > you're aggregating data: what do you group by? For instance, at an hourly > resolution, if you group by both the UTC timestamp and the local one, you > might end up, say, dividing an hour-long bucket in two for time zones with > half-hour-based offsets, no? > > Thanks for the detailed writeup! Definitely helpful to learn more about what > people are using in production to handle this sort of thing. Frankly, I haven’t had to deal with half-hour-based offsets since I got this idea. I’m using it with whole-hour-offsets, where it doesn’t affect bin boundaries. I suppose you could enrich your data in a similar fashion by adding a (virtual) column with the (client) time zone offset, so you could group by local timestamp + offset. That’s not going to match index expressions though, I fear… For sorting, UTC timestamps would probably still be a useful addition, but they’re simple to add by either converting back from the local timestamps or by taking the min and max of the UTC-based column on the above grouping. Both solutions require that offset, obviously. Now of course there are only 2 hours a year where this happens. Our data scientists chose to ignore the problem for simplicity’s sake and be slightly off with their numbers on those dates. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Issue in compiling postgres on latest macOS 14.1.1
> On 13 Nov 2023, at 18:00, Tom Lane wrote: (…) > * If you use MacPorts or Homebrew, maybe that's out of date? > Try removing the associated directories from your PATH to see > if it works better. Perhaps even worse; you had old binaries from an Intel architecture that were migrated onto a new ARM-based architecture? In that case the Homebrew uninstall scripts won’t even work anymore - at least not w/o Rosetta 2 - as they’re Intel-based too. A migration assistant can also work too well, I found. Alban Hertroys -- There is always an exception to always.
Re: Time zone offset in to_char()
> On 11 Jan 2024, at 17:43, Adrian Klaver wrote: > > On 1/11/24 07:06, Alban Hertroijs wrote: >> Hi all, > >> In the above, I worked around the issue using a couple of user-defined >> functions in PG. That should give a reasonable idea of the desired >> functionality, but it's not an ideal solution to my problem: >> 1). The first function has as a drawback that it changes the time zone for >> the entire transaction (not sufficiently isolated to my tastes), while >> 2). The second function has the benefit that it doesn't leak the time zone >> change, but has as drawback that the time zone is now hardcoded into the >> function definition, while > > I don't think the set_config and SET are acting the way you think they are: > > set_config(https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-SET) > > " > set_config ( setting_name text, new_value text, is_local boolean ) → text > > Sets the parameter setting_name to new_value, and returns that value. If > is_local is true, the new value will only apply during the current > transaction. If you want the new value to apply for the rest of the current > session, use false instead. This function corresponds to the SQL command SET. > > set_config('log_statement_stats', 'off', false) → off" > " I tried this like so: select ToDatetimeOffset(current_timestamp, 'Europe/Amsterdam'), to_char(current_timestamp, ‘-MM-DD HH24:MI:SS.SU0 TZH:TZM’). The result of the second call was based on time zone ‘Europe/Amsterdam’, where it wasn’t when called outside the transaction (when it was based on UTC corresponding to the server time zone). So the time zone set with set_config(…, …, true) appeared to leak out of function scope and applied to transaction scope (as described in the quoted text). For brevity I could run that query tomorrow when I’m back at work. > SET(https://www.postgresql.org/docs/current/sql-set.html) > > "If SET (or equivalently SET SESSION) is issued within a transaction that is > later aborted, the effects of the SET command disappear when the transaction > is rolled back. Once the surrounding transaction is committed, the effects > will persist until the end of the session, unless overridden by another SET. > > The effects of SET LOCAL last only till the end of the current transaction, > whether committed or not. A special case is SET followed by SET LOCAL within > a single transaction: the SET LOCAL value will be seen until the end of the > transaction, but afterwards (if the transaction is committed) the SET value > will take effect. It says transaction again here. > The effects of SET or SET LOCAL are also canceled by rolling back to a > savepoint that is earlier than the command. > > If SET LOCAL is used within a function that has a SET option for the same > variable (see CREATE FUNCTION), the effects of the SET LOCAL command > disappear at function exit; that is, the value in effect when the function > was called is restored anyway. This allows SET LOCAL to be used for dynamic > or repeated changes of a parameter within a function, while still having the > convenience of using the SET option to save and restore the caller's value. > However, a regular SET command overrides any surrounding function's SET > option; its effects will persist unless rolled back. > " I didn’t succeed in calling SET LOCAL TIMEZONE from within the function. Could be I missed something, then Google (stackoverflow) pointed me to set_config(). I did manage to apply it to the second function header, which I think behaves such that the time zone change stays within function scope. Right now I’m not 100% sure that I verified that. More to check tomorrow. Frankly, I do hope that you’re right here, that would make my work easier. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Time zone offset in to_char()
> On 11 Jan 2024, at 18:27, Adrian Klaver wrote: > > On 1/11/24 08:48, Adrian Klaver wrote: >> On 1/11/24 08:04, Alban Hertroijs wrote: > >>> The drawback, as mentioned, being that we need to maintain those functions >>> in each deployment, which is a bit of a hassle (albeit a minor one) because >>> we need to customise both the TDV side and the PostgreSQL side in that >>> case. Our preferred solution would be to just add a few entries to the TDV >>> database-specific capabilities file (as described in my initial message) >> Are you referring to?: >> "It currently have this: >> ToDatetimeOffsetNL(~any) : ciscache.ToDatetimeOffsetNL($1) >> ToDatetimeOffset(~any,~any) : ciscache.ToDatetimeOffset($1, $2) >> " > > It finally dawned on me, you want to replace the user defined functions above > with Postgres builtins only. Try as I might I could not come with that > solution. Exactly. I was having the same problem of finding a solution, quite to my surprise. >> I thought the issue there was maintaining the two Postgres functions? Yup, those two functions in fact. There will be at least 3 separate deployments, while maintenance of the database(-schema) contents is the responsibility of the 3rd party application (TDV). PG is used as a caching DB here, we therefore intend to treat the data in it as volatile; it shouldn’t hurt if we decide to recreate the caches from scratch from source data. Having custom code in there not under control of the 3rd party application breaks that guideline. If they’re necessary, then so be it, but I can’t shake the feeling that we can achieve this without custom code in the database. Regards, Alban Hertroys -- There is always an exception to always.