Re: Enforce primary key on every table during dev?

2018-03-01 Thread Alban Hertroys

> On 1 Mar 2018, at 1:47, Melvin Davidson  wrote:

> > I think you would be better off having an automated report which alerts
> >you to tables lacking a primary key and deal with that policy through
> >other means. 
> 
> Perhaps a better solution is to have a meeting with the developers and 
> explain to them 
> WHY the policy of enforcing a primary key is important. Also, explain the 
> purpose of
> primary keys and why it is not always suitable to just use an integer or 
> serial as the key,
> but rather why natural unique (even multi column) keys are better. But this 
> begs the question, 
> why are "developers" allowed to design database tables? That should be the 
> job of the DBA! At
> the very minimum, the DBA should be reviewing and have the authority to 
> approve of disapprove 
> of table/schema designs/changes .

Not to mention that not all types of tables necessarily have suitable 
candidates for a primary key. You could add a surrogate key based on a serial 
type, but in such cases that may not serve any purpose other than to have some 
arbitrary primary key.

An example of such tables is a monetary transaction table that contains records 
for deposits and withdrawals to accounts. It will have lots of foreign key 
references to other tables, but rows containing the same values are probably 
not duplicates.
Adding a surrogate key to such a table just adds overhead, although that could 
be useful in case specific rows need updating or deleting without also 
modifying the other rows with that same data - normally, only insertions and 
selections happen on such tables though, and updates or deletes are absolutely 
forbidden - corrections happen by inserting rows with an opposite transaction.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Enforce primary key on every table during dev?

2018-03-01 Thread Alban Hertroys
On 1 March 2018 at 17:22, Steven Lembark  wrote:
>
>> On 03/01/2018 02:20 AM, Alban Hertroys wrote:
>> [snip]
>> > Not to mention that not all types of tables necessarily have
>> > suitable candidates for a primary key. You could add a surrogate
>> > key based on a serial type, but in such cases that may not serve
>> > any purpose other than to have some arbitrary primary key.
>> >
>> > An example of such tables is a monetary transaction table that
>> > contains records for deposits and withdrawals to accounts.

(...)

> Start with Date's notion that a database exists to correclty represent
> data about the real world. Storing un-identified data breaks this
> since we have no idea what the data means or have any good way of
> getting it back out. Net result is that any workable relational
> database will have at least one candidate key for any table in it.

(...)

> If you have a design with un-identified data it means that you havn't
> normalized it properly: something is missing from the table with
> un-identifiable rows.

While that holds true for a relational model, in reporting for
example, it is common practice to denormalize data without a
requirement to be able to identify a single record. The use case for
such tables is providing quick aggregates on the data. Often this
deals with derived data. It's not that uncommon to not have a primary
or even a uniquely identifiable key on such tables.

I do not disagree that having a primary key on a table is a bad thing,
but I do disagree that a primary key is a requirement for all tables.

More generally: For every rule there are exceptions. Even for this one.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.



Re: Programmatically duplicating a schema

2018-03-13 Thread Alban Hertroys

> On 13 Mar 2018, at 4:23, matt.f...@internode.on.net wrote:
> 
> Hi all,
> 
> What is a reliable way to programmatically & generically populate an empty 
> schema with all the objects in the public schema as a template?
> 
> We are using the multi tenancy ruby gem Apartment ( 
> https://github.com/influitive/apartment ), which was recently broken by the 
> changes made to pg_dump to address CVE-2018-1058 
> https://nvd.nist.gov/vuln/detail/CVE-2018-1058
> 
> Apartment attempts to duplicate the public schema whenever creating a new 
> schema by running:
> 
> pg_dump -s -x -0 -n public
> 
> to get the SQL statements needed to recreate the public schema & then 
> executes the pg_dump's sql output after creating & switching to the new 
> schema ( via set search_path to ; )
> 
> After the fix to CVE-2018-1058, all table references in pg_dump's output 
> (including within SQL of stored procedures) are prefixed by the public. 
> schema, which means you cannot just reuse this output in a different schema 
> context without first manually changing the sql.
> As a temporary fix so we can handle new customers in production, we are using 
> a regex search/replace for public. in the pg_dump output, but clearly this is 
> not a reliable solution for a generic gem such as Apartment.

In my opinion, it makes sense that if you have the option of dumping the 
contents of a specific schema, it should be possible to restore that dump into 
a different schema. Unfortunately, looking at pg_restore, there does not appear 
to be such an option (yet).

I'd even go so far to suggest that every single object type that can be dumped 
with pg_dump (single database, single schema, single table, single function, 
etc) should be restorable under a different name. I realise that this could 
make pg_restore options potentially more confusing.

I suppose people currently manually edit the dumps to this effect, but that 
risks silent corruption of data when for example a data value contains a string 
such as 'The building is now open to public.'. Regular expressions don't know 
the difference between data and identifiers in a dump file - pg_restore does.

Whether psql needs the same treatment? I'd qualify this as "advanced" use and 
limit it to pg_restore.

But then, I'm just a list-lurker, I currently have but the option of voicing my 
opinion.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Use pgloader with FDW

2018-03-23 Thread Alban Hertroys
I suppose MySQL has the concept of views. Perhaps you can create a
view over the table that translates those 0-datetimes to NULL and have
the FDW connect to the view instead?

On 23 March 2018 at 14:27, Patricia DECHANDOL
 wrote:
> Thanks a lot Adrian.
> We can't update the mysql columns values, so I will adopt the datetime -> 
> text solution.
> Thanks again
>
>
> -Message d'origine-
> De : Adrian Klaver 
> Envoyé : vendredi 23 mars 2018 14:25
> À : Patricia DECHANDOL ; 
> pgsql-general@lists.postgresql.org
> Objet : Re: Use pgloader with FDW
>
> On 03/23/2018 06:07 AM, Patricia DECHANDOL wrote:
>> Hello Adrian,
>>
>> So, if I well understand, the only solution is to wrap the mysql datetime 
>> columns to "text" columns in my foreign tables.
>> And then use a cast function to convert from text to date in Postgre when I 
>> want to use these columns ?
>
> There is also the option of changing the values in the MySQL database to 
> either an actual datetime or NULL. Though there is the potential issue of 
> what that would do to code that is pulling from the MySQL database.
>
>>
>> No other way.
>> The pgloader can't be used by the FDW to manage this point ?
>
> It has been a while since I used pgloader, but from what I remember it is a 
> tool for doing the initial migration of data from MySQL/Sqlite/SQL Server to 
> Postgres. What you seem to be looking for is pgloader to sit between the 
> MySQL database and the Postgres one and do the transformation in real time. 
> AFAIK that is not possible.
>
>>
>>
>>
>> -Message d'origine-
>> De : Adrian Klaver  Envoyé : vendredi 23
>> mars 2018 13:52 À : Patricia DECHANDOL ;
>> pgsql-general@lists.postgresql.org
>> Objet : Re: Use pgloader with FDW
>>
>> On 03/23/2018 03:55 AM, Patricia DECHANDOL wrote:
>>> Hello,
>>>
>>> I'm new to Postgresql and try to use the FDW with mysql database.
>>>
>>> Everything is OK to create my FDW and foreign tables, but I meet a
>>> problem when I try to do a select on a FOREIGN table containing
>>> datetime column which contains the value "-00-00 00:00:00" in mysql.
>>
>> That is MySQL's version of NULL for datetimes. It is not a valid datetime 
>> though.
>>
>>>
>>> Select on the foreign table fails.
>>>
>>> The mysql datetime column has been automatically wrapped to
>>> "timestamp without timezone" column in the foreign table by the
>>> instruction "import foreign schema" that I used.
>>
>>>
>>> How can I deal with this ?
>>
>> https://github.com/EnterpriseDB/mysql_fdw/issues/38
>>
>>>
>>> I read about the pgloader with can manage this king of problem, but
>>> can't figure out how to use it with FDW.
>>>
>>> Thanks for your help.
>>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com



-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.



Re: Concatenate of values in hierarchical data

2018-04-04 Thread Alban Hertroys
010,439755
> 'CTS', '536', 2009,  2010,439765
> 
> There are primary keys on indexlist (on indexlistid) and indexlistapp (on 
> indexlistid) but there is no foreign key pointing to the other table. The 
> indexlistid in indexlist 
> points directly to the indexlistid in indexlistapp. The parent column is 
> simply calculated from the treelevel. The tree is built entirely from the seq 
> and treelevel.
> 
> I need the data to be returned in this format:
> 
> app   price  ic
> ---
> 'Front-V-Series-opt J56-R.',  '$693','536-01132AR'
> 'Front-V-Series-opt J56-L.',  '$693','536-01132AL'
> 'Front-V-Series-opt J63-R.',  null,  '536-01130R'
> 'Front-V-Series-opt J63-L.',  null,  '536-01131L'
> 'Front-exc. V-Series-opt JE5-AWD-R.', null,  '536-01142'
> 'Front-exc. V-Series-opt JE5-AWD-L.', null,  '536-01143'
> 'Front-exc. V-Series-opt JE5-RWD-R.', null,  '536-01143'
> 'Front-exc. V-Series-opt JE5-RWD-L.', null,  '536-01142'
> 'Front-exc. V-Series-opt J55-AWD-R.', null,  '536-01136'
> 'Front-exc. V-Series-opt J55-AWD-L.', null,  '536-01137'
> 'Front-exc. V-Series-opt J55-RWD-R.', null,  '536-01137'
> 'Front-exc. V-Series-opt J55-RWD-L.', null,  '536-01136'
> 'Rear-Base-opt JE5-R.',   null,  '536-01038R'
> 'Rear-Base-opt JE5-L.',   null,  '536-01039L'
> 'Rear-Base-opt J55-R.',   null,  '536-01042R'
> 'Rear-Base-opt J55-L.',   null,  '536-01043L'
> 'Rear-V-Series-R.',   '$403.00', '536-01134AR'
> 'Rear-V-Series-L.',   '$466.00', '536-01135AL'
> 
> I am unsure how to do this in SQL.

(…)

> Also, using this recursive sql 
> (REF:https://stackoverflow.com/questions/26280379/how-to-concatenate-field-values-with-recursive-query-in-postgresql),
>  I'm able to get it fairly close. 
> Just not sure why it is returning 476 rows vs. 34. 
> http://sqlfiddle.com/#!15/ca1ee/3
> 
> WITH RECURSIVE the_tree AS (
> 
>   SELECT g.seq, g.app::TEXT, g.price, g.ic, g.treelevel::INTEGER
>   FROM indexlistapp g
>   WHERE g.indexlistid in (SELECT il.indexlistid 
>   FROM indexlist il
>   WHERE il.model =  'CTS'
>   AND ('2010' BETWEEN il.beginyear AND 
> il.endyear) 
>   AND il.type = '536')
>   AND g.treelevel = 1
> 
> 
>   UNION
> 
>   SELECT t.seq, t.app || ', ' || g.app AS app, t.price, t.ic, 
> t.treelevel::INTEGER + 1
>   FROM the_tree AS t
>   INNER JOIN indexlistapp g ON g.treelevel = t.treelevel + 1
>   WHERE g.indexlistid in (SELECT il.indexlistid 
>   FROM indexlist il
>   WHERE il.model =  'CTS'
>   AND ('2010' BETWEEN il.beginyear AND 
> il.endyear) 
>   AND il.type = '536')
> )
> SELECT * from the_tree;
> 
> 
> Can anyone help me out here?

Your JOIN in the recursive part of the CTE should probably be ON g.parent = 
t.treelevel AND g.indexlistid = t.indexlistid. Since both the initial and 
recursive part use the same value for indexlistid, the subquery in the 
recursive part is superfluous.

You should also take the order in seq into account for items on the same node, 
if that is possible in your data, otherwise there is no guarantee that you'll 
get your string components in the correct order.

I think you need something closer to this:

WITH RECURSIVE the_tree AS (
SELECT g.seq, g.app::TEXT, g.price, g.ic, g.treelevel::INTEGER, 
g.indexlistid
FROM indexlistapp g
WHERE g.indexlistid IN (
SELECT il.indexlistid
FROM indexlist il
WHERE il.model = 'CTS'
AND '2010' BETWEEN il.beginyear AND il.endyear
AND il.type = '536'
)
AND g.parent = 0

UNION ALL

SELECT t.seq, t.app || ', ' || g.app AS app, t.price, t.ic, 
g.treelevel, g.indexlistid
FROM the_tree t
INNER JOIN indexlistapp g ON g.parent = t.treelevel AND g.indexlistid = 
t.indexlistid
ORDER BY g.seq
)
SELECT * FROM the_tree

Another observation is that you only seem interested in the leaf nodes of your 
tree. Recognising leaf nodes in your hierarchical query can be achieved by 
adding a column that tests the existence of a node with the current node as its 
parent - if it does not exist, this node is a leaf node.

The result of SELECT * FROM the_tree can then be filtered to return leaf nodes 
only.


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Same condition in the CTE and in the subsequent JOIN using it

2018-04-25 Thread Alban Hertroys

> On 25 Apr 2018, at 17:45, Alexander Farber  wrote:

(…)

> And here is the function source code:
> 
> CREATE OR REPLACE FUNCTION words_stat_scores(
> in_social integer,
> in_sidtext
> ) RETURNS TABLE (
> out_day   text,
> out_diff  numeric,
> out_score numeric
> ) AS
> $func$
> WITH cte AS (
> SELECT
> DATE_TRUNC('day', m.played) AS day,
> m.mid,
> EXTRACT(EPOCH FROM m.played - LAG(m.played) OVER 
> (PARTITION BY m.gid ORDER BY m.played))::int/60 AS diff
> FROMwords_moves m
> JOINwords_games g ON (m.gid = g.gid)
> JOINwords_social s ON (s.uid IN (g.player1, g.player2))
> WHERE   s.social = in_social   -- CAN 
> THIS BE REFERRED TO FROM BELOW?
> AND s.sid = in_sid
> AND m.played > CURRENT_TIMESTAMP - interval '1 month'
> )
> SELECT
> TO_CHAR(c.day, 'DD.MM.'),
> ROUND(AVG(c.diff)),
> ROUND(AVG(m.score), 1)
> FROMwords_moves m
> JOINcte c using(mid)
> JOINwords_social s USING(uid)
> WHERE   s.social = in_social
> AND s.sid = in_sid
> AND m.action = 'play'
> GROUP BY c.day
> ORDER BY c.day;
> 
> $func$ LANGUAGE sql STABLE;
> 
> By looking at the above source code, do you think, that the condition being 
> used twice (the s.social = in_social AND s.sid = in_sid) is "too much" and 
> can be optimized? :-)

Actually, no. The conditions are part of different joins.

Within the CTE, you have a join that boils down to:

>   FROMwords_games g ON (m.gid = g.gid)
>   JOINwords_social s ON (s.uid IN (g.player1, g.player2) AND s.social = 
> in_social AND s.sid = in_sid)


In your outer query, you have:

>   FROMwords_moves m
>   JOINwords_social s ON (s.uid = m.uid AND s.social = in_social AND s.sid 
> = in_sid)


The joins are on different fields, in different tables even, so you can't just 
leave the conditions out because they filter different rows.

What you _can_ do is move the words_social JOIN and it's conditions into a new 
CTE and join with that instead.  Something like so:

WITH words_in_social AS (
SELECT sid, uid
  FROM words_social
 WHERE social = in_social
   AND sid = in_sid
),
cte AS (
SELECT
DATE_TRUNC('day', m.played) AS day,
m.mid,
EXTRACT(EPOCH FROM m.played - LAG(m.played) OVER (PARTITION BY 
m.gid ORDER BY m.played))::int/60 AS diff
  FROMwords_moves m
  JOINwords_games g ON (m.gid = g.gid)
  JOINwords_in_social s ON (s.uid IN (g.player1, g.player2))
  WHERE   m.played > CURRENT_TIMESTAMP - interval '1
)
SELECT
TO_CHAR(c.day, 'DD.MM.'),
ROUND(AVG(c.diff)),
ROUND(AVG(m.score), 1)
  FROM words_moves m
  JOIN cte c using(mid)
  JOIN words_in_social s USING(uid)
 WHERE m.action = 'play'
 GROUP BY c.day
 ORDER BY c.day;

That may look cleaner, but your original query probably performs better, since 
CTE's also act as an optimisation fence.

BTW, I suggest to use a better name for your CTE than cte; I'd rather use a 
name that clarifies its purpose.

> Thank you for any hints, I apologize if my question is too specific and 
> difficult to answer...
> 
> Regards
> Alex

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: ON CONFLICT DO UPDATE

2018-05-10 Thread Alban Hertroys

> On 10 May 2018, at 7:13, tango ward  wrote:

>  ON CONFLICT (school_system_id,
>   student_id,
>   campus_name
>   ) DO UPDATE
>   SET school_system_id = 
> excluded.school_system_id,
>   student_id = excluded.student_id,
>   campus_name = excluded.campus_name

I'm pretty sure this ought to read:
ON CONFLICT (school_system_id, student_id, campus_name)
DO
   UPDATE SET modified = EXCLUDED.modified,
balance = EXCLUDED.balance,
balance_as_of = EXCLUDED.balance_as_of

Instead, you were re-assigning the keys (school_system_id, student_id, 
campus_name) to the same values again.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: What to do when dynamic shared memory control segment is corrupt

2018-06-19 Thread Alban Hertroys


> On 18 Jun 2018, at 17:34, Sherrylyn Branchaw  wrote:
> 
> In the other case, the logs recorded
> 
> LOG:  all server processes terminated; reinitializing
> LOG:  dynamic shared memory control segment is corrupt
> LOG:  incomplete data in "postmaster.pid": found only 1 newlines while trying 
> to add line 7
> 
> In that case, the database did not restart on its own. It was 5 am on Sunday, 
> so the on-call SRE just manually started the database up, and it appears to 
> have been running fine since.

That rings a bell. Some versions of systemd apparently clean up shared memory 
belonging to a user when it detects the user logs out. ISTR that we had to 
disable that on our CentOS 7 server to stop crashes from happening.

More details here: https://wiki.postgresql.org/wiki/Systemd

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Is postorder tree traversal possible with recursive CTE's?

2018-06-19 Thread Alban Hertroys
Hi all,

I'm struggling with a hierarchical query where I'm tasked to calculate weights 
of items in an (exploded) Bill of Materials, based on the weights of their 
components. Not all components are measured with a weight, sometimes there are 
pieces, meters, areas, etc, and the hierarchy is of varying levels of depth.

It would help if I could track a sum() throughout the explosion that would 
write back onto parent rows when the recursion returns: postorder traversal.

I created a simplified example about making pizza:

CREATE TABLE ingredient (
name text NOT NULL
);

CREATE TABLE recipe (
name text NOT NULL,
ingredient text NOT NULL,
quantity numeric(6,2) NOT NULL,
unit text NOT NULL,
step integer NOT NULL
);

COPY ingredient (name) FROM stdin;
tomato
basil
salt
tomato sauce
flour
water
yeast
dough
pizza bottom
pizza
\.

COPY recipe (name, ingredient, quantity, unit, step) FROM stdin;
tomato saucetomato  100.00  g   1
dough   flour   150.00  g   1
tomato saucebasil   10.00   g   2
pizza   pizza bottom1.00pcs 2
tomato saucesalt3.00g   3
dough   salt1.00pinch   3
pizza   tomato sauce1.00pcs 1
pizza bottomdough   1.00pcs 2
dough   water   50.00   g   2
\.

ALTER TABLE ONLY ingredient
ADD CONSTRAINT ingredient_pkey PRIMARY KEY (name);

ALTER TABLE ONLY recipe
ADD CONSTRAINT recipe_pkey PRIMARY KEY (name, ingredient);

ALTER TABLE ONLY recipe
ADD CONSTRAINT recipe_ingredient_fkey FOREIGN KEY (ingredient) REFERENCES 
ingredient(name);

ALTER TABLE ONLY recipe
ADD CONSTRAINT recipe_name_fkey FOREIGN KEY (name) REFERENCES 
ingredient(name);


A query listing the recipe for 'pizza' would be as follows:
development=> with recursive pizza (name, step, ingredient, quantity, unit, 
rel_qty, path, weight)
as (
select
name, step, ingredient, quantity, unit
,   quantity::numeric(10,2)
,   step::text
,   case when unit = 'g' then quantity::numeric(10,2) else null end
  from recipe
 where name = 'pizza'
union all
select
recipe.name, recipe.step, recipe.ingredient, recipe.quantity, 
recipe.unit
,   (pizza.rel_qty * recipe.quantity)::numeric(10,2)
,   pizza.path || '.' || recipe.step
,   case when recipe.unit = 'g' then (pizza.rel_qty * 
recipe.quantity)::numeric(10,2) else null end
  from pizza
  join recipe on (recipe.name = pizza.ingredient)
)
select path, ingredient, quantity, rel_qty, unit, weight
  from pizza
 order by path;

 path  |  ingredient  | quantity | rel_qty | unit  | weight 
---+--+--+-+---+
 1 | tomato sauce | 1.00 |1.00 | pcs   |   
 1.1   | tomato   |   100.00 |  100.00 | g | 100.00
 1.2   | basil|10.00 |   10.00 | g |  10.00
 1.3   | salt | 3.00 |3.00 | g |   3.00
 2 | pizza bottom | 1.00 |1.00 | pcs   |   
 2.2   | dough| 1.00 |1.00 | pcs   |   
 2.2.1 | flour|   150.00 |  150.00 | g | 150.00
 2.2.2 | water|50.00 |   50.00 | g |  50.00
 2.2.3 | salt | 1.00 |1.00 | pinch |   
(9 rows)


With these results, I somehow need to calculate that the weights of 'tomato 
sauce', 'dough' and 'pizza bottom' are 113 g, 200 g and 200 g respectively, 
bringing the total weight of 'pizza' to 313 g.

My first thought was to traverse the result of this recursive CTE using another 
one, but in the opposite direction. But since this tends to be kept as a 
temporary materialized result set with no indices, that's not performing great 
and it adds a fair amount of complexity to the query too.

Then I realised that if we somehow could track the sum() of 'weight' throughout 
exploding these recipe items, by using a postorder tree traversal, the desired 
result would be readily available to pick up when the recursive CTE travels up 
through the hierarchy.

In above example; When the CTE would reach '1.3 salt', it would write the 
summed 'weight' value 113 back on the result for '1 tomato sauce' and when it 
reached '2.2.2 salt' it would write back 200 to '2.2 dough' and then 200 to '2 
pizza bottom'.

Is that possible?

I've seen a couple of "solutions" on the internet that just summed up the 
results of the CTE, but that won't do as it would not put the correct weights 
onto intermediate levels of the tree as far as I can see (in above, the weight 
of 'dough').


Regards,

Alban Hertroys


PS. Don't try to make pizza using this recipe, it probably won't succeed. I 
forgot the yeast, for one thing, and quantities are probably way off. Not t

Re: Is postorder tree traversal possible with recursive CTE's?

2018-06-19 Thread Alban Hertroys
3.00 |3.00 | g |   3.00 |   313.00
>  2 | pizza bottom | 1.00 |1.00 | pcs   |    |   313.00
>  2.2   | dough| 1.00 |1.00 | pcs   ||   313.00
>  2.2.1 | flour|   150.00 |  150.00 | g | 150.00 |   313.00
>  2.2.2 | water|50.00 |   50.00 | g |  50.00 |   313.00
>  2.2.3 | salt | 1.00 |1.00 | pinch ||   313.00
> (9 rows)
> 
> 
> 
> 
> 
> El mar., 19 de jun. de 2018 a la(s) 08:39, Alban Hertroys 
> (haram...@gmail.com) escribió:
> Hi all,
> 
> I'm struggling with a hierarchical query where I'm tasked to calculate 
> weights of items in an (exploded) Bill of Materials, based on the weights of 
> their components. Not all components are measured with a weight, sometimes 
> there are pieces, meters, areas, etc, and the hierarchy is of varying levels 
> of depth.
> 
> It would help if I could track a sum() throughout the explosion that would 
> write back onto parent rows when the recursion returns: postorder traversal.
> 
> I created a simplified example about making pizza:
> 
> CREATE TABLE ingredient (
> name text NOT NULL
> );
> 
> CREATE TABLE recipe (
> name text NOT NULL,
> ingredient text NOT NULL,
> quantity numeric(6,2) NOT NULL,
> unit text NOT NULL,
> step integer NOT NULL
> );
> 
> COPY ingredient (name) FROM stdin;
> tomato
> basil
> salt
> tomato sauce
> flour
> water
> yeast
> dough
> pizza bottom
> pizza
> \.
> 
> COPY recipe (name, ingredient, quantity, unit, step) FROM stdin;
> tomato saucetomato  100.00  g   1
> dough   flour   150.00  g   1
> tomato saucebasil   10.00   g   2
> pizza   pizza bottom1.00pcs 2
> tomato saucesalt3.00g   3
> dough   salt1.00pinch   3
> pizza   tomato sauce1.00pcs 1
> pizza bottomdough   1.00pcs 2
> dough   water   50.00   g   2
> \.
> 
> ALTER TABLE ONLY ingredient
> ADD CONSTRAINT ingredient_pkey PRIMARY KEY (name);
> 
> ALTER TABLE ONLY recipe
> ADD CONSTRAINT recipe_pkey PRIMARY KEY (name, ingredient);
> 
> ALTER TABLE ONLY recipe
> ADD CONSTRAINT recipe_ingredient_fkey FOREIGN KEY (ingredient) REFERENCES 
> ingredient(name);
> 
> ALTER TABLE ONLY recipe
> ADD CONSTRAINT recipe_name_fkey FOREIGN KEY (name) REFERENCES 
> ingredient(name);
> 
> 
> A query listing the recipe for 'pizza' would be as follows:
> development=> with recursive pizza (name, step, ingredient, quantity, unit, 
> rel_qty, path, weight)
> as (
> select
> name, step, ingredient, quantity, unit
> ,   quantity::numeric(10,2)
> ,   step::text
> ,   case when unit = 'g' then quantity::numeric(10,2) else null 
> end
>   from recipe
>  where name = 'pizza'
> union all
> select
> recipe.name, recipe.step, recipe.ingredient, recipe.quantity, 
> recipe.unit
> ,   (pizza.rel_qty * recipe.quantity)::numeric(10,2)
> ,   pizza.path || '.' || recipe.step
> ,   case when recipe.unit = 'g' then (pizza.rel_qty * 
> recipe.quantity)::numeric(10,2) else null end
>   from pizza
>   join recipe on (recipe.name = pizza.ingredient)
> )
> select path, ingredient, quantity, rel_qty, unit, weight
>   from pizza
>  order by path;
> 
>  path  |  ingredient  | quantity | rel_qty | unit  | weight 
> ---+--+--+-+---+
>  1 | tomato sauce | 1.00 |1.00 | pcs   |   
>  1.1   | tomato   |   100.00 |  100.00 | g | 100.00
>  1.2   | basil|10.00 |   10.00 | g |  10.00
>  1.3   | salt | 3.00 |3.00 | g |   3.00
>  2 | pizza bottom | 1.00 |1.00 | pcs   |   
>  2.2   | dough| 1.00 |1.00 | pcs   |   
>  2.2.1 | flour|   150.00 |  150.00 | g | 150.00
>  2.2.2 | water|50.00 |   50.00 | g |  50.00
>  2.2.3 | salt | 1.00 |1.00 | pinch |   
> (9 rows)
> 
> 
> With these results, I somehow need to calculate that the weights of 'tomato 
> sauce', 'dough' and 'pizza bottom' are 113 g, 200 g and 200 g respectively, 
> bringing the total weight of 'pizza' to 313 g.
> 
> My first thought was to traverse the result of this recursive CTE using 
> another one, but in the opposite direction. But since this tends to be kept 
> as a temporary materialized result set with no indices, that's n

Re: Load data from a csv file without using COPY

2018-06-19 Thread Alban Hertroys


> On 19 Jun 2018, at 22:16, Ravi Krishna  wrote:
> 
> In order to test a real life scenario (and use it for benchmarking) I want to 
> load large number of data from csv files.  
> The requirement is that the load should happen like an application writing to 
> the database ( that is, no COPY command). 
> Is there a tool which can do the job.  Basically parse the csv file and 
> insert it to the database row by row.
> 
> thanks

I think an easy approach would be to COPY the CSV files into a separate 
database using psql's \copy command and then pg_dump that as separate insert 
statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Is postorder tree traversal possible with recursive CTE's?

2018-06-20 Thread Alban Hertroys
 313.00
>>  1.3   | salt     |     3.00 |3.00 | g |   3.00 |   313.00
>>  2 | pizza bottom | 1.00 |1.00 | pcs   ||   313.00
>>  2.2   | dough| 1.00 |1.00 | pcs   ||   313.00
>>  2.2.1 | flour|   150.00 |  150.00 | g | 150.00 |   313.00
>>  2.2.2 | water|50.00 |   50.00 | g |  50.00 |   313.00
>>  2.2.3 | salt | 1.00 |1.00 | pinch ||   313.00
>> (9 rows)
>>
>>
>>
>>
>>
>> El mar., 19 de jun. de 2018 a la(s) 08:39, Alban Hertroys
>> (haram...@gmail.com) escribió:
>>>
>>> Hi all,
>>>
>>> I'm struggling with a hierarchical query where I'm tasked to calculate
>>> weights of items in an (exploded) Bill of Materials, based on the weights of
>>> their components. Not all components are measured with a weight, sometimes
>>> there are pieces, meters, areas, etc, and the hierarchy is of varying levels
>>> of depth.
>>>
>>> It would help if I could track a sum() throughout the explosion that
>>> would write back onto parent rows when the recursion returns: postorder
>>> traversal.
>>>
>>> I created a simplified example about making pizza:
>>>
>>> CREATE TABLE ingredient (
>>> name text NOT NULL
>>> );
>>>
>>> CREATE TABLE recipe (
>>> name text NOT NULL,
>>> ingredient text NOT NULL,
>>> quantity numeric(6,2) NOT NULL,
>>> unit text NOT NULL,
>>> step integer NOT NULL
>>> );
>>>
>>> COPY ingredient (name) FROM stdin;
>>> tomato
>>> basil
>>> salt
>>> tomato sauce
>>> flour
>>> water
>>> yeast
>>> dough
>>> pizza bottom
>>> pizza
>>> \.
>>>
>>> COPY recipe (name, ingredient, quantity, unit, step) FROM stdin;
>>> tomato saucetomato  100.00  g   1
>>> dough   flour   150.00  g   1
>>> tomato saucebasil   10.00   g   2
>>> pizza   pizza bottom1.00pcs 2
>>> tomato saucesalt3.00g   3
>>> dough   salt1.00pinch   3
>>> pizza   tomato sauce1.00pcs 1
>>> pizza bottomdough   1.00pcs 2
>>> dough   water   50.00   g   2
>>> \.
>>>
>>> ALTER TABLE ONLY ingredient
>>> ADD CONSTRAINT ingredient_pkey PRIMARY KEY (name);
>>>
>>> ALTER TABLE ONLY recipe
>>> ADD CONSTRAINT recipe_pkey PRIMARY KEY (name, ingredient);
>>>
>>> ALTER TABLE ONLY recipe
>>> ADD CONSTRAINT recipe_ingredient_fkey FOREIGN KEY (ingredient)
>>> REFERENCES ingredient(name);
>>>
>>> ALTER TABLE ONLY recipe
>>> ADD CONSTRAINT recipe_name_fkey FOREIGN KEY (name) REFERENCES
>>> ingredient(name);
>>>
>>>
>>> A query listing the recipe for 'pizza' would be as follows:
>>> development=> with recursive pizza (name, step, ingredient, quantity,
>>> unit, rel_qty, path, weight)
>>> as (
>>> select
>>> name, step, ingredient, quantity, unit
>>> ,   quantity::numeric(10,2)
>>> ,   step::text
>>> ,   case when unit = 'g' then quantity::numeric(10,2) else
>>> null end
>>>   from recipe
>>>  where name = 'pizza'
>>> union all
>>> select
>>> recipe.name, recipe.step, recipe.ingredient,
>>> recipe.quantity, recipe.unit
>>> ,   (pizza.rel_qty * recipe.quantity)::numeric(10,2)
>>> ,   pizza.path || '.' || recipe.step
>>> ,   case when recipe.unit = 'g' then (pizza.rel_qty *
>>> recipe.quantity)::numeric(10,2) else null end
>>>   from pizza
>>>   join recipe on (recipe.name = pizza.ingredient)
>>> )
>>> select path, ingredient, quantity, rel_qty, unit, weight
>>>   from pizza
>>>  order by path;
>>>
>>>  path  |  ingredient  | quantity | rel_qty | unit  | weight
>>> ---+--+--+-+---+
>>>  1 | tomato sauce | 1.00 |1.00 | pcs   |
>>>  1.1   | tomato   |   100.00 |  100.00 | g | 100.00
>>>  1.2   | basil|10.00 |   10.00 | g |  10.00
>>>  1.3   | salt | 3.00 |3.0

Re: DB size growing exponentially when materialized view refreshed concurrently (postgres 9.6)

2018-06-25 Thread Alban Hertroys


> On 25 Jun 2018, at 19:21, Vikas Sharma  wrote:
> 
> I am looking for advice in a issue where two materialized views are being 
> refreshed concurrently and dbsize has grown to 150gb from 4gb in two days.
> 
> We use two materialized views to keep processed data for faster query results 
> for a search function. Earlier materialized views were refreshed not 
> concurrently and all was good on DB.
> 
> We changed mv refresh to concurrently to take advantage of simultaneous 
> access when mv refreshed. Now the refresh takes slightly longer and but DB 
> size has grown exponentially.
> 
> I ran full vacuum on DB and size again reduced to 4gb from 150gb. 

You did not disable or tune down autovacuum perchance?
With materialized view refreshes that often, you probably need fairly 
aggressive autovacuuming on that table - you can tune autovacuum parameters per 
table (see Storage parameters). That probably won't put you at 4GB, more around 
double that size, but it should stay a reasonable size that way.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Split daterange into sub periods

2018-07-06 Thread Alban Hertroys
On 5 July 2018 at 16:16, hmidi slim  wrote:
> In fact I'm trying to split a period in sub periods. Following this example
> :
> If I have a period =[2018-01-01, 2018-01-31] and two other periods
> [2018-01-04, 2018-01-06] and [2018-01-08, 2018-01-08].
> If I split the base period '[2018-01-01, 2018-01-31]' by the other two
> periods '[2018-01-04, 2018-01-06]' and '[2018-01-08, 2018-01-08]' I will got
> such a result:
> [2018-01-01, 2018-01-03]
> [2018-01-07, 2018-01-07]
> [2018-01-09, 2018-01-31].

What about a recursive CTE?

What about a recursive CTE?

with recursive
period as (select '[2018-01-01, 2018-01-31]'::daterange as range)
,exclude as (
select range
  from (values
  ('[2018-01-01, 2018-01-03]'::daterange)
,('[2018-01-07, 2018-01-07]'::daterange)
,('[2018-01-09, 2018-01-31]'::daterange)
  ) v(range)
)
,available (lo, hi, exclude, available) as (
select
lower(p.range), upper(p.range)
,x.range
,p.range - x.range
  from period p,exclude x
  where not exists (
  select 1
  from exclude x2
 where lower(x2.range) < lower(x.range)
   and lower(x2.range) >= lower(p.range)
  )

 union all

 select
upper(x.range), hi
,x.range
,daterange(upper(x.range), hi)
  from available a, exclude x
 where a.lo <= a.hi
   and lower(x.range) > lo
   and not exists (
select 1
  from exclude x2
 where lower(x2.range) < lower(x.range)
   and lower(x2.range) > lo
 )
)
select * from available;

 lo | hi | exclude |available
++-+-
 2018-01-01 | 2018-02-01 | [2018-01-01,2018-01-04) | [2018-01-04,2018-02-01)
 2018-01-08 | 2018-02-01 | [2018-01-07,2018-01-08) | [2018-01-08,2018-02-01)
 2018-02-01 | 2018-02-01 | [2018-01-09,2018-02-01) | empty
(3 rows)

It can probably be optimized a bit, I haven't played with ranges much yet.

Regards,
Alban Hertroys
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.



Re: Restore from dumps

2018-07-25 Thread Alban Hertroys


> On 25 Jul 2018, at 9:43, Nicola Contu  wrote:
> 
> Hello,
> we recently moved from postgres 9.6.6 to 10.4
> 
> We perform a pg_dump in production to restore daily in a preprod env.
> This process used to work perfectly, but now we have a tiny problem.
> 
> We first restore data, we perform a vacuum and then we restore matviews.

What are the commands you used? You don't seem to mention restoring the schema?

> Restoring matviews now we have :
> 
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 23090; 0 1912379424 
> MATERIALIZED VIEW DATA matview_vrs_request_sla postgres
> pg_restore: [archiver (db)] could not execute query: ERROR:  relation 
> "all_days" does not exist
> LINE 3: from all_days
>  ^
> QUERY:
> select count(*)::numeric
> from all_days
> where (("date" between $2::date and $1::date) or ("date" between $1::date and 
> $2::date))
> and dow not in (0,6)
> 
> CONTEXT:  SQL function "bdays" during inlining
> Command was: REFRESH MATERIALIZED VIEW public.matview_vrs_request_sla;

Is all_days a table? Or is it perhaps another view, possibly materialized even?

> The relation is there, in fact if I go there when I get in to the office, the 
> same command works.

This sounds to me like you may be using a different version of pg_restore in 
the office. Are both versions 10.4 or newer?
It can't hurt to check that you used version 10.4 of pg_dump as well.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Select rows when all all ids of its children records matches

2018-09-12 Thread Alban Hertroys


> On 12 Sep 2018, at 17:44, Arup Rakshit  wrote:
> 
> Can you show me the SQL construction? Do I need to use `WITH`?


An option is to create a bit-wise OR and SUM the grouped results. If the result 
of these 3 bits is 7, than the post matches all three tags.

select p.id, p.name
  from post p
  join post_tag pt on (pt.post = p.id)
  join tag t on (t.id = pt.tag)
 where t.id in (1, 2, 3)
 group by
  case t.id
   when 1 then 1
   when 2 then 2
   when 3 then 4
   else 0
  end
 having sum(case t.id
   when 1 then 1
   when 2 then 2
   when 3 then 4
   else 0
  end) = 7;

I used ints here for the bitwise OR, a bitstring would probably be neater.

Another approach is to aggregate the set of matching tags into an array using 
array_agg(). I think that's what David means. You could then check the length 
of the array to see if you have all 3 (or 4 or 5 or 9000).

>> On 12-Sep-2018, at 9:13 PM, David G. Johnston  
>> wrote:
>> 
>> On Wednesday, September 12, 2018, Arup Rakshit  wrote:
>> IN is OR, I want the AND logic. Select posts which has tag 1, 2 and 3 ( tag 
>> ids )
>> 
>> Build arrays and then use the “contains” operator.
>> 
>> David J. 
> 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Regrading brin_index on required column of the table

2018-09-19 Thread Alban Hertroys
The results of explain analyze would shed light on the core problem.

My guess is that your conditions are not very selective - ie. most
records in both tables have bFetch = false - and therefore you are
retrieving most of your data and that is what's taking 7 minutes. No
index is going to fix that.

If those boolean values are distributed very unevenly (say 99.9% has
false and 0.1% has true), you may get better results by excluding the
records with 'true' values (instead of including those that are
'false'), for example by using a where not exists(...) subquery.

Obviously, that still won't help if you're just fetching a lot of data.
On Wed, 19 Sep 2018 at 16:23, Durgamahesh Manne
 wrote:
>
>
>
>
>
>
> On Wed, Sep 19, 2018 at 7:41 PM Igor Neyman  wrote:
>>
>>
>>
>> From: Durgamahesh Manne [mailto:maheshpostgr...@gmail.com]
>> Sent: Wednesday, September 19, 2018 10:04 AM
>> To: Igor Neyman 
>> Subject: Re: Regrading brin_index on required column of the table
>>
>> On Wed, Sep 19, 2018 at 7:19 PM Igor Neyman  wrote:
>>
>>
>>
>> From: Durgamahesh Manne [mailto:maheshpostgr...@gmail.com]
>> Sent: Wednesday, September 19, 2018 9:43 AM
>> To: PostgreSQL mailing lists 
>> Subject: Regrading brin_index on required column of the table
>>
>> Hi
>>
>> Respected postgres community members
>>
>>
>>
>> I have created BRIN index on few columns of the table without any issues. 
>> But i am unable to create BRIN index on one column of the table as i got 
>> error listed below
>>
>>
>>
>>
>>
>> [local]:6263 postgres@huawei=# CREATE INDEX brin_idx on huawei using brin 
>> ("dFetch");
>>
>> ERROR:  data type boolean has no default operator class for access method 
>> "brin"
>>
>> HINT:  You must specify an operator class for the index or define a default 
>> operator class for the data type.
>>
>>
>>
>>  below is the column description:
>>
>> Column datatype   collationnullable   defaultstorage
>>
>>
>>
>>  dFetchboolean false
>> plain
>>
>>
>>
>>
>>
>>
>>
>> so please help in creating of the BRIN index on above column of the table .
>>
>>
>>
>>
>>
>>
>>
>> Regards
>>
>>
>>
>> Durgamahesh Manne
>>
>>
>>
>> Why would you want BRIN index on Boolean-type column?
>>
>> What kind of interval will you specify?
>>
>>
>>
>> Regards,
>>
>> Igor Neyman
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>  Hi
>>
>>
>>
>>
>>
>>  I have complex query like for ex  select  distinct 
>> sub_head."vchSubmittersCode" ,rec."vchFileName" , 
>> rec."vchCusipFundIDSubFundID" , rec."vchFundUnitPrice"  ,  
>> sub_head."vchValuationDate" ,rec."vchAdvisorLabel" from  "table1" rec join  
>> "table2" sub_head on rec."vchSubmittersCode"=sub_head."vchSubmittersCode" 
>> where  rec."bFetch"=false and sub_head."bFetch"=false ;
>>
>>
>>
>>
>>
>> Query taken around 7 minutes time to execute without indexes on required 
>> columns
>>
>>
>>
>>  SO  i need to execute this distinct query at less time by creating indexes 
>> on required columns of the tables
>>
>>
>>
>> i have created brin indexes on vchsubmitterscode of two tables
>>
>>
>>
>> i am not able to create brin indexes on bfetch tables as i got a error  
>> ERROR:  data type boolean has no default operator class for access method 
>> "brin"
>>
>> HINT:  You must specify an operator class for the index or define a default 
>> operator class for the data type.
>>
>>
>>
>>
>>
>> so please help in creating of the BRIN index on above column of the table as 
>> i need to reduce the query execution time
>>
>>
>>
>>
>>
>> Regards
>>
>>
>>
>> Durgamahesh Manne
>>
>> Again, BRIN indexes are not design to work on Boolean columns. If you want 
>> to index Boolean column, just create regular BTREE index.
>>
>> Regards,
>>
>> Igor
>>
>>
>>
>>
>
>
>  Hi
>
> I have already tried with BTREE indexes & HASH indexes on required columns 
> .but distinct query execution time was not reduced
>
>
> Query taken around 7 minutes time to execute with BTREE indexes & HASH 
> indexes on required columns
>
>
> Regards
>
> Durgamahesh Manne
>


-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.



Re: Regrading brin_index on required column of the table

2018-09-20 Thread Alban Hertroys
On Thu, 20 Sep 2018 at 11:42, Durgamahesh Manne
 wrote:

...

> |   ->  Sort  (cost=5871873.64..5973124.65 rows=40500405 width=89) (actual 
> time=326397.550..372470.846 rows=4050 loops=1)
>   |
> | Sort Key: sub_head."vchSubmittersCode", rec."vchFileName", 
> rec."vchCusipFundIDSubFundID", rec."vchFundUnitPrice", 
> sub_head."vchValuationDate", rec."vchAdvisorLabel" |
> | Sort Method: external merge  Disk: 3923224kB
>   
>|
The above is a clear sign of a problem.
To get distinct records, the results need to be sorted, and that
doesn't fit in the available memory and spills to disk.

The actual filters on the boolean fields, even though they are
performed in seq-scans, hardly take any time at all. The hash join to
combine them takes a bit over 6s.

> so i am unable to reduce the query execution time as it is taken around 7 
> minutes to execute with indexes & without indexes
>
>  please help in reducing the query execution time

...

> On Wed, Sep 19, 2018 at 9:07 PM Alban Hertroys  wrote:
>>
>> The results of explain analyze would shed light on the core problem.
>>
>> My guess is that your conditions are not very selective - ie. most
>> records in both tables have bFetch = false - and therefore you are
>> retrieving most of your data and that is what's taking 7 minutes. No
>> index is going to fix that.
>>
>> If those boolean values are distributed very unevenly (say 99.9% has
>> false and 0.1% has true), you may get better results by excluding the
>> records with 'true' values (instead of including those that are
>> 'false'), for example by using a where not exists(...) subquery.
>>
>> Obviously, that still won't help if you're just fetching a lot of data.
>> On Wed, 19 Sep 2018 at 16:23, Durgamahesh Manne
>>  wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Sep 19, 2018 at 7:41 PM Igor Neyman  wrote:
>> >>
>> >>
>> >>
>> >> From: Durgamahesh Manne [mailto:maheshpostgr...@gmail.com]
>> >> Sent: Wednesday, September 19, 2018 10:04 AM
>> >> To: Igor Neyman 
>> >> Subject: Re: Regrading brin_index on required column of the table
>> >>
>> >> On Wed, Sep 19, 2018 at 7:19 PM Igor Neyman  
>> >> wrote:
>> >>
>> >>
>> >>
>> >> From: Durgamahesh Manne [mailto:maheshpostgr...@gmail.com]
>> >> Sent: Wednesday, September 19, 2018 9:43 AM
>> >> To: PostgreSQL mailing lists 
>> >> Subject: Regrading brin_index on required column of the table
>> >>
>> >> Hi
>> >>
>> >> Respected postgres community members
>> >>
>> >>
>> >>
>> >> I have created BRIN index on few columns of the table without any issues. 
>> >> But i am unable to create BRIN index on one column of the table as i got 
>> >> error listed below
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> [local]:6263 postgres@huawei=# CREATE INDEX brin_idx on huawei using brin 
>> >> ("dFetch");
>> >>
>> >> ERROR:  data type boolean has no default operator class for access method 
>> >> "brin"
>> >>
>> >> HINT:  You must specify an operator class for the index or define a 
>> >> default operator class for the data type.
>> >>
>> >>
>> >>
>> >>  below is the column description:
>> >>
>> >> Column datatype   collationnullable   defaultstorage
>> >>
>> >>
>> >>
>> >>  dFetchboolean false  
>> >>   plain
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> so please help in creating of the BRIN index on above column of the table 
>> >> .
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Regards
>> >>
>> >>
>> >>
>> >> Durgamahesh Manne
>> >>
>> >>
>> >>
>> >> Why would you want BRIN index on Boolean-type column?

Re: *Regarding brin_index on required column of the table

2018-09-21 Thread Alban Hertroys
Your plan is not readable to me (perhaps because of gmail). Does
https://explain.depesz.com/ give you any useful insights?
On Fri, 21 Sep 2018 at 16:15, Durgamahesh Manne
 wrote:
>
>
> On Fri, Sep 21, 2018 at 7:38 PM Durgamahesh Manne  
> wrote:
>>
>> Hi
>>
>> Complex query taken around 30 minutes to execute even i have increased 
>> work_mem value to 4GB  temporarily as total ram is 16gb
>>
>> Explain analyze query taken around 30 minutes to execute even i have created 
>> partial indexes with where condition on required columns
>>
>>
>>
>> Below is the query plan for explain analyze query :
>>
>> | HashAggregate  (cost=16034967.98..16035010.75 rows=4277 width=1710) 
>> (actual time=1806653.536..1806680.802 rows=26098 loops=1)
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>   |
>> |   Group Key: (max((v."vchSubmittersCode")::text)), 
>> (max((v."vchRecordType")::text)), (max((v."vchSequenceNumber")::text)), 
>> v."vchContractNumber", v."vchContractPartyRoleQualifier", v."vchPartyRole", 
>> (max((v."vchPartyNatural_Non_NaturalEntity")::text)), 
>> (max((v."vchPartyLastName")::text)), (max((v."vchPartyFirstName")::text)), 
>> (max((v."vchPartyMiddleName")::text)), (max((v."vchPartyPrefix")::text)), 
>> (max((v."vchPartySuffix")::text)), (NULL::text), v."vchPartyID", 
>> (max((v."vchPartyIDQualifier")::text)), 
>> (max((v."vchTrustRevocabilityIndicator")::text)), (NULL::text), 
>> (NULL::text), (max((v."vchFiller1")::text)), 
>> (max((v."vchRejectCode")::text)), 
>> (max((v."vchContractEntityAddressLine1")::text)), 
>> (max((v."vchContractEntityAddressLine2")::text)), 
>> (max((v."vchContractEntityCity")::text)), 
>> (max((v."vchContractEntityState")::text)), 
>> (max((v."vchContractEntityZip")::text)), 
>> (max((v."vchContractEntityAddressLine3")::text)), 
>> (max((v."vchContractEntityAddressLine4")::text)), 
>> (max((v."vchContractEntityAddressLine5")::text)), 
>> (max((v."vchPartyDateofBirth")::text)), 
>> (max((v."vchPartyAddressLine1")::text)), 
>> (max((v."vchContractStatus")::text)), (string_agg(DISTINCT 
>> btrim((s."vchAgentTaxID")::text), ','::text)), v."vchPartyRole", 
>> (max((v."vchAdvisorLabel")::text)), v."vchFileName", 
>> (max((v.vchpartycity)::text)), (max((v.vchpartystate)::text)), 
>> (max((v.vchpartypostalcode)::text)), (string_agg(DISTINCT 
>> btrim((s."vchAgentFirstName")::text) || ' '::text) || 
>> btrim((s."vchAgentMiddleName")::text)) || ' '::text) || 
>> btrim((s."vchAgentLastName")::text)), ','::text)) |
>> |   ->  Append  (cost=48944.67..16034550.97 rows=4277 width=1710) (actual 
>> time=3324.233..1806605.691 rows=26098 loops=1)   
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>

Re: *Regarding brin_index on required column of the table

2018-09-21 Thread Alban Hertroys



> On 21 Sep 2018, at 17:49, Durgamahesh Manne  wrote:
> 
> 

Considering how hard you try to get rid of duplicates, I'm quite convinced that 
you're at least short a few join conditions. Getting rid of duplicates early 
has the added benefit of having to aggregate fewer rows, which should 
drastically improve the performance of this query.

In addition, your approach of using both distinct and group by raises a red 
flag that you're fighting symptoms (most likely: duplicate results) without 
understanding their causes. I can't think of a single situation where both are 
justified in the same (sub-)query.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: using the nextval('sequence_name') in sql, the result maybe is not right

2018-09-26 Thread Alban Hertroys
On Wed, 26 Sep 2018 at 14:08, Wanglin  wrote:
>
> Hi, all:
> PostgreSQL version : 10.3.  I use "nextval" in the sql , but I think the 
> result is not right, maybe it is a bug.
>The test case as bellow:
>create sequence seq1;
> select nextval('seq1');
> create table tx1(id1 int, id2 int);
> insert into tx1 select generate_series(1,100), random()*102;
> explain verbose select * from tx1 where id2 = nextval('seq1');;
> select * from tx1 where id2 = nextval('seq1');
> postgres=# explain verbose select * from tx1 where id2 = nextval('seq1');; 
> QUERY PLAN  Seq 
> Scan on public.tx1 (cost=0.00..43.90 rows=11 width=8) Output: id1, id2 
> Filter: (tx1.id2 = nextval('seq1'::regclass)) (3 rows)
>
> postgres=# select * from tx1 where id2 = nextval('seq1'); -- here, may be the 
> result is not right id1 | id2 -+- 56 | 57 (1 row)
>
> :: I think "nextval('seq1')" equal 2, so " select * from tx1 where id2 = 
> nextval('seq1')" equals " select * from tx1 where id2 = 2", is it ?

No. nextval("seq1") increments the sequence and returns the new value.
It never[*] returns the same value in subsequent calls, that is the
purpose of the function (and sequences in general).

Normally, you would assign a sequence to a surrogate key field in your
table, so that you automatically get unique values in that field
(unless you mess around).
That's not how you're using it, so I wonder what your purpose is for
the sequence?

[*] Never is never without exceptions, just like always always has.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.



Re: Want to acquire lock on tables where primary of one table is foreign key on othere

2018-10-11 Thread Alban Hertroys
On Thu, 11 Oct 2018 at 16:38, Abhishek Tripathi
 wrote:
> Actually I have acquired a "Select for Update" on a table whose id is 
> refrence as a foreign key on another table So I want those table won't update 
> until there is lock. Is it possible? Becuase postgres is acquiring lock but 
> AccessShare Lock which allow to write on those table How I restrict this.

For what purpose do you want that? What is inadequate about the lock
that Postgres acquires?

Table locks are very rarely what you want, as it blocks all concurrent
access to the entire table, while that is only necessary for a few
rarely used corner cases; a foreign key update is not among those.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.



Re: Want to acquire lock on tables where primary of one table is foreign key on othere

2018-10-11 Thread Alban Hertroys
You forgot to CC the list. Also, top-posting is generally not appreciated here.

> On Thu, Oct 11, 2018 at 8:26 PM Alban Hertroys  wrote:
> On Thu, 11 Oct 2018 at 16:38, Abhishek Tripathi
>  wrote:
> > Actually I have acquired a "Select for Update" on a table whose id is 
> > refrence as a foreign key on another table So I want those table won't 
> > update until there is lock. Is it possible? Becuase postgres is acquiring 
> > lock but AccessShare Lock which allow to write on those table How I 
> > restrict this.
> 
> For what purpose do you want that? What is inadequate about the lock
> that Postgres acquires?
> 
> Table locks are very rarely what you want, as it blocks all concurrent
> access to the entire table, while that is only necessary for a few
> rarely used corner cases; a foreign key update is not among those.

> On Fri, Oct 12, 2018 at 10:33 AM Abhishek Tripathi 
>  wrote:
> Thank you for your response. 
> 
> I am explaining my situation there is table A on which I have taken a row 
> lock and the primary key of table A is related with table B as a foreign key 
> so automatically I can not insert new row with that  foreign key  now the 
> primary key of table B is used as foreign key in table C and insertion can be 
> done on table C. I just want to lock table C also No insertion can be done on 
> table C related to table B primary key.
> 

> On 12 Oct 2018, at 8:08, Abhishek Tripathi  wrote:
> 
> My bad sorry actually there is updation-
> 
> there is table A on which I have taken a row lock and the primary key of 
> table A is related with table B as a foreign key so automatically I can not 
> insert new row with that  foreign key on table B that is fine now  table C 
> any  insertion can be done on table C. I just want to lock table C also No 
> insertion can be done on table C becuse table C primary key is related to 
> table B as a foreign key of B.


So your tables (simplified) are something like this?:
create table A (id primary key)
create table B (id primary key, a_id references A (id))
create table C (id primary key, b_id references B (id))

And you insert a new value for b_id in C? That's not possible, unless that 
value exists in table B, which in turn can't exist unless it's a_id exists in 
table A.
That's how foreign key constraints are designed, you don't need to do manual 
locking for that to work.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Optimizing Postgresql ILIKE while query

2018-10-22 Thread Alban Hertroys



> On 22 Oct 2018, at 7:56, aman gupta  wrote:
> 
> Issue:
> 
> We have the base table which contains 22M records and we created a view on 
> top of it while querying the view with ILIKE clause it took 44 seconds and 
> with LIKE Clause 20 Seconds
> 
> Query:
> 
> fm_db_custom_db=# EXPLAIN (ANALYZE, TIMING OFF)
> select 
> destination,hostname,inputfilename,inputtime,logicalservername,outputfilename,outputtime,processinglink,source,totalinputbytes,totalinputcdrs,totaloutputbytes,totaloutputcdrs
>  from mmsuper.test_20m_view  where inputfilename ilike 
> '%SDPOUTPUTCDR_4001_BLSDP09_ADM_4997_18-10-15-02549.ASN%';

Perhaps, when you have a question about timing, you shouldn't turn off the 
timing in the query plan? Now we can't see where the time is spent.

> 

That's all sequential scans that each remove a significant amount of rows. That 
probably costs a significant amount of time to do.

It looks like you don't have any indices on the underlying table(s) at all. I'd 
start there and then look at the ILIKE problem again. By that time, Pavel's 
suggestion for a trigram index on that text field is probably spot-on.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Query plan: SELECT vs INSERT from same select

2019-07-24 Thread Alban Hertroys


> On 23 Jul 2019, at 22:29, Alexander Voytsekhovskyy  
> wrote:
> 
> I have quite complicated query:
> 
> SELECT axis_x1, axis_y1, SUM(delivery_price)  as v_1 FROM (
> SELECT to_char(delivery_data.delivery_date, '-MM') as axis_x1, 
> clients.id_client as axis_y1, delivery_data.amount * production_price.price * 
> groups.discount as delivery_price
> 
> FROM delivery_data 
> JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt) 
> JOIN clients ON (client_tt.id_client = clients.id_client) 
> JOIN production ON (production.id = delivery_data.id_product) 
> JOIN groups ON (groups.id = delivery_data.delivery_group_id AND 
> client_tt.id_group = groups.id AND groups.id = clients.id_group) 

Are client_tt.id_group and clients.id_group ever different from each other? It 
looks like you might have redundant information there, but... If they are 
guaranteed to be the same then you don’t need the JOIN to clients, which would 
both remove a JOIN and reduce the complexity of the JOIN condition on groups.

Or (assuming the group id’s are indeed supposed to be equal), you could
 JOIN clients ON (client_tt.id_client = clients.id_client AND 
client_tt.id_group = clients.id_group)
instead of putting that condition within the JOIN condition on groups.

I don’t think either option will make a huge difference (the first probably 
more than the second, as it reduces an entire join), but it could be enough to 
help the database figure out a better plan.

> LEFT JOIN production_price on (delivery_data.id_product = 
> production_price.id_production AND groups.price_list_id = 
> production_price.price_list_id AND delivery_data.delivery_date BETWEEN 
> production_price.date_from AND production_price.date_to) 
> 
> WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30' 
> AND delivery_data.delivery_group_id IN (...short list of values...) 
> AND delivery_data.id_product IN ()) AS tmpsource 

You don’t have a price if your goods weren’t produced in the delivery window 
you set? Or do you have goods that get delivered without having a price?

You seem to be using this query for a report on nett sales by month, but I have 
my doubts whether that LEFT JOIN, and especially the condition on the 
production date window, is really what you want: Your formula for 
delivery_price includes the price column from that LEFT JOIN, so you’re going 
to get 0 values when there is no production_price record in your 
delivery-window, resulting in a SUM that’s too low if the product was produced 
before (or after, but that seems unlikely) the delivery window.

> WHERE TRUE 

This line is unnecessary.

> GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ())

Apparently (I’m new to these statements), CUBE (axis_x1, axis_y1) is a 
shorthand for the above. They seem to have been introduced at the same time (in 
9.6?). See: 
https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-GROUPING-SETS

> It runs well, took 1s and returns 4000 rows.

I won’t go into the performance issue ash this point, other more knowledgeable 
people already did.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Request for resolution || Support

2019-07-25 Thread Alban Hertroys


> On 24 Jul 2019, at 10:08, jay chauhan  wrote:
> 
> Hi Thomas, David/Team,
> 
> Thanks you for your response. However we need your confirmation whether my 
> Error/issue as mentioned below will be resolved if we upgrade our PostgreSQL 
> Version.

It won’t, you are talking about Oracle-specific features. You need to change 
the code. PostgreSQL is not Oracle, some features are quite RDBMS-specific.

> < compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit">>
> David response: Use a newer version
> Tomas response: Yeah, you should use release 11 for a new project.  
> 
> My Issue while migrating procedure/function from Oracle to PostgreSQL:
> Error-1) 
> ERROR:  cannot begin/end transactions in PL/pgSQL
> HINT:  Use a BEGIN block with an EXCEPTION clause instead.
> CONTEXT:  PL/pgSQL function 
> icmsuatnew.eload_commission_payout_active(text,text,text,text,text,text) line 
> 486 at SQL statement
> SQL state: 0A000
> David Response on it : Rewrite your code as instructed

How to handle these depends on your use of sub-transactions, but the HINT gives 
a pretty good general approach.

> Error-2)
> ERROR:  schema "utl_http" does not exist
> LINE 38: L_HTTP_REQUEST UTL_HTTP.REQ;
> ^
> SQL state: 3F000
> Character: 1785
> Thomas response: That's an Oracle thing for doing HTTP requests from PL/SQL.  
> To do that from plpgsql you could try an extension like this one:
> https://github.com/pramsey/pgsql-http
> Or you could write your own function in Python or  favourite PL>.  That's what I'd probably do.
> https://www.postgresql.org/docs/11/plpython-funcs.html

Initiating TCP/IP from the database means that a database process needs to wait 
for a response. In the meantime, it cannot do anything else. You’re effectively 
blocking it for other transactions and keeping that particular transaction 
‘waiting in transaction’ until, in the worst case, a time-out. That means that 
no maintenance can be done on records touched by this transaction, which can 
lead to bloat.

This is generally considered a bad idea, at least in this community. You’re 
usually better off handing the connection over to an external process that 
reports back to the database when appropriate.
The exception to that is if your transaction cannot be allowed to commit 
without a response from the other peer. In such cases it is appropriate to use 
plpython, plperl, etc

All that said, I am talking about PostgreSQL here. If you’re instead using 
EnterpriseDB, which does have an Oracle compatibility layer that could perhaps 
support these features (I don’t know), you should talk to the EnterpriseDB guys.

Alban Hertroys
--
There is always an exception to always.








Re: adding more space to the existing server

2019-08-03 Thread Alban Hertroys


> On 2 Aug 2019, at 21:45, Julie Nishimura  wrote:

> 1) We use streaming replication, and due to hardware limitation, we cannot 
> add more drives to the existing host. That is why we thought by breaking the 
> existing streaming replication (from a->b), instead of currently identical 
> standby (b), we can introduce twice larger host, then start the replication 
> to the newly larger host, and when it is caught up, break it again. Then 
> break rep again, make modification to 'a" host, making it larger, then 
> replicate b->a. After it is caught up, break the rep again, switch 
> master->standby (if necessary).

Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done 
(streaming) replication once and I managed to mess that up in a minor way 
(disabled the wrong service during failover, so data still went to the database 
I was attempting to replace for, like, 15 minutes).

> 2) I am not sure about the time, but it is understood it is required 2 full 
> replication cycles, and might be up to 2 weeks with no standby situation

No standby situation? Murphy is probably just waiting for that to strike…
I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a 
migration of a couple dozen terabytes from her main server (because of some 
failing disks in her RAID set) using her backup server to move data around 
(with backups removed to make room), when, due to an accident outside the 
building, an aerial 10KV power line hit another power line in the ground, 
causing a fire in one UPS and frying the other one. Losing power at that point 
meant that the file systems (ZFS) on both servers ended up in an unrecoverable 
state with no backups. It didn’t help that the UPS’s were at the bottom of the 
rack, with the heat and smoke going up into the servers. What are the chances, 
right? (And then it turned out that it is really hard to try to recover data 
from a ZFS file system in such a state, which is what her actual inquiry was 
about)

I would definitely prefer to add a 3rd machine into the mix, even if it were 
just a temporary machine - a rental perhaps?

From there, I’m certain Adrian knows more about replication than I do. I’d go 
with the approach he suggested.

> 4) by pg_basebackup and restore
> 
> As of now, we are thinking about possibly other solutions, as of splitting 
> existing 37 databases on the cluster into 2 hosts with their own standbys. 
> This solution requires breaking up existing replication as well. Can you 
> please point me to some document which lists all steps describing breaking up 
> the existing replication properly? we are using 9.6 postgres

I’m going to assume that you will have data coming in while this split is 
taking place and that you therefore cannot offline the entire set of databases 
for as long as this takes. If not, that would probably allow for a simpler 
(faster) scenario.

I think the easiest for this scenario would be to add two more machines (c and 
d) and replicate them off the current setup. You want that to happen as 
parallel as possible, so perhaps replicate c off a and d off b.

If you aren’t already using “replication slots”, I found that to make things 
both easier to understand and more reliable. You can query their status, for 
one thing.

Those replicas will take extra time of course (about double) because you’re 
replicating twice what you need, but I don’t think you can replicate parts of a 
cluster with your setup unless you go for a different replication approach (I 
think per database replication requires statement level replication?).

After that, decouple both sets into:
a —> b (your current machine)
c —> d (the new ones)

(Although any order should be fine, really, as long as they have caught up.)

At that point I would probably (temporarily) pause replication in at least one 
set and create a backup of that.

This is the point to start removing superfluous databases from a and c (so that 
a+c make up the complete set again).
After verifying that no databases are missing, unpause replication.

If instead you find that you accidentally removed a database from both a and c, 
you still have replicas to recover it from. And the backups, of course, but 
that will not contain the data that came in after replication was paused.

I do hope the remaining 3% disk space is enough to cover all that, though...

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: adding more space to the existing server

2019-08-05 Thread Alban Hertroys
On 5 Aug 2019, at 0:39, Julie Nishimura  wrote:

Alban, thank you for your reply. Your suggestion makes sense, and I will be
talking to our engineers about it. Currently we need to understand:

a)   How do we break A -> B replication such that both can become
independent primaries


That is pretty much like normal failover from A to B, except that you don’t
reverse replication.

You will need to stop your clients from sending data for a bit (if it’s
continuous data, having a buffer in between is a big help - at our company
we’re looking into Apache Kafka for that), so that you can switch half of
them to connect to B instead of A.

Next, you promote B to master. I used the docs for that last time, and they
were pretty clear on the subject.


b)  How do we reassign C from B->C replication to A->C replication


I don’t think you need to. If you indeed already have A->B->C, after
promoting B to master, you end up with B->C, which is alright.
You just need to add A->D for the other set.

c)   Thoughts on why this isn’t a good plan


That depends on your clients and how you decide which database in the
current cluster they connect to. If you connect specific clients to
specific databases, then all you need to do is to configure half your
clients to connect to B instead.

Another option is to put a virtual database layer in front, such that both
clusters still look like a single database to the outside world. We have
some experience with Dremio for similar purposes (although for read-only
reporting). Mind that the community edition doesn’t do authorization.

Current:
 A replicates to B
all requests go to A

Soon:
  A replicates to B -> cascading to C and D

Transition:
  break A replication to B such that both can become primary

Correct.

  stop B replication to C  then setup A to replicate to C

I would change this in:
   setup A to replicate to D


End state:
A replicates to C
B replicates to D

End state:
A replicates to D
B replicates to C


we remove some of the dbs from A and B, then reassign the traffic based on
db selections

I hope it all makes sense...

Thank you

It does to me. Now would be a good time for people to chime in if they
don't agree ;)


From: Alban Hertroys 
Sent: Saturday, August 3, 2019 3:15 AM
To: Julie Nishimura 
Cc: Adrian Klaver ;
pgsql-general@lists.postgresql.org ;
pgsql-general 
Subject: Re: adding more space to the existing server


> On 2 Aug 2019, at 21:45, Julie Nishimura  wrote:

> 1) We use streaming replication, and due to hardware limitation, we
cannot add more drives to the existing host. That is why we thought by
breaking the existing streaming replication (from a->b), instead of
currently identical standby (b), we can introduce twice larger host, then
start the replication to the newly larger host, and when it is caught up,
break it again. Then break rep again, make modification to 'a" host, making
it larger, then replicate b->a. After it is caught up, break the rep again,
switch master->standby (if necessary).

Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only
done (streaming) replication once and I managed to mess that up in a minor
way (disabled the wrong service during failover, so data still went to
the database I was attempting to replace for, like, 15 minutes).

> 2) I am not sure about the time, but it is understood it is required 2
full replication cycles, and might be up to 2 weeks with no standby
situation

No standby situation? Murphy is probably just waiting for that to strike…
I recall a fairly recent story on the FreeBSD ML about someone on Malta
doing a migration of a couple dozen terabytes from her main server (because
of some failing disks in her RAID set) using her backup server to move data
around (with backups removed to make room), when, due to an accident
outside the building, an aerial 10KV power line hit another power line in
the ground, causing a fire in one UPS and frying the other one. Losing
power at that point meant that the file systems (ZFS) on both servers ended
up in an unrecoverable state with no backups. It didn’t help that the UPS’s
were at the bottom of the rack, with the heat and smoke going up into the
servers. What are the chances, right? (And then it turned out that it is
really hard to try to recover data from a ZFS file system in such a state,
which is what her actual inquiry was about)

I would definitely prefer to add a 3rd machine into the mix, even if it
were just a temporary machine - a rental perhaps?

>From there, I’m certain Adrian knows more about replication than I do. I’d
go with the approach he suggested.

> 4) by pg_basebackup and restore
>
> As of now, we are thinking about possibly other solutions, as of
splitting existing 37 databases on the cluster into 2 hosts with their own
standbys. This solution requires breaking up existing replication as well.
Can you please point me to some document which lists a

Re: adding more space to the existing server

2019-08-05 Thread Alban Hertroys


> On 5 Aug 2019, at 17:27, Julie Nishimura  wrote:
> 
> Thanks for your reply Alban. Currently we only have A->B replication. Is 
> adding B->C replication difficult? I remember in the past I tried to seed 
> pg_basebackup from hot standby, and it was erroring out after awhile, so 
> needed to switch to run from master.

I’ve never used multi-level replication (or cascading replication, as it’s 
called in the PG docs).

I expect that replication slots (w/ streaming replication) would be resilient 
to streaming to multiple levels, provided that you have the disk space left to 
keep the logs that your farthest-behind replica is at, but I do not know that. 
I’m pretty sure that’s being done though and not just theoretically possible.

The basic problem you have is a bit like Towers of Hanoi. The details of 
setting up each replica may be a little complicated, but if you look at it from 
a high-level design phase, you’re just shuffling around clusters. The details 
come later, when you’re actually designing how to apply those replicas/clusters.

One of the main problems is that creating a new replica takes a lot of time, 
you want to minimise the total time that takes. Apparently, you have neither C 
nor D ready yet, so you need to create two replicas - being able to do those in 
parallel instead of sequentially would save you time.

The other problem is that, to be on the safe side, you want to have a full 
replica of A at any point in time. If you disconnect B from A before either C 
or D is complete, you run a risk: If A fails, you don’t have the data that 
accumulated while B was ‘offline’. So that’s not the best scenario.

That is why I think your initial set of replicas should look like:

A — B — C
 \
  D

IIRC, streaming replication uses pg_basebackup to create the initial replica 
and then it streams what’s needed to catch up. With replication slots, the 
master knows what the slaves still need, so it won’t clean up too early. 
Apparently, the slave (B) knows that it needs to retain data for C as well. It 
looks perfectly safe on paper, except for the replication lag between A — B.

You can, according to the docs, cascade replicas from each other and that even 
allows replication (among the slaves) to go on after the master gets 
disconnected - quite what you need for the B — C chain, I would think.

Take a look at: 
https://www.postgresql.org/docs/11/warm-standby.html#STREAMING-REPLICATION
And note the sections on Replication Slots and on Cascading Replication.

The replication lag between A — B can be solved by pausing the clients 
connecting to A (half of them need to be changed to B anyway) while B is 
catching up on its lag. You probably have at least that much down-time to 
change the connections anyway.

Regards,

Alban.


> From: Alban Hertroys 
> Sent: Monday, August 5, 2019 5:01 AM
> To: Julie Nishimura 
> Cc: Adrian Klaver ; 
> pgsql-general@lists.postgresql.org ; 
> pgsql-general 
> Subject: Re: adding more space to the existing server
>  
> 
>> On 5 Aug 2019, at 0:39, Julie Nishimura  wrote:
>> 
>> Alban, thank you for your reply. Your suggestion makes sense, and I will be 
>> talking to our engineers about it. Currently we need to understand:
>> 
>> a)   How do we break A -> B replication such that both can become 
>> independent primaries
> 
> That is pretty much like normal failover from A to B, except that you don’t 
> reverse replication.
> 
> You will need to stop your clients from sending data for a bit (if it’s 
> continuous data, having a buffer in between is a big help - at our company 
> we’re looking into Apache Kafka for that), so that you can switch half of 
> them to connect to B instead of A.
> 
> Next, you promote B to master. I used the docs for that last time, and they 
> were pretty clear on the subject.
> 
> 
>> b)  How do we reassign C from B->C replication to A->C replication
> 
> I don’t think you need to. If you indeed already have A->B->C, after 
> promoting B to master, you end up with B->C, which is alright.
> You just need to add A->D for the other set.
> 
>> c)   Thoughts on why this isn’t a good plan
> 
> That depends on your clients and how you decide which database in the current 
> cluster they connect to. If you connect specific clients to specific 
> databases, then all you need to do is to configure half your clients to 
> connect to B instead.
> 
> Another option is to put a virtual database layer in front, such that both 
> clusters still look like a single database to the outside world. We have some 
> experience with Dremio for similar purposes (although for read-only 
> reporting). Mind that the community edition doesn’t do authorization.
> 
>> Current: 
>>  A replicates to B
>> all requests go to A
>>

Re: A GROUP BY question

2019-08-13 Thread Alban Hertroys


> On 13 Aug 2019, at 13:10, stan  wrote:
> 
> select 
>   project.proj_no ,

Removed columns that get in the way of your desired result. You can’t have both 
details and the sum over them in a meaningful way.

>   SUM (rate.rate * task_instance.hours) 
> from 
>   task_instance
> join rate on 
>   rate.employee_key = task_instance.employee_key
>   AND
>   rate.work_type_key = task_instance.work_type_key

(break)

> inner join employee on
>   rate.employee_key = employee.employee_key
> inner join work_type on
>   rate.work_type_key = work_type.work_type_key

These are now probably redundant, you don’t need them unless they filter your 
results.

> inner join project on
>   project.project_key = task_instance.project_key

And this JOIN could be dropped if project_key and proj_no weren’t different 
fields. If both are unique in project, you could drop one of them and keep the 
same functionality with fewer joins. That said, in the “war” between surrogate 
and natural keys I’m on the natural keys side. Clearly, not everyone agrees on 
that.

> GROUP BY 
>   project.project_key ,

Same columns removed here too.

> ORDER BY 
>   project.proj_no 
>   ;

That should give you the total cost for each project.

You could get the same result repeated per employee and per work type as you 
tried originally, by putting the above revised query as a subquery and joining 
that back into the full query in the place of your project-related tables (add 
the project_key so you have something to join against).

The repeated sum risks getting multiplied in the final output though, 
especially if unaware people will be putting the results in an Excel sheet or 
something. From experience, that either results in people reporting the wrong 
financial results (several orders too high) or blaming your query.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: A GROUP BY question

2019-08-13 Thread Alban Hertroys


> On 13 Aug 2019, at 15:19, David G. Johnston  
> wrote:
> 
> On Tuesday, August 13, 2019, Alban Hertroys  wrote:
> 
> > On 13 Aug 2019, at 13:10, stan  wrote:
> > 
> > select 
> >   project.proj_no ,
> 
> Removed columns that get in the way of your desired result. You can’t have 
> both details and the sum over them in a meaningful way.
> 
> Sure you can, at least generally, with Window Functions/Expressions (i.e., 
> OVER)

That’s why I added “in a meaningful way” ;)

Repeating the same SUM-result on every line in a group is not what I’d call a 
meaningful result; the SUM has no bearing on the detailed line and leads to the 
kind of mistakes I already mentioned.
(For the record; I do this kind of grouping in a hierarchical database 
regularly, but there the grouped SUM is at a different level in the hierarchy 
and I consider it thus sufficiently separated from the detail rows.)

Besides, I figured the OP was already struggling with the query syntax, adding 
window functions into the mix didn’t seem a good idea in the context. 
Possible?, sure, desirable?, I would say not.

Alban Hertroys
--
There is always an exception to always.








Re: Question about password character in ECPG's connection string

2019-08-28 Thread Alban Hertroys


> 2)  "tcp:postgresql://localhost?user=myuser&password=password" looks like
> 
> "tcp:postgresql://localhost?user=myuser&password=my&pwd"
> 
> and password is parsed on the & and you also end up with an extra parameter 
> pwd

Perhaps it helps to URL-encode the & in the password as %26?

Alban Hertroys
--
There is always an exception to always.








Re: Arrays and ANY problem

2019-09-25 Thread Alban Hertroys


> On 25 Sep 2019, at 22:25, David Salisbury  wrote:
> db=# select name from table_name_ds_tmp where ARRAY[categoryid] = ANY ( 
> select string_to_array( '200,400', ',')::bigint[] );
>  name 
> --
> (0 rows)

You are comparing two arrays for equality. Since the left-hand array has only 1 
item and the right-hand one has two, there’s not much equality between them.

You probably meant:
select name from table_name_ds_tmp where categoryid = ANY ( select 
string_to_array( '200,400', ',')::bigint[] );


Alban Hertroys
--
There is always an exception to always.








Re: Arrays and ANY problem

2019-09-25 Thread Alban Hertroys


> On 25 Sep 2019, at 22:50, Alban Hertroys  wrote:
> 
> 
>> On 25 Sep 2019, at 22:25, David Salisbury  wrote:
>> db=# select name from table_name_ds_tmp where ARRAY[categoryid] = ANY ( 
>> select string_to_array( '200,400', ',')::bigint[] );
>> name 
>> --
>> (0 rows)
> 
> You are comparing two arrays for equality. Since the left-hand array has only 
> 1 item and the right-hand one has two, there’s not much equality between them.
> 
> You probably meant:
> select name from table_name_ds_tmp where categoryid = ANY ( select 
> string_to_array( '200,400', ',')::bigint[] );

Or rather:
select name from table_name_ds_tmp where categoryid = ANY ( string_to_array( 
'200,400', ',')::bigint[] );

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: The connection to the server was lost. Attempting reset: Failed.

2019-10-10 Thread Alban Hertroys


> On 10 Oct 2019, at 17:55, Yessica Brinkmann  
> wrote:
> 
> I really thought a lot, but I don't understand why but the function fails 
> after the expression is executed:
> appendStringInfo (& cols, "% s a.attnum =% d", (i> 0? "OR": ""), idxcd-> 
> varattno [i]);
> The error appears only to me when entering the cycle:
> foreach (cell, candidates) / * foreach cell in candidates * /
> more than once, that is, when you have more than one candidate index. If the 
> cycle is entered only once, the function works correctly.
> The error that appears to me is that the connection to the PostgreSQL server 
> is directly lost. I proved that the error occurs in that statement, printing 
> some values.

There is probably an error in the Postgres log-file providing you more info.

That said, at least the below bit in your code is dangerous:

foreach( cell, candidates ) /* foreach cell in candidates */
{

idxcd = (IndexCandidate*)lfirst( cell );

if( !idxcd->idxused )
continue;

if (idxcd!=NULL)
{


You should at least check for NULL before referencing an attribute of that 
structure. Personally, I would invert the test like so (and then move it before 
the idxused test:

if (idxcd == NULL) {
elog( INFO, "idxcd IS NULL" );
continue; /* Or is that fatal enough to break instead? */
)

if (!idxcd->idxused)
continue;



Alban Hertroys
--
There is always an exception to always.








Re: Finding out about the dates of table modification

2019-11-23 Thread Alban Hertroys



> On 23 Nov 2019, at 3:24, Martin Mueller  
> wrote:
> 
> I've moved happily from MySQL to Postgres but miss one really good feature of 
> MYSQL: the table of tables that let you use SQL queries to find out metadata 
> about your table. Thus looking at the table of tables and sorting it by last 
> change, lets you quickly look at the most recently modified table. Which is 
> really useful if you have a bad memory, as I do, and can't remember the name 
> of a particular table that I worked on last Wednesday. 
> 
> Are those capabilities hidden somewhere in Postgres?  There isn't an obvious 
> section in the documentation. At least I can't find it. 

AFAIK, there’s nothing like that built-in, but it’s not impossible to deduce.

You could start with getting a list of files in $PG_DATA/base that were 
modified in that period (provided you have sufficient privileges on that 
directory):

find base/ -mtime -2 -type f -print

For figuring out to what tables these files belong [1]:

pg_filenode_relation(0, );

and:

pg_relation_filepath();

For example, I did:
# create table creation_date(test text);

[/home/postgres/10/data]$ find base/ -mtime -2 -type f -print   
  
base/16403/2608
base/16403/29784
base/16403/2659
base/16403/29789
base/16403/2678
base/16403/29787
base/16403/2662
base/16403/2703
base/16403/2679
base/16403/2673
base/16403/2658
base/16403/1249
base/16403/2610
base/16403/2704
base/16403/2674
base/16403/3455
base/16403/2663
base/16403/1247
base/16403/1259

The lower numbers are probably core tables, such as pg_depend:
# SELECT pg_filenode_relation(0, 2608); -- 0 being the default table-space
 pg_filenode_relation 
--
 pg_depend
(1 row)

But!:
# SELECT pg_filenode_relation(0, 29784);
 pg_filenode_relation 
--
 creation_date
(1 row)


And indeed:
# select pg_relation_filepath('creation_date');   
 pg_relation_filepath 
--
 base/16403/29784
(1 row)


I was looking for the inverse function pg_filepath_relation(), but 
that does not appear to exist; That would have been useful in combination with 
file listings like those from `find`.

Mind that larger tables consist of multiple files. I’m sure this would become a 
head-ache quick on a larger database. Having an actual creation-date of a file 
would be nice too, but that doesn’t necessarily mean much when growing tables 
create extra files too.

Apparently, someone already turned the process into a number of queries[2]. As 
they mention though, it’s not 100% reliable though, as there are operations 
that recreate table files, such as CLUSTER.

Then again, if you’re just looking for the table you created last Wednesday, 
that’s probably not a major concern.


Another option is to add a DDL Event trigger on create table statements and log 
that to some table[3].

Regards,
Alban Hertroys

[1]: https://www.2ndquadrant.com/en/blog/postgresql-filename-to-table/
[2]: 
https://stackoverflow.com/questions/18849756/automatically-drop-tables-and-indexes-older-than-90-days/18852752#18852752
[3]: https://www.postgresql.org/docs/current/event-triggers.html

--
There is always an exception to always.








Re: Range contains element filter not using index of the element column

2019-11-28 Thread Alban Hertroys


> On 27 Nov 2019, at 10:32, Lauri Kajan  wrote:
> 
> Hi all,
> I'm wondering if there are anything to do to utilize a index when doing a 
> range contains element  query. I have tested this with 9.6 and 12.0.
> 
> I have a table with a timestamp column that has a btree index.
> I would like to do a query:
> SELECT * FROM table WHERE ts <@ tsrange($1, $2, '(]');
> The index is not used and a seq scan is done instead.
> 
> To use the index correctly I have to do the query like this:
> SELECT * FROM table WHERE  ($1 IS null OR $1 < ts) AND ($2 IS null OR ts <= 
> $2);
> I like the <@ syntax more. Is there something I can do differently? Maybe a 
> different type of index instead?

Does it help to use timestamps -infinity and infinity instead of nulls in your 
case?

=> select t, t < current_timestamp, current_timestamp <= t from (values 
('-infinity'::timestamp), ('infinity'::timestamp)) x(t);
   
 t | ?column? | ?column? 
-------+--+--
 -infinity | t| f
 infinity  | f| t
(2 rows)

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Conditional return of aggregated data

2019-12-02 Thread Alban Hertroys
On Mon, 2 Dec 2019 at 12:11, Laura Smith 
wrote:

>
> My initial idea was something along the lines of :
>  select (select sum(statcount) from stats_residence) as
> aggstat,statcount,short_name_en from stats_residence where
> aggstat>some_number;
>

One option is to move the aggregate to the where-clause. If you also need
the value in your select-list, you can just repeat the subselect there,
usually the planner is smart enough to figure out that it can just re-use
the result.

select short_name_en from stats_residence where (select sum(statcount) from
stats_residence) >some_number;

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: SQL operator '*='

2019-12-23 Thread Alban Hertroys


> On 23 Dec 2019, at 15:33, Matthias Apitz  wrote:
> 
> #ifdef DBSPOS
>   EXEC SQL DECLARE land_cursor CURSOR FOR 
>   SELECT stammprio, lkz, landbez, plkz, postbez, karenz1, 
> karenz2, 
>   karenz3, land.wkz, webez, we, kurs, land.del
>   FROM   land

 LEFT JOIN devisen ON land.wkz = devisen.wkz AND land.brgroup = 
devisen.brgroup

>   WHERE  land.brgroup = :brgroupHost_for_helpland_cursor
>   ORDER  BY stammprio, landbez;
> #endif

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: UPDATE many records

2020-01-06 Thread Alban Hertroys


> On 6 Jan 2020, at 21:15, Israel Brewster  wrote:
> 
>> On Jan 6, 2020, at 10:08 AM, Christopher Browne  wrote:
>> 
>> On Mon, 6 Jan 2020 at 13:36, Israel Brewster  wrote:
>> Thanks to a change in historical data, I have a need to update a large 
>> number of records (around 50 million). The update itself is straight 
>> forward, as I can just issue an "UPDATE table_name SET 
>> changed_field=new_value();" (yes, new_value is the result of a stored 
>> procedure, if that makes a difference) command via psql, and it should work. 
>> However, due to the large number of records this command will obviously take 
>> a while, and if anything goes wrong during the update (one bad value in row 
>> 45 million, lost connection, etc), all the work that has been done already 
>> will be lost due to the transactional nature of such commands (unless I am 
>> missing something).
>> 
>> Given that each row update is completely independent of any other row, I 
>> have the following questions:
>> 
>> 1) Is there any way to set the command such that each row change is 
>> committed as it is calculated?
>> 2) Is there some way to run this command in parallel in order to better 
>> utilize multiple processor cores, other than manually breaking the data into 
>> chunks and running a separate psql/update process for each chunk? Honestly, 
>> manual parallelizing wouldn’t be too bad (there are a number of logical 
>> segregations I can apply), I’m just wondering if there is a more automatic 
>> option.
>> 
>> Yeah, I'd be inclined to do this in batches.

I think you’re overcomplicating the matter.

I’d just do it as a single update in one transaction. It’s only 50M rows. It 
may take half an hour or so on decent hardware, depending on how 
resource-intensive your function is.

If that fails[1], only then would I start looking into batching things. But 
then you still need to figure out why it fails and what to do about that; if it 
fails it will probably fail fast, and if not, then you’re looking at a one-off 
situation that won’t require more than a few workarounds - after which you can 
just run the update again.

Ad 1). No harm has been done, it’s a single transaction that rolled back.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Trigger

2020-02-25 Thread Alban Hertroys


> On 25 Feb 2020, at 17:53, Adrian Klaver  wrote:
> 
> On 2/25/20 12:01 AM, Sonam Sharma wrote:
>> I have a trigger, like many other triggers that fire after
>> update and checks a field of the OLD set. For some reason this trigger throw 
>> this error:
>> ERROR: record "old" has no field "ivo_sts_cd" CONTEXT: SQL statement
> 
>> if exc_count = 0 then
>> UPDATE pps.T8071_CAI_IVO_HDR SET IVO_STS_CD = 1 where 
>> T616_VBU_NBR=old.T616_VBU_NBR and T617_FNC_TYP_CD=old.T617_FNC_TYP_CD and
>> T8071_CAI_IVO_ID=old.T8071_CAI_IVO_ID and T8071_ADD_DM= old. T8071_ADD_DM 
>> and old.ivo_sts_cd != 10 and old.ivo_sts_cd != 3;
> 
> Realized I went through the above to quickly. I do not see a SET, nor am I 
> clear what table you are trying to UPDATE.

I’m pretty sure that if the OP were to format their query in a more readable 
and consistent way, they would spot their error pretty quickly. It’s a simple 
typo.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Advice request : simultaneous function/data updates on many databases

2020-03-05 Thread Alban Hertroys


> On 4 Mar 2020, at 23:42, Guyren Howe  wrote:
> 
> On Mar 4, 2020, at 14:33 , Rory Campbell-Lange  
> wrote:
>> 
>> Essentially we wish to reduce the window where the frontend and backend
>> aren't synchronised.
>> 
>> If we have (for example) 200 databases which each take 2 seconds to
>> update, a client could be on the wrong frontend code for over 6 minutes.
>> Send each of the servers a PL/PGSQL method that executes all the things in a 
>> transaction and then waits until the same clock time to commit. Then all the 
>> servers are committing at the same moment. They will still be out of synch 
>> somewhat, but this would reduce the degree.


I’m wondering whether this could be done with a more generic event-based 
approach, where each server sends a ‘done’ event to a central machine once it’s 
ready to commit, and the central machine returns an ‘acknowledged’ once the 
last server sent it’s ‘done’ event.
The challenge there is that the ‘ack’ needs to be caught and processed within 
the same waiting transaction… Not sure how to do that right now - maybe through 
web services, MQTT or similar.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key

2020-06-01 Thread Alban Hertroys


> On 1 Jun 2020, at 20:18, Shaheed Haque  wrote:
> 
> Hi,
> 
> I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB 
> column called 'snapshot'. In Python terms, each row's 'snapshot' looks like 
> this:
> 
> ==
> snapshot = {
> 'pay_definition' : {
> '1234': {..., 'name': 'foo', ...},
> '99': {..., 'name': 'bar', ...},
> }
> ==
> 
> I'd like to find all unique values of 'name' in all rows of MyModel. I have 
> this working using native JSON functions from the ORM like this:
> 
> =
> class PayDef(Func):
> function='to_jsonb'
> 
> template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"
> 
> MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef',
>  flat=True)
> =
> 
> So, skipping the ordering/distinct/ORM parts, the core looks like this:
> 
> to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name’)


I do something like this to get a set of sub-paths in a JSONB field (no idea 
how to write that in Django):

select snapshot->’pay_definition’->k.value->’name’
  from MyModel
  join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true

I don’t know how that compares performance-wise to using jsonb_each, but 
perhaps worth a try. Obviously, the way it’s written above it doesn’t return 
distinct values of ’name’ yet, but that’s fairly easy to remedy.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key

2020-06-02 Thread Alban Hertroys


> On 2 Jun 2020, at 9:30, Shaheed Haque  wrote:
> 
> 
>> I do something like this to get a set of sub-paths in a JSONB field (no idea 
>> how to write that in Django):
>> 
>> select snapshot->’pay_definition’->k.value->’name’
>>   from MyModel
>>   join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true
>> 
> I was unaware of the LATERAL keyword, so thanks. After a bit of Googling 
> however, it seems that it is tricky/impossible to use from the ORM (barring a 
> full scale escape to a "raw" query). One question: as a novice here, I think 
> I understand the right hand side of your JOIN "... k(value)" is shorthand for:
> 
> ... AS table_name(column_name)
> 
> except that I don't see any clues in the docs that jsonb_object_keys() is a 
> "table function". Can you kindly clarify?

Correct. Thomas already explained the return type, but the plural form of the 
function name is also an indication that it returns multiple results.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: table name

2020-06-11 Thread Alban Hertroys


> On 11 Jun 2020, at 20:58, Paul Förster  wrote:
> 
> Hi Marc,
> 
>> On 11. Jun, 2020, at 20:54, Marc Millas  wrote:
>> sorry if my question is tooo simple :-)
> 
> it's not. :-)
> 
>> obviously if I ask:
>> select * from regions-20180101;
>> I get a syntax error.
>> if I try select * from $$regions_20180101$$;
>> I get another syntax error.
>> If I try to rename that table, same thing.
>> if I try a cte, same thing.
>> 
>> What should I do ?
> 
> you can just quote its name:
> 
> select * from "regions-20180101";
> 
> Cheers,
> Paul

The background here is that ‘’ and $$ are quoting of literals (strings, 
integers, JSON objects, etc.), while “” is identifier quoting (tables, indices, 
types, etc.).

Identifier quoting not only allows to include special symbols, it also makes 
the identifier case-sensitive. That’s probably why Paul suggested to rename the 
table to no longer require identifier quoting - many people consider it a PITA, 
but it can be used to get out of trouble like yours - some people insist on it, 
for example because it makes using camel-caps in identifiers meaningful.

Regards,

Alban Hertroys
--
There is always an exception to always.








Re: Postgres and alias

2020-08-28 Thread Alban Hertroys


> On 28 Aug 2020, at 2:14, Fontana Daniel C (Desartec S.R.L.) 
>  wrote:
> 
> Perfect.
> 
> now let's imagine that '1234567890' is a function f_art_get_price(id_code),
> which returns in a string like the following 'XXXZMMM1234567890123/mm/dd' 
> where 1234567890123 is the price and /mm/dd the date it was last changed 
> price.
> How would you do in this case to obtain these values ​​separately?
> without calling the function 2 times avoiding overloading the base?
> 
> something like this
> 
> select art.description,
>f_art_get_price_str( art.id ) as ls_price_and_date
>   SUBSTRING( ls_price_and_date, 7, 13 )
> from articulos;

Let's assume art is supposed to be an alias for articulos.
Something like this?:

select art.description,
p.ls_price_and_date,
SUBSTRING( p.ls_price_and_date, 7, 13 )
from articulos art
cross join lateral f_art_get_price_str( art.id ) p(ls_price_and_date);


Alban Hertroys
--
There is always an exception to always.








Re: How to properly query lots of rows based on timestamps?

2020-08-29 Thread Alban Hertroys


> On 29 Aug 2020, at 10:24, Thorsten Schöning  wrote:
> 
> Hi all,
> 
> I have a table containing around 95 million rows, pretty much only
> storing a timestamp and further IDs of related tables containing the
> actual data in the end.
> 
>> CREATE TABLE clt_rec
>> ( 
>>  id BIGSERIAL NOT NULL, 
>>  oms_rec BIGINT NOT NULL, 
>>  captured_at TIMESTAMP(6) WITH TIME ZONE NOT NULL, 
>>  rssiSMALLINT NOT NULL, 
>>  CONSTRAINT pk_clt_rec PRIMARY KEY (id), 
>>  CONSTRAINT fk_clt_rec_oms_rec FOREIGN KEY (oms_rec) REFERENCES "oms_rec" 
>> ("id"), 
>>  CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec) 
>> );
> 
> In many use cases I need to search all of those rows based on their
> timestamp to find rows arbitrary in the past: Sometimes it's only 15
> minutes into the past, sometimes it's 2 years, sometimes it's finding
> the first day of each month over 15 months for some of those telegrams
> etc. In the end, I pretty often need to compare those timestamps and
> some queries simply take multiple seconds in the end, especially
> adding up if multiple, but slightly different queries need to be
> executed one after another. The following are two abstracts of
> Postgres' query plans:
> 
> Plan 1:
> 
>> ->  Nested Loop  (cost=1.14..343169.49 rows=43543 width=20) (actual 
>> time=0.313..113.974 rows=34266 loops=3)
>>->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  
>> (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.153..20.192 
>> rows=34266 loops=3)
>>Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp 
>> with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 
>> 00:00:00+02'::timestamp with time zone + '1 day'::interval)))
>>->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.80 rows=1 
>> width=12) (actual time=0.002..0.002 rows=1 loops=102799)
>>Index Cond: (id = clt_rec.oms_rec)

What happens here is that the planner looks up the lower and upper boundaries, 
everything in between those index nodes is a candidate record. Next, it loops 
over those to match the other condition of your query (id = clt_rec.oms_rec). 
You didn’t tell whether there’s an index on that column.

You’d probably see a performance improvement were you to create an index on 
(captured_at, id). If your Postgres version is somewhat recent, that could even 
lead to an Index Only Scan.


> Plan 2:
> 
>> ->  Nested Loop  (cost=1.14..836381.50 rows=111934 width=20) (actual 
>> time=0.379..911.697 rows=334465 loops=3)
>>->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  
>> (cost=0.57..8819.57 rows=111934 width=24) (actual time=0.193..154.176 
>> rows=334465 loops=3)
>>Index Cond: ((captured_at >= ('2020-08-28 10:21:06.968+02'::timestamp 
>> with time zone - '14 days'::interval)) AND (captured_at <= ('2020-08-28 
>> 10:21:06.968+02'::timestamp with time zone + '00:00:00'::interval)))
>>->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.39 rows=1 
>> width=12) (actual time=0.002..0.002 rows=1 loops=1003394)
>>Index Cond: (id = clt_rec.oms_rec)

And this situation is very much the same issue, apart from the larger number of 
candidate records.

> Postgres seems to properly use available indexes, parallel workers and
> stuff like that. But looking at the actual times and compared to all
> the other parts of the query, comparing those timestamps simply takes
> the most time.

It only needs to compare 2 timestamps.

> I've looked into this topic and found statements about that one
> shouldn't put too many rows into the index[1] and stuff like that or
> it will be ignored at all. But that doesn't seem to be the case for me
> according to the plan. OTOH, my index really simply is about the
> column containing the timestamp, no function reducing things to dates
> or stuff like that to reduce the number of rows.
> 
>> CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at );

Try this:
CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at, id );


Alban Hertroys
--
There is always an exception to always.








How to enumerate partitions from a window function?

2020-09-03 Thread Alban Hertroys
We are trying to add some information to a query over data from a
continuous process. Most of what we want can be done quite nicely using
window functions, but I got stuck on this particular problem:

The data has records with a timestamp and a few properties that make
subsequent rows belong to the same group. Say we have:
create table process_data (
timestamp timestamp not null,
property_A text not null,
property_B text not null,
value numeric(12, 3)
);

And a query like:
select datetime, property_A, property_B
, first_value(datetime)::time over run as swap_time
--, here I want to enumerate the runs themselves
, value
from process_data
window run as (partition by property_A, property_B order by datetime)
;

As stated above, I want to enumerate the runs, starting at 1 and
incrementing by 1 every time a partition from the 'run' window closes,
which would result in something like this:

datetime | property_A | property_B | swap_time | run_nr | value

2020-09-03 15:06 | tea | earl grey | 15:06 | 1 | 0.23
2020-09-03 15:07 | tea | earl grey | 15:06 | 1 | 0.22
2020-09-03 15:08 | tea | ceylon | 15:08 | 2 | 0.34
2020-09-03 15:09 | coffee | cappucino | 15:09 | 3 | 0.45
2020-09-03 15:10 | coffee | cappucino | 15:09 | 3 | 0.43
etc.

Is there a way to achieve this through window functions, or do we need to
wrap the thing in a subquery to achieve this?

Regards,
Alban Hertroys
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: How to enumerate partitions from a window function?

2020-09-03 Thread Alban Hertroys
On Thu, 3 Sep 2020 at 16:01, Tom Lane  wrote:

> Alban Hertroys  writes:
> > As stated above, I want to enumerate the runs, starting at 1 and
> > incrementing by 1 every time a partition from the 'run' window closes,
> > Is there a way to achieve this through window functions, or do we need to
> > wrap the thing in a subquery to achieve this?
>
> I think this'll work:
>
> select datetime, property_A, property_B
> , first_value(datetime)::time over run as swap_time
> , dense_rank() over (order by property_A, property_B)
> , value
> from process_data
> window run as (partition by property_A, property_B order by datetime)
> ;
>
> You can't do it with a window function over the "run" window because
> no window function ever looks outside the current partition.  But
> that's easy to fix by using a different window definition.  The
> planner is smart enough to see that these windows are compatible
> and only need one sort to be performed.
>
> regards, tom lane
>

Thanks Tom,

That gets us close, but it ignores the order of the runs over time. I think
it also reassigns the same number to later runs at the same 'day' that
happen to have the same values for property_A and _B. That's some crucial
information that I forgot to include.

To expand on my original example:

datetime | property_A | property_B | swap_time | run_nr | value

2020-09-03 15:06 | tea | earl grey | 15:06 | 1 | 0.23
2020-09-03 15:07 | tea | earl grey | 15:06 | 1 | 0.22
2020-09-03 15:08 | tea | ceylon | 15:08 | 2 | 0.34
2020-09-03 15:09 | coffee | cappucino | 15:09 | 3 | 0.45
2020-09-03 15:10 | coffee | cappucino | 15:09 | 3 | 0.43
2020-09-03 15:11 | tea | earl grey | 15:11 | 4 | 0.23
etc.

Where the last row has the same characteristic properties as the first 2
rows (from run 1), but is in run 4 due to it having started after run 3.

The runs normally start at 1 hour before midnight, with run 1, and continue
24h from there (it's a shifted day-schedule). The above example starting at
15:06 is unlikely to occur in reality, although possible (with long
downtime). That's mostly to clarify how the run numbers should function, it
would require to at least partition run_nr by a date shifted 1 hour back,
as long as they number their runs correctly along the time axis.

Regards,
Alban.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: How to enumerate partitions from a window function?

2020-09-04 Thread Alban Hertroys
On Thu, 3 Sep 2020 at 20:59, Michael Lewis  wrote:

> It seems like you are maybe wanting this- If the previous row is the same,
> then get the previous row's run_nr. If it is different, then increment.
>
> case when lag( property_A ) over() = property_A and lag( property_B )
> over() = property_B then coalesce( lag( run_nr ) over(), 1 ) else lag(
> run_nr ) over() + 1 end
>
> Perhaps there is a much simpler implementation though.
>

That would work were it not that the very column we're defining is the one
to be aliased run_nr. The data does not contain that information, it's what
I'm trying to enrich it with and what I'm having trouble wrapping my head
around.
Your query (adopted a tiny bit) unfortunately results in:

select datetime, property_A, property_B
, first_value(datetime::time) over run_win as swap_time
, case
when lag(property_A) over time_win = property_A
and lag(property_B) over time_win = property_B
then coalesce(lag(run_nr) over time_win, 1)
else lag(run_nr) over time_win +1
  end
, value
from process_data
window
time_win as (order by datetime)
, run_win as (partition by property_A, property_B order by datetime)
order by datetime
;

ERROR:  column "run_nr" does not exist
LINE 6:  then coalesce(lag(run_nr) over time_win, 1)
   ^
SQL state: 42703
Character: 221

I turned my example into a proper test-case (better late than never):

CREATE TABLE process_data (
datetime timestamp without time zone NOT NULL,
property_a text NOT NULL,
property_b text NOT NULL,
value numeric(12,3)
);

COPY process_data (datetime, property_a, property_b, value) FROM stdin;
2020-09-03 15:06:00 tea earl grey 0.230
2020-09-03 15:07:00 tea earl grey 0.220
2020-09-03 15:08:00 tea ceylon 0.340
2020-09-03 15:09:00 coffee cappucino 0.450
2020-09-03 15:10:00 coffee cappucino 0.430
2020-09-03 15:11:00 tea earl grey 0.230
\.

With the desired result (note that swap_time and run_nr are calculated
columns):
  datetime   | property_a | property_b | swap_time | run_nr | value
-+++---++---
 2020-09-03 15:06:00 | tea| earl grey  | 15:06:00  |  1 | 0.230
 2020-09-03 15:07:00 | tea| earl grey  | 15:06:00  |  1 | 0.220
 2020-09-03 15:08:00 | tea| ceylon | 15:08:00  |  2 | 0.340
 2020-09-03 15:09:00 | coffee | cappucino  | 15:09:00  |  3 | 0.450
 2020-09-03 15:10:00 | coffee | cappucino  | 15:09:00  |  3 | 0.430
 2020-09-03 15:11:00 | tea| earl grey  | 15:06:00  |  4 | 0.230
(6 rows)

I've been looking around on the Internet in the meantime, and it seems
people either solve this with a recursive CTE (referencing the previous row
by row_number() over (...)) or by writing a set-returning function that
walks over the data in datetime order using a cursor.

Since the actual query is growing more and more state-tracking flags, using
a function has the added benefit that referencing state columns from the
previous row gets a lot easier (lots of repeated window functions
otherwise). It would become a procedural solution instead of a set-based
one, but considering that this data is order-sensitive (on datetime),
that's probably what a set-based solution would also end up doing anyway.

Regards,
Alban.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


FATAL: terminating connection due to administrator command

2020-10-01 Thread Alban Hertroys
h exit code 1
2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30658) exited with exit code 1
2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30659) exited with exit code 1
2020-09-30 22:43:08.459 CEST [8055] 172.30.2.25 selfservice_prd ERROR: 
schema "somethingelse" does not exist at character 71

Apparently, something is sending SIGTERM to our pg processes. I know that 
I'm not doing that, certainly not at those hours, and I'm the one who set 
up this system and am the only DBA of it.

Advice I found on the Internet is to use systemtap with some tap-script, 
but the scripts that I found just displayed the PID's of processes without 
telling me their names, which I didn't find all that useful in figuring 
out who was responsible, so I made an attempt (I have no experience with 
stap) at modifying it to print process names of signal sender and target:

/*
 * killsnoop-nd.stp Trace process signals.
 *  For Linux, uses SystemTap (non-debuginfo).
 *
 * Copyright (C) 2015 Brendan Gregg.
(etc)
 */
global target;
global signal;

probe begin
{
printf("%-6s %-12s %-5s %-6s %6s\n", "FROM", "COMMAND", "SIG", 
"TO", "COMMAND");
}

probe nd_syscall.kill
{
target[tid()] = uint_arg(1);
signal[tid()] = uint_arg(2);
}

probe nd_syscall.kill.return
{
if (signal[tid()] == 15 && target[tid()] != 0) {
printf("%-6d %-12s %-5d %-6d %12s\n"
, pid(), execname()
, signal[tid()]
, target[tid()], 
pid2execname(target[tid()]));
}
delete target[tid()];
delete signal[tid()];
}


The output of last night was:

FROM   COMMAND  SIG   TO COMMAND
30068  systemd-udevd 1514151  systemd-udevd
30068  systemd-udevd 1514836  systemd-udevd
(...)
6482   postmaster   1530649postmaster
6482   postmaster   1530648postmaster
6482   postmaster   1530647postmaster
6482   postmaster   1530646postmaster
6482   postmaster   1530645postmaster
6482   postmaster   1530659postmaster
6482   postmaster   1530658postmaster
6482   postmaster   1530657postmaster
6482   postmaster   15    30656postmaster
6482   postmaster   1530655postmaster
6482   postmaster   152065 postmaster
6482   postmaster   15    2064 postmaster
6482   postmaster   152063 postmaster

Several of these TO-pid's match those in the PG log.

$ ps aux | grep 6482
postgres  6482  0.0  0.5 51755776 1043872 ?Ss   Sep28   3:25 
/usr/pgsql-11/bin/postmaster -D /data/11/data

Am I correct in concluding that postmaster is killing its own processes? 
If so, what is going on here? And more importantly, what do I do about it?

Regards,
Alban Hertroys.

P.S. I'm mailing from my work account, as this is work related. You may 
remember me from my private account.


Alban  Hertroys 
D: +31 (0)53 4 888 888  | T: +31 (0)53 4888 888 | E: 
alban.hertr...@apollovredestein.com
Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The 
Netherlands
Chamber of Commerce number: 34223268




 

 
The information contained in this e-mail is intended solely for the use of the 
individual or entity to whom it is addressed. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, distribution 
or action in relation to the contents of this information is strictly 
prohibited and may be unlawful and request you to delete this message and any 
attachments and advise the sender by return e-mail. The confidentiality of this 
message is not warranted. Apollo Vredestein and its subsidiaries rule out any 
and every liability resulting from this or any other electronic transmission





   Please consider the environment before printing this e-mail


Betr: Re: FATAL: terminating connection due to administrator command

2020-10-01 Thread Alban Hertroys
"Srinivasa T N"  wrote on 01/10/2020 11:47:33:

> On Thu, Oct 1, 2020 at 2:47 PM Alban Hertroys <
> alban.hertr...@apollovredestein.com> wrote:
> Hi all, 
> 
> We're seeing the FATAL error message from the subject pop up in our 
> logs at regular intervals, but I haven't been able to pinpoint what 
> is causing it. I'm hoping for some insights here. 
> 
> We run a PostgreSQL 11.9 server on CentOS 7, within a vmware 
environment: 
>  PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
> 20150623 (Red Hat 4.8.5-39), 64-bit 
> 
> The package was installed from the PGDG repository. 
> 
> I'm not even sure I should be worried, there doesn't appear to be 
> any impact on the servers' functioning, but it does say 'FATAL'. 
> What we're seeing are lines like these two instances: 
> 
> 2020-09-30 22:27:56.446 CEST [30659]   STATEMENT:  select count(*) 
> from "dm_b2b"."prlwytzkofskiv1" 
> 2020-09-30 22:27:56.446 CEST [30658]   FATAL:  terminating 
> connection due to administrator command 
> 2020-09-30 22:27:56.446 CEST [30658]   STATEMENT:  select count(*) 
> from "dm_b2b"."prlwytzkofskiv1" 
> 2020-09-30 22:27:56.446 CEST [30657]   FATAL:  terminating 
> connection due to administrator command 
> 2020-09-30 22:27:56.446 CEST [30657]   STATEMENT:  select count(*) 
> from "dm_b2b"."prlwytzkofskiv1" 
> 2020-09-30 22:27:56.446 CEST [30656]   FATAL:  terminating 
> connection due to administrator command 
> 2020-09-30 22:27:56.446 CEST [30656]   STATEMENT:  select count(*) 
> from "dm_b2b"."prlwytzkofskiv1" 
> 2020-09-30 22:27:56.446 CEST [30655]   FATAL:  terminating 
> connection due to administrator command 
> 2020-09-30 22:27:56.446 CEST [30655]   STATEMENT:  select count(*) 
> from "dm_b2b"."prlwytzkofskiv1" 
> 2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker 
> "parallel worker" (PID 30655) exited with exit code 1 
> 2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker 
> "parallel worker" (PID 30656) exited with exit code 1 
> 2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker 
> "parallel worker" (PID 30657) exited with exit code 1 
> 2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker 
> "parallel worker" (PID 30658) exited with exit code 1 
> 2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker 
> "parallel worker" (PID 30659) exited with exit code 1 
> 2020-09-30 22:43:08.459 CEST [8055] 172.30.2.25 selfservice_prd 
> ERROR:  schema "somethingelse" does not exist at character 71 

> I am guessing that 6 background workers are started, 1 worker had 
> the result and hence killing the other 5 workers.  Maybe, some more 
> pg experts can comment.  Anyway, explain of your query helps.

I think you may have the right of it:

   QUERY PLAN
--
 Finalize Aggregate  (cost=3065970.74..3065970.75 rows=1 width=8)
   ->  Gather  (cost=3065970.21..3065970.72 rows=5 width=8)
 Workers Planned: 5
 ->  Partial Aggregate  (cost=3064970.21..3064970.22 rows=1 
width=8)
   ->  Nested Loop Left Join  (cost=2772.30..2743631.23 
rows=128535594 width=0)
 Join Filter: ((avl.xx)::text <> ''::text)
 ->  Parallel Hash Left Join  (cost=2772.01..943286.00 
rows=5574292 width=13)
   Hash Cond: (avl.x = 
(dc.x)::integer)
   ->  Parallel Seq Scan on  
avl  (cost=0.00..596772.71 rows=5574171 width=21)
   ->  Parallel Hash  (cost=2262.01..2262.01 
rows=40800 width=8)
 ->  Parallel Index Only Scan using 
 on  dc  (cost=0.42..2
 ->  Index Scan using ix_xxxx 
on xxx dm  (cost=0.29..0.31 rows=1 width=19)
   Index Cond: ((avl.xx)::text = 
(xx)::text)
   Filter: ((xx)::text <> ''::text)
(14 rows)

So, apparently these FATAL errors are just caused by parallel workers 
being aborted because they're no longer needed. Good to know.

Regards,
Alban.


Alban  Hertroys 
D: +31 (0)53 4 888 888  | T: +31 (0)53 4888 888 | E: 
alban.hertr...@apollovredestein.com
Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The 
Netherlands
Chamber of Commerce number: 34223268




 

 
The 

Re: How to update a table with the result of deleting rows in another table

2020-10-06 Thread Alban Hertroys


> On 6 Oct 2020, at 7:37, Hemil Ruparel  wrote:
> 
> I am trying to delete orders for a given customer on a given date and add the 
> cost of those orders to credit for the customer. 
> 
> So far, I came up with this:
> ```
> with data as (
> delete from orders
> where customer_id = 
> and date = '2020-10-05' returning price
> ), total as (
> select sum(price) from data
> )
> update paymentdetail
> set temp_credit = temp_credit + (select * from total)
> where customer_id = 
> ```
> 
> which works. but is there a better way to update one table using the result 
> of deleting rows from another table given that I only want the aggregate of 
> the result?

Adding the customer id to your returning clause and using update..from could 
help:

with data as (
delete from orders
where customer_id = 
returning customer_id, price
), total as (
select customer_id, sum(price) as total_price
from data
group by customer_id
)
update paymentdetail
set temp_credit = temp_credit + total.total_price
from total
where customer_id = total.customer_id


You could also do this using subqueries instead of CTE’s, that may perform 
better as CTE’s act as optimisation fences.


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Check constraints do not seem to be working!!!

2020-11-11 Thread Alban Hertroys

On 11 Nov 2020, at 11:15, Jitendra Loyal  wrote:
> 
> 
> Thanks Nikolay
> 
> I read that but is there a way to meet the above requirement. And I will like 
> to add that IS NULL and IS NOT NULL should evaluate to true/false. These 
> operators are made for this and should not be returning NULL.

That is exactly what they do. Your problem is with the equality operator and 
its behaviour with NULL values, which is described in the referenced document.

--
If you can't see the forest for the trees,
Cut the trees and you'll find there is no forest.

>> On Wed 11 Nov, 2020, 14:18 Nikolay Samokhvalov,  
>> wrote:
>>> On Wed, Nov 11, 2020 at 12:26 AM Jitendra Loyal  
>>> wrote:
>>> Despite the above two constraints, the following rows get into the table:
>>>  insert into t (b , c) values (null, true), (null, false);  
>> 
>> This behavior is described in the docs 
>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS:
>> 
>> > It should be noted that a check constraint is satisfied if the check 
>> > expression evaluates to true or the null value. Since most expressions 
>> > will evaluate to the null value if any operand is null, they will not 
>> > prevent null values in the constrained columns. To ensure that a column 
>> > does not contain null values, the not-null constraint described in the 
>> > next section can be used.


Re: Is it possible to write a generic UPSERT?

2020-11-12 Thread Alban Hertroys


> On 12 Nov 2020, at 14:58, Mario Emmenlauer  wrote:

(…)

> But the statement is slightly complex to type, and I find me and my
> colleagues often spend more time on this than I would hope. Our two
> main challenges are:
> (1) we have to look up the uniqueness constraints on the table, and
> (2) we have to duplicate the insert statement in the UPDATE section
>again, because virtually all fields should get overwritten
>(except for the conflicting ones). On long inserts this can be
>quite annoying and error-prone.
> 
> I can see how "ON CONFLICT" is very powerful. But that power seems
> often a burden for us. We would prefer something that is less manual
> effort for the specific use case. Basically, we would like:
>INSERT if not exist, and
>UPDATE _all_ non-conflicting fields in case of _any_ conflict
> 
> In my (naiive) thinking, such a construct would cover 99% of our
> use cases. Or did other people make very different experiences?

(…)

> Has anybody ever done something like this? Is there an SQL way to
> achieve this? Or another programmatic way?

We generate the SQL @work based on the definitions in, IIRC, the 
information_schema. It has tables for both the column lists per table and the 
primary key definitions.

With that, an SQL statement that returns the required SQL statement is easy to 
generate, after which you can execute it either from a plpgsql execute 
statement in a function or in a do-block.

We do this in plpgsql, but that’s mostly because this code is part of our ETL 
process and it has to perform some other logic on the same data anyway. I could 
look up our code for you tomorrow, but that looks to be a busy day, so I can’t 
promise.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Accessing Postgres Server and database from other Machine

2020-12-05 Thread Alban Hertroys


> On 5 Dec 2020, at 10:05, Paul Förster  wrote:
> 
> Hi Hemil,
> 
>> On 05. Dec, 2020, at 07:50, Hemil Ruparel  wrote:
>> 
>> Did you restart postgres after changing pg_hba.conf?
> 
> that shouldn't be necessary for changes in pg_hba.conf. Just do either on the 
> command line:
> 
> $ pg_ctl reload

While you’re in there, also verify that something is listening on the port (see 
below)

$ netstat -an


> or from psql:
> 
> postgres=# select pg_reload_conf();
> 
> You can then see the effective result immediately in pg_hab_file_rules:
> 
> postgres=# table pg_hba_file_rules;

Also:

postgres=# show listen_addresses;

postgres=# show port;

Those will tell you whether the server is listening on the network and on the 
expected port.

Alban Hertroys
--
There is always an exception to always.








Re: SQL group by help

2020-12-12 Thread Alban Hertroys


> On 11 Dec 2020, at 18:24, Chris Stephens  wrote:
> 
> I'm trying to create a visual representation of a 6x8 grid of samples on a 
> rack using the following SQL format:
> 
> with rack_display as (
> select sr.ts rack_ts
> , sr.rack_id
> , r.rack_barcode
> , 1 as row_pos
> , max(case when rack_well = 0 then 'A1: '||sample_barcode end) as col1
> , max(case when rack_well = 1 then 'A2: '||sample_barcode end) as col2
> , max(case when rack_well = 2 then 'A3: '||sample_barcode end) as col3
> , max(case when rack_well = 3 then 'A4: '||sample_barcode end) as col4
> , max(case when rack_well = 4 then 'A5: '||sample_barcode end) as col5
> , max(case when rack_well = 5 then 'A6: '||sample_barcode end) as col6
> from rack r
> , sample_rack sr
> , sample s
> where r.rack_id = sr.rack_id
>and sr.sample_id = s.sample_id
>and sr.rack_well < 6
> group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
> union all
> select sr.ts rack_ts
> , sr.rack_id
> , r.rack_barcode
> , 2 as row_pos
> , max(case when rack_well = 6 then 'B1: '||sample_barcode end) as col1
> , max(case when rack_well = 7 then 'B2: '||sample_barcode end) as col2
> , max(case when rack_well = 8 then 'B3: '||sample_barcode end) as col3
> , max(case when rack_well = 9 then 'B4: '||sample_barcode end) as col4
> , max(case when rack_well = 10 then 'B5: '||sample_barcode end) as 
> col5
> , max(case when rack_well = 11 then 'B6: '||sample_barcode end) as 
> col6
> from rack r
> , sample_rack sr
> , sample s
> where r.rack_id = sr.rack_id
>and sr.sample_id = s.sample_id
>and sr.rack_well >= 6
>and sr.rack_well < 12
> group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
> union all
> ...
> )
> select * from rack_display order by rack_ts, rack_id, row_pos;
> 
> the "union all"s continue for another 6 blocks. reports would filter on 
> rack_id and timestamp. 

Is time really what groups these batches? I would double-check whether you may 
be omitting to store some data relevant to this process.

> if timestamps for each load of a rack were guaranteed to be the same, this 
> would work. however, the "sr.ts" values may vary by a few seconds so there is 
> potential for the "group by" to break. ts differences will be a minimum of 5 
> minutes for each distinct load of a rack. 
> 
> what i think i need is to manufacture a group by column based off rows in 
> "sample_rack" that have "ts" values that are < 1 minute from each other and 
> rack_id is the same.  i'm coming up blank on how to accomplish that though. 
> my first thought was to create an interval of +/- 1 min then find all rows 
> that overlap and assign a group number but i'm not sure how to accomplish 
> that. 

You could date_trunc those timestamps to the minute and group on that.

> there's also no guarantee an entire rack is full of samples so some "cells" 
> of display might be null. i think that makes the use of tablefunc crosstab a 
> little harder. if i remember correctly, it does not handle missing values 
> well. i'm open to any pivoting strategy.

Many reporting tools have features to support just that. We use WebFOCUS, which 
calls those ACROSS columns. It’s a common requirement in reporting.


Alban Hertroys
--
There is always an exception to always.








Re: Trigger with conditional predicates

2021-01-02 Thread Alban Hertroys


> On 1 Jan 2021, at 16:56, Dirk Mika  wrote:
> 
> Hi all and a happy new Year!
> 
> We have an Oracle schema that is to be converted to PostgreSQL, where 
> conditional predicates are used in some triggers.
> 
> In particular, columns are populated with values if they are not specified in 
> the update statement which is used.
> Usually with an expression like this:
> 
>  IF NOT UPDATING('IS_CANCELED')
>  THEN
> :new.is_canceled := ...;
>  END IF;
> 
> I have not found anything similar in PostgreSQL. What is the common approach 
> to this problem?
> 
> BR
> Dirk

Can't you use column defaults to handle these cases?

Alban Hertroys
--
There is always an exception to always.








Re: Trigger with conditional predicates

2021-01-04 Thread Alban Hertroys


> On 4 Jan 2021, at 20:02, Dirk Mika  wrote:
> 
>>> On 1 Jan 2021, at 16:56, Dirk Mika  wrote:
>>> 
>>> Hi all and a happy new Year!
>>> 
>>> We have an Oracle schema that is to be converted to PostgreSQL, where 
>>> conditional predicates are used in some triggers.
>>> 
>>> In particular, columns are populated with values if they are not specified 
>>> in the update statement which is used.
>>> Usually with an expression like this:
>>> 
>>> IF NOT UPDATING('IS_CANCELED')
>>> THEN
>>>:new.is_canceled := ...;
>>> END IF;
>>> 
>>> I have not found anything similar in PostgreSQL. What is the common 
>>> approach to this problem?
> 
>> Can't you use column defaults to handle these cases?
> 
> That would work for inserts, but not for updates.

Usually, if you don’t mention a column in an UPDATE, you want the value to 
remain as it was, which is precisely what happens by default. That certainly 
makes sense to me when you’re dealing with an application that doesn’t know 
about the existence of said column; overwriting an existing value that some 
other application put there looks like a problem to me. But of course, that 
depends a lot on what you’re trying to achieve.

What is your use-case that that is not the desired behaviour? Or are we talking 
about a mixed problem here, where this approach works for some fields, but 
other fields (such as a status change date) always need to be updated 
(regardless of whether a value was specified)?

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Define hash partition for certain column values

2021-01-12 Thread Alban Hertroys


> On 12 Jan 2021, at 16:51, Голубева Яна  wrote:
> 
> Values for the key partitioning column are generated randomly and I can't 
> predict their distribution between ranges.
> If I just create some ranges I won't have any guarantee that partitions will 
> have similar amount of data. It is possible that I will have 2 or 3 extremely 
> big partitions and a bit of data in others.

A hash of a random number is also random, so when using hashes for partitioning 
you will get the same problem.

If you want to distribute values equally over a fixed number of partitions, I 
suggest you partition on a modulo of a monotonously increasing number (a 
sequence for example), instead of relying on a random number.

> 12.01.2021, 17:55, "Michael Lewis" :
> On Tue, Jan 12, 2021 at 1:21 AM Голубева Яна  wrote:
> List or range partitioning isn't suitable for my case.
> I am using a column of numeric(20) type as a base for partitioning. The 
> values of the column are generated randomly. 
> So there will be too many partitions if I use list partitioning as is.
> 
> Sorry, but why is range not suited for this? It would seem fairly trivial to 
> create 50 or 1000 partitions to break up the range of values allowed by your 
> field definition.

Alban Hertroys
--
There is always an exception to always.








Re: How to keep format of views source code as entered?

2021-01-12 Thread Alban Hertroys


> On 12 Jan 2021, at 20:54, Alex Williams  wrote:
> 
> Hi Ingolf,
> 
> For comments in views, I create a unused CTE and put my comments there, e.g.
> 
> WITH v_comments AS (
>  SELECT 'this is my comment' AS comment
> )
> 
> Alex

You do know about COMMENT ON VIEW v_comments IS ’this is my comment’, right?


> ‐‐‐ Original Message ‐‐‐
> On Thursday, January 7, 2021 11:19 AM, Markhof, Ingolf 
>  wrote:
> 
>> Hi!
>> 
>>  
>> 
>> Switching from Oracle SLQ to PostgreSQL I am facing the issue that the SQL 
>> code the system returns when I open a views source code is different from 
>> the code I entered. The code is formatted differently, comments are gone and 
>> e.g. all text constants got an explicit cast to ::text added. (see sample 
>> below).
>> 
>>  
>> 
>> I want the SLQ code of my views stored as I entered it. Is there any way to 
>> achieve this? Or will I be forced to maintain my views SQL code outside of 
>> PostgreSQL views?
>> 
>>  
>> 
>> Any hints welcome!
>> 
>>  
>> 
>> Here is an example:
>> 
>>  
>> 
>> I enter this code to define a simple view:
>> 
>>  
>> 
>> create or replace view myview as
>> 
>> select
>> 
>>   product_id,
>> 
>>   product_acronym
>> 
>> from
>> 
>>   products -- my comment here
>> 
>> where
>> 
>>   product_acronym = 'ABC'
>> 
>> ;
>> 
>>  
>> 
>> However, when I open the view my SQL client (DBeaver) again, this is what I 
>> get:
>> 
>>  
>> 
>> CREATE OR REPLACE VIEW myview
>> 
>> AS SELECT product_id,
>> 
>> product_acronym
>> 
>>FROM products
>> 
>>   WHERE product_acronym = 'ABC'::text;
>> 
>>  
>> 
>> So, the formatting changed, keywords are capitalized, the comment I added in 
>> the from-part has gone and the text constant 'ABC' changed to 'ABC'::text.
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> 
>> Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - 
>> Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - 
>> Vorsitzender des Aufsichtsrats: Francesco de Maio
>> 
> 

Alban Hertroys
--
There is always an exception to always.








Re: Puzzline CROSS JOIN when doing RECURSIVE CTE

2022-04-18 Thread Alban Hertroys


> On 18 Apr 2022, at 11:56, Pól Ua Laoínecháin  wrote:

(…)

> All  of the code below is available on the fiddle here:
> 
> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=0cc20c9081867131260e6e3550bd08ab

(…)

> OK, grand, now I wish to perform a RECURSIVE CTE on it. So, I start by
> trying something (I thought was) very simple. Obviously, I plan to do
> more, but I wanted to get the "mechanics" correct to start with. So,
> my query is:
> 
> WITH RECURSIVE cte1 (n, ln) AS
> (
>  SELECT 1 AS n, string
>  FROM line

Here is your first problem, this will yield a result for each row in your line 
table, numbering it ‘1’. You seem to have expected just a single result here, 
but that is something that you need to take care of in your query.
This part is called the base case, base step or initial step.

>  UNION ALL
>  SELECT n + 1, ln
>  FROM cte1
>  WHERE n < (SELECT COUNT(*) FROM line)

And then for each of those rows, it will add all those rows (from the same 
CTE!) again.
This part is called the recursive step.

You did add a termination condition here, which indeed manages to terminate, 
but it does so too late.

It seems that you do understand some of the concepts of recursive CTE’s, but 
you appear to be missing some crucial knowledge.

For example, it is actually possible to query multiple trees with a single 
recursive CTE. It is not limited to a single tree. How many trees the CTE will 
navigate depends on how you selected the rows in the base case.

> )
> SELECT * FROM cte1;
> 
> i.e. have a counter variable and a string from the line table

My first question is why you’re using a recursive CTE here? This doesn’t appear 
to be hierarchical data (such as a tree), unless perhaps you intended to 
actually traverse the HTML document hierarchy?

> 
> But, then to my horror, the result of this query is
> 
> 1with t(x) as (values( XMLPARSE(DOCUMENT
> (' AlarmCode="mail" AlarmStartTime="10:00:00" AlarmTime="0" Id ="2"
>>  Id="2">

Re: Puzzline CROSS JOIN when doing RECURSIVE CTE

2022-04-18 Thread Alban Hertroys


> On 18 Apr 2022, at 14:51, Pól Ua Laoínecháin  wrote:
> 
> Hi Alban, and many thanks for your input.
> 
>> My first question is why you’re using a recursive CTE here? This doesn’t 
>> appear to be hierarchical data (such as a tree), unless perhaps you intended 
>> to actually traverse the HTML document hierarchy?
> 
> This is basically an exercise on my part.
> 
> The question that I'm trying to answer  is here:
> 
> https://stackoverflow.com/questions/70574881/how-can-get-html-inner-tag-in-posgresql
> 
> I've already answered it in 3 different ways - but I was trying to do
> it with RCTEs in order to improve my comprehension of them.
> 
> So, basically, I want to pick out a subsection of text from a "passage".
> 
> So then, I wanted to establish a true/false state for the lines that I
> want and don't want, going through line by line. I know that the RCTE
> is  a very contrived way of doing this, but it's for learning really.

Considering that you’re already looking at the elements of a parsed DOM tree, 
the exercise boils down to traversing that tree. Due to how xmlparse() is 
implemented, you probably already get them in the right order even when not 
using an explicit order by. That is, if you’re looking for a DFT (depth first 
traversal) as opposed to a BFT (breadth first).

One of the difficulties here is that there are some CDATA sections involved 
with more XML in them. My guess is that that’s the data that you’re actually 
after, but that’s just a matter of entering the document with the correct path 
I suppose?


> I wonder if you could be so kind as to give me  a "skeleton" RCTE for
> this - I've been staring at this for hours - and it's not that I'm
> lazy or haven't studied RCTEs - I wrote this RCTE
> 
> https://stackoverflow.com/a/71674990/470530
> 
> recently, so it's not as if I'm completely ignorant of RCTEs - I'm
> just stuck in a rut. Any help would be appreciated.

You would first need to determine the root node(s). Those are the ones w/o 
parents, or you may have some other way of determining those.

Next is finding all nodes that have an earlier node as their parent.
You could go an extra step here with preserving the order of the siblings in 
the document, by numbering nodes (directly) under the same parent.
I usually build an ltree structure with that information, while traversing the 
tree - that gets you an ltree with entries (1, 1.1, 1.1.1, 1.1.2, 1.2.1, etc) 
that you then can use for the final order by, for example.

In case you didn’t know, ltree is a module you can install. I find it still 
very useful in tree traversals. The one drawback I see is that for these 
scenario’s you’d ideally want an ltree based on integers, such that 10 sorts 
after 9 instead of between 1 and 2. Padding enough zeroes before the ltree text 
items is a bit of an extra hassle that I’d prefer to do without.

I haven’t actually looked at what DOM navigation functions exist for PG, so 
this is more or less pseudo code. Worse, my local copy of PG was compiled w/o 
XML support, so I don’t know what kind of result the query from that SO article 
produces. But then again, I don’t really know what you’re after anyway, so...

This is basically how I would go about it.

with recursive
-- First we need to get the DOM-tree parsed (this is not actually recursive)
domtree as (
select node
  from xmlparse(document(‘...'))
),
-- Next we can traverse it
cte (node, hierarchy, n) as (
select node, 1::text::ltree, 1
  from domtree
 where parent(node) is null

union all

select node, cte.hierarchy || (cte.n+1)::text::ltree, n+1
  from domtree t
  join cte on parent(t.node) = cte.node
)
select *
  from cte
 order by hierarchy;

Function parent() is made-up. It would return the parent node of a node, so 
that there is some way to connect the different parts in the hierarchy. I guess 
xpath() could fulfil that purpose, but I have no way of testing that hypothesis.

I hope that’s a good enough starting point for you?


Alban Hertroys
--
There is always an exception to always.








Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-18 Thread Alban Hertroys


> On 18 Jun 2022, at 2:14, Bryn Llewellyn  wrote:
> 
> I implemented two complementary functions:
> 
> —"no_null_keys()" checks that a "jsonb" value has no occurrences of « "some 
> key": null »
> 
> —"strip_null_keys()" removes « "some key": null » occurrences from a "jsonb" 
> value
> 
> The code checks with "no_null_keys()" that, as expected, no ingested JSON 
> document has an occurrence of « "some key": null ».
> 
> And it uses "strip_null_keys()" on the output of "to_jsonb()" — and, as 
> appropriate, any other built-in JSON function that produces a "jsonb" value.
> 
> It was straightforward to implement these two functions by using REGEXP 
> built-in functionality on the canonically formatted "text" value produced by 
> the "jsonb::text" typecast.

In my experience, using regular expressions applied to document formats tends 
to get you false positives. I’d be worried about books with titles similar to 
'How we wrote a regular expression to detect occurrences of "some key": null in 
our JSON documents', for example.

For stripping those null occurrences, you are aware of the 
json_strip_nulls(json) and jsonb_strip_nulls(jsonb) functions, right?

For detecting them on a recent PG, the @? operator or json_path_exists(json, 
jsonpath) functions would probably do the trick.
I am not too familiar with JSONPATH expressions, but I expect (it passed some 
preliminary testing) this would detect your nulls just fine, while taking JSON 
semantics into account:

jsonb_path_exists(document, '$.** ? (@ == null)'::jsonpath)

For PG-specifics on JSONPATH, see section 9.16.2 on: 
https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-OP-TABLE

A recursive query is another possible solution. It would probably perform far 
worse, but I find them more rewarding to write. Some people prefer Sodoku.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Getting data from a record variable dynamically

2022-07-05 Thread Alban Hertroys



> On 5 Jul 2022, at 1:02, Rhys A.D. Stewart  wrote:
> 
> Greetings All,
> 
> I have a trigger that is attached to several different tables. In the
> trigger function I retrieve a single row and I want the info from a
> specific column. This column is dependent on the table in question.
> and I have the column name stored in a variable as well. Without
> writing a conditional for each table, what is the best way to
> dynamically get the data from the record variable?

I would create a simple trigger function for each of those tables that just 
extracts the value(s) from the field(s) you mentioned, and then pass those 
values on to a generic function that does the actual table-independent work.

The usual trigger pseudo-columns and variables wouldn’t be available in that 
generic function, but considering that your triggers fire from different 
tables, you could either pass them on or they (probably) don’t make sense in 
the context of the generic function.

Alban Hertroys
--
There is always an exception to always.








Re: pg_restore creates public schema?

2022-10-07 Thread Alban Hertroys


> On 6 Oct 2022, at 22:52, Ron  wrote:
> 
> On 10/6/22 12:46, Christophe Pettus wrote:
>>> On Oct 6, 2022, at 10:44, Ron  wrote:
>>> Sadly, that VM doesn't have nearly enough disk space to hold the backup 
>>> folder.
>> Use file mode, and stream the output via scp/ssh to a different machine?
> 
> I thought of that, too.  Unfortunately, the ssh version in RHEL 8.6 is 
> sufficiently old that "three way" ssh (person at HostA wanting to transfer a 
> file from Server1 to Server2) requires that port 22 be open from Server1 to 
> Server2.

Bit of a wild idea, I haven’t used SSH tunnels in years, so a bit fuzzy on the 
details:

Can you create an SSH tunnel to the new machine from the VM, then pipe that to 
an SSH connection from a machine that does have enough space to dump?

And then vice versa to the new machine to restore? (Unless access to that one 
is easier of course)

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Duda sobre como imprimir un campo INTERVAL

2022-11-19 Thread Alban Hertroys


> On 19 Nov 2022, at 4:58, Ken Tanzer  wrote:
> 
> On Thu, Nov 17, 2022 at 2:30 PM Alejandro Baeza Rangel 
>  wrote:

(…)

> don't fully understand it.  But what really confuses me is the example below. 
>  How can these two intervals be equal and still yield different output in the 
> to_char function?  And as a practical matter, and for the OPs question, how 
> can you convert from one to the other of these "equal" values?
> 
> WITH inters AS (
> SELECT
> '1 day 2 hours'::interval AS i1,
> '26 hours'::interval AS i2
> )
> SELECT
> *,
> to_char(i1,'HH24:MM:SS') AS i1_char,
> to_char(i2,'HH24:MM:SS') AS i2_char,
> i1=i2 AS "Equal?"
> FROM inters;
> 
>i1   |i2| i1_char  | i2_char  | Equal? 
> +--+--+--+
>  1 day 02:00:00 | 26:00:00 | 02:00:00 | 26:00:00 | t
> 
> Cheers,
> Ken

Those intervals are not identical. I think the reasoning is that due to DST 
changes, ‘1 day 2 hours’ is more specific than its conversion to ’26 hours’ (or 
25 or 27 at DST change).
And since you’re not converting the number of days in to_char, that information 
gets lost.

That problem doesn’t seem to arise in the OP’s question (as far as I understand 
his question), he does have dates to base the intervals on. However, converting 
the differences in dates to intervals decouples the difference from the dates 
(the intervals could, for example, subsequently be added to an entirely 
different date) and he ends up in the same boat.

It would seem that the way to do this is to convert the difference to (seconds 
since) epoch and do the math to convert that to a character string yourself.

See for example:
https://stackoverflow.com/questions/341384/how-to-convert-an-interval-like-1-day-013000-into-253000

That seems unnecessarily complicated, perhaps there is/could be a more 
convenient method? I’m sort of thinking of a "relative timestamp offset" type, 
that tracks an exact difference relative to a given timestamp?

Alban Hertroys
--
There is always an exception to always.








Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

2022-12-03 Thread Alban Hertroys


> On 2 Dec 2022, at 15:10, DAVID ROTH  wrote:
> 
> Is there a way to reverse engineer the original code (or its equivalent) from 
> what is saved in the database?

I’m dumping an idea here… Treading back quite a bit with information from later 
in this thread.

With the original procedure source code under version control and assuming the 
tokenization converts that source code consistently (since it’s done by a 
computer), you could store another level of diffs: From the source code you 
pushed, against the output of pg_get_functiondef.

Assuming that changes to the tokenised code, when converted back to text, only 
involve renames of database objects, the result of a reversely applied diff 
could very well be comparable to the original source code.

I suspect that would be sufficient for telling whether a developer is 
responsible for the changes, or that they were caused by renaming of database 
artefacts.

You would need to wrap the function creation calls into some automation to 
generate and store those diffs, comparing it back, etc, but that may be doable. 
I would also generate new diffs right after major version updates of the 
database (a before and after of the output of pg_get_functiondef, applied to 
the stored diff?).

I’m not so sure that would work for auditing, but that seems to have been 
tackled down-thread.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

2022-12-04 Thread Alban Hertroys


> On 3 Dec 2022, at 20:55, Karsten Hilbert  wrote:
> 
>> You would need to wrap the function creation calls into some automation to 
>> generate and store those diffs, comparing it back, etc, but that may be 
>> doable. I would also generate new diffs right after major version updates of 
>> the database (a before and after of the output of pg_get_functiondef, 
>> applied to the stored diff?).
> 
> I wonder whether that would tie the sanity check to a particular PG version.
> 
> I mean, pg_get_functiondef output being a server runtime artifact it might
> well change between server versions, no ?

I meant to write: “I would also generate new diffs right _before and_ after…”, 
precisely for that reason. The before patch should get you the last ’sane’ 
situation to get back to the source code. Next, you can diff that to the newly 
tokenised version after the upgrade.

It is a bit of a hassle, as you need to remember to do that before an upgrade, 
but at least you’d have something…

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Regular expression for lower case to upper case.

2022-12-11 Thread Alban Hertroys


> On 10 Dec 2022, at 12:00, Eagna  wrote:
> 
> 
> Hi, and thanks for your input.
> 
> 
>> RegExp by itself cannot do this. You have to match all parts of the input 
>> into different capturing groups, then use lower() combined with format() to 
>> build a new string. Putting the capturing groups into an array is the most 
>> useful option.
> 
> 
> OK - I *_kind_* of see what you're saying. 
> 
> There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd care to 
> give an outline of the solution that you propose.

If you put all the regexes and their replacements into a table[1], you could 
use an aggregate over them to combine all the replacements into the final 
string. It would need some aggregate like regex_replace_agg, which would 
probably be a custom aggregate.

[1]: If you stick to ASCII, you could just calculate them and even omit storing 
them in a physical table.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?

2023-01-17 Thread Alban Hertroys


> On 16 Jan 2023, at 15:37, HECTOR INGERTO  wrote:
> 
> > The database relies on the data being consistent when it performs crash 
> > recovery.
> > Imagine that a checkpoint is running while you take your snapshot.  The 
> > checkpoint
> > syncs a data file with a new row to disk.  Then it writes a WAL record and 
> > updates
> > the control file.  Now imagine that the table with the new row is on a 
> > different
> > file system, and your snapshot captures the WAL and the control file, but 
> > not
> > the new row (it was still sitting in the kernel page cache when the 
> > snapshot was taken).
> > You end up with a lost row.
> > 
> > That is only one scenario.  Many other ways of corruption can happen.
>  
> Can we say then that the risk comes only from the possibility of a checkpoint 
> running inside the time gap between the non-simultaneous snapshots?

I recently followed a course on distributed algorithms and recognised one of 
the patterns here.

The problem boils down to a distributed snapshotting algorithm, where both ZFS 
filesystem processes each initiate their own snapshot independently.

Without communicating with each other and with the database which messages (in 
this case traffic to and from the database to each FS) are part of their 
snapshots (sent or received), there are chances of lost messages, where either 
none of the process snapshots know that a 'message' was sent or none received 
it.

Algorithms like Tarry, Lai-Yang or the Echo algorithm solve this by adding 
communication between those processes about messages in transit.

Alban Hertroys
--
There is always an exception to always.








Re: WHERE col = ANY($1) extended to 2 or more columns?

2023-02-09 Thread Alban Hertroys


> On 9 Feb 2023, at 16:41, Dominique Devienne  wrote:
> 
> Hi. We are implementing an API which takes a list of row keys, and must 
> return info about those rows. To implement that efficiently, in as few 
> round-trips as possible, we bind a (binary) array of keys (ints, uuids, or 
> strings) and that works great, but only if the key is a scalar one.
> 
> Now we'd like to do the same for composite keys, and I don't know how to do 
> that.
> Is it possible? Could someone please help out or demo such a thing?
> We are doing it in C++ using libpq, but a pure SQL or PL/pgSQL demo would 
> still help (I think).

This works:

=> select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, 
'one'::text), (2, 'two'::text));
 ?column? 
--
 t
(1 row)

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: WHERE col = ANY($1) extended to 2 or more columns?

2023-02-10 Thread Alban Hertroys


> On 9 Feb 2023, at 18:35, Dominique Devienne  wrote:
> 
> On Thu, Feb 9, 2023 at 5:37 PM David G. Johnston  
> wrote:
> On Thu, Feb 9, 2023 at 9:28 AM Alban Hertroys  wrote:
> > On 9 Feb 2023, at 16:41, Dominique Devienne  wrote:
> > Now we'd like to do the same for composite keys, and I don't know how to do 
> > that.
> 
> This works:
> => select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, 
> 'one'::text), (2, 'two'::text));
> But you cannot write the right-side of the IN as a single parameter which 
> seems to be the primary constraint trying to be conformed to.
> 
> Right. The goal is to (re)use a prepared statement (i.e. plan once), and bind 
> the RHS (binary) array
> and do a single exec (single round-trip) to get the matching rows. AFAIK, 
> this is the fastest way.
> If there's a better/faster way, I'm interested. --DD

How would an ORM like that push a list of tuples into a single query parameter 
though? Is that feasible?

Perhaps this is easier to use with an ORM then? It would need a list of 
placeholders for each item, but I suspect you would need that anyway…

=> with v(col1, col2) as (
values (1, 'two'::text), (2, 'one'::text), (1, 'one'::text), (2, 
'two'::text)
)
select * from v where (col1, col2) = (1, 'one'::text);
 col1 | col2 
--+--
1 | one
(1 row)

This could be written as a join to a table with fixed values in the OP’s case.


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-02-28 Thread Alban Hertroys



> On 28 Feb 2023, at 3:54, Thorsten Glaser  wrote:

(…)

>> Well, that may be what you want, but it's not what you wrote in
>> the query.  Follow David's advice and do
> […]
>> I'm pretty sure that this will only incur one evaluation of the
>> common subexpression, so even though it's tedious to type it's not
>> inefficient.
> 
> Thanks. But I fear it’s not as simple as you wrote. More like:
> 
> jsonb_build_object('opening_times',
>   jsonb_agg(DISTINCT jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)
>   ORDER BY
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'weekday',
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'from_hour',
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'to_hour')
> )
> 
> Isn’t that more like it?


Perhaps you can use a lateral cross join to get the result of 
jsonb_build_object as a jsonb value to pass around?


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-03-03 Thread Alban Hertroys



> On 3 Mar 2023, at 0:02, Thorsten Glaser  wrote:
> 
> On Tue, 28 Feb 2023, Alban Hertroys wrote:
> 
>> Perhaps you can use a lateral cross join to get the result of
>> jsonb_build_object as a jsonb value to pass around?
> 
> I don’t see how. (But then I’ve not yet worked with lateral JOINs.)

You posted this bit:

> jsonb_build_object('opening_times',
>   jsonb_agg(DISTINCT jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)
>   ORDER BY
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'weekday',
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'from_hour',
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'to_hour')
> )


You can rewrite that into something like this:

select jsonb_build_object('opening_times’,
obj
ORDER BY
obj->>'weekday’,
obj->>'from_hour’,
obj->>'to_hour')
)
from cot
cross join lateral jsonb_agg(jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour) obj

That’s off the top of my head and I did leave out the DISTINCT. Just to show 
the concept here. A bit of experimenting and reading should get you there, I’m 
keeping $work waiting :P

(…)

> WITH
>cgwaj AS (
>   SELECT cgwa.id AS id, jsonb_build_object(
>   'weekday', cgwa.weekday,
>   'forenoon', cgwa.forenoon,
>   'afternoon', cgwa.afternoon,
>   'evening', cgwa.evening) AS obj
>   FROM core_generalworkavailability cgwa
>),
>-- … same for opening times
> SELECT cp.email, …,
>   -- …
>   jsonb_build_object('possible_work_times', COALESCE(
>   jsonb_agg(DISTINCT cgwaj.obj ORDER BY cgwaj.obj->>'weekday',
>   cgwaj.obj->>'forenoon', cgwaj.obj->>'afternoon',
>   cgwaj.obj->>'evening')
>   FILTER (WHERE cgwaj.id IS NOT NULL))) ||
>   -- …
>FROM core_person cp
>   -- …
>   LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id
>   LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id
>   -- …
> 
> That is, add a CTE for each m:n-attached table whose “value” is
> an object, not a single field, keep the id field; LEFT JOIN that
> (instead of the original table), then we have a field to use in
> ORDER BY.
> 
> I think. I’ve not yet tried it (I don’t have access to that DB
> normally, I was just helping out).
> 
> This avoids sub-SELECTs in the sense of needing to run one for
> each user row, because the innermost JSON object building needs
> to be done for each (connected (if the query is not filtering on
> specific users)) row of the “property table”, anyway. (And even
> if filtered, that can be passed down.)
> 
> bye,
> //mirabilos
> -- 
> Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
> schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
> Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
> hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r
> 

Alban Hertroys
--
There is always an exception to always.








Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-03-04 Thread Alban Hertroys


> On 3 Mar 2023, at 20:32, Thorsten Glaser  wrote:
> 
> On Fri, 3 Mar 2023, Alban Hertroys wrote:
> 
>> You can rewrite that into something like this:
>> 
>> select jsonb_build_object('opening_times’,
>>  obj
>>  ORDER BY
>>  obj->>'weekday’,
>>  obj->>'from_hour’,
>>  obj->>'to_hour')
>> )
>> from cot
>> cross join lateral jsonb_agg(jsonb_build_object(
>>  'weekday', cot.weekday,
>>  'from_hour', cot.from_hour,
>>  'to_hour', cot.to_hour) obj
> 
> But isn’t that the same as with a regular LEFT JOIN?

Similar, but not the same, I’d say.

I do now notice that I made some copying errors there, I was a bit nauseous at 
that time.
That should have read:

>> select jsonb_build_object('opening_times’,
>> jsonb_agg(obj
>>  ORDER BY
>>  obj->>'weekday’,
>>  obj->>'from_hour’,
>>  obj->>'to_hour')
>> )
>> from cot
>> cross join lateral jsonb_build_object(
>>  'weekday', cot.weekday,
>>  'from_hour', cot.from_hour,
>>  'to_hour', cot.to_hour) obj


The lateral join applies the function to each row returned from the left side 
of the join and enriches that row with the function result.
I used a cross join because there is no join condition to apply to the lateral, 
otherwise you could also use an inner join on true.

I think you could also have used an implicit Cartesian product (using ‘,’ for 
the join), and that in that case the lateral would be implied. I prefer 
explicit notation though.

A left join wouldn’t make much sense here, unless the function could return 
NULL - for example if it were a function marked as STRICT and some of the input 
parameter values (from the table) could be NULL.


>>>   cgwaj AS (
>>> SELECT cgwa.id AS id, jsonb_build_object(
>>> 'weekday', cgwa.weekday,
>>> 'forenoon', cgwa.forenoon,
>>> 'afternoon', cgwa.afternoon,
>>> 'evening', cgwa.evening) AS obj
>>> FROM core_generalworkavailability cgwa
> 
> plus

There are some differences.

You need a sub-select, which in turn creates its own result set. It’s up to the 
planner whether the left or the right side gets executed first, after which the 
results of the other side of the join get merged to this, or whether this can 
all be collected in one go. That’s up to the query planner to decide though, 
and it could be right.

>>> LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id
> 
> With the addition that I can aggregate…

You can do so in both situations, but I guess that confusion stems from my 
copy/paste mistake.

In my experience, lateral joins go well with the jsonb functions. They tend to 
reduce code repetition when referencing object members, such as in your case.

Regards,

Alban Hertroys
--
There is always an exception to always.








Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause

2023-03-14 Thread Alban Hertroys


> On 7 Mar 2023, at 4:11, David G. Johnston  wrote:
> 
> On Mon, Mar 6, 2023 at 7:51 PM David Rowley  wrote:
> On Tue, 7 Mar 2023 at 12:40, Tom Lane  wrote:
> >
> > Ben Clements  writes:
> > > As shown above, the following calculated column can bring in the city 
> > > name,
> > > even though the city name isn't in the GROUP BY:
> > >max(city) keep (dense_rank first order by population desc)
> >
> > You haven't really explained what this does, let alone why it can't
> > be implemented with existing features such as FILTER and ORDER BY.
> 
> (It wasn't clear to me until I watched the youtube video.) 
> 
> Likely KEEP is more flexible than just the given example but I think
> that something similar to the example given could be done by inventing
> a TOP() and BOTTOM() aggregate. Then you could write something like:
> 
> select
>country,
>count(*),
>max(population),
>bottom(city, population)
> from
>cities
> group by
>country
> having
>count(*) > 1
> 
> the transfn for bottom() would need to remember the city and the
> population for the highest yet seen value of the 2nd arg.
> 
> BOTTOM() remembers the highest value?
>  
> Where this wouldn't work would be if multiple columns were
> required to tiebreak the sort.
> 
> TOP(city, ROW(population, land_area)) ?

What should be the expected behaviour on a tie though?

Say that we count the number of districts or airfields or train stations per 
city and query for the one(s) with the most or least of them? There could well 
be multiple cities with the same max number, and there will be many cities with 
the same minimum number (namely 0).

Should the result be just the first of the maximums (or minimums) through some 
selection criterium (such as their alphabetical order), should that give each 
of the tied results, or should there be a means to define that behaviour?

I suppose a combination with FIRST and LAST could solve that issue?

Regards,
Alban Hertroys
--
There is always an exception to always.








Re: Using CTID system column as a "temporary" primary key

2023-03-29 Thread Alban Hertroys


> On 29 Mar 2023, at 21:11, Sebastien Flaesch  wrote:
> 
> Oh the use of default keyword is new to me, thanks for that.
> 
> But to make PostgreSQL more Informix-compatible, zero should have been 
> considered as well.

…No, I’m not going to be humble about this opinion… Postgres does a sane thing 
here.
It’s Informix that you should be complaining about. Zero is not a sane value to 
specify special behaviour, it could mean zero and be just as valid. By 
consequence, Informix probably forbids zero as a sequence value, but that is an 
artefact (and a limitation) of its implementation, not a feature.

The core of your problem however, is that you’re trying to get 
database-agnostic behaviour by relying on database-specific features. That is 
not going to work, you’ve just been lucky enough to get away with it until now.

There’s really only one realistic answer here: Fix your design.

Regards,
Alban Hertroys
--
There is always an exception to always.








Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-09 Thread Alban Hertroys



> On 7 Apr 2023, at 2:49, Louis Tian  wrote:

(…)

> I am not expecting an error here. The problem is with no conflict it always 
> go down the insert path first and results in a not null constraint error.
> While I am expecting the insert is never executed in the first place when 
> that row already exist (as identified by it primary key). So the update 
> execute without error.
> I hope the pesudo code above is enough to clarify the difference? 

Your assumption on what the problem is, is not correct. The problem is not with 
the conflict resolution, it is with your statement violating a not null 
constraint.

It doesn’t matter whether you insert first or update first, either operation is 
going to violate that constraint. You’re specifying a NULL value for a column 
that doesn’t accept that because it has a NOT NULL constraint. That is your 
problem.

Alban Hertroys
--
There is always an exception to always.








Re: "PANIC: could not open critical system index 2662" - twice

2023-04-14 Thread Alban Hertroys


> On 14 Apr 2023, at 9:38, Evgeny Morozov  wrote:

(…)

> I don't know whether ZFS zero-fills blocks on disk errors. As I
> understood, ZFS should have been able to recover from disk errors (that
> were "unrecoverable" at the hardware level) using the data on the other
> two disks (which did not report any errors). Thus, PG should not have
> seen any corrupted data (if ZFS was working correctly).
> https://unix.stackexchange.com/questions/341614/understanding-the-error-reporting-of-zfs-on-linux
> seems to confirm this. Am I misunderstanding something?

Your problem coincides with a thread at freebsd-current with very similar data 
corruption after a recent OpenZFS import: blocks of all zeroes, but also 
missing files. So, perhaps these problems are related?

Apparently, there was a recent fix for a data corruption issue with the 
block_cloning feature enabled, but people are still seeing corruption even when 
they never enabled that feature.

I couldn’t really find the start of the thread in the archives, so this one 
kind of jumps into the middle of the thread at a relevant-looking point:

https://lists.freebsd.org/archives/freebsd-current/2023-April/003446.html

Regards,

Alban Hertroys
--
There is always an exception to always.








Re: Ltree: set of allowed charcters is limited to [A-Za-z0-9_]. Could the dash "-" be included?

2018-10-27 Thread Alban Hertroys


> On 27 Oct 2018, at 18:14, David G. Johnston  
> wrote:
> 
>> On Saturday, October 27, 2018, joernbs  wrote:
>> Dear friends,
>> 
>> I would like to use ltree for search paths in a warehouse application, 
>> something like "Material-Entry-01.Main-Aisle.Shelf-Aisle-R07/R08.R07-12-03"
>> Unfortunately I can not use common separators like dash (-) or slash(/) 
>> 
>> Documentation states only thes characters [A-Za-z0-9_] are 
>> allowed.https://www.postgresql.org/docs/10/static/ltree.html
>> 
> I don’t see how this would be possible to do with the existing type - too 
> much potential breakage of existing data.  Your example itself shows why 
> using dash as a separator is a bad idea.

David,

I don't think he's talking about separators ;)

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Empty Range Bound Specified for Partition

2018-11-22 Thread Alban Hertroys


> On 22 Nov 2018, at 7:21, ramsiddu007  wrote:
> 
> pgsql>  create table test_upto_100 partition of test_parent_partition
>  for values from ('51') to ('100');
> 
>It was showing error like this..
> 
> ERROR:  empty range bound specified for partition "test_upto_100"
> DETAIL:  Specified lower bound ('51') is greater than or equal to upper bound 
> ('100').
> SQL state: 42P17
>  
> 
> I am not getting, please give me cause of this and solution.

Alphabetically, '1' comes before '5', so '100' comes before '51'.
Numerically, you would get what you intended, but you're not partitioning on 
numbers but on strings instead and those sort alphabetically: '1', '10', '100', 
'11', '2', '3', '51', etc.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: using a function in where

2018-12-03 Thread Alban Hertroys


> On 3 Dec 2018, at 8:06, Glenn Schultz  wrote:
> 
> All,
> I am using the function below to convert a continuous variable to a binned 
> value.  Sometimes a value other than zero is passed through the query. For 
> example -.5 result value is passed to the query result.  The basic of the 
> query is below.  
> 
> select
> incentivebin(wac, rate, .25)
> from
> my_table
> where incentivebin(was, rate, .25) = 0

> CREATE or REPLACE FUNCTION embs_owner.IncentiveBin(IN "Gwac" double 
> precision, 

> AS 'select ceiling(($1 - $2)/$3) *$3';

If the difference of (wac - rate) = -0.5, and you multiply that by 4 (divide by 
.25), ceil will round that correctly to -2. If you then divide by 4 again, you 
get -0.5 back (which is twice your bin size).

The result matches your function definition with the given parameters.


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: IF NOT EXIST

2018-12-18 Thread Alban Hertroys


> On 18 Dec 2018, at 7:10, Igor Korot  wrote:
> 
> Hi, ALL,
> I have a following statement:
> 
> IF NOT EXIST( SELECT 1 SELECT 1 FROM pg_proc AS proc, pg_namespace AS
> ns ) CREATE FUNCTION();
> 
> Unfortunately trying to execute it thru the ODBC interface with:
> 
> ret = SQLExecDirect( m_hstmt, query, SQL_NTS );
> 
> gives syntax error near IF.
> 
> What is the proper way to do that?

It looks like you’re trying to create a function unless it already exists, but 
you’re missing several important parts.

Firstly, the statement you’re looking for is CREATE OR REPLACE FUNCTION. Look 
it up in the docs for the various syntax options and for how to use it, as…

Secondly, your function doesn’t have a name. A function requires a name, or you 
wouldn’t ever be able to call it.
You will also have to specify a return type (functions return values) and the 
language the function is implemented in. The documentation will show you that 
there are several options you can provide too.

And lastly, a function requires an implementation.

Regards,

Alban Hertroys
--
There is always an exception to always.



Reclaiming space for dropped database

2019-01-23 Thread Alban Hertroys
Hi all,

Our current development database server is running a bit low on diskspace, 
so I dropped an old but rather large database with the intention of 
claiming back some space. However, the space remains claimed.

This server was upgraded from PG10 to PG11 using pg_upgrade's --link 
option.

I see this after having dropped the (300+GB) database:
-bash-4.2$ du -d 1 -h
4.0K./.cache
0   ./.config
4.0K./9.6
376G./10
72G ./11
447G.

The files of that database apparently still exist within the ./10 
directory:
-bash-4.2$ du -d 1 -h ./10/data/
4.1G./10/data/pg_wal
816K./10/data/global
0   ./10/data/pg_commit_ts
0   ./10/data/pg_dynshmem
8.0K./10/data/pg_notify
0   ./10/data/pg_serial
0   ./10/data/pg_snapshots
208K./10/data/pg_subtrans
0   ./10/data/pg_twophase
16K ./10/data/pg_multixact
372G./10/data/base
0   ./10/data/pg_replslot
0   ./10/data/pg_tblspc
3.5M./10/data/pg_stat
0   ./10/data/pg_stat_tmp
4.0K./10/data/pg_logical
96K ./10/data/log
688K./10/data/pg_xact
376G./10/data/


How do I reclaim that space?

Regards,
Alban Hertroys

Alban  Hertroys 
D: +31 (0)53 4 888 888  | T: +31 (0)53 4888 888 | E: 
alban.hertr...@apollovredestein.com
Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The 
Netherlands
Chamber of Commerce number: 34223268



 

 
The information contained in this e-mail is intended solely for the use of the 
individual or entity to whom it is addressed. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, distribution 
or action in relation to the contents of this information is strictly 
prohibited and may be unlawful and request you to delete this message and any 
attachments and advise the sender by return e-mail. The confidentiality of this 
message is not warranted. Apollo Vredestein and its subsidiaries rule out any 
and every liability resulting from this or any other electronic transmission





   Please consider the environment before printing this e-mail




Betr: Re: Reclaiming space for dropped database

2019-01-23 Thread Alban Hertroys
"Tom Lane"  wrote on 2019-01-23 16:02:01:

> Alban Hertroys  writes:
> > Our current development database server is running a bit low on 
diskspace, 
> > so I dropped an old but rather large database with the intention of 
> > claiming back some space. However, the space remains claimed.
> > This server was upgraded from PG10 to PG11 using pg_upgrade's --link 
> > option.
> 
> If you used --link, then all the files would remain hard-linked from 
both
> the old and new database directories.  You've got to remove them from 
the
> old DB directory as well.
> 
> There's not really any point in keeping around the source DB directory
> once you've completed a --link migration.  Starting the postmaster in
> the old DB directory would be disastrous because the files are
> inconsistent from its standpoint once the new postmaster has modified
> them at all.  (In fact, I think pg_upgrade intentionally makes the old
> directory non-runnable to prevent that error.)  So you might as well
> just "rm -rf ./10", not only its biggest subdirectory.

That explains what I'm seeing. After creating a dump (better safe than 
sorry), I'll remove that directory.

Thanks!

Regards,
Alban.

Alban  Hertroys 
D: +31 (0)53 4 888 888  | T: +31 (0)53 4888 888 | E: 
alban.hertr...@apollovredestein.com
Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The 
Netherlands
Chamber of Commerce number: 34223268



 

 
The information contained in this e-mail is intended solely for the use of the 
individual or entity to whom it is addressed. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, distribution 
or action in relation to the contents of this information is strictly 
prohibited and may be unlawful and request you to delete this message and any 
attachments and advise the sender by return e-mail. The confidentiality of this 
message is not warranted. Apollo Vredestein and its subsidiaries rule out any 
and every liability resulting from this or any other electronic transmission





   Please consider the environment before printing this e-mail




Re: Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations

2019-03-08 Thread Alban Hertroys
Is there a reason not to use a relational model instead of json(b) here? I
think that is in fact considered best practice.

On Fri, 8 Mar 2019 at 15:40, Alexandru Lazarev 
wrote:

> I am working on product managing and monitoring Network (NMS-like
> products).
>
> Product manages configuration of network devices, for now each device has
> stored its configuration in simple table - this was the original design.
>
> CREATE TABLE public.configuration(
>   id integer NOT NULL,
>   config json NOT NULL,
>   CONSTRAINT configuration_pkey PRIMARY KEY (id),)
>
> A config looks like:
>
> {
> "_id": 20818132,
> "type": "Modem",
> "data": [{
> "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.40",
> "instance": "24",
> "value": "null"
> },
> {
> "oid": "1.3.6.1.4.1..3.5.10.1.86",
> "instance": "0",
> "value": "562"
> },
> {
> "oid": "1.3.6.1.4.1..3.5.10.3.92.4.1",
> "instance": "0",
> "value": "0"
> },
> {
> "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.43",
> "instance": "24",
> "value": "vlan24"
> },
> {
> "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.43",
> "instance": "25",
> "value": "vlan25"
> }
> ]}
>
> And there are many plv8 (java script procedural language extension for
> PostgreSQL) stored procedures working on bulks of such config, reading some
> OIDs, changing them conditionally, removing some of them and adding others,
> especially in use cases like: There are some upper-level META-configuration
> of different level, which during change have to update all their updated
> parameters to all affected leaves configs. An simple test-example (but
> without touching 'data' node)
>
> CREATE OR REPLACE FUNCTION public.process_jsonb_plv8()
>   RETURNS void AS$BODY$
> var CFG_TABLE_NAME = "configurations";
> var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME + " c 
> where c.id = $1", ['int'] );
> var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config = $1 
> where id = $2', ['jsonb','int'] );
>
> try {
>
> var ids = plv8.execute('select id from devices');
>
> for (var i = 0; i < ids.length; i++) {
> var db_cfg = selPlan.execute(ids[i].id); //Get current json config 
> from DB
> var cfg = db_cfg[0].config;
> cfg["key0"] = 'plv8_json'; //-add some dummy key
> updPlan.execute(cfg, ids[i].id); //put uopdated JSON config in DB
> plv8.elog(NOTICE, "UPDATED = " + ids[i].id);
>
>
> }} finally {
> selPlan.free();
> updPlan.free();}
> return;$BODY$
>   LANGUAGE plv8 VOLATILE
>   COST 100;
>
> For real use-cases plv8 SPs are more complicated, doing FOR-LOOP through
> ALL OIDs object of 'data' array, checking if it is looking for and update
> value an/or remove it and/or add newer if necessary.
>
> Since number of devices in DB increased from several hundreds to 40K or
> even 70K, and number of OID+Instance combinations also increased from
> several hundred to ~1K and sometimes up to 10K within a config, we start
> facing slowness in bulk (especially global -> update to ALL Devices)
> updates/searches.
>
> In order to get rid off FOR LOOP step for each configuration I've
> converted data-node from array to object (key-value model), something like
> :
>
> {
> "_id": 20818132,
> "type": "Modem",
> "data": {
> "1.3.6.1.4.1..3.10.2.2.25.4.1.40": {
> "24": "null"
> },
> "1.3.6.1.4.1..3.5.10.1.86": {
> "0": "562"
> },
> "1.3.6.1.4.1..3.5.10.3.92.4.1": {
> "0": "0"
> },
> "1.3.6.1.4.1..3.10.2.2.25.4.1.43": {
> "24": "vlan24",
> "25": "vlan25"
> }
> }}
>
> Now in order to get a concrete OID (e.g.
> "1.3.6.1.4.1..3.10.2.2.25.4.1.43") and/or its instance I do 1-2 *O(1)*
> operations instead *O(n)*. And it become a bit faster. After I've changed
> column type from json to jsonb - I've got a lot of memory issues with
> plv8 stored procedures, so now ideas is:
>
> *What are the best practices to store such data and use cases in DB?*
> taking in considerations following: - Bulk and global updates are often
> enough (user-done operation) - several times per week and it takes long
> time - several minutes, annoying user experience. - Consulting some OIDs
> only from concrete config is medium frequency use case - Consulting ALL
> devices have some specific OID (SNMP Parameter) settled to a specific value
> - medium frequency cases. - Consult (read) a configuration for a specific
> device as a whole document - often use case (it is send to device as json
> or as converted CSV, it is send in modified json format to other utilities,
> etc)
>
> One of suggestion from other oppinions is to move ALL configurations to
> simple plain relational tabl

Re: WAL Archive Cleanup?

2019-04-02 Thread Alban Hertroys
That seems to be a misconfigured client application that explicitly tries
to connect to a non-existent database 'admin' (via db=admin).
Instead of adding that database, it seems more logical to fix the client
configuration.

On Tue, 2 Apr 2019 at 09:53, Foo Bar  wrote:

> Hello All,
>
> Ok, so maybe something helpful?  On my master node I am seeing a bunch of:
>
> 2019-03-28 23:54:44 GMT [2611]: [1-1]
> user=admin,db=admin,client=17210.10.37 172.10.10.37(57552) (0:3D000)FATAL:
> database "admin" does not exist
> 2019-03-28 23:54:46 GMT [2613]: [1-1]
> user=admin,db=admin,client=172.10.10.18 172.10.10.18(56970)
> (0:3D000)FATAL:  database "admin" does not exist
>
> Which is accurate, as there is no admin database...  I usually connect
> with
>
> psql -h localhost -U admin postgres
>
> Should there be?  Will this fix my issue with pgsql filling up the disk?
>
> Thanks,
> - QBR
>
> On Mon, Mar 25, 2019 at 10:21 AM Foo Bar  wrote:
>
>> Hello All,
>>
>> Wow!  Lots of awesome replies, Went away for the weekend thinking my
>> email had been rejected and come back to a full inbox.  Thanks for all the
>> help!
>>
>> >> Postgres version?
>>
>> 9.6.11
>>
>> Doh.  Fairly important detail there.  :)
>>
>> >> FYI, psql is the Postgres client program, Postgres(ql) is the server.
>> >> "psql" is the name of a specific command line tool used to connect to
>> a PostgreSQL database server, it is not the name of the database itself.
>> The database is usually abbreviated "pgsql".
>>
>> Duly noted, thanks for the correction.
>>
>> >> It's the standby that has not seen any traffic?
>>
>> There's really no traffic.  I built three nodes, connected them, created
>> a test table and inserted some values, then left the cluster be for a
>> couple weeks.
>>
>> >>  And "restartpoint" is usually spelled as one work in technical
>> discussions of it.  Or at least, searching for it that way avoids finding
>> things which mention each word separately in different senses.
>>
>> Ah ha.  I had seen it that way but thought it was a typo.  Thanks for the
>> clarification!
>>
>> >> Are you sure it is the archive
>> directory (/hab/svc/postgresql/data/archive) which is filling up, and not
>> the live directory (pg_wal or pg_xlog)?  This is often a point of confusion.
>>
>> Right before I sent the mail last week I deleted everything in
>> /hab/svc/postgresql/data/archive, this morning I'm seeing:
>>
>> # du -h --max=1 /hab/svc/postgresql/data/
>> 198M /hab/svc/postgresql/data/pgdata
>> 8.9G /hab/svc/postgresql/data/archive
>> 9.1G /hab/svc/postgresql/data/
>> # du -hs /hab/svc/postgresql/data/pgdata/pg_xlog/
>> 177M /hab/svc/postgresql/data/pgdata/pg_xlog/
>> # ls -lah /hab/svc/postgresql/data/archive/ | wc -l
>> 571
>>
>> There is no pg_wal directory though (should there be?)
>>
>> # find /hab/svc/postgresql/ -name '*pg*wal*'
>> #
>>
>> >>  If the only reason you want an archive is for replication, then use
>> streaming replication and do away with the archive completely
>>
>> To be honest, I thought it was required for streaming replication based
>> on the guides linked above.
>>
>> >> There are reasons other than replication that one might want to keep a
>> WAL archive, but those reasons don't seem to apply to you
>>
>> Like backup maybe?  A wholly other topic, we recently had a power outage
>> and I lost a pgsql node... having an archive would allow me to "replay" any
>> transactions?
>>
>> >> What needs to be determined here is why the standby never consumed
>> the WAL's from the master?
>> Ok, so it the standby that's the problem.
>>
>> >> Do you still have the logs from the standby and do they show anything
>> relevant?
>>
>> Sure, what am I looking for?  I see a bunch of entries like:
>>
>> 2019-03-08 17:06:11 GMT [1813]: [815-1] user=,db=,client=  (0:0)LOG:
>> restartpoint complete: wrote 22 buffers (0.0%); 0 transaction log file(s)
>> added, 0 removed, 1 recycled; write=2.211 s, sync=0.062 s, total=2.281 s;
>> sync files=18, longest=0.062 s, average=0.003 s; distance=16383 kB,
>> estimate=16383 kB
>> 2019-03-08 17:06:11 GMT [1813]: [816-1] user=,db=,client=  (0:0)LOG:
>> recovery restart point at 0/8D28
>>
>> On the 15th, around when I think I filled the disk, I see a bunch of:
>>
>> cp: cannot stat '/0002.history': No such file or directory
>> cp: cannot stat '/000100040049': No such file or directory
>> 2019-03-15 23:59:49 GMT [16691]: [1-1] user=,db=,client=
>> (0:XX000)FATAL:  could not connect to the primary server: could not connect
>> to server: Connection refused
>> Is the server running on host "172.16.10.23" and accepting
>> TCP/IP connections on port 5432?
>>
>> Which makes sense since the pgsql service was down.
>>
>> This appears to be when I recovered the master on Thursday:
>>
>> cp: cannot stat '/0002.history': No such file or directory
>> cp: cannot stat '/00010004004D': No such file or directory
>> 2019-03-21 17:37:31 GMT [31338]: [1-1] user=,db

Re: random generated string matching index in inexplicable ways

2019-05-07 Thread Alban Hertroys


> On 7 May 2019, at 13:53, Myles Miller  wrote:
> 
> PROBLEM:
> Strings or characters generated by any random function (including pg_crypto
> gen_random_bytes) are matching a string/char index in surprising ways.

I fail to see anything surprising in your examples.

> Reduced down to its simplest example:

(…)

> -- if we use random-generated 'A' or 'B', things get inexplicable
> 
> # SELECT y FROM x WHERE y = chr(round(random())::int + 65);
> y 
> ---
> A
> B
> (2 rows)

Here you got a random value in the lower range of 0..1 for the record with 
value ‘A’, so that’s a match, and one in the higher range for value ‘B’, a 
match again, so you get 2 rows.

> 
> # SELECT y FROM x WHERE y = chr(round(random())::int + 65);
> y 
> ---
> (0 rows)

Here you got a random value in the higher range for the record with value ‘A’, 
so no match, and one in the lower range for value ‘B’, no match again, so you 
get 0 rows.


> # SELECT y FROM x WHERE y = chr(round(random())::int + 65);
> y 
> ---
> B
> (1 row)

Here you got two random values in the higher range, so only the row with ‘B’ 
matches.

You could also get two random values in the lower range and only get a match 
against ‘A’.

Regards,
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: How To: A large [2D] matrix, 100,000+ rows/columns

2023-06-09 Thread Alban Hertroys


> On 9 Jun 2023, at 04:17, Pat Trainor  wrote:

(…)

> Imagine something akin to stocks, where you have a row for every stock, and a 
> column for every stock. Except where the same stock is the row & col, a 
> number is at each X-Y (row/column), and that is the big picture. I need to 
> have a very large matrix to maintain & query, and if not (1,600 column 
> limit), then how could such data be broken down to work?

If your matrix contains values that are all of the same type, as matrices 
usually do, then a matrix can be described as the Carthesian product of rows 
and columns, with values connecting those.

For rows and columns you could enumerate them using generate_series() or a pair 
of recursive CTEs, or you could put them into their own table.
For the values (or cells), a tuple of (row, column, value) would be sufficient.

Then in the end, the matrix would be a presentation of the left joins of the 
Carthesian product of rows and columns with your cell values. The left joins 
are to account for missing cell values (empty cells), or you could explicitly 
add tuples for those with an ‘empty’ value.

For presentation, I would use something like Python Pandas and the xlsxwriter.

Data-entry is going to be a bit of a pain if you cannot automate it, and it’s 
not going to be very space-efficient, but it does fit the relational model this 
way and it would be easy to expand the matrix in either direction.

> By wanting postgresql as a solution, am I round-hole, square-pegging myself?

I expect that there are solutions that were explicitly designed for handling 
(large) matrices and that those would perhaps perform better.

> I don't mind keeping, say, a 1,500 column max per table, but then adding new 
> items (stocks in the above analogy) might make it difficult to keep track of 
> things... 

That’s also a possibility, but that sort of pushes the column lookups down to 
the catalog level and induces overhead on all other catalog lookups as well. 
It’s not a huge amount though.

An alternative approach would be a table of rows (or columns) with each an 
array of values, especially since you don’t expect many updates. That would be 
a first attempt at optimisation if the pure relational model doesn’t work out.

> Hoping someone has tackled this before, and performance isn't too big a 
> concern, as the data changes seldom.

Not in practice, AFAIR, but I was thinking of a solution like this for small 
matrices (Sudoku’s, I hate the things, but I need some solutions to a few to 
help me with a number of test scripts).

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Nu-B here

2023-07-19 Thread Alban Hertroys


> On 20 Jul 2023, at 02:36, Amn Ojee Uw  wrote:
> 
> After this command 'sudo -u postgres psql'
> I get this message : 
> could not change directory to "/home/my_account": Permission denied

What’s the reason that you’re using the OS user postgres?

If you’re simply trying to connect to the database named postgres as database 
user postgres, you can instead use the command 'psql -U postgres postgres’.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: How to improve the performance of my SQL query?

2023-07-29 Thread Alban Hertroys


> On 29 Jul 2023, at 10:59, Peter J. Holzer  wrote:
> 
> On 2023-07-26 15:46:16 +0800, gzh wrote:
>> SET enable_seqscan TO off;
> [...]
>>->  Parallel Bitmap Heap Scan on tbl_sha  
>> (cost=92112.45..2663789.14 rows=800650 width=18) (actual 
>> time=260.540..21442.169 rows=804500 loops=3)
>>  Recheck Cond: (ms_cd = 'MLD009'::bpchar)
>>  Rows Removed by Index Recheck: 49
>>  Filter: (etrys = '0001'::bpchar)
>>  Rows Removed by Filter: 295500
>>  Heap Blocks: exact=13788 lossy=10565
>>  ->  Bitmap Index Scan on index_search_04_mscd_cdate 
>>  (cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 
>> rows=330 loops=1)
>>Index Cond: (ms_cd = 'MLD009'::bpchar)
> 
> So now it's using index_search_04_mscd_cdate which contains only ms_cd
> (and - judging from the name, other fields not relevant to this query),
> but it still doesn't use index_search_01 which would fit the query
> exactly. I can understand that Postgres prefers a sequential scan over
> an index scan (the number of matching rows is about 10% of the total
> table size which is a lot), but why would it prefer a less specific
> index to a more specific one?
> 
> Can you get Postgres to use that index at all?
> 
> Find a combination of ms_cd and etrys which doesn't cover millions of
> rows and try that.
> 
> Also try lowering random_page_cost.

Wasn’t this an RDS server with just 4GB of memory?

How large are those multi-column indices? Perhaps they don’t (all) fit into 
available cache memory and the server decided to use the one that it had cached?

I’m frankly not at all certain how the server would behave around such resource 
shortage situations, but I suppose loading an uncached index into cache could 
get a higher cost than using a less optimal (costlier) index that’s already 
cached.


Regarding lowering random_page_cost; If your index files are on SSD storage, 
lowering that sufficiently (to a realistic value) could then sufficiently lower 
the cost of loading that uncached index into memory, evicting the index it was 
using in above plan to make room (unless other active sessions are using it).

Alban Hertroys
--
There is always an exception to always.








Re: ident auth does not works as usual

2023-08-26 Thread Alban Hertroys


> On 26 Aug 2023, at 11:31, pan snowave  wrote:

(…)

> pg_indent.conf
> 
> test   rootcce

If that is indeed the name of the file, that would explain your problem. No 
doubt that it should be named pg_ident.conf instead, without the ’n'.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Even if I have a lot of free space PgSQL returns "ERROR: could not extend file - No space left on device"

2023-09-09 Thread Alban Hertroys



> On 8 Sep 2023, at 13:25, Nature Conservation Geovista Space 
>  wrote:
> 
> Dear Pg-users, 
> I am coming back to Postgres/PostGIS after a few years. I am dealing with a 
> big database with a lot of geometries and too many vertices.

So a lot of geometry going on then… That seems important in this case.

> After hours running a query to Subdivide, I get this Postgres error
> 2023-09-08 02:11:23.745 BST [328594] postgres@database ERROR:  could not 
> extend file "base/16388/7985375.1020": No space left on device
> 2023-09-08 02:11:23.745 BST [328594] postgres@database HINT:  Check free disk 
> space.
> 2023-09-08 02:11:23.745 BST [328594] postgres@database  STATEMENT:  CREATE 
> TABLE _gaul_administrative_subdivided100 AS (
> SELECT *, st_subdivide(geom,100) AS geom_subdivided100 
> FROM gaul_administrative 
> );

That _looks_ like a query that could blow through space rather quickly.

How large is gaul_administrative in GB? In rows? How many subdivisions does 
this generate per row on average? How many bytes are those subdivisions on 
average?

Multiply those numbers, and you get a fair indication of how much space that 
table requires. Does that fit in 1.1TB?


Frankly, I don’t see the point of repeating the geom column in that table after 
dividing it up, aren’t you just wasting space there? The original is still 
available in the source table, after all. And what about the other columns that 
you clone into this new table?

My suggestion would be to minimise that with an operation like the above. This 
would be one of those situations where I’d introduce a (bigint) surrogate key 
and use that to link the two tables together, even though I’m a natural key 
person.

Something like this (I had to make a few assumptions about your source table):

CREATE TABLE _gaul_administrative_subdivided100 AS (
id bigint NOT NULL,
geomid bigserial NOT NULL,
geom_subdivided100 geometry NOT NULL
);

-- If you care about referential integrity here
ALTER TABLE _gaul_administrative_subdivided100
ADD FOREIGN KEY (id) REFERENCES gaul_administrative (id)
 ON DELETE CASCADE
 ON UPDATE RESTRICT;

INSERT INTO _gaul_administrative_subdivided100 (id, geom_subdivided100)
SELECT id, st_subdivide(geom,100)
  FROM gaul_administrative;


(…)

> It seems that it is not a problem of space.

I wouldn’t be too sure of that, but I don’t have the numbers.

> Command df -h returns:
> Filesystem  Size  Used Avail Use% Mounted on
> tmpfs   6.3G  1.1M  6.3G   1% /run
> /dev/sda1.3T  164G  1.1T  14% /
> tmpfs32G  3.2M   32G   1% /dev/shm
> tmpfs   5.0M 0  5.0M   0% /run/lock
> tmpfs   6.3G  4.0K  6.3G   1% /run/user/1000

Is this after the error and after PG finished rolling back?
What does this show while that query is going on?

If gaul_administrative takes up a large part of those 164G, then you probably 
don’t have enough space for a 10x multiplication in size from the original 
table to the new table. And that happening looks entirely possible from the 
information you provided.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Making Sure Primary and Secondary Keys Alligns

2023-09-11 Thread Alban Hertroys


> On 11 Sep 2023, at 16:09, Anthony Apollis  wrote:
> 
> Fact Table:
> CREATE TABLE IF NOT EXISTS system."IMETA_ZTRB_MP$F_ZTBR_TA_BW"
> (

(…)

> )

> and Dimension:CREATE TABLE IF NOT EXISTS dim."IMETA_Region_Mapping"
> (

(…)

> )

> How do i get that all these columns that are joined are aligned, meaning if 
> it starts with 1 in one column it must be 1 in the other columns. Or how 
> would you assign unique keys in Postgres?

Are you perhaps asking how to define FOREIGN KEY CONSTRAINTs?

https://www.postgresql.org/docs/15/ddl-constraints.html#DDL-CONSTRAINTS-FK


Regards,
Alban Hertroys
--
There is always an exception to always.








Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Alban Hertroys


> On 4 Oct 2023, at 17:58, Lincoln Swaine-Moore  wrote:
> 
> > SELECT
> > sub.gs AS ts_in_utc
> > ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz
> > ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York', 
> > '2023-01-01')
> > FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz, 
> > '2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub
> > WHERE
> > sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND 
> > sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz
> 
> I believe this query will be funky around DST borders, because `sub.gs AT 
> TIME ZONE 'America/New_York'` will be localized in a way that erases the 
> difference between hours with different offsets, which are genuinely 
> different. For instance, I ran this and there are two rows within it that 
> look like: 
> 
> ` 2023-11-05 05:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`
> and 
> ` 2023-11-05 06:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`
> 
> I think that the non-unique second column will pose an issue for the date 
> binning at a resolution finer than 1 day.

What I do in such cases is to add an extra column with the UTC timestamp to 
serve as a linear scale to the local timestamps. That also helps with ordering 
buckets in reports and such during DST changes (especially the ones where an 
hour repeats).

Filtering in the queries occurs on the UTC scale, with the local timestamps 
calculated back to UTC, so that it doesn’t matter whether the local time has 
23, 24, 25 or 24.5 or 23.5 or whatever number of hours on a date-range - it all 
maps back because UTC always has 24 hours.

Something that I also do is to create calendar tables and views for the 
buckets, with 2 timestamps per bucket: the start of the bucket and the start of 
the next bucket. That gives you a range to put actual timestamps between (not 
BETWEEN between, because that’s inclusive). You can store and index that, as 
opposed to generated results using generate_series - basically I materialise 
those.

For hours and quarter hours I found it to be fairly convenient to base a view 
on a join between a date calendar and an (quarter of an) hour per UTC day 
table, but materialising that with some indexes may perform better (at the cost 
of disk space). I do materialise that currently, but our database server 
doesn’t have a lot of memory so I’m often not hitting the cache and performance 
suffers a bit (infrastructure is about to change for the better though).

Regards,

Alban Hertroys
--
There is always an exception to always.








Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Alban Hertroys


> On 4 Oct 2023, at 21:30, Lincoln Swaine-Moore  wrote:
> 
> > What I do in such cases is to add an extra column with the UTC timestamp to 
> > serve as a linear scale to the local timestamps. That also helps with 
> > ordering buckets in reports and such during DST changes (especially the 
> > ones where an hour repeats).
> 
> > For hours and quarter hours I found it to be fairly convenient to base a 
> > view on a join between a date calendar and an (quarter of an) hour per UTC 
> > day table, but materialising that with some indexes may perform better (at 
> > the cost of disk space). I do materialise that currently, but our database 
> > server doesn’t have a lot of memory so I’m often not hitting the cache and 
> > performance suffers a bit (infrastructure is about to change for the better 
> > though).
> 
> That's an interesting idea, but I'm not sure I fully understand. Assuming 
> you're aggregating data: what do you group by? For instance, at an hourly 
> resolution, if you group by both the UTC timestamp and the local one, you 
> might end up, say, dividing an hour-long bucket in two for time zones with 
> half-hour-based offsets, no? 
> 
> Thanks for the detailed writeup! Definitely helpful to learn more about what 
> people are using in production to handle this sort of thing.

Frankly, I haven’t had to deal with half-hour-based offsets since I got this 
idea. I’m using it with whole-hour-offsets, where it doesn’t affect bin 
boundaries.

I suppose you could enrich your data in a similar fashion by adding a (virtual) 
column with the (client) time zone offset, so you could group by local 
timestamp + offset. That’s not going to match index expressions though, I fear…

For sorting, UTC timestamps would probably still be a useful addition, but 
they’re simple to add by either converting back from the local timestamps or by 
taking the min and max of the UTC-based column on the above grouping. Both 
solutions require that offset, obviously.

Now of course there are only 2 hours a year where this happens. Our data 
scientists chose to ignore the problem for simplicity’s sake and be slightly 
off with their numbers on those dates.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Issue in compiling postgres on latest macOS 14.1.1

2023-11-13 Thread Alban Hertroys


> On 13 Nov 2023, at 18:00, Tom Lane  wrote:

(…)

> * If you use MacPorts or Homebrew, maybe that's out of date?
> Try removing the associated directories from your PATH to see
> if it works better.

Perhaps even worse; you had old binaries from an Intel architecture that were 
migrated onto a new ARM-based architecture? In that case the Homebrew uninstall 
scripts won’t even work anymore - at least not w/o Rosetta 2 - as they’re 
Intel-based too.

A migration assistant can also work too well, I found.

Alban Hertroys
--
There is always an exception to always.








Re: Time zone offset in to_char()

2024-01-11 Thread Alban Hertroys


> On 11 Jan 2024, at 17:43, Adrian Klaver  wrote:
> 
> On 1/11/24 07:06, Alban Hertroijs wrote:
>> Hi all,
> 
>> In the above, I worked around the issue using a couple of user-defined 
>> functions in PG. That should give a reasonable idea of the desired 
>> functionality, but it's not an ideal solution to my problem:
>> 1). The first function has as a drawback that it changes the time zone for 
>> the entire transaction (not sufficiently isolated to my tastes), while
>> 2). The second function has the benefit that it doesn't leak the time zone 
>> change, but has as drawback that the time zone is now hardcoded into the 
>> function definition, while
> 
> I don't think the set_config and SET are acting the way you think they are:
> 
> set_config(https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-SET)
> 
> "
> set_config ( setting_name text, new_value text, is_local boolean ) → text
> 
> Sets the parameter setting_name to new_value, and returns that value. If 
> is_local is true, the new value will only apply during the current 
> transaction. If you want the new value to apply for the rest of the current 
> session, use false instead. This function corresponds to the SQL command SET.
> 
> set_config('log_statement_stats', 'off', false) → off"
> "

I tried this like so:

select ToDatetimeOffset(current_timestamp, 'Europe/Amsterdam'), 
to_char(current_timestamp, ‘-MM-DD HH24:MI:SS.SU0 TZH:TZM’).

The result of the second call was based on time zone ‘Europe/Amsterdam’, where 
it wasn’t when called outside the transaction (when it was based on UTC 
corresponding to the server time zone).
So the time zone set with set_config(…, …, true) appeared to leak out of 
function scope and applied to transaction scope (as described in the quoted 
text).
For brevity I could run that query tomorrow when I’m back at work.

> SET(https://www.postgresql.org/docs/current/sql-set.html)
> 
> "If SET (or equivalently SET SESSION) is issued within a transaction that is 
> later aborted, the effects of the SET command disappear when the transaction 
> is rolled back. Once the surrounding transaction is committed, the effects 
> will persist until the end of the session, unless overridden by another SET.
> 
> The effects of SET LOCAL last only till the end of the current transaction, 
> whether committed or not. A special case is SET followed by SET LOCAL within 
> a single transaction: the SET LOCAL value will be seen until the end of the 
> transaction, but afterwards (if the transaction is committed) the SET value 
> will take effect.

It says transaction again here.

> The effects of SET or SET LOCAL are also canceled by rolling back to a 
> savepoint that is earlier than the command.
> 
> If SET LOCAL is used within a function that has a SET option for the same 
> variable (see CREATE FUNCTION), the effects of the SET LOCAL command 
> disappear at function exit; that is, the value in effect when the function 
> was called is restored anyway. This allows SET LOCAL to be used for dynamic 
> or repeated changes of a parameter within a function, while still having the 
> convenience of using the SET option to save and restore the caller's value. 
> However, a regular SET command overrides any surrounding function's SET 
> option; its effects will persist unless rolled back.
> "

I didn’t succeed in calling SET LOCAL TIMEZONE from within the function. Could 
be I missed something, then Google (stackoverflow) pointed me to set_config().

I did manage to apply it to the second function header, which I think behaves 
such that the time zone change stays within function scope. Right now I’m not 
100% sure that I verified that. More to check tomorrow.

Frankly, I do hope that you’re right here, that would make my work easier.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Time zone offset in to_char()

2024-01-11 Thread Alban Hertroys


> On 11 Jan 2024, at 18:27, Adrian Klaver  wrote:
> 
> On 1/11/24 08:48, Adrian Klaver wrote:
>> On 1/11/24 08:04, Alban Hertroijs wrote:
> 
>>> The drawback, as mentioned, being that we need to maintain those functions 
>>> in each deployment, which is a bit of a hassle (albeit a minor one) because 
>>> we need to customise both the TDV side and the PostgreSQL side in that 
>>> case. Our preferred solution would be to just add a few entries to the TDV 
>>> database-specific capabilities file (as described in my initial message)
>> Are you referring to?:
>> "It currently have this:
>> ToDatetimeOffsetNL(~any) : ciscache.ToDatetimeOffsetNL($1)
>> ToDatetimeOffset(~any,~any) : ciscache.ToDatetimeOffset($1, $2)
>> "
> 
> It finally dawned on me, you want to replace the user defined functions above 
> with Postgres builtins only. Try as I might I could not come with that 
> solution.

Exactly. I was having the same problem of finding a solution, quite to my 
surprise.

>> I thought the issue there was maintaining the two Postgres functions?

Yup, those two functions in fact.

There will be at least 3 separate deployments, while maintenance of the 
database(-schema) contents is the responsibility of the 3rd party application 
(TDV). PG is used as a caching DB here, we therefore intend to treat the data 
in it as volatile; it shouldn’t hurt if we decide to recreate the caches from 
scratch from source data. Having custom code in there not under control of the 
3rd party application breaks that guideline.

If they’re necessary, then so be it, but I can’t shake the feeling that we can 
achieve this without custom code in the database.

Regards,

Alban Hertroys
--
There is always an exception to always.








  1   2   >