from:"Paul Jungwirth"

Re: system catalog permissions

2018-02-26 Thread Paul Jungwirth


On 02/26/2018 03:47 PM, Tom Lane wrote:

PropAAS DBA  writes:

We have a client which is segmenting their multi-tenant cluster
(PostgreSQL 9.6) by schema, however if one of their clients connects via
pgadmin they see ALL schemas, even the ones they don't have access to
read.

PG generally doesn't assume that anything in the system catalogs is
sensitive.  If you don't want user A looking at user B's catalog
entries, give them separate databases, not just separate schemas.


I'm sure this is what you meant, but you need to give them separate 
*clusters*, right? Even with separate databases you can still get a list 
of the other databases and other roles in the cluster. I would actually 
love to be mistaken but when I looked at it a year or two ago I couldn't 
find a way to lock that down (without breaking a lot of tools anyway).


Thanks!

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Primary key gist index?

2018-03-14 Thread Paul Jungwirth


On 03/14/2018 06:19 AM, Jeremy Finzel wrote:
Hello!  From all that I can tell, it is not possible using a btree_gist 
index as a primary key.  If so, why not?  I have a table with this gist 
index which truly ought to be its primary key.  as_of_date is of range 
date type:


EXCLUDE USING gist (id WITH =, as_of_date WITH &&)


I'm curious why you need a primary key on this table, especially if the 
exclusion constraint is already preventing duplicate/overlapping records?


Technically I think an exclusion constraint (or at least this one) 
fulfills the formal requirements of a primary key (is unique, isn't 
null), but maybe there are other primary-key duties it doesn't meet, 
like defining foreign keys that reference it. I've been on-and-off 
building an extension for temporal foreign keys at [1]. That is pretty 
new, but perhaps it will be useful/interesting to you. And if you have 
any feedback, I'd love to hear it!


But anyway, maybe if you shared why the table needs a real PRIMARY KEY, 
people here can suggest something.


[1] https://github.com/pjungwir/time_for_keys

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Primary key gist index?

2018-03-14 Thread Paul Jungwirth

On 03/14/2018 11:44 AM, Tom Lane wrote:> Also, as you mention, 
extrapolating behavior that's not really equality

to situations like foreign keys gets pretty interesting pretty fast.
An exclusion constraint using && might ensure that no two values in the
column are identical, but it would not be enough to ensure that a proposed
FK row can't && with more than one PK row.  So what then?


This is (perhaps) a digression from the OP's question, but in temporal 
databases it is fine if the child record's as_of_time overlaps with 
multiple records from the parent table. In fact that would be pretty 
normal. What's required is that the child's as_of_time is completely 
covered by the "sum" of the as_of_times of the parent records with a 
matching ID. For example:


houses
id  as_of_timeappraisal
1   [2016-01-01, 2017-01-01)  $200k
1   [2017-01-01, 2018-01-01)  $230k

rooms
id  as_of_timehouse_id
1   [2016-01-01, 2018-01-01)  1

If you had a "temporal foreign key" from rooms to houses, that example 
should be valid. I understand that's not possible today with normal 
foreign keys, and maybe it's too specific to be desirable as a 
general-purpose feature, but that's what the github extension I linked 
to tries to offer.


--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Check constraints.

2018-03-27 Thread Paul Jungwirth


On 03/27/2018 04:23 AM, Steve Rogerson wrote:

I am trying to add/change a constraint programmatically, but not if it
already exists, and is the same as before.
...
Is there a way of getting the "normalised" version of constraint so decide if
I need to update the constraint if one already exists?


Hi Steve,

I wrote a Ruby gem to do this some years ago. Here is the SQL I used:

  SELECT  c.conname,
  t.relname,
  pg_get_expr(c.conbin, c.conrelid)
  FROMpg_catalog.pg_constraint c,
  pg_catalog.pg_class t,
  pg_catalog.pg_namespace n
  WHERE   c.contype = 'c'
  AND c.conrelid = t.oid
  AND t.relkind = 'r'
  AND n.oid = t.relnamespace
  AND n.nspname NOT IN ('pg_catalog', 'pg_toast')
  AND pg_catalog.pg_table_is_visible(t.oid)

https://github.com/pjungwir/db_leftovers/blob/master/lib/db_leftovers/postgres_database_interface.rb#L117-L137

I haven't used it against the last few Postgres versions, but it 
probably still works or needs only minor adjustments.


--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Autonomous transaction, background worker

2018-03-27 Thread Paul Jungwirth

I try to setup a logging facility and want it to do its work 
asynchronously in autonomous transactions. I have read 
http://blog.dalibo.com/2016/08/19/Autonoumous_transactions_support_in_PostgreSQL.html 
and chapter 47 of the documentation believing those articles are about 
the same. However, pg_background_launch mentioned in the blog is not 
available on my vanilla installation of Debian 10.3-2 from the 
PostgreSQL repositories.
CHapter 47 makes me believe I need to code some C functions to use 
background workers.


I've experimented with background workers a bit. Yes, normally you'd 
write some C, but you could also install this (mentioned in the blog 
post and the source of pg_background_launch):


https://github.com/vibhorkum/pg_background

Note that is not part of contrib and is not on pgxn, so to install it 
you need to clone the repo and run `make && sudo make install`. Also I'm 
not sure it is really production-level. In particular this is a serious 
issue:


https://github.com/vibhorkum/pg_background/issues/1

(If the launching process exits, the child process won't be able to map 
the shared memory segment.)


This is another example of background workers that might interest you, 
just to see how they're doing it:


https://github.com/citusdata/pg_cron

In that case they start the worker upon database startup, so it doesn't 
have problems with shared memory going away.


I took a stab recently at using background workers to run ad hoc async 
SQL, but eventually I wound up doing something simpler.


I think background workers is a pretty interesting feature though, so if 
you get something working I'm sure others would be happy to hear about 
it. Good luck!


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Using Lateral

2018-03-27 Thread Paul Jungwirth


On 03/27/2018 03:22 AM, Johann Spies wrote:

In the past I could use this in a query:

 SELECT
 DISTINCT ut,
 CASE
 WHEN xpath_exists ('//t:address_spec/t:country/text()',
 q.address_spec,
 p.ns)
 THEN unnest (xpath ('//t:address_spec/t:country/text()',
 q.address_spec,
 p.ns))::citext
 ELSE NULL
 END country,

No longer.  The error message suggests I should use a lateral query.
But I could not figure out in the documentation how to get the same
result using a "lateral" construct.

Just selecting "unnest(...)" gives the wrong result because if the
xpath does not exist all the other parts of the query (like 'ut' in
this case) also are empty.


It is hard to suggest something without seeing your whole query (e.g. 
how are you joining q & p?). But it sounds like you basically want a 
left join to the unnested xpath result. It could be a lateral join or not.


It is common to use UNNEST with an implicit lateral join, like this:

SELECT  ...
FROMq, UNNEST(xpath('...', q.address_spec))

But that gives you an inner join. To get an outer join you need to be 
more explicit. Something like this:


SELECT  ...
FROMq, p
LEFT OUTER JOIN LATERAL (
  SELECT  *
  FROMunnest(xpath('//t:address_spec/t:country/text()',
   q.address_spec, p.ns))::citext
) x(country)
ON true

(Presumably you would do something more restrictive to connect q & p 
though.)


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Proposition for better performance

2018-03-27 Thread Paul Jungwirth

On 03/27/2018 07:42 AM, hmidi slim wrote:

Hi,
I'm trying to create an availability table for existing products. I'm 
fetching the suitable schema to design in order to get good performance 
when I fetch products in a table contains millions of rows.

I think to make a schema like this:
*create table availability (product_id integer, product_name 
varchar(255), start_date date, end_date date)*.

I would use a tstzrange (or daterange) instead of separate start_date 
and end_date columns. Then you can create an exclusion constraint that 
has `EXCLUDE USING gist (id WITH =, available_during WITH &&)`. That 
will automatically add a GiST index on those columns that should help 
with faster lookups. (It will also prevent contradictions where a 
product has two overlapping rows.)

You didn't mention the queries you want to be fast, but that index 
should cover the reasonable ones I think.

> Is there any solution to use such as range types or anything which
> mentions the unavailable dates.

For any product, there should be the same number of unavailable periods 
as available, right---or often one more? So I don't see any performance 
benefit in doing it that way, and it certainly seems less intuitive to 
store when something is *not* available.

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Proposition for better performance

2018-03-27 Thread Paul Jungwirth


On 03/27/2018 10:04 AM, hmidi slim wrote:

the query that I used to fetch products was:
select * from availability
where ('27-03-2018' between start_date and end_date)
and ('31-03-2018' between start_date and end_date);

I added another column named during of type daterange and I created a 
gist index :

create index idx on availability(during);


That creates a b-tree index.

Also, what did you set `during` to?


select * from availability
where during @> daterange(''27-03-2018', '31-03-2018');

With a table of 15M rows I got an execution time of 1 minute.


What was the query plan? Did you confirm that it used the index?

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Proposition for better performance

2018-03-27 Thread Paul Jungwirth


(Including the list)

On 03/27/2018 10:49 AM, hmidi slim wrote:

Sorry I didn't copy it very well:
create index idx on availability using gist(during);

and during = daterange(start_date,end_date)

And the query plan used was seq scan not index scan.


It sounds like there must be some important information missing. Here is 
my attempt to mock up some fake data:


insert into availability
select p.id, concat('prod ', p.id::text), daterange((now() + 
concat(r.v, ' days')::interval)::date, (now() + concat((r.v + 1 + 
random() * 21)::int, ' days')::interval)::date)

from (select * from generate_series(1, 100)) p(id)
cross join (select * from generate_series(1, 15)) n(i)
join lateral (
  select p2.id, n2.i, (random() * 600 - 300)::int AS v
  from generate_series(1, 100) p2(id),
  generate_series(1, 15) n2(i)
) r
on r.id = p.id and r.i = n.i
;

Then

explain select * from availability where during @> 
daterange('2018-03-27', '2018-03-31');
 QUERY PLAN 



-
 Index Scan using idx on availability  (cost=0.39..1644.41 rows=1 
width=552)

   Index Cond: (during @> '[2018-03-27,2018-03-31)'::daterange)
(2 rows)

Running that query for real I get:

...
(215044 rows)

Time: 1450.099 ms (00:01.450)

So if the index isn't being used you'll have to do some digging to find 
out why.	


--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Proposition for better performance

2018-03-27 Thread Paul Jungwirth


On 03/27/2018 11:14 AM, hmidi slim wrote:

Query plan:*
Bitmap Heap Scan on product_availabilities  (cost=33728.79..236086.04 
rows=878500 width=26) (actual time=2775.058..5792.842 rows=1500 loops=1)

   Recheck Cond: (during @> '[2018-02-01,2018-04-01)'::daterange)
   Heap Blocks: exact=31040 lossy=79255
   ->  Bitmap Index Scan on idx_time  (cost=0.00..33509.17 rows=878500 
width=0) (actual time=2767.262..2767.262 rows=1500 loops=1)

     Index Cond: (during @> '[2018-02-01,2018-04-01)'::daterange)
Planning time: 0.063 ms
Execution time: 6408.456 ms


*SELECT product_id, start_date, end_date, during
     FROM product_availabilities
     WHERE during @> daterange('2018-02-01', '2018-04-01')*
*
returns  query runtime*: 2min
*


It is 6 seconds or 2 minutes? Where is the 2 minutes number coming from? 
Are you sure that's all Postgres? With 878k rows even sending them over 
the network is going to take a while, and then more time to JSONify them 
or whatever else you need to do.


--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Proposition for better performance

2018-03-27 Thread Paul Jungwirth


On 03/27/2018 11:42 AM, hmidi slim wrote:

This is the message that I got:

Successfully run. Total query runtime: 2 min.
1500 rows affected.


Sorry, I don't think this is enough information to suggest anything.

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Using Lateral

2018-03-28 Thread Paul Jungwirth


On 03/28/2018 05:22 AM, Johann Spies wrote:

Thanks Paul.

I was hesitant to post my full query.  It is a long and complicated
query.


Ha ha, you aren't joking. :-)

With something that long I don't think I'd want to split every xpath 
call into a new join. I guess you could try it and see though.


Stepping back, your original query is prepared for xpath to give 0 
results or 1 result, but not 2+, and the problem is that you're getting 
a 2+ result from the address_spec element. What is the right behavior 
there? To get two rows in the end result? Just use the first/last 
address_spec?


If you want to force 0/1 results, you can keep your structure and do this:

SELECT  DISTINCT ut,
(xpath('//t:address_spec/t:country/text()', q.address_spec, 
p.ns))[1]::citext AS country,
(xpath('//t:address_spec/t:city/text()', q.address_spec, 
p.ns))[1]::citext AS city,
(xpath('//t:organizations/t:organization/text()', 
q.address_spec, p.ns))[1]::citext AS organisation,
 (xpath('//t:organizations/t:organization[@pref="Y"]/text()', 
q.address_spec, p.ns))[1]::citext AS prefname,
 (xpath ('//t:suborganizations/t:suborganization/text()', 
q.address_spec, p.ns))[1]::citext AS suborgname,
 (xpath ('/t:address_spec/@addr_no', q.address_spec, 
p.ns))[1]::text::integer AS addr_no


(Actually I see you are already doing that for addr_no. And an aside: 
did you intend `/t:address_spec` instead of `//t:address_spec` there?)


If you would prefer to get multiple rows back, then you'll probably need 
a subquery to give one row per `xpath('//t:address_spec')`, so that you 
can keep the cities/countries/addr_nos together.


Anyway, I hope that's enough to get you started on the right path! The 
root cause is that UNNEST is giving you two rows where it only makes 
sense to have one, so you need to restrict that or pull it into a 
context where several rows are allowed (like a join).


--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: How to get an inclusive interval when using daterange

2018-04-03 Thread Paul Jungwirth


On 04/03/2018 09:40 AM, hmidi slim wrote:
I tried insert into availability values ('product x', 
'[2018-02-02,2018-03-01]'::daterange); and I got the same result such 
as insert into availability values ('product x', daterange('2018-02-02', 
'2018-03-01', '[]').


Yes, those are equivalent ways of constructing the same daterange.

If you really want a closed/closed daterange, you'll need to create your 
own type. I don't really recommend that, but you can do it. (Using 
close/open is the standard because it's so convenient for 
combining/comparing ranges.)


It's easy to create a type without a canonical function, e.g.:

CREATE FUNCTION date_minus(date1 date, date2 date)
RETURNS float AS $$
SELECT cast(date1 - date2 as float);
$$ LANGUAGE sql immutable;

CREATE TYPE daterange2 AS range
(subtype = date, subtype_diff = date_minus);

Then you can say:

SELECT daterange2('2018-01-01', '2018-03-01', '[]');

This is not great though, because without a canonical function Postgres 
doesn't know that [x,y] is equal to [x,y+1). If you prefer to have a 
canonical function, you'll need to write one in C. (It's easy but you 
won't be able to install it on a managed service like AWS RDS.) It might 
help to read these and the code they link to (The second one is by me.):


https://stackoverflow.com/questions/29895077/how-to-create-a-new-date-range-type-with-included-upper-bound-in-postgres

https://illuminatedcomputing.com/posts/2016/06/inet-range/

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Extension make installcheck: include update/insert feedback?

2018-04-04 Thread Paul Jungwirth


Hello,

I have a custom extension that uses the usual REGRESS Makefile variable 
to indicate files in {sql,expected} that should be used when you say 
`make installcheck`.


I've noticed that if my test code does an INSERT or DELETE, the usual 
`INSERT 0 1` and `UPDATE 2` messages don't appear in the *.out files, 
even though those otherwise mirror psql. I thought maybe there was some 
psql switch that turns those on/off, but I couldn't find one. I'd like 
to include those messages in my expected/*.out files though, so that my 
tests verify I'm adding/updating as many rows as I intend. Is there any 
way to do that?


Thanks,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Recursive CTE for building menus

2018-04-13 Thread Paul Jungwirth


On 04/13/2018 02:09 AM, Bob Jones wrote:

The adaptions I am trying to make are as follows:
- Higher priority moves the item higher up the menu (i.e. adapting
from the original "votes" concept).
- Default alphabetical ordering of titles
- Use of alphanumeric IDs instead of numeric


Hi, I wrote that blog post! :-)

This works for me:

WITH RECURSIVE cte (menu_item_id, menu_title, path, menu_parent, depth, 
menu_priority) AS (

  SELECT  menu_item_id,
  menu_title,
  ARRAY[(-menu_priority, menu_title, menu_item_id)] AS path,
  menu_parent,
  1 AS depth,
  menu_priority
  FROMtest_table
  WHERE   menu_parent IS NULL
  UNION ALL
  SELECT  m.menu_item_id,
  m.menu_title,
  cte.path || (-m.menu_priority, m.menu_title, m.menu_item_id),
  m.menu_parent,
  cte.depth + 1,
  m.menu_priority
  FROMtest_table m
  JOIN cte ON m.menu_parent = cte.menu_item_id
)
SELECT  menu_item_id, menu_title, path, depth, menu_priority
FROMcte
ORDER BY path
;
 menu_item_id | menu_title | 
 path | depth | 
menu_priority

--++--+---+---
 H| Home   | {"(-1000,Home,H)"} 
  | 1 | 
 1000
 A| About  | {"(-900,About,A)"} 
  | 1 | 
  900
 B| Background | 
{"(-900,About,A)","(,Background,B)"} 
| 2 |  NULL
 R| Resources  | 
{"(-900,About,A)","(,Resources,R)"} 
| 2 |  NULL
 F| FOOBAR | {"(-800,FOOBAR,F)"} 
  | 1 | 
  800
 Fb   | Bar| {"(-800,FOOBAR,F)","(,Bar,Fb)"} 
  | 2 | 
 NULL
 Fba  | About Bar  | 
{"(-800,FOOBAR,F)","(,Bar,Fb)","(,\"About Bar\",Fba)"} 
| 3 |  NULL
 Fbt  | Team Bar   | 
{"(-800,FOOBAR,F)","(,Bar,Fb)","(,\"Team Bar\",Fbt)"} 
| 3 |  NULL
 Ff   | Foo| {"(-800,FOOBAR,F)","(,Foo,Ff)"} 
  | 2 | 
 NULL
 Ffw  | Foo World  | 
{"(-800,FOOBAR,F)","(,Foo,Ff)","(,\"Foo World\",Ffw)"} 
| 3 |  NULL
 FFwi | World Introduction | 
{"(-800,FOOBAR,F)","(,Foo,Ff)","(,\"Foo World\",Ffw)","(-1000,\"World 
Introduction\",FFwi)"} | 4 |  1000
 FFwa | About World| 
{"(-800,FOOBAR,F)","(,Foo,Ff)","(,\"Foo World\",Ffw)","(,\"About 
World\",FFwa)"} | 4 |  NULL

(12 rows)

So basically the sort is by menu_priority, breaking ties with 
menu_title, then breaking ties with menu_item_id. I think that's what 
you want, right?


The hard part was dealing with mixed types (integer for priority, text 
for the others), because an array has to be all one type. Fortunately 
you can build an array of tuples and the sorting will work as you expect.


I was a little worried to see those tuples appearing like strings in the 
output, but then I remembered that in Postgres ' is a string and " is 
not. Or to prove it:


select * from unnest( array[(1, 'a'::text), (2, 'b'::text)] ) x(a int, b 
text);

 a | b
---+---
 1 | a
 2 | b

Anyway, I hope that gets you what you need!

Yours,


--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: array UNNESTed to rows stable with respect to order?

2018-04-17 Thread Paul Jungwirth


On 04/17/2018 01:20 PM, Thiemo Kellner wrote:

 I use UNNEST to get rows from array. This works fine but I am
not sure whether the ordering remains in the later use.


I think you are looking for `WITH ORDINALITY` (in pg 9.4+). For instance 
you could rewrite your first CTE like so:


SELECT  oid as PROOID,
PRONAME,
t as PROARGTYPE,
i,
PRONAMESPACE,
PROOWNER
FROMPG_CATALOG.PG_PROC,
UNNEST(PROARGTYPES) WITH ORDINALITY AS proargtypes(t, i)
;

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Performance opportunities for write-only audit tables?

2018-05-24 Thread Paul Jungwirth


Hi,

A project of mine uses a trigger-based approach to record changes to an 
audit table. The audit table is partitioned by month (pg 9.5, so 
old-fashioned partitioning). These tables are write-heavy but 
append-only and practically write-only: we never UPDATE or DELETE, and 
we seem to consult them only a few times a year. But they are enormous: 
bigger than the rest of the database in fact. They slow down our 
backups, they increase WAL size and streaming replication, they add to 
recovery time, they make upgrades more time-consuming, and I suppose 
they compete for RAM.


This is all on an AWS EC2 instance with EBS storage. We also run a warm 
standby with streaming replication.


Since these tables are so different from everything else, I'm wondering 
what opportunities we have to reduce their performance cost. I'm 
interested both in practical high-bang-for-buck changes, but also in 
harder just-interesting-to-think-about last-resort approaches. Here are 
a few ideas of my own, but I'm curious what others think:


We already have no indexes or foreign keys on these tables, so at least 
there's no cost there.


Since they are already partitioned, we could move old data to offline 
storage and drop those tables. This feels like the biggest, easiest win, 
and something we should have done a long time ago. Probably it's all we 
need.


Put them on a different tablespace. This one is also pretty obvious, but 
aside from using a separate disk, I'm curious what other crazy things we 
could do. Is there any per-tablespace tuning possible? (I think the 
answer within Postgres is no, but I wish we could change the settings 
for wal_level, or exclude them from replication, or something, so I'm 
wondering if we could achieve the same effect by exploiting being on a 
separate filesystem.) Maybe put the tablespace on some FUSE filesystem 
to get async writes? Or just pick different mount options, e.g. on ext4 
lazytime,dealloc,data=writeback? I don't know. Or at a different level: 
change the triggers so they call a custom function that uses a new 
thread to store the audit records elsewhere. Maybe these ideas are all 
too risky, but I think the organization is fine with slightly relaxed 
durability guarantees for this data, and anyway I'm just curious to have 
a list of possibilities before I categorize anything as too crazy or 
not. :-)


If we upgraded to pg 10 we could use logical replication and leave out 
the audit tables. That is appealing. Even without upgrading, I guess we 
could replace those tables with postgres_fdw ones, so that they are not 
replicated? Has anyone else used that trick?


Thanks!

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Is postorder tree traversal possible with recursive CTE's?

2018-06-19 Thread Paul Jungwirth


On 06/19/2018 02:05 PM, Alban Hertroys wrote:

On the more theoretical front: The question remains whether it is possible to 
calculate fields in post-order tree traversal. I think that would be a 
semantically proper way to express this type of problem and it wouldn't need 
the kinds of pre/post-processing that after-the-fact aggregation (like in above 
solution) requires. So, leaner, and probably faster.
That implies that the SQL committee thought of the possibility in the first 
place though, which I'm beginning to doubt...


If this interests you, you might enjoy this StackOverflow question:

https://stackoverflow.com/questions/35956486/generate-nested-json-with-couting-in-postgresql

Briefly, how do you construct a nested JSON structure from a recursive 
CTE? The only answers at that link rely on plpgsql, but of course that 
is cheating. :-) I took a stab at it a couple years ago but couldn't 
figure it out, and it seemed like post-order processing was exactly the 
missing piece.


If anyone has any ideas I'd be intrigued to hear them!

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: When exactly is a TIMESTAMPTZ converted to the sessions time zone?

2018-08-09 Thread Paul Jungwirth


On 08/09/2018 01:49 PM, Bruce Momjian wrote:

On Tue, Jul  3, 2018 at 05:00:17PM -0400, Tom Lane wrote:

Thomas Kellerer  writes:

A recent discussion around timestamptz behaviour has lead me to question my own 
understanding on how a TIMESTAMPTZ is converted to the session's time zone.
I assumed this conversion happens *on the server* before the value is sent to 
the client.


It's done in the datatype's output conversion function.


A co-worker of mine claims that this is purely a client side thing, and that the server 
will always send the "plain" UTC value that is stored in a timestamptz column.


Your co-worker is certainly wrong so far as text output is concerned.
If you retrieve data in binary format, though, it looks like you get
the raw (un-rotated) timestamp value, so that any conversion would have
to be done on the client side.


Wow, I am kind of surprised by that.  Do any other data types have this
behavior?


This isn't related to binary-vs-string format, but I think it's often 
overlooked that timestamptz considers your timezone not just to 
stringify the value, but also to truncate it:


db=> create table t (ts timestamp, tstz timestamptz);
CREATE TABLE
Time: 3.154 ms
db=> set timezone='America/Los_Angeles';
SET
Time: 0.303 ms
db=> insert into t (ts, tstz) values ('2018-06-09 19:00:00', '2018-06-09 
19:00:00');

INSERT 0 1
Time: 2.653 ms
db=> select ts, date_trunc('day', ts), tstz, date_trunc('day', tstz) from t;
 ts  | date_trunc  |  tstz  | 
date_trunc

-+-++
 2018-06-09 19:00:00 | 2018-06-09 00:00:00 | 2018-06-09 19:00:00-07 | 
2018-06-09 00:00:00-07

(1 row)

Time: 0.438 ms
db=> set timezone='UTC';
SET
Time: 0.227 ms
db=> select ts, date_trunc('day', ts), tstz, date_trunc('day', tstz) from t;
 ts  | date_trunc  |  tstz  | 
date_trunc

-+-++
 2018-06-09 19:00:00 | 2018-06-09 00:00:00 | 2018-06-10 02:00:00+00 | 
2018-06-10 00:00:00+00

(1 row)

Are there any other places where timestamptz consults your timezone?

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Full table lock dropping a foreign key

2018-09-05 Thread Paul Jungwirth


Hello,

I noticed that Postgres takes an AccessExclusiveLock (a lock on the 
whole table) against the *referenced* table when dropping a foreign key. 
I wasn't expecting that, and some experimentation showed it does *not* 
take one when creating the FK. For example:


pjtest=# create table parent (id integer primary key);
CREATE TABLE
pjtest=# create table child (id integer primary key, parent_id integer);
CREATE TABLE
pjtest=# begin;
BEGIN
pjtest=# alter table child add constraint pfk foreign key (parent_id) 
references parent (id);

ALTER TABLE

And now pg_locks has this:

pjtest=# select locktype, relation::regclass, mode from pg_locks;
   locktype|  relation   | mode
---+-+---
 relation  | parent_pkey | AccessShareLock
 relation  | child_pkey  | AccessShareLock
 virtualxid| | ExclusiveLock
 relation  | pg_locks| AccessShareLock
 virtualxid| | ExclusiveLock
 relation  | parent  | AccessShareLock
 relation  | parent  | RowShareLock
 relation  | parent  | ShareRowExclusiveLock
 transactionid | | ExclusiveLock
 relation  | child   | AccessShareLock
 relation  | child   | ShareRowExclusiveLock
(11 rows)

But after dropping it:

pjtest=# commit;
COMMIT
pjtest=# begin;
BEGIN
pjtest=# alter table child drop constraint pfk;
ALTER TABLE

Now my locks are:

pjtest=# select locktype, relation::regclass, mode from pg_locks;
   locktype| relation |mode
---+--+-
 virtualxid|  | ExclusiveLock
 relation  | pg_locks | AccessShareLock
 virtualxid|  | ExclusiveLock
 relation  | parent   | AccessExclusiveLock
 relation  | child| AccessExclusiveLock
 transactionid |  | ExclusiveLock
 object|  | AccessExclusiveLock
 object|  | AccessExclusiveLock
 object|  | AccessExclusiveLock
 object|  | AccessExclusiveLock
 object|  | AccessExclusiveLock
(11 rows)

I noticed this on 9.5 but confirmed it on 10.5.

I was surprised because the docs give a pretty short list of things that 
take AccessExclusiveLocks 
(https://www.postgresql.org/docs/current/static/explicit-locking.html). 
It mentions ALTER TABLE, and it makes sense when I recall that foreign 
keys are implemented by putting triggers on *both* referencing & 
referenced tables, but still it caught me off guard. Also I don't 
understand why the lock is not necessary when adding a foreign key?


Anyway I don't have much of a question, although I wouldn't mind adding 
a note to the docs that dropping an FK takes this lock on both tables, 
if others agree that is a good idea.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: nested query problem

2018-09-06 Thread Paul Jungwirth


On 09/06/2018 01:59 PM, David Gauthier wrote:

I'm having trouble with this query...

select sr.sqf_id, sr.status, fse.perl_sub_name, fse.end_datetime
   from
     sqf_runs sr,
     (select perl_sub_name, end_datetime from flow_step_events_view 
where sqf_id = sr.sqf_id order by 2 limit 1) fse

   where sr.userid='foo';

ERROR:  invalid reference to FROM-clause entry for table "sr"
LINE 4: ...atetime from flow_step_events_view where sqf_id = sr.sqf_id ...
                                                              ^
HINT:  There is an entry for table "sr", but it cannot be referenced 
from this part of the query.


This calls for a lateral join:

SELECT  sr.sqf_id, sr.status, fse.perl_sub_name, fse.end_datetime
FROMsqf_runs sr
LEFT OUTER JOIN LATERAL (
  SELECT  perl_sub_name, end_datetime
  FROMflow_step_events_view fsev
  WHERE   fsev.sqf_id = sr.sqf_id
  ORDER BY 2
  LIMIT 1
) fse
ON  true
WHERE   sr.userid = 'foo'
;

It's nearly what you had already, but `LATERAL` lets the subquery 
reference columns in the other tables.


A lateral join is conceptually a lot like running your subquery in for 
loop, looping over all the rows produced by the rest of the query. It 
doesn't have to produce 1 row for each iteration, but saying `LIMIT 1` 
ensures that here.


The `ON true` is just pro forma because you can't have a join without an 
`ON` clause.


You might prefer an INNER JOIN LATERAL, depending on your needs.

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: How to handle things that change over time?

2019-09-13 Thread Paul Jungwirth

On 9/13/19 2:57 AM, stan wrote:

I am working on a system that will support internal bossiness work for a
company. Periodicly things will change in their "world".

Oh this is my favorite topic. :-) It's a common problem, although 
solutions don't seem to be well-known in the programming community. 
Typically you'd use start/end times on your table, as you suggested. 
Postgres is a great RDBMS for this since it has range types and 
exclusion constraints. The SQL:2011 standard also has temporal primary 
keys, foreign keys, SELECTs, and UPDATE/DELETEs, and we're working on 
adding those too. But your use case sounds easy to build even without 
those extra features. Here is a talk I gave this summer about temporal 
features in Postgres:

https://github.com/pjungwir/postgres-temporal-talk

And here is an annotated bibliography to the main writings & tools out 
there, including some projects that might help you implement what you 
want in Postgres:

https://illuminatedcomputing.com/posts/2017/12/temporal-databases-bibliography/

> In another scenario, a column of the employee table is the pay rate.
> Obviously this will change over time, also. It does not appear to me
> that this lends itself to this same treatment, as most of the data
> associated with a given employee, is fairly static, and if I add an
> entire row, just because the pay rate changed, this looks overly
> complex.

Normally people would indeed just add another row. I wouldn't call it 
complex (especially compared to the alternatives), but maybe a little 
wasteful. The Date/Darwen/Lorentzos book gives a lot of attention to 
avoiding the problem though. Essentially you'd adopt a sixth-normal form 
where each attribute (column) gets its own table.

Good luck. I'm always happy to talk about temporal databases if you 
like. :-)

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: When does Postgres use binary I/O?

2019-09-18 Thread Paul Jungwirth


On 9/18/19 7:26 AM, Tom Lane wrote:

The core regression tests don't systematically exercise binary I/O,
and they certainly wouldn't magically cover a new type they didn't
use to.  You'd need to add test case(s).


Thanks! I thought psql might use binary IO internally when available, or 
at least pg_dump with its custom format, but it looks like neither of 
those are true---and now that I think about it I suppose the code in adt 
isn't something they would be able to use, so it makes sense.


So I guess my choices are to use COPY WITH (FORMAT BINARY) or write 
something that calls PQexecParams. Does anything else use binary I/O?



Likely it'd be good to have some more consistent approach to
testing that ... right now it's not even very obvious where
is a good place to add such tests.  I do see a small test in
src/test/regress/input/misc.source (COPY BINARY stud_emp),
but that solution doesn't scale easily because of its
dependence on absolute file pathnames.


I'm not sure how hard this would be, but we could add a new schema to 
the regression db and then do something like this:


psql -d regression -c 'COPY public.t TO STDOUT WITH (FORMAT BINARY)' \
  | psql -d regression -c 'COPY binaryio.t FROM STDIN WITH (FORMAT BINARY)'

and then verify that `SELECT *` looks the same for the two tables. I can 
think of several complications already though. :-)


But even if that's not an automated test at least it gives me an easy 
way to manually exercise my own multirange_{send,recv} functions.


Thanks!

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Pg_auto_failover

2019-09-25 Thread Paul Jungwirth


On 9/25/19 8:55 AM, Sonam Sharma wrote:

My user has permission to touch inside the Fs.
I am not getting why it is checking for /backup.

Any suggestions on this , what else I can check


In addition to the code already linked to, I would look at 
https://github.com/citusdata/pg_auto_failover/blob/1290edd0bab54c627f577cf4462bd16a56b20a1a/src/bin/pg_autoctl/pgctl.c#L429-L437


You see that pg_auto_failover is trying to `mkdir -p $pgdata/../backup` 
(where pgdata is a C variable, not a shell/environment variable). You 
might want to read through the rest of that code to see where pgdata is 
coming from, and see if it's being set correctly. Do you really want 
/backup at the root level?


Clearly whatever user is running pg_basebackup doesn't have permission 
to make that directory. But maybe that's okay and the directory should 
be somewhere else.


--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Issue upgrading from V11 to V12 on Debian

2019-12-03 Thread Paul Jungwirth


On 12/3/19 7:53 AM, stan wrote:

I am working on upgrading from V11 to V12 on Debian.

My first attempt failed, and I have figured out that this is because I have
added extensions to the V11 DB, at least one of which was not installed
using  the Debian packages.

So, it looks like i need to install these before doing the upgrade, correct?

This brings up a chicken or egg issue however. We are creating the
extensions, like this:

CREATE EXTENSION emailaddr with schema ica

So, to accomplish this, I would need to create the ice schema, and the user
it "belongs" to before I run pg_upgradecluster. This raises 2 questions


There are two parts to adding extensions. First you install the *files* 
the extension needs. If you build from source this happens when you say 
`make install`. Or if you use pgxn, when you say `pgxn install foo`. On 
Ubuntu this will add the *.sql and *.control files to 
/usr/share/postgresql/12/extension, and the *.so files to 
/usr/lib/postgresql/12/lib (so I assume Debian is similar). You can do 
that before running pg_upgradecluster.


The second part is saying CREATE EXTENSION, which adds the extension to 
the current database. pg_upgradecluster should handle that for you; it 
shouldn't require any extra steps.


I hope this helps!

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Issue upgrading from V11 to V12 on Debian

2019-12-03 Thread Paul Jungwirth

On 12/3/19 8:46 AM, stan wrote:> So, I have V12 running as the default 
on the machine I am testing this on

now:

Ver Cluster Port Status OwnerData directory  Log file
11  main5433 down   postgres /var/lib/postgresql/11/main
/var/log/postgresql/postgresql-11-main.log
12  main5432 online postgres /var/lib/postgresql/12/main
/var/log/postgresql/postgresql-12-main.log

BUT, I went to the directory where I have the exentsion's source, did a
make clean ; make ; make install, and the files were still installed in the
V11 tree. How can I instruct the system to put these in the V12 tree?

The  extension in question is pgemailaddr if this matters, BTW.


That extension (and every extension I've seen) uses Postgres's normal 
extension-building infrastructure, so it runs pg_config to learn where 
to put files. If you run pg_config on your system it will probably 
report directories belonging to v11. (That's a little surprising because 
on Ubuntu systems I've always had it report the latest version.)


Many other Ubuntu Postgres commands accept a PGCLUSTER envvar to 
specific which cluster to use. If you want to add that to your pg_config 
you could do it like this: https://stackoverflow.com/a/43403193/122087 
Then just set PGCLUSTER before building. (Make sure you `make clean` first.)


--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Issue upgrading from V11 to V12 on Debian

2019-12-03 Thread Paul Jungwirth


On 12/3/19 9:41 AM, stan wrote:

Well, this gets more interesting. I decided to try the reinstall thing
agai, so I did an apt-get purge on the V12 server, started the V11 server
(admiitely I did not check to see what port it started on), reinstalled
the V1 package, and now BOTH are running, with eh V12 server on 5432 and
the V11 on 5433.


Yes, Ubuntu/Debian make it easy to run multiple versions/clusters at 
once. I don't think they even use the alternatives system (but I could 
be wrong). For building/installing the extension, the only important 
part AIUI is getting pg_config to point at the right files. My own 
approach has always been to hack the bash code to accept a PGCLUSTER 
envvar, as in that Stack Overflow answer. Someone here might have a 
better approach though.


Regards,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: a proposal for a new functionality: "SELECT * [EXCEPT col1 [,col2]]

2020-02-25 Thread Paul Jungwirth


On 2/25/20 11:46 AM, Stanislav Motycka wrote:



Dňa 25. 2. 2020 o 20:28 Paul A Jungwirth napísal(a):

I take the proposal to mean this:

SELECT listOfColumns [EXCEPT listOfColumns] FROM ...
Exactly, simply exclude unneeded columns from the base clause "SELECT", 
nothing more ..


Not that this is necessarily fatal, but you'd need to avoid parsing 
trouble with the other EXCEPT, e.g.


SELECT 1 EXCEPT SELECT 1;

Believe it or not these are valid SQL:

SELECT;
SELECT EXCEPT SELECT;

This fails today but only because of the different number of columns:

SELECT 1 AS SELECT EXCEPT SELECT;

So the parser understands it as selectQuery EXCEPT selectQuery, but you 
can see how it could also be parsable as this new structure. So the 
parser would have to decide which is meant (if that's even possible at 
that early stage).


I guess as soon as you exclude two columns it is unambiguous though 
because of this comma: SELECT ... EXCEPT SELECT,  And anyway I think 
for such a pathological case you could just tell people to add double 
quotes.


Google Big Query was mentioned upthread. I see they require parens, e.g. 
SELECT ... EXCEPT (...). I don't think that actually fixes the ambiguity 
though. Also a few other notes (after very brief testing):


SELECT * EXCEPT (foo) FROM t;   -- works
SELECT * EXCEPT (foo, bar) FROM t;  -- works
SELECT t.* EXCEPT (foo) FROM t; -- works
SELECT * EXCEPT foo FROM t; -- fails
SELECT foo, bar EXCEPT (foo) FROM t;-- fails
SELECT t1.foo, t2.* EXCEPT (foo) FROM t1 JOIN t2 ON ...; -- works
SELECT t2.*, t1.foo EXCEPT (foo) FROM t1 JOIN t2 ON ...; -- fails!

So it seems they require at least one `*` in the SELECT target list. In 
fact the `*` must be the very last thing. Personally I think it should 
be as general as possible and work even without a `*` (let alone caring 
about its position).


Regards,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Reg: Help to understand the source code

2020-04-23 Thread Paul Jungwirth


On 4/23/20 8:44 AM, Preethi S wrote:
I am fairly new to postgres and I am trying to understand how the data 
is processed during the insert from buffer to the disk. Can someone help 
me with that? Also, I would like to see source code workflow. Can 
someone help me with finding the source code for the data 
insertion/modification workflow.


I'm also a Postgres hacker newbie, but I've spent some time adding 
SQL:2011 FOR PORTION OF support to UPDATE/DELETE, so I've gone through 
that learning process. (I should say "going through". :-)


I'd say be prepared to spend a *lot* of time reading the code. 
Personally I use `grep -r` a lot and just read and read. For specifics 
you can use a debugger or insert `ereport(NOTICE, (errmsg("something 
%s", foo)))` and run queries (or the test suite). Also many subfolders 
have an extensive README that will guide you. Some of the READMEs may 
take an hour or more to get through and understand, but reading them is 
worth it.


It helped me a lot to spend several years writing occasional Postgres C 
extensions before really doing anything in the core codebase. There are 
lots of basics you learn that way. There are a bunch of articles and 
presentations out there about that you might find helpful.


Postgres processes queries in several steps:

- parse
- analyze
- rewrite
- plan
- optimize
- execute

The parse step is a bison grammar (look for gram.y). Basically it fills 
in structs cutting up what the user typed.


The analyze step starts to make sense of the parse results. Look at 
parser/analyze.c. It maps input strings to database objects---for 
example looking up table/column names (and making sure they really 
exist). Here you're sort of just copying things from the parse structs 
to different structs. You're building up Node trees that later steps can 
use. I think the analyze step is often considered to be still part of 
the parse phase.


It seems like each SQL "clause" has its own transformFoo function, so 
probably you'll want to add your own (transformMyAwesomeFeatureClause) 
and then call it from its "parent" (e.g. transformUpdateStmt).


If you add new Node types you'll need to edit nodes/*funcs.c and also 
probably teach some switch statements how to handle them. If you are 
filling in a struct but then later in the pipeline find that what you 
wrote isn't there anymore, you probably forgot to implement a copy function.


The rewrite/plan/optimize steps aren't things you need to worry about 
too much if you're interested in DML, but you can read more about them 
in the source code. Especially rewrite is pretty niche (views and RULEs).


The execute step is the most challenging I think. It has its own Node 
trees and also keeps an execution state. Probably you'll need to look at 
src/backend/executor/nodeModifyTable.c among others. You'll also need to 
learn about TupleTableSlots. (If anyone here has a good learning 
resource for TTS I would also be glad to read it.)


I'm afraid this description is comically dumbed down, but hopefully it 
can be something like a map. I'd probably just take an UPDATE statement 
and try to trace it through the pipeline, and maybe experiment with 
small changes along the way. You can add things to src/test/regress as 
you go.


And the mailing list is a very friendly place to ask questions.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Enforcing uniqueness on [real estate/postal] addresses

2020-05-11 Thread Paul Jungwirth


On 5/11/20 9:55 AM, Peter Devoy wrote:

Of course, if any of the fields are NULL (which they often are) I end
up with duplicates.

One solution may be to add NOT NULL constraints and use empty strings
instead of NULL values but, until asking around today, I thought this was
generally considered bad practice.


If you don't want to store empty strings (which I agree is a little 
yucky), you could replace NULLs with an empty string *only when checking 
for uniqueness*. To do this, first replace your unique constraint with a 
unique index, which gives you some additional features (e.g. indexing 
expressions, indexing only part of the table with a WHERE clause, 
building it concurrently, etc.). In this case we only care about 
indexing expressions.


So you can say:

CREATE UNIQUE INDEX is_unique_address ON properties (
  COALESCE(description, ''),
  COALESCE(address_identifier_general, ''),
  COALESCE(street, ''),
  COALESCE(postcode, ''));

Another approach, which I don't think is really a serious suggestion but 
is sort of interesting to think about: you could define an operator, say 
===, that does the same thing as `IS NOT DISTINCT FROM`. Then you could 
create an exclusion constraint using that operator on all four columns. 
I've never tried that before but it seems like it would work.


Maybe that's too much effort for something like this. I just think it's 
interesting because it feels like a use case for exclusion constraints 
that goes in the "opposite direction" of how they are usually used: 
instead of being less restrictive than =, it is more restrictive.


Regards,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

psql -f and PAGER

2022-03-29 Thread Paul Jungwirth


Hello,

I noticed that this runs your pager:

psql -f <(echo 'select * from pg_class;')

but not this:

echo 'select * from pg_class;' | psql

A client encountered this when the psql command run from their deb's 
.postinst file started to hang.


We can prevent it with PSQL_PAGER='', but it almost seems like a bug to 
me, since I expect -f to be for non-interactive use. I'd at least call 
it a footgun.


Indeed running the pager for -f but not stdin seems opposite of the last 
line of these docs (https://www.postgresql.org/docs/14/app-psql.html):


> Using this option is subtly different from writing psql < filename. 
In general, both will do what you expect, but using -f enables some nice 
features such as error messages with line numbers. There is also a 
slight chance that using this option will reduce the start-up overhead. 
On the other hand, the variant using the shell's input redirection is 
(in theory) guaranteed to yield exactly the same output you would have 
received had you entered everything by hand.


Does it seem wrong to anyone else to run the pager from -f? Is it 
something the community would accept patches to change?


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Shared hosting with FDW on AWS RDS

2019-02-10 Thread Paul Jungwirth


On 2/10/19 2:57 PM, auxsvr wrote:

We'd like to configure an RDS server for shared hosting. The idea is that every 
customer will be using a different database and FDW will be configured, so that 
the remote tables have access to the full data


I've set up something like this before (but on EC2), and the only 
problem I couldn't solve was that any user can see your full customer 
list by typing `\l` or `\du`. They can't see other customers' stuff, but 
they can see how many customers you have and their database/login names. 
The only way around it I know is that run separate "clusters" aka RDS 
instances.


You can try to lock this down somewhat by revoking access to various 
system tables, but it starts breaking a lot of tools (e.g. some GUI 
tools don't know what to do if they get an error just listing the 
databases). Also it is so piecemeal I wouldn't trust that I'd blocked 
off all avenues of getting the information.


I'd love to be corrected on this btw if anyone has better information! :-)

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Conditional INSERT

2019-03-15 Thread Paul Jungwirth


On 3/15/19 10:55 AM, basti wrote:

I want to insert data into table only if condition is true.
For example:

INSERT into  mytable (domainid, hostname, txtdata)
   VALUES (100,'_acme.challenge.example', 'somedata');

The insert should only be done if Hostname like %_acme.challenge%.


I would use `INSERT INTO ... SELECT` for this, instead of `INSERT INTO 
... VALUES`. For example:


INSERT INTO mytable (domainid, hostname, txtdata)
SELECT 100, '_acme.challenge.example', 'somedata'
WHERE '_acme.challenge.example' LIKE '%_acme.challenge%'
;

(Presumably in the real code the hostname is parameterized so this isn't 
quite as pointless as it looks. :-)


If you are inserting a lot of rows at once you could also SELECT from a 
VALUES list:


INSERT INTO mytable (domainid, hostname, txtdata)
SELECT d, h, t
FROM (VALUES
  (100, '_acme.challenge.example', 'somedata'),
  (200, 'bar.example.com', 'somedata'),
  (300, 'foo.example.com', 'somedata'),
  (400, '_acme.challenge.example', 'somedata')
) x(d, h, t)
WHERE h LIKE '%_acme.challenge%'
;

I hope that helps!

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: schema change tracking

2019-05-16 Thread Paul Jungwirth


On 5/16/19 9:41 AM, Benedict Holland wrote:
I need a tool that can track schema changes in a postgesql database, 
write scripts to alter the tables, and store those changes in git. Are 
there tools that exist that can do this?


BTW, I know if I use a framework with some form of model tracking 
(laravel and Django), those can do it. The application I am writing does 
not track model files and I really cannot compare sql schema dumps 
across versions. It would be far too time-consuming.


For Python database migrations when I'm not using Django or SQLAlchemy 
I've enjoyed yoyo migrations:


https://ollycope.com/software/yoyo/latest/

If you are using SQLAlchemy then I'd go with Alembic:

https://pypi.org/project/alembic/

Btw I don't *think* this is what you're looking for, but on the Postgres 
side you can also set `log_statement='ddl'` to audit the changes:


https://blog.hagander.net/why-are-you-not-logging-your-ddl-142/

Or for something more elaborate you could use Event Triggers. The docs 
have an example you could start from:


https://www.postgresql.org/docs/current/event-trigger-example.html

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Need efficient way to remove (almost) duplicate rows from a table

2023-10-17 Thread Paul Jungwirth


On 10/17/23 15:48, David Gauthier wrote:
I have a table with millions of records and there are a lot of "almost" 
duplicates that I want to get rid of in an efficient way.  Best to 
illustrate with a simple example...


This sounds like a classic use-case for a "correlated subquery". 
Assuming you want to keep the lowest twig_id, you could do this:


```
paul=# select * from twigs;
 limb_id | branch_id | twig_id | surfarea
-+---+-+--
   1 | 1 |   1 |  100
   1 | 1 |   2 |103.7
   1 | 1 |   3 |103.7
   1 | 1 |   4 |110.4
   1 | 2 |   1 |  120
   1 | 2 |   2 |123.6
   1 | 2 |   3 |128.1
   1 | 2 |   4 |128.1
   1 | 2 |   5 |128.1
   2 | 1 |   1 |  100
   2 | 1 |   3 |104.4
   2 | 1 |   4 |131.9
(12 rows)

Time: 0.218 ms
paul=# delete from twigs as t1
where exists (select 1 from twigs as t2 where (t1.limb_id, t1.branch_id, 
t1.surfarea) = (t2.limb_id, t2.branch_id, t2.surfarea) and t1.twig_id > 
t2.twig_id);

DELETE 3
Time: 8.555 ms
paul=# select * from twigs;
 limb_id | branch_id | twig_id | surfarea
-+---+-+--
   1 | 1 |   1 |  100
   1 | 1 |   2 |103.7
   1 | 1 |   4 |110.4
   1 | 2 |   1 |  120
   1 | 2 |   2 |123.6
   1 | 2 |   3 |128.1
   2 | 1 |   1 |  100
   2 | 1 |   3 |104.4
   2 | 1 |   4 |131.9
(9 rows)

Time: 0.231 ms
```

That says "delete from t1 where there is a matching t2 with a lower 
twig_id."


If you want to keep the highest-numbered twig_id, the sql is easy to adjust.

Regards,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Content for talk on Postgres Type System at PostgresConf

2024-02-29 Thread Paul Jungwirth


On 2/29/24 14:47, Guyren Howe wrote:
What is worth saying about custom range types? What even would be a good use case? A range of 
strings or ip address ranges, something like that, I suppose?


I wrote a blog post and an extension for an inetrange type. It's from 2016 so it might need some 
updates, but you are welcome to use anything you like for your talk:


https://illuminatedcomputing.com/posts/2016/06/inet-range/

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: On exclusion constraints and validity dates

2024-08-26 Thread Paul Jungwirth


On 8/22/24 11:13, Justin Giacobbi wrote:

I have an issue that on the surface seems orthogonal to existing functionality. I’m trying to 
dynamically update validity ranges as new s replace old s.


In a nutshell the problem looks like this:

psqlprompt=# select * from rangetest;
id |  rangecol
+-
   0 | empty
   0 | ["2024-05-05 00:00:00+00","2024-05-06 00:00:00+00")
   0 | ["2024-05-06 00:00:00+00","-03-31 00:00:00+00")
   1 | ["2024-05-06 00:00:00+00",)

psqlprompt=# insert into rangetest values (1, '["2024-06-07 00:00:00+0",)') on conflict on 
constraint rangetest_id_rangecol_excl do update rangecol = concat('[', lower(rangetest.rangecol),',', lower(excluded.rangecol),')')::tstzrange;


ERROR:  ON CONFLICT DO UPDATE not supported with exclusion constraints

So I’m not sure if I’m after a feature request, a workaround or contribution advice. Maybe someone 
can point me in the right direction.


 1. A ‘currently valid’ item that becomes invalid and is replaced by a new 
‘currently valid’ item
seems like such a real-world use case that there should be explicit support 
for it.
 1. Unfortunately, the temporal tables extensions seem too immature for my 
needs currently.
 2. Barring that an exclusion constraint arbiter would be a lovely solution.
 3. Barring either of those at least a ‘select all conflicts’ type feature that 
at least makes it
easy to pair the offending rows.

Currently I’m looking at working around this in the application or in a stored procedure/insert 
trigger that is essentially the same logic. Whichever seems easier to maintain.


Advice on how to submit a feature request, or maybe a better workaround that I haven’t discovered 
would be most welcome. What would be even more welcome is someone with insight into these pieces of 
the program that can tell me if I’d be biting off more than I can chew (or violating a principle) 
trying to submit one of the three options above as a feature.


Your example looks a bit like UPDATE FOR PORTION OF (from SQL:2011). That would 
give you this result:

1 | ["2024-05-06 00:00:00+00","2024-06-07 00:00:00+00")
1 | ["2024-06-07 00:00:00+00",)

We update the range you targeted, and we preserve untargeted range(s) 
before/after that.

I have submitted a patch for that,[0] but if you need it today you might be able to use the periods 
extension (although that is built on start/end columns, not ranges).[1]


An example that better fits ON CONFLICT DO UPDATE would be inserting `(1, '["2024-01-01 
00:00:00+0",)')`. That range is *wider* than what you already have in your table. Probably you want 
get this:


1 | ["2024-01-01 00:00:00+00","2024-05-06 00:00:00+00")
1 | ["2024-05-06 00:00:00+00",)

In other words: fill in the empty gaps and update what is already there.

This is similar to what Tom Johnston calls "temporal merge" in *Bitemporal Data: Theory and 
Practice* (building on his ideas for "whenever insert" and "whenever update"), pages 179-184.


I'm not sure we have enough information to do the right thing for an arbitrary exclusion constraint, 
but once we have primary keys and unique constraints with WITHOUT OVERLAPS (also in my submitted 
patches), the semantics should be specific enough that Postgres could implement it. It's on my list 
of things to do once we finish supporting SQL:2011. Maybe it would use the ON CONFLICT DO UPDATE 
syntax, or perhaps MERGE---I'm not sure yet.


[0] https://commitfest.postgresql.org/49/4308/
[1] https://github.com/xocolatl/periods

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Strange behaviors with ranges

2024-08-27 Thread Paul Jungwirth


On 8/27/24 10:29, Jean-Christophe Boggio wrote:

I have 2 very confusing behaviors when using ranges.

It all started with this query:

     WITH rangespaliers AS (
     SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM paliers JOIN tmp_limitcontrats 
USING(idcontrat)

--    SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM paliers 
WHERE idcontrat=1003
     )
     ,rangespaliers2 AS (
     select *
     FROM rangespaliers
     WHERE rangep <> NUMRANGE(NULL, NULL) -- ERROR IS HERE
     )
     select * from rangespaliers2;

When I run this query, I get the error "Range lower bound must be less than or equal to range upper 
bound".


(a) If I comment out the line marked "ERROR IS HERE", I don't have an error (but I'm missing the 
filter of course).


(b) Also, if I uncomment line 3 and comment out line 2, I get the correct behavior. Very strange 
thing is that tmp_limitcontrats has only one row which contains "idcontrat=1003".


The issue is the order-of-operations used by the planner. If I put EXPLAIN on 
your last query, I see:

 Hash Join  (cost=16.64..109.90 rows=2410 width=64)
   Hash Cond: (tmp_limitcontrats.idcontrat = paliers.idcontrat)
   ->  Seq Scan on tmp_limitcontrats  (cost=0.00..35.50 rows=2550 width=4)
   ->  Hash  (cost=14.27..14.27 rows=189 width=12)
 ->  Seq Scan on paliers  (cost=0.00..14.27 rows=189 width=12)
   Filter: (numrange(((qtep1 + 1))::numeric, (qtep2)::numeric) <> 
'(,)'::numrange)

So we are applying that filter to every row in paliers, not just the one with 
idcontrat = 1003.

Indeed this simplified version also fails:

SELECT numrange(qtep1+1,qtep2) as rangep FROM paliers;

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Slick way to update multiple tables.

2021-04-01 Thread Paul Jungwirth


On 4/1/21 11:54 AM, Michael Lewis wrote:
postgresql.org/docs/current/sql-createview.html 



My apologies. It seems INSTEAD OF triggers are required to implement 
updates across multiple tables. I thought not if all were simple joins. 
My mistake.


Even with INSTEAD OF triggers, if you use a view then I suppose you 
would be forced to update some of the records more often that necessary? 
(Unless your tables are 1-to-1-to-1 of course.) Or if there is some 
trick to avoid that I'd be curious to know about it.


Here is something I've done in the past:

WITH
update1(ok) AS (
  UPDATE foo SET ... WHERE ...
  RETURNING 'ok'
),
update2(ok) AS (
  UPDATE bar SET ... WHERE ...
  RETURNING 'ok'
),
update3(ok) AS (
  UPDATE baz SET ... WHERE ...
  RETURNING 'ok'
)
SELECT ok FROM update1
UNION ALL
SELECT ok FROM update2
UNION ALL
SELECT ok FROM update3
;

You could even select different messages from each update if you want to 
know how many rows you touched in each table.


--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Treating float arrays as vectors?

2021-06-17 Thread Paul Jungwirth


On 6/17/21 7:13 AM, Celia McInnis wrote:
I would love it if there was a vector data type in postgresql along with 
such vector operations as addition, subtraction, scalar multiplication, 
cross product, dot product, normalization, length and various sorts of 
vector distances.


I wrote an extension to define vector-based functions that take & return 
numeric arrays:


https://github.com/pjungwir/floatvec

It only has basic arithmetic, but matrix math functions could be added.

If you want aggregate functions instead I wrote another extension for that:

https://github.com/pjungwir/aggs_for_vecs

I haven't touched either in a while, but if you find they have problems 
on modern versions of Postgres, let me know and I should be able to get 
them updated quickly.


I experimented a bit with an AVX implementation, and it showed a 
worthwhile performance improvement, but I never got around to adding it 
to all functions, so it was just a proof-of-concept. I could dust that 
off if you're interested. But since the extension is C things are 
already pretty fast.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Temporal tables as part of main release

2021-06-18 Thread Paul Jungwirth


On 6/17/21 11:10 PM, David G. Johnston wrote:
And, since its not in the commitfest app, and the last emails I found 
are over a year old, it doesn’t seem to be a priority for anyone at this 
time.


I'm working on the application-time (aka valid-time) side of this, and 
there is another patch for system-time. I can't speak to that patch, but 
my patch is almost ready to share on the mailing list. I gave a progress 
update at PGCon last month, but here is a summary of what it includes:


- You can define PERIODs (this is based heavily on Vik Fearing's patch 
from a few years ago).

- You can define temporal PRIMARY KEYs and UNIQUE constraints.
- You can use FOR PORTION OF syntax to do temporal UPDATEs and DELETEs.
- You can define temporal FOREIGN KEYs including with all the 
CASCADE/SET NULL/SET DEFAULT options.


I noticed a problem last week with the FOR PORTION OF code, so as soon 
as I fix that I'm going to organize my changes into four patch files 
(one for each of those bullet points) and send that to the list.


So far I haven't tried to support partitioned tables yet, so certainly 
that will remain to be done. But that's the only missing piece I'm aware 
of. (I'd be grateful for any help adding that btw. And Corey has already 
offered to give me some help in general.)


I'm sorry it's been slow progress the last year. I do hope I can get 
this landed for v15!


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Incremental backup

2021-10-28 Thread Paul Jungwirth


On 10/28/21 5:23 AM, Edu Gargiulo wrote:

Is there any way to make incremental backups in postgres-12?


wal-e and wal-g are useful tools for this and easy to set up. They take 
periodic full backups and then also continuously archive WAL files. To 
restore they apply the last full backup and then replay the WAL since 
then. They have nice support for putting the files on S3 or similar 
places, but really they can drop the files anywhere.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: a very naive question about table names in Postgres

2021-12-31 Thread Paul Jungwirth


On 12/31/21 12:27 PM, Martin Mueller wrote:
I know that on my Mac the tables are  kept in the  data directory 
/Users/martinmueller/Library/Application Support/Postgres/var-13.  If I 
go there I see that internally the tables have numbers.  Somewhere in 
the system there must be a table that maps the name I gave to table X  
to the numerical  inside the database.


Where is that mapping and how can I query it?  On what page of the 
documentation do I find the answer?


Dear Prof. Mueller,

This question strikes Postgres people a bit like your teenager asking 
where you keep the tallest ladder. There aren't many practical uses for 
accessing these files directly, and if you do it's likely to break your 
database. Could you speak a bit more about your goal here? Perhaps 
someone can suggest a good way to achieve it.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Use of Port 5433 with Postgresql 9.6

2018-01-01 Thread Paul Jungwirth


On 01/01/2018 10:40 AM, Jeff Janes wrote:

The Ubuntu packages use 5433 if you already have something (either a
different packaged version, or an unpackaged system) running on 5432.


Also on Ubuntu lots of the Postgres utilities will obey an envvar like 
`PGCLUSTER=9.5/main` to let you choose which version of 
psql/pg_dump/etc. I don't know about pg_isready but it's worth a try. 
Even if pg_isready doesn't follow that pattern, it is probably easy to 
"teach" it. Here is how I did a similar thing for pg_config:


https://stackoverflow.com/questions/30143046/pg-config-shows-9-4-instead-of-9-3/43403193#43403193

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Trigger (or something similar) on table rename?

2018-02-15 Thread Paul Jungwirth


On 02/15/2018 10:52 AM, Ken Tanzer wrote:
Hi.  I'm wondering about possibilities for taking action when a table is 
renamed.


I've looked into this a bit. Here is what I understand:

Since 9.3 Postgres has had "event triggers" which can run code on DDL 
events 
(https://www.postgresql.org/docs/current/static/event-triggers.html). 
There are events like `ddl_command_start` and tags like `ALTER TABLE`. 
So you could write a trigger that fires `ON ddl_command_start WHEN TAG 
IN ('ALTER TABLE')`.


Unfortunately I don't think you can get the old/new table name from 
inside the trigger function. If you need that, you might take a look at 
this extension which adds some custom event triggers with ways of 
getting that information:


https://github.com/CartoDB/pg_schema_triggers

If you are really adventurous you could even look at using the 
ProcessUtility hook directly to do what you need.


I'm looking forward to seeing what others say here because I'd like to 
know more myself!


Good luck!

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Given a set of daterange, finding the continuous range that includes a particular date

2018-02-22 Thread Paul Jungwirth


On 02/22/2018 04:44 PM, Ken Tanzer wrote:
Hi, hoping to get some help with this.  I'm needing to take a specific 
date, a series of dateranges  and, given a specific date, return a 
single conitinuous daterange that includes that date.


The part about joining multiple touching dateranges to give a single 
continuous daterange is what Richard Snodgrass calls "coalescing" in 
*Developing Time-Oriented Database Applications in SQL*, pages 159 - 
169, available printed or as a free PDF at 
http://www2.cs.arizona.edu/~rts/publications.html (His approach also 
supports overlapping ranges, but it sounds like you don't need that.) If 
you had a coalesced view (or maybe a set-returning function), you could 
do this:


SELECT  term
FROMcoalesced_staff_assign
WHERE   client_id = 5
AND term @> '2018-15-01'
;

I can't think of any way to avoid scanning all of a given client's 
records, but hopefully client_id alone would be selective enough to 
still give you good performance.


Oh also: in reading Snodgrass's SQL, note that he assumes closed-open 
ranges (i.e. '[)'), so you'll need to adjust some things to fit with 
your closed-closed ranges (or always use `staff_assign_date_end - 
INTERVAL '1 day'` if every assignment is at least 1 day long). On the 
other hand with built-in range types you might be able to simplify his 
pure-SQL solutions.


--
Paul  ~{:-)
p...@illuminatedcomputing.com

46 matches

Mail list logo