from:"Dominique Devienne"

PQresultMemorySize of ResultSet larger than expected

2022-03-10 Thread Dominique Devienne

Hi,

I've instrumented by libpq code, to show the byte size of the result-set.
And I'm suprised that it's never smaller than 2,264 bytes, then the next
size I see is 3,288, etc... Even when receiving no rows at all, in fact.

I've trimmed-out the SQL, but these are all SELECT statements.
The output below is from Windows, if that matters.
The two larger ones at the bottom are fetching bytea blobs.

So, is this expected? I'm surprised 2KB+ is used to return no rows basically.
Where's the space going?

Is the reported space the same or similar to the actual bytes returned
on the wire?
Or they somehow add up local allocations client-side?
Any way to actually know how many bytes were returned on the wire?

I'm basically curious, and would appreciate pointers to better understand this.

Thanks, --DD

Running 1 test case...
...
`-> got 1 rows x 16 cols, 3,288 bytes (2.7 MB/s) in 1.159ms
...
`-> got 1 rows x 18 cols, 3,288 bytes (2.2 MB/s) in 1.417ms
...
`-> got 1 rows x 18 cols, 3,288 bytes (4.1 MB/s) in 0.771ms
...
`-> got 0 rows x 15 cols, 2,264 bytes (2.1 MB/s) in 1.041ms
...
`-> got 0 rows x 3 cols, 2,264 bytes (2.2 MB/s) in 0.991ms
...
`-> got 0 rows x 15 cols, 2,264 bytes (3.3 MB/s) in 0.662ms
...
`-> got 0 rows x 3 cols, 2,264 bytes (4.3 MB/s) in 0.503ms
...
`-> got 1 rows x 14 cols, 3,288 bytes (3.6 MB/s) in 0.863ms
...
`-> got 1 rows x 16 cols, 3,288 bytes (5.0 MB/s) in 0.623ms
...
`-> got 0 rows x 15 cols, 2,264 bytes (3.6 MB/s) in 0.602ms
...
`-> got 0 rows x 3 cols, 2,264 bytes (4.4 MB/s) in 0.494ms
...
`-> got 0 rows x 15 cols, 2,264 bytes (3.7 MB/s) in 0.591ms
...
`-> got 0 rows x 3 cols, 2,264 bytes (3.9 MB/s) in 0.548ms
...
`-> got 1 rows x 1 cols, 818,441 bytes (78.1 MB/s) in 9.996ms
...
`-> got 1 rows x 1 cols, 3,288 bytes (7.9 MB/s) in 0.396ms
...
`-> got 1 rows x 1 cols, 36,345 bytes (33.4 MB/s) in 1.038ms

*** No errors detected
*** tests_acme_foo in 2.926s (user: 0.219s) 38 MB

Press any key to continue . . .

COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines

2022-03-11 Thread Dominique Devienne

Hi.

It's my first time using COPY TO. And first time using built-in CSV support.
Performs well. BUT...

The code below (real code, but using a custom libpq wrapper lib) is run on
a few tables, with unit tests that verify the number of lines of the
output file.
And for a few of those tables, there's a mismatch, the output from PostgreSQL
"has too many lines". I've tracked these to text values in the DB with embedded
newlines. These values are 'normal'. I'm not use to CSV, but I suppose
such newlines
must be encoded, perhaps as \n, since AFAIK CSV needs to be 1 line per row, no?

So how's one supposed to configure the CSV output for the DB with
embedded newlines?

Thanks, --DD

auto rset = my_exec(*conn_, "COPY MY_TAB TO STDOUT WITH (FORMAT
CSV, HEADER)");
if (rset.status() != PGRES_COPY_OUT) {
raise("CSV Export via SQL COPY error: ", rset.error_msg());
}

std::ofstream os(file_name);
bool done = false;
while (!done) {
auto buf = pq::CopyOutBuffer::get(*conn_);

switch (buf.status()) {
case pq::CopyOutStatus::eWait:  assert(false); continue;
case pq::CopyOutStatus::eDone:  done = true; continue;
case pq::CopyOutStatus::eError: raise("PQgetCopyData: {}",
conn_->error_msg());
case pq::CopyOutStatus::eData:  break; // process it below
}

// Each buffer seems to a single line of output,
// with Unix-newline at the end, on all platforms.
os.write(buf.data(), buf.size());
}

os.close();

Re: COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines

2022-03-11 Thread Dominique Devienne

On Fri, Mar 11, 2022 at 7:38 PM Daniel Verite  wrote:
>>     Dominique Devienne wrote:
> > These values are 'normal'. I'm not used to CSV, but I suppose such newlines
> > must be encoded, perhaps as \n, since AFAIK CSV needs to be 1 line per row, 
> > no?
>
> No, but such fields must be enclosed by double quotes, as documented
> in RFC 4180 https://datatracker.ietf.org/doc/html/rfc4180

Hi Daniel. OK, good to know, thanks.

> Consider this output:
> psql> COPY (values (1, E'ab\ncd'), (2,'efgh')) TO STDOUT CSV;
> 1,"ab
> cd"
> 2,efgh

yes, this is consistent with what I am seeing in the debugger.
The value is indeed double-quoted, and definitely contains a _raw_ '\n' newline,
and not an encoded "backslash then n", as David was writing.

> That's 2 records on 3 lines.
> If you feed this to a parser and it chokes on it, it means that it's not a 
> valid CSV parser.

I never pretended that parser to be a CSV parser :). It's a naive "wc
-l" equivalent written by someone else.
And it's comparing PostgreSQL CSV output to some old legacy "CSV"
output that's home grown, which
does encode newlines as '\\' and 'n', and which most likely is not RFC
compliant (I'm not suprised! :))).

In my case, backward-compat is more important than "compliancy" (if
that's a word!),
so I can easily do that "\n" encoding myself, as a post-processing on
the buffer I get back.

Thank you for the help. --DD

PS: And David, no, it's not my wrapper that's at fault here :). It's a
thin wrapper,
  that's just easier, terser, and safer (RAII) to use compared to naked libpq.

Re: COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines

2022-03-11 Thread Dominique Devienne

On Fri, Mar 11, 2022 at 8:32 PM Daniel Verite  wrote:
>     Dominique Devienne wrote:
> > so I can easily do that "\n" encoding myself, as a post-processing on
> > the buffer I get back.
>
> Alternatively, it might be easier to use the default TEXT format of
> COPY rather than CSV, as the TEXT format already produces \n for
> line feeds, along with  half a dozen other special backslashes sequences.
> See https://www.postgresql.org/docs/current/sql-copy.html

I actually just submitted CSV+PostProcessing :)

But sure, if TEXT does the kind of pseudo-CSV I need, I'd change it to use it.
I'll look into it next week. Thanks again Daniel. --DD

PSQL bug?

2022-03-17 Thread Dominique Devienne

Made a typo, using } instead of ) to test a weird table name, and got
disconnected. Normal? --DD

ddevienne=> create table "t |" ( id int};
ddevienne(> );
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
ddevienne=> create table "t |" ( id int);
ddevienne=> drop table "t |";
ddevienne=> \! psql -V
psql (PostgreSQL) 12.1
ddevienne=> select version();
 version
-
 PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)

Re: PSQL bug?

2022-03-17 Thread Dominique Devienne

On Thu, Mar 17, 2022 at 11:38 AM Pavel Stehule  wrote:
>> Made a typo, [...], and got disconnected. Normal? --DD
> I don't see any problem on master

OK, good. Must have been fixed already then. Or was a fluke. Thanks
for checking. --DD

Re: Major release of advanced PostgreSQL driver written in C++

2022-04-12 Thread Dominique Devienne

On Tue, Apr 12, 2022 at 12:17 PM Dmitry Igrishin  wrote:
> I'm excited to announce Pgfe 2.0.0, an advanced and feature rich
> PostgreSQL driver written in C++.

Congrats on the release. I've been following your progress, and saw
good stuff in your commits.

Could you please more clearly state in the project's README.md the
minimal version
of C++ required, and whether the interface or impl changes with higher
versions of std?

Your hello-world example (extracted below), uses lambda-auto (C++14),
probably variadic templates (C++11 and up, C++17 for
fold-expressions), etc... So if you require C++20,
that's important to know (since out of bounds to me, for example).

In the same vein, what platforms / compilers / versions it's built and
tested on is also useful to know upfront.
(again, as an example, I need to be cross-platform and cross-compiler,
including Windows and MS VS).

My $0.02. And thanks for sharing your work. --DD

  // Executing statement with positional parameters.
  conn.execute([](auto&& r)
  {
std::printf("Number %i\n", to(r.data()));
  }, "select generate_series($1::int, $2::int)", 1, 3);

Who am I? Where am I connected?

2022-05-18 Thread Dominique Devienne

LibPQ has various defaults for the host, user, and DB name.
There's also the password file, the service file and service name.
In the example below, I can connect with a "naked" psql invocation.

Once connected, can I find out all aspects of the connection string?
Or where they came from, like a pgpass.conf or service file?

How to get the host, port, db name once connected?
SHOW and pg_settings does not appear to be it, at first glance.

Thanks, --DD

c:\Users\ddevienne>psql
psql (12.1, server 14.2)
WARNING: psql major version 12, server major version 14.
 Some psql features might not work.
WARNING: Console code page (437) differs from Windows code page (1252)
 8-bit characters might not work correctly. See psql reference
 page "Notes for Windows users" for details.
Type "help" for help.

ddevienne=>

Re: Who am I? Where am I connected?

2022-05-18 Thread Dominique Devienne

On Wed, May 18, 2022 at 12:07 PM Dominique Devienne  wrote:
> LibPQ has various defaults for the host, user, and DB name.
> There's also the password file, the service file and service name.
> In the example below, I can connect with a "naked" psql invocation.
>
> Once connected, can I find out all aspects of the connection string?

Thank you all for \conninfo.

I was more thinking at the time about the SQL-way to get that info,
not the psql way.
But thanks to https://www.postgresql.org/docs/current/functions-info.html
I managed
to emulate it, modulo resolving the server's IP into a hostname.

ddevienne=> \conninfo
You are connected to database "ddevienne" as user "ddevienne" on host
"localhost" (address "::1") at port "5432".

ddevienne=> select current_database() || ' ' || session_user || ' ' ||
inet_server_addr() || ':' || inet_server_port();
 ?column?
--
 ddevienne ddevienne ::1/128:5432
(1 row)

> Or where they came from, like a pgpass.conf or service file?

OTOH, no one replied to that part of the question.

How to know if the user or database name was defaulted?
Or came from a service-file, using a given service name?
Is there no way, except by reverse-engineering the logic of the
env-vars and built-in defaults?

Re: Who am I? Where am I connected?

2022-05-18 Thread Dominique Devienne

On Wed, May 18, 2022 at 5:43 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, May 18, 2022 at 3:08 AM Dominique Devienne 
> wrote:
>
>> Once connected, can I find out all aspects of the connection string?
>> Or where they came from, like a pgpass.conf or service file?
>>
>> How to get the host, port, db name once connected?
>> SHOW and pg_settings does not appear to be it, at first glance.
>>
>>
> The server has no clue how the values sent to it came into existence - nor
> should it.
>
> Whether and how any particular client might expose this kind of debugging
> information (or upgrade it to proper state info) is up to the client.  I do
> not know what options psql offers.
>

AFAIK, it’s not psql that does this though, it’s libpq the official client
api.
And the fact libpq has no way to surface that info seems like an important
oversight.

>

No default for (user-specific) service file location on Windows?

2022-05-19 Thread Dominique Devienne

Hi,

The doc is explicit about defaults for the password file:
>From https://www.postgresql.org/docs/current/libpq-pgpass.html
Linux: ~/.pgpass
Windows: %APPDATA%\postgresql\pgpass.conf

But for the service file OTOH, only the Linux default is documented:
From: https://www.postgresql.org/docs/current/libpq-pgservice.html
Linux: ~/.pg_service.conf
Windows: N/A

I tried the obvious %APPDATA%\postgresql\pg_service.conf, with no luck.

So is this a lack in the documentation for the service file?
Or is it instead a lack of symmetry in the implementations of these
two mechanisms?

If the latter, any chance the symmetry would be restored? (with
corresponding doc update).

I guess the only work-around specific to Windows at this point, if the latter,
would be defining the PGSERVICEFILE env-var explicitly? --DD

Re: No default for (user-specific) service file location on Windows?

2022-05-19 Thread Dominique Devienne

On Thu, May 19, 2022 at 10:17 AM Julien Rouhaud  wrote:
> > Or is it instead a lack of symmetry in the implementations of these
> > two mechanisms?
>
> As far as I can see from the code it should be
> %APPDATA%\postgresql\.pg_service.conf
>
> Does this one work?

Yes it does! Thanks.

> There is a discrepancy (apart from the documentation) as the default
> file name is the same on Linux and Windows.

The doc is explicit about Windows for the password file,
probably should be for the service file as well IMHO.

It's unfortunate that the naming conventions are different.
Might need to be warned about in the doc too. My $0.02.

Could also be changed to try both on Windows, with or w/o the leading dot,
but I suspect this kind of change won't be considered worth implementing :).

Thanks again Julien, --DD

Re: No default for (user-specific) service file location on Windows?

2022-05-19 Thread Dominique Devienne

On Fri, May 20, 2022 at 5:31 AM Julien Rouhaud  wrote:
> On Thu, May 19, 2022 at 06:38:46PM -0400, Tom Lane wrote:
> > Julien Rouhaud  writes:
> > Pushed, with some minor twiddling to make ... descriptions more alike.
> Thanks a lot!

And thank you both! --DD

Re: accessing postgres from c++

2022-06-21 Thread Dominique Devienne

On Tue, Jun 21, 2022 at 7:59 AM Laurenz Albe  wrote:
> I am surprised that you choose to be fettered by the constraints of a generic 
> API like ODBC.
> For me, that only makes sense if you want to be portable to different 
> databases.

Ditto.

> I would use ... C API of libpq, or, .. C++ boilerplate around it, use libpqxx.

There's also this modern C++ libpq wrapper, announced on this list
earlier this year:
https://github.com/dmitigr/pgfe

PS: Haven't used it. but followed it's development (to inform my own
non-OSS C++ wrapper dev), and it looked solid from a distance.
PPS: I think we tried libpqxx in the past, and I kinda remember it
forced you into non-binary mode, which was a non-started for
performance.

About revoking large number of privileges; And the PUBLIC role.

2022-07-07 Thread Dominique Devienne

Hi. I'm struggling to delete databases because of grants to roles on
objects of those DBs.

These DBs can have a large'ish number of schemas, 100-300 is typical.
and define a bunch of ROLEs "specific" to those schemas. Normally "login user"
ROLEs are never granted explicit access to objects, instead only the
"db specific" ROLEs
get those grants, and regular users are granted some of those ROLEs.

So my goal is to delete all those "db specific" ROLEs, then the DB
with all its schemas.
Which implies REVOKE'ing grants on all those "db specific" ROLEs first.
(and "login users" just implicitly lose membership in "db specific"
roles when the latter are dropped)

OK, so to help me achieve that "mass revoking", I thought I'd use pg_shdepend,
but turns out to not be that easy to figure this shared catalog out...
I'm made some
progress on that, but then trying various manual REVOKEs, I often don't see any
changes in pg_shdepend, and I'm not sure whether it's because of PUBLIC, or
because of DEFAULT PRIVILEGES, or because I'm not always revoking using the
same ROLE as the one that did the grants, or something else I don't
yet know about...

So I'd thought I'd seek clarifications here, and go back to asking a
few basic basic-principle questions.

The way the GRANTs are made right now is via:
`ALTER DEFAULT PRIVILEGES IN SCHEMA $schema GRANT $priv TO $role`

With (so far) $priv taking all these values:
- "SELECT ON TABLES",
- "USAGE, SELECT ON SEQUENCES",
- "EXECUTE ON ROUTINES",
- "USAGE ON TYPES".

the default privileges are updated *before* schema objects are created.
those all "db specific" (and "schema specific" too) roles are getting
their object grants via DEFAULT PRIVILEGES.

So my first question is whether revoking from the DEFAULT PRIVILEGES
is enough to "ungrant" all those object privileges?

My reading of the doc seem to imply it does, but after for example

ALTER DEFAULT PRIVILEGES IN SCHEMA $schema REVOKE SELECT ON TABLES FROM $role

nothing in pg_shdepend changed.  Could it be related to the PUBLIC role?

Which bring me to questions on PUBLIC. I suspect there's something I
don't understand here.
>From my reading, all ROLEs are implicitly members of PUBLIC, and you
cannot avoid that?
And it seems many GRANTs are implicitly made to public, w/o my realizing it?

Just recently, I discovered any user could connect to new databases I
created, when I didn't want that.
I've started to explicitly `revoke all on database {} from public`
everytime I create a new DB. But I just
don't quite understand why I can't seem to avoid PUBLIC having
implicit access. Am I missing something?

So similarly, is PUBLIC getting implicit access to my DEFAULT PRIVILEGES too?
My pg_shdepend results seem to indicate so, although I'm not 100% sure
(and why I'm here now).

What steps do I need to take to ensure PUBLIC gets "nothing" on the
DBs and SCHEMAs I create?
I always want grants to be made explicitly, on specific ROLEs I
control, never implicitly.

Right now, I'm doing most of my experimentation as a SUPERUSER, which
could be skewing my (non-conclusive) results.
That's unlikely to be the case in the "real-world". I suppose it
matters who revokes privileges?

How can I translate rows in pg_shdepend into the proper REVOKE call?
```
select c.relnamespace::regnamespace::text,
   r.rolname,
   count(*)
  from pg_database db
  join pg_shdepend dep on db.oid = dep.dbid
  join pg_roles r on dep.refobjid = r.oid
  join pg_class c on dep.objid = c.oid
 where datname = current_database()
   and refclassid::regclass::text = 'pg_authid'
   and classid::regclass::text = 'pg_class'
 group by 1, 2
```
So far, my (naive?) attempt above tells me how my dependencies I have
between ROLEs and SCHEMAs.
That tells me how my dependencies I have between them (I think at
least!), but I don't know how to turn
what the aggregation is hiding into the proper REVOKE calls, from
what's inside pg_shdepend.
Can somehow help with that? Should it be a separate thread, just on that?

Finally, a tangentially related question, to end this first email,
related to what is transactional and what isn't.
The doc doesn't seem to say much on this subject.
- And all GRANT/REVOKE transactional?
- Is DB create/drop transactional?

I'm trying to get to grasp with dropping DBs and ROLEs, and would
appreciate this list's expert opinions.
I suspect this is only the first message of a thread, I tried to put
just enough info I thought relevant to my first questions.

Thanks, --DD

Re: About revoking large number of privileges; And the PUBLIC role.

2022-07-07 Thread Dominique Devienne

On Thu, Jul 7, 2022 at 3:52 PM Tom Lane  wrote:
>
> Dominique Devienne  writes:
> > Hi. I'm struggling to delete databases because of grants to roles on
> > objects of those DBs.
>
> > These DBs can have a large'ish number of schemas, 100-300 is typical.
> > and define a bunch of ROLEs "specific" to those schemas. Normally "login 
> > user"
> > ROLEs are never granted explicit access to objects, instead only the
> > "db specific" ROLEs
> > get those grants, and regular users are granted some of those ROLEs.
>
> > So my goal is to delete all those "db specific" ROLEs, then the DB
> > with all its schemas.
> > Which implies REVOKE'ing grants on all those "db specific" ROLEs first.
>
> You should not really have to revoke those manually.
> The normal process for that is to use DROP OWNED BY.

Except we already went through that, that DROP OWNED BY acquires too many locks.
Increasing max_locks_per_transaction when it fails is just not an option IMHO.
One user had to raise it to 32K for his particular DB, which is not
even that large.

Or are you saying setting it to 1M or 1B is "safe", and should be
required setup for users?

Is revoking privileges taking locks? Is dropping a DB taking locks?
If neither are, then I can work around the limitations of DROP OWNED BY.

So will the community help me figure this out?

BTW, I'm also hoping revoking privs, and dropping roles and dbs will
be faster than DROP OWNED BY.
That DB took 30min to 1h to get rid of, via DROP OWNED BY, which seems
to long to delete a bunch of files, no?

Number of updated rows with LibPQ

2022-10-05 Thread Dominique Devienne

Hi,

Is there a way to programmatically now how many rows an UPDATE did update?
I've read about [PQcmdTuples][1], but surely I shouldn't have to parse
that string, no?
For selects, I have [PQnTuples][2], but what to do on INSERT, UPDATE, DELETE?
Parse the result of PQcmdTuples myself??? If so, what's the 0 in the
INSERT below?
Is the output of PQcmdTuples "stable", i.e. "official"? Not locale dependent?

Thanks, --DD

[1]: https://www.postgresql.org/docs/14/libpq-exec.html#LIBPQ-PQCMDTUPLES
[2]: https://www.postgresql.org/docs/14/libpq-exec.html#LIBPQ-PQNTUPLES

postgres=# create table foo (v int);
CREATE TABLE
postgres=# insert into foo values (1), (2), (3);
INSERT 0 3
postgres=# update foo set v = 2*v where v = 2;
UPDATE 1
postgres=# delete from foo where v = 3;
DELETE 1
postgres=# select v from foo where v > 1;
...
(1 row)
postgres=# delete from foo;
DELETE 2
postgres=# drop table foo;
DROP TABLE
postgres=#

[libpq] OIDs of extension types? Of custom types?

2022-10-14 Thread Dominique Devienne

Hi. I'm using binary binds and results for DMLs, as well as for COPY.

So far, I've stayed within built-in scalar and array types, thus I
could hardcode the OIDs of values "in the type-system" (in C++, via
traits).
But I'd like to venture into extension (e.g. hstore) and custom
(enums, domain, etc...) types.
Thus I'm wondering:
1) whether "official" extensions have fixed/stable OIDs, like in my
hstore example. If so, where are they defined?
2) how should I be looking up OIDs for custom (or extension?) types
with libpq? Any specific APIs? Or I need to do SQL instead?
3) If I duplicate custom types per-schema, to keep them standalone,
they'll get different OIDs, right?

Thanks for any insights on the above. --DD

Re: Number of updated rows with LibPQ

2022-10-14 Thread Dominique Devienne

On Wed, Oct 5, 2022 at 8:17 PM Tom Lane  wrote:
> Laurenz Albe  writes:
> > On Wed, 2022-10-05 at 16:38 +0200, Dominique Devienne wrote:
> > Yes, you have to use PQcmdTuples(), and you have to convert the string to 
> > an integer.
> > But don't worry: the result will *not* be "INSERT 0 5", it will be just "5"

Thanks. What's the leading 0 though, then?
I guessed it might be the number of rows returned, but it isn't, see below:

postgres=# create table foo (id serial primary key, v int);
CREATE TABLE
postgres=# insert into foo (v) values (1), (2) returning id;
 id

  1
  2
(2 rows)

INSERT 0 2

> Yeah, just applying atoi() or atol() to the result should be enough.

Thanks too. Since I'm in C++, I used  instead, and
discovered it can be empty something, not 0.
I guess atoi() would have hidden that distinction, and worked anyway
(returning 0).

In the same topic, I've noticed an INSERT returns PGRES_COMMAND_OK,
while an INSERT-RETURNING returns PGRES_TUPLES_OK. So there's no way
to use the status to distinguish a SELECT from anything else? A RETURNING clause
makes any statement supporting it an hybrid of a command and a query,
but then how
does one know the exact "kind" of the statement?

E.g. So how does psql show INSERT in either cases? By parsing the SQL
itself, client-side?
Or is there a libpq API on PGresult that would allow to get the type
of statement the result is from?

Thanks, --DD

Re: [libpq] OIDs of extension types? Of custom types?

2022-10-14 Thread Dominique Devienne

On Fri, Oct 14, 2022 at 2:31 PM Laurenz Albe  wrote:
> You use the #defines like TEXTOID for the built-in Oids, right?

I don't. I used
https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_type.dat
as a reference.
I suspect that should be fairly stable, right? I have at least 2 or 3
dozen OIDs pairs (scalar + array) of primitives
and other types (Oid, Name, Text, Bytea, Uuid, etc...). Are there
#defines for all of those? Where?

template<> struct OidTraits {
// boolean, true/false
static constexpr Type type{ "bool", 1, Oid{ 16 }, Oid{ 1000 } };
};

template<> struct OidTraits {
// variable-length string, binary values escaped
static constexpr Type type{ "bytea", -1, Oid{ 17 }, Oid{ 1001 } };
};

etc...

> For types from an extensions, you would run a query on "pg_type".

 OK, thanks.

Re: [libpq] OIDs of extension types? Of custom types?

2022-10-14 Thread Dominique Devienne

On Fri, Oct 14, 2022 at 4:35 PM Tom Lane  wrote:
> > I don't. I used
> > https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_type.dat
>
> They're stable

Good to know, thanks.

> but writing magic numbers leads to unreadable code.
> Use the macros from catalog/pg_type_d.h.

OK. I see that header. But it seems to be a "server" header, not a client one.
I.e. I'm not sure it's a good idea to depend on such a header for pure
client-side libpq code.

> >> For types from an extensions, you would run a query on "pg_type".
> >  OK, thanks.
>
> In SQL queries, you can avoid hard-wiring anything by writing
> things like "'hstore'::regtype".  It may or may not be possible
> to avoid fetching the OID altogether that way.

Something like below you mean?
Thanks for the tip.

I do need to OIDs, on binds and results (defines in OCI speak), in my C++ code.
Because I try to enforce strong type safety between the C++ world, and
the libpq "bytes".
I don't check OIDs of the values on binds, I just give them to libpq
on execute().
But for results, I compare the actual OID (from PGresult) and the
expected OID from the C++ type (via the traits).

ddevienne=> select 'hstore'::regtype;
 regtype
-
 hstore
(1 row)

ddevienne=> select 'hstore'::regtype::oid;
oid
---
 207025799
(1 row)

ddevienne=> select 'uuid'::regtype::oid;
 oid
--
 2950
(1 row)

ddevienne=> select 'uuid[]'::regtype::oid;
 oid
--
 2951
(1 row)

Re: Number of updated rows with LibPQ

2022-10-14 Thread Dominique Devienne

On Fri, Oct 14, 2022 at 4:39 PM Daniel Verite  wrote:
> Or a SELECT, or an UPDATE RETURNING, or a DELETE RETURNING.
> The code still need to look at PQcmdStatus() to learn which kind it is,

Thanks Daniel. This is exactly what I needed. --DD

PS: Now I wonder whether PQcmdTuples() points to the same buffer as
PQcmdStatus(), except with an offset...

byte-size of column values

2022-10-18 Thread Dominique Devienne

Hi. I'm surprised by the result for bit(3) and char, when calling
pg_column_size().

Why 6, instead of 1? The doc does mention 5-8 bytes overhead, but I
expected those for varying bit, not fixed-sized bit typed values. How
come?

Similarly, why 2 for char? Is it linked to Unicode?
1 byte for the varying UTF-8 encoded length, then the (potentially)
multi-byte encoding?
(the very bit pattern of UTF-8 allows to infer the encoded length, so
storing the length explicitly is theoretically not even necessary)

Similarly, enums are always 4 bytes I read, despite rarely exceeding
cardinalities beyond a single byte can store.

How does one store as compactedly as possible several small enums, on
millions of rows?

And is the minimum column size always 2?

I'm sure many we call out "premature optimization", but isn't using 32
bits instead of 2, 3 (or 8, to round to a byte) wasteful, in disk
space, thus then ultimately energy? (OK, that last one is pushing it
:) ).

I'm sure there are reasons for the above. And I guess I'm curious
about them. Thanks, --DD

ddevienne=> create table foo (b3 bit(3), i2 int2, i4 int4, i8 int8, c char);
CREATE TABLE
ddevienne=> insert into foo values (b'101', 1002, 1004, 1008, 'C');
INSERT 0 1
ddevienne=> select pg_column_size(b3), pg_column_size(i2),
pg_column_size(i4), pg_column_size(i8), pg_column_size(c) from foo;
 pg_column_size | pg_column_size | pg_column_size | pg_column_size |
pg_column_size
++++
  6 |  2 |  4 |  8 |
   2
(1 row)

Re: byte-size of column values

2022-10-19 Thread Dominique Devienne

On Tue, Oct 18, 2022 at 6:04 PM David G. Johnston
 wrote:
> On Tue, Oct 18, 2022 at 8:53 AM Dominique Devienne  
> wrote:
>> I'm surprised by the result for bit(3) and char, when calling 
>> pg_column_size().

> The base type is what matters, if the length of the actual type is a parameter
> (the (n) part) the underlying type must be variable.

Thanks. Interesting. Didn't know (n)-suffixed "fixed-length" types
where always based on variable-size ones.

>> How does one store as compactedly as possible several small enums
> int2

OK, I see. Thanks again.

> p.s., pretend char doesn't even exist.

I realize that now. Wasn't obvious to me, despite the warning in the doc.

Re: byte-size of column values

2022-10-19 Thread Dominique Devienne

On Tue, Oct 18, 2022 at 6:04 PM Tom Lane  wrote:
> Dominique Devienne  writes:
> > I'm surprised by the result for bit(3) and char
> > The doc does mention 5-8 bytes overhead, but I expected
> > those for varying bit, not fixed-sized bit typed values.
>
> Your expectation is incorrect.  Postgres always treats these types
> as variable-length, whether or not the column has a length constraint.

OK. Still, wasn't such a stretch to assume that, no?
Now I know better, thanks to you and David.
I'm not sure the doc on types talks about that either.
Didn't see it for sure (but could still be there and I missed it).

> Thus, there's always a header to store the actual length.  That can
> be either 1 or 4 bytes (I think the doc you are looking at might be
> a little out of date on that point).

Even the doc on v15 (or devel) still says 5-to-8.
https://www.postgresql.org/docs/15/datatype-bit.html

And on my v12, that's born out from my experimentation.
Being used to SQLite using varints,
I'd have expected fewer overhead bytes for the size, like your 1-to-4.

> Because of the popularity of variable-width character encodings,
> a column declared as N characters wide isn't necessarily a fixed
> number of bytes wide, making it a lot less useful than you might
> think to have optimizations for fixed-width storage.  Between that
> and the fact that most Postgres developers regard CHAR(N) as an
> obsolete hangover from the days of punched cards, no such
> optimizations have been attempted.

Thanks for the background. I definitely appreciate PostgreSQL's large 1GB
limit on text and bytea columns, coming from Oracle's tiny 4K one, which
created us all kind of headaches.

For kicks, I looked at bpchar, blank-padded-char, and its extra byte, which
I assume is again some kind of length, there at least the overhead is small
compared to bit(n). 1 bytes versus 5 bytes is no small difference.

ddevienne=> create table bar (bpc bpchar(16));
CREATE TABLE
ddevienne=> insert into bar values ('foo'), ('bar baz');
INSERT 0 2
ddevienne=> select length(bpc), pg_column_size(bpc), '<'||bpc||'>' from bar;
 length | pg_column_size | ?column?
++---
  3 | 17 | 
  7 | 17 | 
(2 rows)

How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne

Hi. I'd like some advice storing blobs (millions of them), ranging
from very small, to large > 1GB.

I know about bytea versus lo, and I have probably read most of what's
out there about them :)

Upfront, I have to state that I'm not keen on lo, because of security
considerations. We store
blobs in many different schemas, and users can access some schemas,
and not others. So
the fact the lo table is unique for the whole database would allow
users to see blobs from any
schema, as I understand it. Right? OTOH, lo has random access, which I
also need...
(I'm also not a fan of lo needing triggers for lifetime management)
(nor of the requirement to have an explicit transaction to use lo).

Here are the main requirement I need to fulfil:
1) store literally millions of rows, 1 "blob" per row. (scientific data).
2) most "blobs" are smallish, below a few KBs. bytea's perfect for that.
3) yet many blobs are on the dozens of MBs. bytea's still OK.
4) but some blobs exceed the 1GB byte limit. Found at least a dozen
just in our test data, so clients will have them.
5) for accessing larger blobs, the API I must implement accesses
contiguous chunks of the blobs. Thus I need random access.

I'm porting the backend of that API from Oracle to PostgreSQL.
In Oracle we used RAW for smaller values, and SecureFile blobs for larger ones.
Oracle LOBs are similar to lo I guess, providing random access, except
higher performance for large sizes,
but slower than bytea for smaller one. But a PITA to deal with, with
locators, breaking pre-fetching, etc...

PostgreSQL bytea is much better and simpler, except limited to 1GB...
Bytea also has no direct random access, except via substr[ing], but
how efficient and "random access" is that?
For example, SQLite stores large values in overflow pages, and has to
traverse all those pages to pointer-chase
the chain of those for large blobs, so the substr() "random access" is
NOT O(1) and involves way too much IO (i.e. O(N)).
Do TOAST'ed large values (text or bytea) fare better in term of
substr() efficiency, close to O(logN) and only
the necessary IO for what's actually requested by the substr() (modulo
the extrema 2K chunks of TOAST)?

Some posts on the internet also peg lo to be faster above 20MB
compared to bytea, for example. Is that even true?
One post I read (from Daniel Verite I think) kinda hinted the lo table
is not that different from TOAST ones,
with 2K chunks, thus I don't see why lo would be any faster than a
TOAST'ed bytea for example.
Any insights from someone with knowledge of the internals can share on this?

At this point, I have not yet redone the extensive benchmarking we did
a few years ago between
Oracle and PostgreSQL, which went into lo versus bytea then. And
benchmark are hard to do, not
having easy access to different kind of servers with different storage
backends, or cloud-hosted PG.
It's too easy to get biais from a local setup, leading to a design
that'd not perform optimally in a different one.
That's why I'm asking a more open-ended question to experts on this list.

Because of #4 above, I need to either use lo (but see above, notably
the security concern),
or roll-up my own TOAST^2 (squared), as I call it, where I manually
"shard" / chunk large blobs
in an auxiliary table, itself TOAST'ed of course, with some threshold
for chunks (e.g. 1 or 4 or 16 MB perhaps).
The latter keeps the blobs in the schema (good for security), lifetime
is managed by FKs (as usual),
and the sharding limits the ill-effects of "emulating" random-access
with substr() if necessary.
I've already done things like this in SQLite land (also has the 1GB
limit for its text and blob types).

So is this a terrible idea? What alternatives do I have?
I'd really appreciate some expert advice on the above, before I go too
far down the rabbit hole.

Thanks, --DD

PS: Another drawback of lo is that because it's a single table, it
still subject to the 32 TB limit on a relation.
  The 4TB limit per lo is way more than we need, but the 32 TB limit
may actually be more of an issue for
  our larger clients, which have thousands of projects, each with
upwards of a few millions of those blobs.
  bytea values being stored in different schemas (per-project, a
design constraint), puts that limit per-project
  which will be more than enough. For the sum of all projects, maybe
not... I.e. with real client-case of 3K projects,
  that puts an average of only 10GB of lo's per-project (i.e. schema),
which could very well be problematic...

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne

On Wed, Oct 19, 2022 at 12:17 PM Andreas Joseph Krogh 
wrote:

> First advice, don't do it. We started off storing blobs in DB for “TX
> safety”
>

Not really an option, I'm afraid.

> , but backup/restore quickly became too cumbersome so we ended up moving
> all blobs out and only store reference in DB.
>
This required us to make a “vacuum system” that cleans up the blob-storage
> regularly as ROLLBACK/crash can make it out of sync.
>

Note the fact the data is spread in many mostly independent schemas.
Might ease the backup/restore? I'm not much of a DBA though...

> We chose storing as LO because with it, streaming large blobs (not using
> much memory) actually worked, with JDBC at least.
>

I'm in C++, with I believe efficient use of binary binds and results, and
use of COPY as much as possible,
so as good as it gets I think (that I know of, with libpq), in terms of
performance. Haven't looked at libpq's
new in v14 pipeline mode yet though.

Investigated Cursor vs Statement too, and it's a tradeoff between latency
and throughput.
Not sure how good JDBC drivers for PostgreSQL are. I'm doing my best to go
fast with libpq.

In any case, thanks for your input. But it's not really a question of "if".
But of "how".

Putting thousands of large blobs in the file system is a no go. Assuming
the clients
can even see the file system the server sees. This is a 2-tier system,
there's no mid-tier
that would somehow magically handle proper security and lifetime management
of these blobs.

Thanks, --DD

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne

On Wed, Oct 19, 2022 at 1:00 PM Andreas Joseph Krogh  wrote:
> Ok, just something to think about;

Thank you. I do appreciate the feedback.

> Will your database grow beyond 10TB with blobs?

The largest internal store I've seen (for the subset of data that goes
in the DB) is shy of 3TB.
But we are an ISV, not one of our clients, which have truly massive
scale for data.
And they don't share the exact scale of their proprietary data with me...

> If so try to calculate how long it takes to restore, and comply with SLA,
> and how long it would have taken to restore without the blobs.

Something I don't quite get is why somehow backup is no longer needed
if the large blobs are external?
i.e. are you saying backups are so much more worse in PostgreSQL than
with the FS? I'm curious now.

Also, managing the PostgreSQL server will be the client's own concern
mostly. We are not into Saas here.
As hinted above, the truly massive data is already not in the DB, used
by different systems, and processed
down to the GB sized inputs all the data put in the DB is generated
from. It's a scientific data heavy environment.
And one where security of the data is paramount, for contractual and
legal reasons. Files make that harder IMHO.

Anyways, this is straying from the main theme of this post I'm afraid.
Hopefully we can come back on the main one too. --DD

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne

On Wed, Oct 19, 2022 at 1:38 PM Andreas Joseph Krogh 
wrote:

> There's a reason “everybody” advices to move blobs out of DB, I've learned.
>

I get that. I really do. But the alternative has some real downsides too.
Especially around security, as I already mentioned. That's why I'd like if
possible
to get input on the technical questions of my initial post.

That's not to say we wouldn't ultimately move out the big blobs outside the
DB.
But given how much that would complexify the project, I do believe it is
better
to do it as a second step, once the full system is up-and-running and
testing at
scale has actually been performed.

We've already moved other kind of data to PostgreSQL, from SQLite DBs
(thousands) this time,
and ported "as-is" the sharding done on the SQLite side to PostgreSQL
(despite TOAST).
And so far, so good. With good ingestion rates. And decent runtime access
to data too,
in the albeit limited testing we've had so far.

Now we need to move this other kind of data, from proprietary DB-like files
this times (thousands too),
to finish our system, and be able to finally test the whole system in
earnest, and at (our limited internal) scale.

So you see, I'm not completely ignoring your advise.

But for now, I'm inquiring as to the *best* way to put that data *in*
PostgreSQL,
with the requirements / constraints I've listed in the first post.
It may indeed be a bad idea long term. But let's make the most of it for
now.
Makes sense? Am I being unreasonable here? --DD

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne

On Wed, Oct 19, 2022 at 3:01 PM Daniel Verite  wrote:
>     Dominique Devienne wrote:
> > the fact the lo table is unique for the whole database would allow
> > users to see blobs from any schema, as I understand it.

> Each large object has its own set of permissions. This is a significant
> difference with bytea, since every creation of a new large object
> may need to be followed by GRANT statements.
> Also if the roles and the access policies are changed in the
> lifetime of the app, that might imply massive REVOKE/GRANT
> statements to apply to existing objects.

Thank you Daniel. Very interesting, and something I definitely didn't know.

I believe that's doable, given our design on ROLEs, but would for sure
be both a PITA, and
additional management / code to deal with. At least GRANTs are
transactional like the new
LO oids themselves, I think, so now I know it would be possible to
properly secure the LOs.

This insight is greatly appreciated. --DD

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne

On Wed, Oct 19, 2022 at 3:05 PM Alvaro Herrera  wrote:
> On 2022-Oct-19, Dominique Devienne wrote:
> > OTOH, lo has random access, which I also need...
>
> Generally speaking, bytea sucks for random access, because if a TOAST
> item is compressed, it has to be always read from the beginning in order
> to decompress correctly.  However, if you set
> ALTER TABLE tab ALTER COLUMN byt SET STORAGE EXTERNAL
> then compression is not used, and random access becomes fast.

Thank you Álvaro. Quite insightful as well. I was aware of TOAST compression,
but didn't make the connection to the side-effects on random-access.

But now that TOAST has LZ4 support, which decompresses extremely fast,
compared to ZLib (I have experience with LZ4 for a WebSocket-based
server messages),
and choosing an appropriately small shard/chunk size, that might be
mitigated somewhat.
Would need testing / benchmarking to compare uncompressed vs LZ4, at
various chunk
and subset/offset sizes, of course.

Anybody has an answer to my question regarding how substr() works on
bytea values?
I.e. is it "pushed down" / optimized enough that it avoids reading the
whole N-byte value,
to then pass it to substr(), which then returns an M-byte value (where M < N)?

If TOAST stores 2,000 chunks, and those chunks' PKs are numbers,
simple arithmetic
should be able to select only the chunks of interest, those incurring
only the necessary IO
for the selected range, no?

Or the fact subsetting a bytea currently requires substr() prevents
using such a scenario?
And if so, why not support a native subsetting notation that did
support that scenario,
like the obvious bytea_col[offset, count] or bytea_col[start:end]?

Seems to be me efficient native subsetting of varlength values would
be quite valuable.

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-20 Thread Dominique Devienne

On Wed, Oct 19, 2022 at 4:29 PM Daniel Verite  wrote:
> In your case I would probably opt for bytea (as opposed to large
> objects), and slicing the blobs in the application in chunks

Thanks for the advice, and the valuable info on LO permissions. --DD

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-20 Thread Dominique Devienne

On Wed, Oct 19, 2022 at 5:30 PM Alvaro Herrera  wrote:
> That's exactly what I was trying to say.
> If there's no compression, we don't read prior chunks.

Great to read that. I'll probably try to benchmark w/ and w/o
compression eventually.
Need to deal with other issues first, will take a while to report back on this.

> This can be seen in detoast_attr_slice()

Thank you for the pointer. I'll be sure to have a look at that code.
And thanks again for chiming in, with very useful info Alvaro.

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-20 Thread Dominique Devienne

On Wed, Oct 19, 2022 at 5:05 PM Laurenz Albe  wrote:
> On Wed, 2022-10-19 at 12:48 +0200, Dominique Devienne wrote:
> > On Wed, Oct 19, 2022 at 12:17 PM Andreas Joseph Krogh  
> > wrote:
> > > First advice, don't do it. We started off storing blobs in DB for “TX 
> > > safety”
> > Not really an option, I'm afraid.
> You should reconsider.  Ruling out that option now might get you into trouble
> later.  Large Objects mean trouble.

Andreas, Ericson, Laurenz, thanks for the advice.
I'll be sure to discuss these concerns with the team.

We have other (bigger) data in the file system, albeit more of a
read-only nature though perhaps.
And this is an area I'm not familiar with how security is handled, so
I'll investigate it to see if a path
forward to externalize the largish blobs (currently destined to live
in the DB) is possible.
So I hope you can see I'm not dismissing what you guys are saying.

But before I finish this thread for now, I'd like to add that I
consider unfortunate a state of affairs where
NOT putting the data in the DB is the mostly agreed upon advice. It
IMHO points to a weak point of
PostgreSQL, which does not invest in those use-cases with large data,
perhaps with more file-system
like techniques. Probably because most of the large users of
PostgreSQL are more on the "business"
side (numerous data, but on the smaller sizes) than the "scientific"
side, which (too often) uses files and
files-in-a-file formats like HDF5.

FWIW, when Oracle introduced SecureFile blobs years ago in v11, it
represented a leap forward in
performance, and back then we were seeing them being 3x faster than LO
at GB sizes, if I recall correctly,
with throughput that challenged regular networked file-system like
NFS. That was over 10 years ago,
so who knows where we are now. And from the posts here, the issues
with large blobs may be more
related to backup/restore perhaps, than runtime performance.

Having all the data in the DB, under a single security model, is a big
win for consistency and simplicity.
And the fact it's not really possible now is a pity, in my mind. My
(probably uninformed) opinion on this
is the large blobs are handled just like other relational data, in
paged storage designed for smaller data.
I.e. file-like blobs are shoehorned into structures which are
inappropriate for them, and that a rethink
and redesign is necessary specifically for them, similar to the Oracle
SecureFile one of old.

I have similar gripes with SQLite, which is otherwise a fantastic
embedded DB. Just see how the
SQLite-based Fossil-SCM fails to scale for very large repo with big
(e.g. game) assets, and how it
similarly failed to scale in SVN a long time ago, to be replaced by a
forest-of-files (which GIT also uses).

DBs like PostgreSQL and SQLite should be better at this. And I hope
they get there eventually.
Sorry to turn a bit philosophical at this. It's not a critic per-se.
More of the personal musing of a
dev in this space for a long time. FWIW. Thanks, --DD

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-20 Thread Dominique Devienne

On Thu, Oct 20, 2022 at 12:21 PM Laurenz Albe  wrote:
> On Thu, 2022-10-20 at 10:32 +0200, Dominique Devienne wrote:
> > FWIW, when Oracle introduced SecureFile blobs years ago in v11, it
> > represented a leap forward in
> > performance, and back then we were seeing them being 3x faster than LO
> > at GB sizes, if I recall correctly,
>
> I don't know what exactly they are, but I suspect that they are just
> files (segments?) in Oracle's "file system" (tablespaces/datafiles).
> So pretty much what we recommend.

I don't think so :) Pretty much the opposite in fact.

Don't let the term SecureFile foul you. From the DB client's perspective,
they are a server-side value like any other, living in a row/col
(relation/tuple),
pretty much like a BYTEA value. But unlike BYTEA, and like LO, what you
SELECT or UPDATE is a LOB Locator, not the content itself, then used
in separate APIs,
so very much like LO. So you get the best of both BYTEA (acts like a
value, in a tab/col),
and LO (random access).

Fully transactional. In fact, the LOB locator in a handle to the MVCC machinery,
so you can get the locator(s) within a transaction, commit that transaction, and
later if you read from the locator, you are still AS OF that
transaction (i.e. read-consistent).
That's super handy for lazy-loading the data in the app on demand. You of course
expose yourself to "snapshot too old". We missed that dearly in PostgreSQL.
Could emulate it with an explicit SNAPSHOT left open, not nearly as convenient.

And there are specialized APIs that allow operating on *multiple* LOBs
in a single
server round-trip, which was essential for performance, for smaller
ones. In Oracle,
anything above 32KB had to be a LOB (at the time), yet you don't want to do a
round-trip for each and every 32KB chunk of data, when you have
thousands like load.
(unless you shard yourself "manually", but good luck matching the perf
of SecureFile LOBS)

I'm not privy of how they were implemented internally. But I do
believe the old blobs
they were replacing (at the time) where more like LO is, i.e. handled
(mostly) like
the other datatypes, in the table infrastructure; while these new (in v11) blobs
were handled internally completely differently, more in a file-system manner.
Thus the large performance gap between the OLD and NEW Oracle LOBs.

But from the outside, that's an "implementation detail". They were in the DB,
transactional, value-like (modulo the level of indirection for random access),
and importantly, efficient. I really wish PostgreSQL had an equivalent.

There's apparently an Oracle ACE on this list, so you can fill in the
gaps above,
or correct any falsehoods I wrote above. This is over 10 years old, so
I was a big fan of another ACE, Tom Kyte, whose books helped me a lot,
and I was neck-deep in OCI for a few years, but I was just a mostly-self-taught
Oracle developer, so definitely not an expert like an ACE. FWIW, --DD

Re: Getting PSQL in Windows to support TAB/Autocomplete via modified readline...

2022-11-21 Thread Dominique Devienne

On Mon, Nov 21, 2022 at 4:58 PM Tom Lane  wrote:
> Kirk Wolak  writes:
> >   We have our team of Windows developers, leveraging PSQL.  But honestly,
> > it feels crippled after using PSQL in Linux for any length of time.  Losing
> > auto-complete sucks (multi-line query recall/editing is lost as well).
>
> >   In researching this problem, it appears that the decision was made like
> > 17yrs ago, when windows did not have a realistic "terminal" type
> > interface.  Assuming we target Windows 8.1 or higher, I believe this goes
> > away.
>
> It'd certainly be nice if we could use Readline on Windows.  I do not
> think we want to buy into maintaining our own fork of Readline, if that's
> what you're trying to suggest.  If it "just works" now, that'd be great.
> Otherwise, maybe you should be speaking to the Readline maintainers
> about what changes are needed in it?

FWIW, I've been using https://github.com/arangodb/linenoise-ng for
Linux and Windows,
but that's C++ and no longer supported (but worked well enough for me).

The main https://github.com/antirez/linenoise itself does not build
as-is on Windows still, I believe. --DD

Re: Getting PSQL in Windows to support TAB/Autocomplete via modified readline...

2022-11-21 Thread Dominique Devienne

On Mon, Nov 21, 2022 at 6:12 PM Kirk Wolak  wrote:
> On Mon, Nov 21, 2022 at 11:01 AM Dominique Devienne  
> wrote:
> > FWIW, I've been using https://github.com/arangodb/linenoise-ng for Linux 
> > and Windows,
>
> I've look at linenoise and without tab/autocomplete it's not worth the 
> effort, IMO.

Not sure to understand... Linenoise* has completion hooks.
Completion is something done in the app, not readline/linenoise.
Only the app has the context to know what to complete the text with.
But perhaps I'm missing something? --DD

https://github.com/arangodb/linenoise-ng/blob/master/include/linenoise.h

Re: how to implement add using upsert and trigger?

2022-11-28 Thread Dominique Devienne

On Mon, Nov 28, 2022 at 1:37 PM yin@163.com  wrote:
> on conflict(itemid) do update
> set value = excluded.value + new.value, cnt = excluded.cnt +1 where 
> excluded.itemid = new.itemid;

OT, but isn't `where excluded.itemid = new.itemid` redundant, given
`on conflict(itemid)`?
I'm asking more because I'm not sure, for my own education. Thanks, --DD

Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

2022-12-01 Thread Dominique Devienne

On Thu, Dec 1, 2022 at 4:23 PM Tom Lane  wrote:
> "David G. Johnston"  writes:
> > On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <
> > m.ton...@upscene.com> wrote:
> >> Can you modify the server code to store the original body in proc.prosrc
> >> again? It would be very helpful.
>
> > I seem to recall that this option had been discussed and rejected when this
> > feature went in.  The parsed body is a feature because its contents can be
> > updated due to, e.g., renaming of objects.  The text source of the original
> > command would be unable to be updated in the same way and so it is possible
> > the text and the parsed contents would diverge over time, which is a
> > situation we do not want to have.
>
> Indeed.  We used to have a similar situation with respect to column
> default expressions and CHECK constraint expressions.  Eventually we got
> rid of the textual storage of both, because it couldn't be maintained
> in a reasonable way.
>
> I think the answer here is "don't use the new syntax if you want the
> function body stored textually".  You can have one set of benefits,
> or the other set, but not both at once.

FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
and the fact the original SQL is not conserved as-is has also created
issues for us.

On Oracle, our SQL was preserved as-is, so could be compared reliably. While on
PostgreSQL, some names-in-SQL are rewritten, the text reformatted,
etc... So this
byte-perfect comparison is no longer possible, and we must rely on heuristics
(a.k.a, hacks...) to do a fuzzy-compare (since we lack a real PG-compatible SQL
parsers to do an infoset-comparison instead, at the AST level for example).

So it's not just a matter of browsing the schema. For us, it's a
*functional* issue. --DD

Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

2022-12-01 Thread Dominique Devienne

On Thu, Dec 1, 2022 at 5:10 PM Laurenz Albe  wrote:
> On Thu, 2022-12-01 at 16:38 +0100, Dominique Devienne wrote:
> > FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
>
> This is arguable, but my opinion is that this is not a robust way to
> do development.  You should use a schema versioning tool like Liquibase,
> develop schema migration scripts and maintain the SQL code in a source
> repository like other software.

We don't maintain SQL. We maintain a *Logical* model, and generate the
physical model from it.
FKs, NKs, Enums, CHECK constraints, indexes, etc.. all that
"implementation details"
is programmatically generated, and always consistent, from a much
higher-level and simpler model.
And you also get auto-upgrade most of the time, greatly increasing
development velocity too.

I would argue that NOT doing it this way, is the non-robust way myself :)
We've been refactoring a large data-model maintained manually like you advocate,
and I can't tell you how many anomalies we've discovered and had to fix,
using the more robust formalism of using a high-level logical model
and (DDL) code gen.

I guess is a DBA-versus-Developer point-of-view difference. --DD

Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

2022-12-01 Thread Dominique Devienne

On Thu, Dec 1, 2022 at 6:41 PM Adrian Klaver  wrote:
> On 12/1/22 09:24, Dominique Devienne wrote:
> > I guess is a DBA-versus-Developer point-of-view difference. --DD
>
> What this points to is that there are multiple ways to handle this, many
> external to the server itself. My take is that the system catalogs are
> there for the proper operation of the server and that is their task,
> first and foremost. If you can piggyback of that then great, but with
> the knowledge that the information may change to meet the needs of the
> server not external users.

I see. Still, Oracle preserves SQL as-is. SQLite preserve SQL as-is.
Would be nice if PostgreSQL did too. That's all I'm saying.

Having in libpq functionality that allows to do the same kind of SQL
normalization / rewriting done server-side would help. Then I could use
that to generate the DDL "just right" the first time.

For now, our current work-around is a two step process.
We first generate at-build-time the DDL using "our syntax" and
instantiate the schema.
Then introspect that and re-generate code with the "rewritten syntax".
Subsequent generation (it's dynamic, at runtime) will use the re-generated code
that matches the syntax re-write. Thus now the introspection and
diff'ing match the in-memory DDL.
Still, that's a PITA. I of course don't contest that PostgreSQL
maintains what it needs.
But ALSO maintaining the original, at least until a re-write is
necessary on renames,
would go A LONG WAY to satisfy the OP and myself in our use-cases. FWIW. --DD

Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

2022-12-01 Thread Dominique Devienne

On Thu, Dec 1, 2022 at 8:09 PM Christophe Pettus  wrote:
> > On Dec 1, 2022, at 11:05, Dominique Devienne  wrote:
> > I see. Still, Oracle preserves SQL as-is. SQLite preserve SQL as-is.
> > Would be nice if PostgreSQL did too. That's all I'm saying.
>
> Since this is a custom-built system, there is nothing keeping you from 
> creating your own table in the database that stores the original text of the 
> function.

That's not the point. If a DBA updates one of our triggers or proc or
whatever else,
the recorded info in a custom table won't be affected. We are diff'ing
the server-side
schema, against the expected in-memory model of the physical model.

Thus the dictionaries are the only source of truth we can trust for
the current state of the schema.
And beside minor syntactic differences, and some more troublesome
object-name rewrites, this is
exactly what we want. The system itself needs to preserve the original
DDL IMHO. --DD

Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

2022-12-02 Thread Dominique Devienne

On Fri, Dec 2, 2022 at 1:37 PM Laurenz Albe  wrote:
> Great; then go ahead and use those databases, if it is important for you.

Now come on. We all love PostgreSQL.
But that doesn't mean we can't disagree on some decisions.

Especially when you are a USER of PostgreSQL, not a DEV of it,
and it's too late by the time you are even aware of the changes.
Not everyone can chime in on the dev-list when those are discussed.

>From a user point of view, can also be seen as a "regression",
when an observable property of the system changes to a new
different / incompatible way, to some extent. I'm not saying it is,
still it is a change one discovers too late, creates pain to some,
and is both worth reporting and discussing.

Given the tone of the discussion here though,
I don't think there's much to hope for a middle ground...

> In the same vein, I don't think any of those databases have ... bloom indexes.

SQLite has bloom filters now :) (not persistent indexes)

Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

2022-12-02 Thread Dominique Devienne

On Thu, Dec 1, 2022 at 8:51 PM Tom Lane  wrote:
> Do you really fail to see the contradictions in this?  You want the
> database to preserve the original DDL, but you also want it to update
> in response to subsequent alterations.  You can't have both those

Hi. I probably didn't express myself correctly. I don't think there's
a contradiction.

I originally wrote:
"maintaining the original, at least until a re-write is necessary on renames".

But that I meant that the SQL would be preserved as-is, *initially*.
But that if/when a rename affecting that SQL happens, then it's fair
game to re-write it.
Because then the diff between my in-memory code-generated DDL, and the
server-side
DDL is no longer a false positive, as it is now from the "pre-emptive" re-write.

What is creating me pain, is the fact the re-write of the SQL is
*eager* instead of *lazy*.
I.e. I'm paying for the rewrite, even when it's not strictly necessary
(from my POV at least).

I hope that makes more sense. Thanks, --DD

Re: Tools for moving normalized data around

2023-01-18 Thread Dominique Devienne

On Wed, Jan 18, 2023 at 9:03 PM Jeremy Smith  wrote:
>> How to do this with two tables connected via one foreign key, that is
>> explained a dozen times in Stackoverflow. But what if the tree is 50
>> tables and 120 foreign key columns?
>> It can be done. But probably not manually.
>>
>> So, as this seems a very usual use-case for normalized data, is there
>> any tooling available? Anywhere? (I searched, I didn't find.)
>
> I haven't used it, but this seems like the problem that Jailer is trying to 
> solve: https://github.com/Wisser/Jailer

Seems to dothe subsetting alright, but it doesn't mention the
conflict-resolution for inserting into an already populated target
schema.

We've done something similar, merging many same-schema DBs into a
single local SQLite DB (with conflicts, thus no constraints),
iteratively resolving the conflicts (SK/PK and NK) per-table in
topological order, propagating PK changes to FKs in child tables
later.
Then load the result into a full-constrained PostgreSQL DB. Clever'er
minds can probably do it all in SQL, but we did it with a combination
of imperative code and SQL. Not exactly your use case Peter, but close
enough I think. I don't think a tool could have done what we did,
it's too ad-hoc and specific to our use case. Took a while, and
required lots of testing (unit tests, and functional QA tests). FWIW.
--DD

Re: Sequence vs UUID

2023-01-30 Thread Dominique Devienne

On Mon, Jan 30, 2023 at 5:11 PM veem v  wrote:
> CREATE TABLE test1_UUID ( id bigint,source_id varchar(36) PRIMARY KEY, Name 
> varchar(20) );

Maybe if you used a "native" 16-byte uuid, instead of its textual
serialization with dashes (36 bytes + length overhead), the gap would
narrow.

Fwd: Sequence vs UUID

2023-02-03 Thread Dominique Devienne

Copying the list...

-- Forwarded message -
From: Dominique Devienne 
Date: Fri, Feb 3, 2023 at 4:57 PM
Subject: Re: Sequence vs UUID
To: veem v 

On Thu, Feb 2, 2023 at 8:47 PM veem v  wrote:

> Tested the UUIDv7 generator for postgres as below.
> With regards to performance , It's still way behind the sequence. [...]
> explain analyze  select count(nextval('myseq') ) from
> generate_series(1,10);
> Execution Time: 59.687 ms
>
> explain analyze  select count(gen_random_uuid()) from
> generate_series(1,100'000);
> Execution Time: 904.868 ms
>
> explain analyze  select count(uuid_generate_v7()) from
> generate_series(1,10);
> Execution Time: 1711.187 ms
>

Something's off regarding Guid generations IMHO...

You generate 100K Guids in ~1s. While we generate (in C++, Windows Release,
using Boost) 16M of them in +/- the same time:

Enabling Performance tests
>>>
>>

> generate 16'000'000 guids in 0.980s (user: 0.984s) 12 MB
>>>
>> generate 16'000'000 guids in parallel on 4 CPUs in 0.309s (user: 1.188s)
>>> 12 MB
>>>
>>
That's 2 orders of magnitude faster. Sure there's some overhead from the
SQL, but still. Something seems fishy.
And that's on a 2.5y old desktop. --DD

Re: Sequence vs UUID

2023-02-03 Thread Dominique Devienne

On Fri, Feb 3, 2023 at 5:48 PM veem v  wrote:

> Actually I did the testing by connecting to "https://dbfiddle.uk/";
> postgres version -15.
>
> PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
> 20210514 (Red Hat 8.5.0-10), 64-bit
>
> Am I doing it wrong, please confirm?
>
>
No clue. But even SQLite can generate 1M uuid (16 bytes random blobs are
equivalent) is 1/3 of the time, so 30x faster.
So your timings of generating 100K uuids and counting them seems way too
slow to me. --DD

sqlite> select count(randomblob(16)) from generate_series(1,1000*1000);
QUERY PLAN
`--SCAN generate_series VIRTUAL TABLE INDEX 3:
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 1500   Start at 15
1 Null   0 1 10   r[1..1]=NULL
2 VOpen  0 0 0 vtab:274D3E0   0
3 Integer1 4 00   r[4]=1
4 Multiply   6 6 50   r[5]=r[6]*r[6]
5 Integer3 2 00   r[2]=3
6 Integer2 3 00   r[3]=2
7 VFilter0 1120   iplan=r[2]
zplan=''
8   Function   1 8 7 randomblob(1)  0   r[7]=func(r[8])
9   AggStep0 7 1 count(1)   1   accum=r[1]
step(r[7])
10VNext  0 8 00
11AggFinal   1 1 0 count(1)   0   accum=r[1] N=1
12Copy   1 9 00   r[9]=r[1]
13ResultRow  9 1 00   output=r[9]
14Halt   0 0 00
15Transaction0 0 1 0  1   usesStmtJournal=0
16Integer1000  6 00   r[6]=1000
17Integer168 00   r[8]=16
18Goto   0 1 00
┌───┐
│ count(randomblob(16)) │
├───┤
│ 100   │
└───┘
Run Time: real 0.278 user 0.250000 sys 0.00


> On Fri, 3 Feb 2023 at 21:28, Dominique Devienne 
> wrote:
>
>> Something's off regarding Guid generations IMHO...
>> You generate 100K Guids in ~1s. While we generate (in C++, Windows
>> Release, using Boost) 16M of them in +/- the same time:
>>
>

Re: Sequence vs UUID

2023-02-07 Thread Dominique Devienne

On Tue, Feb 7, 2023 at 3:47 PM Merlin Moncure  wrote:

> On Mon, Feb 6, 2023 at 1:22 PM Peter J. Holzer  wrote:
>
>> On 2023-02-06 20:04:39 +0100, Julian Backes wrote:
>> But UUIDs are random and that plays havoc with locality.
>
>
> This is really key.  [...] the databases I've seen that are written with
> the
>
UUID pattern appear to be written by developers oblivious to this fact.
>

Well, perhaps these developers are not dealing with temporally clustered
data, like commerce related DB,
and more scientific data? In any case, this developer will definitely
investigate ULIDs, vs UUIDs, vs Ints (sequences),
based on the info from this thread. I'm aware of fragmentation issues, and
cache hit/miss issues etc, in general;
but was probably not sufficiently making the mental connection with UUIDs
and PostgreSQL. So thanks everyone.

Using PostgreSQL for service discovery and health-check

2023-02-09 Thread Dominique Devienne

Hi. I'm requesting advice on something I'm about to start on.

In a normally 2-tier system, where "desktop" native apps connect directly
to PostgreSQL to access data, some things must be mediated via a
middle-tier service (i.e. 3-tier architecture). That service is HTTP based
thus clients (the "desktop" native apps) must know the URL (host+port+path)
of a server providing the requested service.

Since clients must already have access to PostgreSQL to operate, I wanted
to add a table in PostgreSQL for services, and server(s) on startup
(probably using a random port) would register in that table, and deregister
on shutdown. Also, since crashes are a fact of life, the server would
regularly update the DB with a "heartbeat" on some frequency (e.g. 1s?), so
clients would select (or see) only registered services with a "fresh
enough" heartbeat timestamp.

That's basically my plan. Now come the questions:
1) will updating a row every second (for example) create issues?
2) if yes to #1 above, what would be good mitigation tactics? Use different
table for service vs heartbeat? Special kind of table? or configured in a
particular way?
3) if a service crashes, it won't remove its row(s), obviously. What kind
of mechanism exists to "reap" "zombie" services?
4) Related to #3 above, I think built-in "cron"-like services are only
available via extensions, not in PostgreSQL proper. Why? Seems like such an
essential service.
5) Which cron-like extension to use? Especially since we run both on-prem
but also in managed-PostgreSQL on the cloud?

I'd appreciate community input. Thanks, --DD

PS: Note that there could be more than 1 server registered, for the same
service, possibly on the same machine, for redundancy. But I think that's
mostly orthogonal to my questions above.

WHERE col = ANY($1) extended to 2 or more columns?

2023-02-09 Thread Dominique Devienne

Hi. We are implementing an API which takes a list of row keys, and must
return info about those rows. To implement that efficiently, in as few
round-trips as possible, we bind a (binary) array of keys (ints, uuids, or
strings) and that works great, but only if the key is a scalar one.

Now we'd like to do the same for composite keys, and I don't know how to do
that.
Is it possible? Could someone please help out or demo such a thing?
We are doing it in C++ using libpq, but a pure SQL or PL/pgSQL demo would
still help (I think).

Thanks, --DD

Re: Using PostgreSQL for service discovery and health-check

2023-02-09 Thread Dominique Devienne

On Thu, Feb 9, 2023 at 4:46 PM David G. Johnston 
wrote:

> On Thu, Feb 9, 2023 at 8:30 AM Dominique Devienne 
> wrote:
>
>> That's basically my plan. Now come the questions:
>> 1) will updating a row every second (for example) create issues?
>> 2) if yes to #1 above, what would be good mitigation tactics? Use
>> different table for service vs heartbeat? Special kind of table? or
>> configured in a particular way?
>> 3) if a service crashes, it won't remove its row(s), obviously. What kind
>> of mechanism exists to "reap" "zombie" services?
>> 4) Related to #3 above, I think built-in "cron"-like services are only
>> available via extensions, not in PostgreSQL proper. Why? Seems like such an
>> essential service.
>> 5) Which cron-like extension to use? Especially since we run both on-prem
>> but also in managed-PostgreSQL on the cloud?
>>
>>
> You can probably get good mileage from CREATE UNLOGGED TABLE.
>

Thanks. Although I guess the fact it's not replicated to standby servers
could be a problem?
That's not something we test now, and also something the on-prem DBA (i.e.
a client of ours)
might want to setup on his/her own, on top of our client/server arch I
guess.

I have no experience with stand-bys (replication) and HA in PostgreSQL.
Would having the main service table be a regular one, and the
service_heartbeat be an unlogged one be replication friendly?
I.e. if fail over to the stand-by happens, the service table is still there
and populated, but the service_heartbeat is empty, but
then the services would start populating it "transparently" no? I.e. would
using 2 tables instead of 1 be a better design?

Re: WHERE col = ANY($1) extended to 2 or more columns?

2023-02-09 Thread Dominique Devienne

On Thu, Feb 9, 2023 at 4:51 PM David G. Johnston 
wrote:

> On Thu, Feb 9, 2023 at 8:41 AM Dominique Devienne 
> wrote:
>
>> Now we'd like to do the same for composite keys, and I don't know how to
>> do that.
>>
>
> An array-of-composites is simply:
> SELECT ARRAY[ ROW(1, 'one'), ROW(2, 'two') ]::composite_type[];
>

Thanks. I don't consider that "simple" myself :). But I'm definitely not an
advanced PostgreSQL user!
Would still appreciate a more fleshed out demo, if anyone is kind enough to
provide it. Thanks, --DD

Re: WHERE col = ANY($1) extended to 2 or more columns?

2023-02-09 Thread Dominique Devienne

On Thu, Feb 9, 2023 at 5:03 PM Dominique Devienne 
wrote:

> On Thu, Feb 9, 2023 at 4:51 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Thu, Feb 9, 2023 at 8:41 AM Dominique Devienne 
>> wrote:
>>
>>> Now we'd like to do the same for composite keys, and I don't know how to
>>> do that.
>>>
>>
>> An array-of-composites is simply:
>> SELECT ARRAY[ ROW(1, 'one'), ROW(2, 'two') ]::composite_type[];
>>
>
> Thanks. I don't consider that "simple" myself :). But I'm definitely not
> an advanced PostgreSQL user!
> Would still appreciate a more fleshed out demo, if anyone is kind enough
> to provide it. Thanks, --DD
>

Also, I'm still not sure how to write that WHERE clause, with the $1 being
an array of a composite type.
And since this is a binary bind, what kind of OIDs to use. Your example
seems to generate a type on-the-fly for example David.
Will we need to create custom types just so we have OIDs we can use to
assemble the bytes of the array-of-composite bind?
I believe there's an implicit ROW type per table created. Are there also
implicit types for composite PKs and/or UNIQUE constraints?
Lots of questions...

Re: Using PostgreSQL for service discovery and health-check

2023-02-09 Thread Dominique Devienne

On Thu, Feb 9, 2023 at 5:05 PM Adrian Klaver 
wrote:

> On 2/9/23 07:30, Dominique Devienne wrote:
> > In a normally 2-tier system, where "desktop" native apps connect
> > directly to PostgreSQL to access data, some things must be mediated via
> > a middle-tier service (i.e. 3-tier architecture). That service is HTTP
> > based thus clients (the "desktop" native apps) must know the URL
> > (host+port+path) of a server providing the requested service.
>
> Is there more then one server providing the same service?
>

Yes. That was my PS: basically. The client can connect to any one, randomly.
We need at least one of course. But there could me more than 1, yes.

> > Since clients must already have access to PostgreSQL to operate, I
> > wanted to add a table in PostgreSQL for services, and server(s) on
> > startup (probably using a random port) would register in that table, and
> > deregister on shutdown. Also, since crashes are a fact of life, the
> > server would regularly update the DB with a "heartbeat" on some
> > frequency (e.g. 1s?), so clients would select (or see) only registered
> > services with a "fresh enough" heartbeat timestamp.
>
> Would it no be easier to not have random ports and just attempt
> connections to the servers either:
> 1) In the client with reattempt to different port on failure.
> 2) From Postgres server and update table to have current up servers.
>

I'm sorry, but I'm not following. Can you perhaps rephrase?

Regarding ports, once you have registration of services, just seems easier
to me to NOT have a fixed port,
and let the host assign any port to the HTTP server. Those servers are not
user-facing directly, from the client
side, it calls an API and lookup of the service and connection to the HTTP
server is transparent is an implementation
detail, so the port used doesn't matter. In-DB registration of (HTTP)
servers makes the while URL an implementation detail.

Re: WHERE col = ANY($1) extended to 2 or more columns?

2023-02-09 Thread Dominique Devienne

On Thu, Feb 9, 2023 at 5:37 PM David G. Johnston 
wrote:

> On Thu, Feb 9, 2023 at 9:28 AM Alban Hertroys  wrote:
>
>> > On 9 Feb 2023, at 16:41, Dominique Devienne 
>> wrote:
>> > Now we'd like to do the same for composite keys, and I don't know how
>> to do that.
>>
>> This works:
>> => select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1,
>> 'one'::text), (2, 'two'::text));
>>
> But you cannot write the right-side of the IN as a single parameter which
> seems to be the primary constraint trying to be conformed to.
>

Right. The goal is to (re)use a prepared statement (i.e. plan once), and
bind the RHS (binary) array
and do a single exec (single round-trip) to get the matching rows. AFAIK,
this is the fastest way.
If there's a better/faster way, I'm interested. --DD

Re: Using PostgreSQL for service discovery and health-check

2023-02-09 Thread Dominique Devienne

On Thu, Feb 9, 2023 at 5:51 PM Adrian Klaver 
wrote:

> On 2/9/23 08:16, Dominique Devienne wrote:
> > On Thu, Feb 9, 2023 at 5:05 PM Adrian Klaver  The flip side of that is that with known ports it would it easier to
> have a process on the Postgres machine or in the database that checks
> the ports on regular basis. And as part of that process mark any non
> responding ports as inactive. That would solve the zombie problem.
>

That's one possibility. But the "reaper" process could just as well scan
the service table,
and probe those too. So again, I'm not sure what the fixed-port approach
gains me, beside
perhaps the reaper not having to connect to PostgreSQL itself. I'm OK with
connecting.

Thanks for the your input. Always good to have one's arguments challenged
by experts.

Losing my latin on Ordering...

2023-02-14 Thread Dominique Devienne

Hi. Porting a unit test to PostgreSQL, we got a failure related to ordering.

We've distilled it to the below. The DB is en_US.UTF-8, and the sorting we
get
does not make sense to me. The same prefix can be sorted differently based
on the suffix apprently, which doesn't make any sense to me.

Surely sorting should be "constant left-to-right", no? What are we missing?

I'm already surprised (star) comes before (space), when the latter "comes
before" the former in both ASCII and UTF-8, but that the two "Foo*" and
"Foo " prefixed pairs are not clustered after sorting is just mistifying to
me. So how come?

For now we can work-around this by explicitly adding the `collate "C"` on
the queries underlying that particular test, but that would be wrong in the
general case of international strings to sort, so I'd really like to
understand what's going on.

Thanks, --DD

PS: if I try "en_US.UTF-8" or "en_US"."UTF-8" for the collate, it fails.
  How come what pg_database.datcollate displays is not a valid value for
collate?

PPS: We tried on v12 and v13 I believe. Somehow my v14.2 on Windows doesn't
have en_US as a collation...

ddevienne=> with t(v) as (values ('Foo All'), ('Foo*'), ('Foo Brief'),
('Foo*All')) select '<'||v||'>', v::bytea from t order by v collate "en_US";
  ?column?   |  v
-+--
   | \x466f6f2a
| \x466f6f20416c6c
| \x466f6f2a416c6c
  | \x466f6f204272696566
(4 rows)


ddevienne=> with t(v) as (values ('Foo All'), ('Foo*'), ('Foo Brief'),
('Foo*All')) select '<'||v||'>', v::bytea from t order by v collate "C";
  ?column?   |  v
-+--
| \x466f6f20416c6c
  | \x466f6f204272696566
   | \x466f6f2a
| \x466f6f2a416c6c
(4 rows)

Re: Losing my latin on Ordering...

2023-02-14 Thread Dominique Devienne

On Tue, Feb 14, 2023 at 11:23 AM Laurenz Albe 
wrote:

> On Tue, 2023-02-14 at 10:31 +0100, Dominique Devienne wrote:
> > Hi. Porting a unit test to PostgreSQL, we got a failure related to
> ordering.
> >
> > We've distilled it to the below. The DB is en_US.UTF-8, and the sorting
> we get
> > does not make sense to me. The same prefix can be sorted differently
> based on
> > the suffix apprently, which doesn't make any sense to me.
> >
> > Surely sorting should be "constant left-to-right", no? What are we
> missing?
>
> No, it isn't.  That's not how natural language collations work.
>

Honestly, who expects the same prefix to sort differently based on what
comes after, in left-to-right languages?
How does one even find out what the (capricious?) rules for sorting in a
given collation are?

I'm aware of sorting taking numerical numbers in text influencing sort, so
"Foo10" comes after "Foo9",
but that's not what we are discussing here. "Foo*" and "Foo " have no
logical relatioship, like 9 and 10 do.

> > I'm already surprised (star) comes before (space), when the latter "comes
> > before" the former in both ASCII and UTF-8, but that the two "Foo*" and
> "Foo "
> > prefixed pairs are not clustered after sorting is just mistifying to me.
> So how come?
>
> Because they compare identical on the first three levels.  Any difference
> in
> letters, accents or case weighs stronger, even if it occurs to the right
> of these substrings.
>

That's completely unintuitive...

> > For now we can work-around this by explicitly adding the `collate "C"` on
> > the queries underlying that particular test, but that would be wrong in
> the
> > general case of international strings to sort, so I'd really like to
> understand
> > what's going on.
>
> Yes, it soulds like the "C" collation may be best for you.  That is, if
> you don't
> mind that "Z" < "a".
>

I would mind if I asked for case-insensitive comparisons.

So the "C" collation is fine with general UTF-8 encoding?
I.e. it will be codepoint ordered OK?

Re: Losing my latin on Ordering...

2023-02-14 Thread Dominique Devienne

On Tue, Feb 14, 2023 at 12:35 PM Alvaro Herrera 
wrote:

> On 2023-Feb-14, Dominique Devienne wrote:
> > Honestly, who expects the same prefix to sort differently based on what
> > comes after, in left-to-right languages?
>
Look, we don't define the collation rules.
>

Ok, ok, sorry. To you, Laurenz, and everyone.
I obviously disagree with these rules, but I'm a nobody, so who cares :)

> > So the "C" collation is fine with general UTF-8 encoding?
> > I.e. it will be codepoint ordered OK?
>
> Sure, just make sure to use the definition of C that uses UTF-8 encoding
> (I think it's typically called C.UTF-8).
>

OK, so for new DBs, sounds like we need to

CREATE DATABASE ... WITH LOCALE 'C.UTF-8' ENCODING UTF8

Correct?

But what about existing DBs? Can the collation be changed a posteriori?
ALTER DATABASE does not seem to support the same options.

We don't want to have to sprinkle COLLATE "C" all over the place in the
code.
And there are quite a few DBs out there already. What to do about them?

DELETE trigger, direct or indirect?

2023-02-16 Thread Dominique Devienne

Hi. This is a bit unusual. We have a foreign key between two tables, with
ON DELETE CASCADE, to preserve referential integrity. But we apparently
also need to preserve the severed reference (by natural key, i.e. its
name), to later on reconnect the two entities after-the-fact, should the
parent row re-appear later on (in the same transaction or not it still
unclear).

To achieve this weird requirement, I'd like to know if it is possible in an
ON DELETE trigger to know whether the deletion is coming from a
direct-DELETE in the "child table", or whether the deletion is coming from
the "parent table" CASCADEd to the child table.

Thanks, --DD

Re: DELETE trigger, direct or indirect?

2023-02-16 Thread Dominique Devienne

On Thu, Feb 16, 2023 at 5:59 PM Adrian Klaver 
wrote:

> On 2/16/23 08:55, David G. Johnston wrote:
> > On Thu, Feb 16, 2023 at 9:46 AM Adrian Klaver  > <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 2/16/23 05:23, Dominique Devienne wrote:
> >  > Hi. This is a bit unusual. We have a foreign key between two
> tables,
> >  > with ON DELETE CASCADE, to preserve referential integrity. But we
> >  > apparently also need to preserve the severed reference (by
> > natural key,
> >  > i.e. its name), to later on reconnect the two entities
> > after-the-fact,
> >  > should the parent row re-appear later on (in the same transaction
> > or not
> >  > it still unclear).
> >
> > This is going to need a more detailed description of the relationship
> > between the two tables:
> >
> > 1) The actual FK relationship.
> >
> > 2) What "...preserve the severed reference (by natural key,  i.e. its
> > name)" means?
> >
> > 3) What information will be used to reconnect the child rows to the
> > parent rows?
> >
> >
> > Maybe the OP should be using ON DELETE SET NULL instead of CASCADE?
>
> That is where I am headed, however it will need more information to
> determine whether that makes sense or not.
>

OK, I started writing SET NULL won't help, but I'll back up and try to give
more info, as requested.
Pseudo SQL at this point.

create table entity (name text primary key, ...);
create table enity_list (name text primary key, ...);
create table entity_list_member(
  list_name text not null references entity_list(name) on delete cascade on
update cascade,
  entity_name text not null references entity(name) on delete cascade on
update cascade
  primary key (list_name, entity_name)
);

Above is the current situation. When the entity is deleted, it's implicitly
deleted from all list that mention it.
Referential Integrity 101 I guess. But apparently, it's common enough for
an entity to be deleted and reloaded,
not necessarily in the same transaction, that losing the list(s) membership
on delete is considered "a bug".

One solution is to not do any reference integrity in the lists. But that
opens the door to garbage in a little too wide I think.

So on second thought, maybe the SET NULL could be of use. I'd add a second
non-FK column on the member assoc-table,
transfering the old entity name to it thanks to an UPDATE on entity_name,
thus preserving the old name.
Then an INSERT trigger on entity could locate any (indexed) "stashed"
entity names in that extra non-FK column in entity_list_member,
to retransfer the name back to the primary FK column.
I'd need to adjust the PK to a coalesce(), and ensure the two columns are
mutually exclusive.
Sounds like that might work, no?

How to avoid Trigger ping/pong / infinite loop

2023-02-16 Thread Dominique Devienne

Hi. I have a large "legacy" code base that write information necessary for
Row-Level-Security in a highly denormalized custom had-hoc text format for
values, in key-value pairs in a table, which would be either impossible or
too slow to base any RLS policy on. The values are basically lists or maps,
depending on the key.

I also have new code that will read and write that same information, but
would much prefer to use a normalized data model, the same one that would
be appropriate for efficient RLS.

So I'm wondering if I can have my cake and eat it too, by synchronizing the
normalized and denormalized information (necessarily duplicated then),
server-side via triggers, w/o running into infinite loops.

Are there techniques for situations like this?

This question is not too far from my earlier question, in the sense that a
trigger would need to know the context in which it was triggered, i.e.
directly (then update the other model), or indirectly (don't update, the
change is boomerang'ing around from our own change).

Any thoughts? --DD

PS: At time point, changing the legacy code base is not really an option...

Re: How to avoid Trigger ping/pong / infinite loop

2023-02-16 Thread Dominique Devienne

On Thu, Feb 16, 2023 at 6:58 PM Adrian Klaver 
wrote:

>
> "... which would be either impossible or too slow to base any RLS policy
> on."
>
> and
>
> "At time point, changing the legacy code base is not really an option..."
>
> seem to be at odds.
>

I don't see why you say that.

So is the current system working or not?
>

The legacy system is 3-tier, so uses the denormalized info in C++ in the
mid-tier,
while the new system uses PostgreSQL and is 2-tier, so the same denormalized
info must drive both modes of execution. The same code-base is used for
both,
but when different backends. Making the code base work under two back ends,
is already hard enough, w/o changing it more extensively to use a new
normalized
model even in the legacy case. So is that clearer?

But that's a bit orthogonal to my question too.

> > I also have new code that will read and write that same information, but
> > would much prefer to use a normalized data model, the same one that
> > would be appropriate for efficient RLS.
> >
> > So I'm wondering if I can have my cake and eat it too, by synchronizing
> > the normalized and denormalized information (necessarily duplicated
> > then), server-side via triggers, w/o running into infinite loops.
>
> A VIEW over both sets of data?
>

I'm not following. How is that related to mutual synchronization via
triggers?
Keeping two copies of the data is of course denormalization, but cannot be
avoided.
One copy is basically an optimization for RLS, so could be read-only I
guess, making
the sync one-way and simpler, but then that would force any new code to
also use
the old denormalized way to update the info. Thus I'd prefer the new model
to be
read-write, but then that requires two-sync sync. Thus my question.

Re: DELETE trigger, direct or indirect?

2023-02-16 Thread Dominique Devienne

On Thu, Feb 16, 2023 at 6:51 PM Adrian Klaver 
wrote:

> You have two tables with list in their name, so are rows deleted from
> both. Just to be clear enity_list should actually be entity_list?
>
> Also how are entity and enity_list related?
>

I have to confess that your questions surprise me a bit.
I thought the model was pretty clear from the SQL.
We have two entities, Foo (my entity table), and another Bar (my
entity_list table),
with Bar reference 0 or more Foos, recording which ones it references in an
association table.

Are the values for the name field in entity and enity(entity)_list the
> same for a given entity?
>

The name of Foo and Bar are completely independent.

> > transfering the old entity name to it thanks to an UPDATE on
> > entity_name, thus preserving the old name.
>
> How?
> Again how would you determine where the action started?
>

Deleting a Foo cascades to the _member assoc-table.
If the cascade is SET NULL, then I can have an UPDATE TRIGGER on _member.
I no longer care where the action started.

> > Then an INSERT trigger on entity could locate any (indexed) "stashed"
> > entity names in that extra non-FK column in entity_list_member,
>
> How would it locate it if  the name that defined the FK(entity(name)) was
> NULL?
>

In the extra non-FK column I mentioned explicitly, in the _member
assoc-table.

Re: Postgres Index and Updates

2023-03-02 Thread Dominique Devienne

On Thu, Mar 2, 2023 at 10:08 AM Laurenz Albe 
wrote:

> On Thu, 2023-03-02 at 15:53 +0800, Navindren Baskaran wrote:
> If the other column is updated, it depends.  If the updated column is not
> indexed and there is enough room for the new row version in the same
> table block, the index doesn't have to be modified.  Otherwise it is.
>
> See
> https://www.cybertec-postgresql.com/en/hot-updates-in-postgresql-for-better-performance/

Very interesting article, thanks Laurenz. I wasn't aware of HOT and its
relation to fillfactor.

At the risk of highjacking this thread a bit, although still on topic I
think.

What's the 2023 status of that zheap table storage work?

And more specifically about a particular data-model of mine.
I have a 3-level hierarchical parents-children-grandchidren table structure:

1) a "root" table, with tens to thousands (~200K max) of (small) rows.
2) a "folder" table, with 20 to 50 (small) rows *per* "root" row.
3) several "value" tables, with again a 20 to 100 (large to very large)
rows per "folder" row.

The root and folder tables must maintain a "last modified" timestamp for
their respective subtrees,
which must be maintained via triggers (how else?). That makes those tables
Update-heavy no?
So from your article, those two tables, with smaller rows (and fewer rows
total in general) should
have larger fillfactors to increase the chances of an HOT update? Am I
interpreting your article
(and its linked articles) correctly for this situation? TIA, --DD

CREATE/DROP ROLE transactional? GRANT/REVOKE?

2023-03-06 Thread Dominique Devienne

Hi. Perhaps I missed it in the doc (e.g. [1]), but are DDLs around ROLEs
and GRANTs transactional?

Since I'm creating many ROLEs and making many GRANTs, based info I read
from PostgreSQL itself (in pg_catalog and elsewhere), should everything be
in a single transaction?

FWIW, I come from Oracle (where DDL is non-transactional and an implicit
COMMIT), and SQLite (where ROLEs and GRANTs don't apply), thus this perhaps
silly question.

I'm already aware that SCHEMAs, TABLEs, etc... are transactionally created.
But given that ROLEs are cluster-wide, and the doc on DDLs say nothing, I
prefer to ask.

Thanks, --DD

[1]: https://www.postgresql.org/docs/current/sql-droprole.html

Re: CREATE/DROP ROLE transactional? GRANT/REVOKE?

2023-03-06 Thread Dominique Devienne

On Mon, Mar 6, 2023 at 4:06 PM Tom Lane  wrote:

> Dominique Devienne  writes:
> > Hi. Perhaps I missed it in the doc (e.g. [1]), but are DDLs around ROLEs
> > and GRANTs transactional?
>
> Your expectation is set wrongly.

Thanks for the precision Tom.

Although I'm not sure where you read my expectectation was wrong. But
nevermind.

> DDL commands in Postgres are
> transactional unless their man page explicitly says they're not.
>

OK, didn't know that. Glad to read it.

Could you point to where in the doc this is stated?

It's hard to be sure about something the doc does *not* mention,
when what is *implied* is hard to find, or at a distance. Especially
at least another well known RDBMS differ in that department.
I much prefer the way PostgreSQL handles DDL, still it's not obvious.

My $0.02. Thanks, --DD

Re: CREATE/DROP ROLE transactional? GRANT/REVOKE?

2023-03-06 Thread Dominique Devienne

On Mon, Mar 6, 2023 at 3:02 PM Erik Wienhold  wrote:

> > On 06/03/2023 14:19 CET Dominique Devienne  wrote:
> > Perhaps I missed it in the doc (e.g. [1]), but are DDLs around ROLEs and
> > GRANTs transactional?
>
> Have you tried?


Nope. I thought about it, but then I could have convinced myself on an
invalid demo.
So I preferred to ask the experts.


> DDL is transactional unless stated otherwise (cf. CREATE DATABASE,
> CREATE INDEX CONCURRENTLY, CREATE TABLESPACE).
>

I now see (and fully appreciate) the Note below:
- CREATE DATABASE cannot be executed inside a transaction block.

I didn't realize the fact CREATE ROLE didn't have that note was significant.

Run the following psql script:
>

Thanks for the demo. Appreciated.

> Since I'm creating many ROLEs and making many GRANTs, based info I read
> from
> > PostgreSQL itself (in pg_catalog and elsewhere), should everything be in
> a
> > single transaction?
>
> If it should be atomic and the commands are allowed in transactions, then
> yes,
> use transactions.
>

Thanks again. --DD

Re: CREATE/DROP ROLE transactional? GRANT/REVOKE?

2023-03-06 Thread Dominique Devienne

On Mon, Mar 6, 2023 at 4:31 PM Tom Lane  wrote:

> Dominique Devienne  writes:
> > On Mon, Mar 6, 2023 at 4:06 PM Tom Lane  wrote:
> >> DDL commands in Postgres are
> >> transactional unless their man page explicitly says they're not.
>
> > Could you point to where in the doc this is stated?
>
> For example, for CREATE DATABASE the first "Note" para in [1] is
>
> CREATE DATABASE cannot be executed inside a transaction block.
>
> I don't think we are entirely consistent about whether this is
> mentioned in "Description" or "Notes", but it's there somewhere.
>

Yes, I noticed that, once I read Erik's email (went to GMail's SPAM
folder...).

Still, my little brain didn't leap to the conclusion that other DDLs were
implicitly transactional,
because missing of a note another DDL's doc has. All I'm saying it's not as
obvious as you
experienced PostgreSQL folks seems to think it is. FWIW. Thanks again, --DD

public schema grants to PUBLIC role

2023-03-09 Thread Dominique Devienne

Hi. I've recently realized via a post (or article?) from Laurenz that the
PUBLIC role has CREATE privilege on the 'public' schema by default (see
query below). I guess it can't be avoided?

OK, then I'll REVOKE that privilege when creating a new DB.
Like I already revoked the default CONNECT to PUBLIC on the DB.

But I'm wondering about unexpected side-effets.
In particular, we need extensions, which are loaded in public by default.
Will USAGE of public be enough for LOGIN users having access to the DB to
use extensions?

More broadly, we want to secure the DB so that all DB access and schema
access are explicit.
Anything else to be aware of please, beside the two mentioned above?

Thanks, --DD

```
=> select grantor::regrole::text, case grantee when 0 then 'PUBLIC' else
grantee::regrole::text end, privilege_type as priv, is_grantable as adm
from pg_namespace, lateral aclexplode(nspacl) where nspname = 'public';
 grantor  | grantee  |  priv  | adm
--+--++-
 postgres | postgres | USAGE  | f
 postgres | postgres | CREATE | f
 postgres | PUBLIC   | USAGE  | f
 postgres | PUBLIC   | CREATE | f
(4 rows)
```

Re: public schema grants to PUBLIC role

2023-03-09 Thread Dominique Devienne

On Thu, Mar 9, 2023 at 2:13 PM Laurenz Albe 
wrote:

Hi,

On Thu, 2023-03-09 at 10:34 +0100, Dominique Devienne wrote:
> > Hi. I've recently realized via a post (or article?) from Laurenz that
> the PUBLIC
> > role has CREATE privilege on the 'public' schema by default (see query
> below).
> > I guess it can't be avoided?
>
> It can be avoided if you connect to "template1" and
>
>   REVOKE CREATE ON SCHEMA public FROM PUBLIC;
>
> there *before* you create a new database.
>

Right. Didn't think of that. Thanks.

> Or, as Christoph said, if you use v15 or better.
>

Because Managed Azure is still stuck at 14.2, that's currently not an
option.
We need both on-prem and managed Azure.

> > More broadly, we want to secure the DB so that all DB access and schema
> access are explicit.
> > Anything else to be aware of please, beside the two mentioned above?
>
> Avoid SECURITY DEFINER functions with no "search_path" set:
> https://www.cybertec-postgresql.com/en/abusing-security-definer-functions/

Thanks for the reminder. We already set "search_path" on our functions,
because the client code does not always set the search_path at all,
which was resulting in errors. No DEFINER functions either, yet.

I also plan to look at the new function syntax, that eagerly resolve
references
at DDL time, rather than runtime, to avoid the search_path dependency at
runtime completely.
Although I'm worried about the introspection rewriting already discussed
recently... --DD

Seq Scan because of stats or because of cast?

2023-03-13 Thread Dominique Devienne

I'm interested in the members of specific roles, providing the roles of
interest to the query via an array of integers (binary bind in code, not
textual array literal like I had to use to have the EXPLAIN work, see
below).

I figured that query would use the "pg_auth_members_role_member_index"
index,
but instead it's using a sequential scan.

And I'm wondering is this is because the cardinality of that catalog is
small (172),
which is just an artifact of my dev-testing, or whether that's because I
cast roleid
to an int4, preventing the use of the index?

In production, the cardinality will be much greator, which is why I worry a
bit.
Also, I don't really need the grantor and admin_option columns for now, thus
it could even be an index-only scan, IF the index was used by the plan.

I tried changing the cast around, or allowing an index-only scan,
but it's still a Seq Scan on the table (see below).

Is there a way to know why the index is not used, in any of my attempts?

I currently does not support (binary) binding Oids in my case, thus the
::int4 casts.
Would supporting binding actual Oid arrays instead of Int4 arrays help in
this case?

I'd appreciate some insights here. Thanks, --DD

PS: libpq uses unsigned int for Oid. So do OIDs go above 2GB in practice?
I'm asking, since I'm casting to ::int4, thus if they do, then that
case might overflow.

PPS: Are OIDs recycled / reused? Or are they monotonically increasing?
  What happens when the Cluster runs out of OIDs?
  Are they Cluster-wide unique or it depends on the OID type?

dd_pns2=> explain SELECT roleid::int4, member::int4, grantor::int4,
admin_option
dd_pns2->   FROM pg_auth_members
dd_pns2->  WHERE roleid::int4 = ANY($1);
ERROR:  there is no parameter $1
LINE 3:  WHERE roleid::int4 = ANY($1);
  ^
dd_pns2=> explain SELECT roleid::int4, member::int4, grantor::int4,
admin_option
dd_pns2->   FROM pg_auth_members
dd_pns2->  WHERE roleid::int4 = ANY(array[1,2,3]);
   QUERY PLAN

 Seq Scan on pg_auth_members  (cost=0.00..5.33 rows=3 width=13)
   Filter: ((roleid)::integer = ANY ('{1,2,3}'::integer[]))
(2 rows)

dd_pns2=> explain SELECT roleid::int4, member::int4, grantor::int4,
admin_option
dd_pns2->   FROM pg_auth_members
dd_pns2->  WHERE roleid = ANY(array[1::oid,2::oid,3::oid]);
   QUERY PLAN

 Seq Scan on pg_auth_members  (cost=0.00..5.33 rows=3 width=13)
   Filter: (roleid = ANY ('{1,2,3}'::oid[]))
(2 rows)

dd_pns2=> explain SELECT roleid::int4, member::int4
dd_pns2->   FROM pg_auth_members
dd_pns2->  WHERE roleid = ANY(array[1::oid,2::oid,3::oid]);
  QUERY PLAN
---
 Seq Scan on pg_auth_members  (cost=0.00..5.33 rows=3 width=8)
   Filter: (roleid = ANY ('{1,2,3}'::oid[]))
(2 rows)

dd_pns2=> \d pg_auth_members
   Table "pg_catalog.pg_auth_members"
Column|  Type   | Collation | Nullable | Default
--+-+---+--+-
 roleid   | oid |   | not null |
 member   | oid |   | not null |
 grantor  | oid |   | not null |
 admin_option | boolean |   | not null |
Indexes:
"pg_auth_members_member_role_index" UNIQUE, btree (member, roleid),
tablespace "pg_global"
"pg_auth_members_role_member_index" UNIQUE, btree (roleid, member),
tablespace "pg_global"
Tablespace: "pg_global"


dd_pns2=> select count(*) from pg_auth_members;
 count
---
   172
(1 row)

Re: Seq Scan because of stats or because of cast?

2023-03-14 Thread Dominique Devienne

On Mon, Mar 13, 2023 at 2:53 PM Tom Lane  wrote:

> regards, tom lane
>

Thank you very much Tom. Very informative.

Re: Seq Scan because of stats or because of cast?

2023-03-14 Thread Dominique Devienne

On Mon, Mar 13, 2023 at 2:53 PM Tom Lane  wrote:

> > PS: libpq uses unsigned int for Oid. So do OIDs go above 2GB in practice?
>
> Yes, eventually.
>

OK, I've added support for native OIDs values (i.e. unsigned int,
sizeof(4)) in my libpq wrapper.
Tested with binary binding and getting of scalar and array values.

But to truly test this is working OK, I'd need OIDs in the range [2^31,
2^32),
while the OIDs in my DB only reach in the 200M range.

So, any way to force the DB to create OIDs in that special range?
Without hosing my DB / Cluster that is... This is not a throw-away DB /
Cluster.

Thanks, --DD

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-16 Thread Dominique Devienne

On Thu, Mar 16, 2023 at 6:48 AM Laurenz Albe 
wrote:

> As we wrote, some of us think that cursors are useful, and we tried to
> explain why we think that.  If you don't think that cursors are useful,
> don't use them.  We are not out to convince you otherwise.
>

Perhaps OT (I only skimed this thread) but when I compared Cursors to
regular Statements / Queries
from a LIBPQ client application perspective, on the same "streamable"
queries (i.e. w/o a sort), Cursor
shined in terms of time-to-first-row, compared to waiting for the whole
ResultSet, but getting the full result
OTOH was 2x as long with Cursor, compared to the regular SELECT Statement.

Thus in my mind, it really depends on what you value in a particular
situation, latency or throughput. --DD

PS: In my testing, I used forward-only cursors
PPS: I don't recall the ResultSet cardinality or byte size, nor the
batching used with the Cursor.

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-16 Thread Dominique Devienne

On Thu, Mar 16, 2023 at 9:23 AM Pavel Stehule 
wrote:

> čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne 
> napsal:
>
>> [...] depends on what you value in a particular situation, latency or
>> throughput. --DD
>>
>
> cursors are optimized for minimal cost of first row, queries are optimized
> for minimal cost of last row
>

That's a nice way to put it Pavel.

And to have it both ways, use COPY in binary protocol? That way the rows
are streamed
to you in arbitrary chunks as soon as available (I hope), and the burden is
on you the
client to decode and use those rows in parallel as they are "streamed" to
you.

I've yet to test that (thus the 'i hope' above). I used COPY binary for
INSERTs,
and COPY text/json for SELECTs, not yet COPY binary for SELECTs. I'm hoping
the latency of COPY will be small compared to a regular SELECT where I have
to
wait for LIBPQ to assemble the whole ResultSet. Are my hopes unfounded? --DD

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-16 Thread Dominique Devienne

On Thu, Mar 16, 2023 at 11:15 AM Pavel Stehule 
wrote:

> čt 16. 3. 2023 v 9:55 odesílatel Dominique Devienne 
> napsal:
>
>> On Thu, Mar 16, 2023 at 9:23 AM Pavel Stehule 
>> wrote:
>>
>>> čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne 
>>> napsal:
>>>
>>>> [...] depends on what you value in a particular situation, latency or
>>>> throughput. --DD
>>>>
>>>
>>> cursors are optimized for minimal cost of first row, queries are
>>> optimized for minimal cost of last row
>>>
>>
>> That's a nice way to put it Pavel.
>>
>> And to have it both ways, use COPY in binary protocol?
>>
>
> COPY is a different creature - it has no execution plan, and it is not
> interpreted by the executor.
>

OK. Not sure what that means exactly. There's still a SELECT, with possibly
WHERE clauses and/or JOINs, no?
Doesn't that imply an execution plan? I'm a bit confused.

> Using COPY SELECT instead SELECT looks like premature optimization.
>

Possible. But this is not an e-commerce web-site with a PostgreSQL backend
here.
This is classical client-server with heavy weight desktop apps loading
heavy weight data
(in number and size) from PostgreSQL. So performance (throughput) does
matter a lot to us.
And I measure that performance in both rows/sec and MB/sec, not (itsy
bitsy) transactions / sec.

> The performance benefit will be minimal ([...]).
>

COPY matters on INSERT for sure performance-wise.
So why wouldn't COPY matter for SELECTs too?

> Cursors, queries can use binary protocol, if the client can support  it.
>

I already do. But we need all the speed we can get.
In any case, I'll have to try and see/test for myself eventually.
We cannot afford to leave any performance gains on the table.

NULL pg_database.datacl

2023-03-20 Thread Dominique Devienne

Hi. I'm surprised, I thought ACLs would never be empty for a database.
Does that mean nobody can connect to this database?
I guess SUPERUSER and/or its datDBA can?
What does a NULL AclItem[] mean exactly?
(BTW, datallowconn is true)

Thanks, --DD

ddb=> select datname, datdba::regrole::text from pg_database where datacl
is null;
 datname | datdba
-+
 qadb| qauser
(1 row)

PS: Was also surprised with  #42883: ERROR:  no binary output function
available for type aclitem
  Thankfully datacl::text[] works fine in binary mode.

Re: NULL pg_database.datacl

2023-03-20 Thread Dominique Devienne

On Mon, Mar 20, 2023 at 1:18 PM Erik Wienhold  wrote:

> > On 20/03/2023 11:52 CET Dominique Devienne  wrote:
> > What does a NULL AclItem[] mean exactly?
>
> It means that the object has default privileges (before any GRANT or REVOKE
> is executed).  For databases this means full privileges for the database
> owner
> and the CONNECT and TEMPORARY privileges for PUBLIC.  So any user can
> connect
> if allowed by pg_hba.conf.
>
> https://www.postgresql.org/docs/current/ddl-priv.html

Thanks Erik. But then, how come aclexplode() is not showing these
default/implicit privileges?

Is there a SQL function returning those per-type default provileges?
That I could then coalesce() datacl with?

Or some other means to list actual / effective privileges, even those
implicit ones?

Thanks, --DD

ddb=> select datname, grantor::regrole::text, grantee::regrole::text,
privilege_type, is_grantable
ddb->   from pg_database
ddb->   left join lateral aclexplode(datacl) on true
ddb-> where datacl is null;
 datname | grantor | grantee | privilege_type | is_grantable
-+-+-++--
 qadb| | ||
(1 row)

Re: NULL pg_database.datacl

2023-03-20 Thread Dominique Devienne

On Mon, Mar 20, 2023 at 2:18 PM Erik Wienhold  wrote:

> > On 20/03/2023 13:50 CET Dominique Devienne  wrote:
> >
> > On Mon, Mar 20, 2023 at 1:18 PM Erik Wienhold  wrote:
> > > > On 20/03/2023 11:52 CET Dominique Devienne 
> wrote:
> > > > What does a NULL AclItem[] mean exactly?
> > >
> > >  It means that the object has default privileges [...]
> > >  https://www.postgresql.org/docs/current/ddl-priv.html
> >
> > Is there a SQL function returning those per-type default provileges?
>
> Use acldefault.  Pass in ownerId=0 to get the privileges for PUBLIC.
>
>
> https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-ACLITEM-FN-TABLE


Thanks again. After experimenting a little, sounds like acldefault('d',
datdba) is what I want,
since it seems to give something closer to the AclItem[] actually
instantiated on the first GRANT or REVOKE.

This has been an education for me, once again. I appreciate the help.
Thanks, --DD

ddb=> select datname, coalesce(datacl, acldefault('d', 0)) from pg_database
where datacl is null;
 datname |coalesce
-+
 qadb| {=Tc/0,=CTc/0}
(1 row)

ddb=> select datname, coalesce(datacl, acldefault('d', datdba)) from
pg_database where datacl is null;
 datname |coalesce
-+
 qadb| {=Tc/qauser,qauser=CTc/qauser}
(1 row)

ddb=# revoke all on database qadb from public;
REVOKE

ddb=# select datname, datacl from pg_database where datname = 'qadb';
 datname |   datacl
-+-
 qadb| {qauser=CTc/qauser}
(1 row)

ddb=# grant connect, temporary on database qadb to public;
GRANT

ddb=# select datname, datacl from pg_database where datname = 'qadb';
 datname | datacl
-+
 qadb| {qauser=CTc/qauser,=Tc/qauser}
(1 row)

ddb=# select count(*) from pg_database where datacl is null;
 count
---
 0
(1 row)

Convert pg_constraint.conkey array to same-order array of column names

2023-03-23 Thread Dominique Devienne

We have a query returning 1 row per constraint column,
which until recently we didn't realize wasn't preserving order of the
columns.

A colleague fixed that, with something like below:

SELECT ...
FROM pg_catalog.pg_constraint cnstr
...
CROSS JOIN LATERAL UNNEST(cnstr.conkey) WITH ORDINALITY AS cols(value, rank)
 ORDER BY cols.rank

But I'm wondering about getting 1 row per constraint instead,
and fetching an array of column names.

So is there a way to "convert" int2[] conkey array into a text[] of those
column names?

Re: Convert pg_constraint.conkey array to same-order array of column names

2023-03-24 Thread Dominique Devienne

On Thu, Mar 23, 2023 at 4:20 PM Adrian Klaver 
wrote:

> On 3/23/23 04:12, Dominique Devienne wrote:
> > CROSS JOIN LATERAL UNNEST(cnstr.conkey) WITH ORDINALITY AS cols(value,
> rank)
> >   ORDER BY cols.rank
> A before coffee solution:
>

Thanks for answering Adrian. And sorry for the delay in responding.

> WITH ck AS (
>  SELECT
>  conrelid,
>  unnest(conkey) AS ky
>  FROM
>  pg_constraint
>  WHERE
>  conrelid = 'cell_per'::regclass
> )
>

This part surprised me. I didn't know a table-valued function could be used
like this on the select-clause.

Both queries below yield the same rows for me, in the same order:

=> select conname, unnest(conkey), conrelid::regclass::text from
pg_constraint where conrelid::regclass::text like ... and
cardinality(conkey) = 8;
=> select conname, key.value, conrelid::regclass::text from pg_constraint
cross join lateral unnest(conkey) as key(value) where
conrelid::regclass::text like ... and cardinality(conkey) = 8;

So your compact form is equivalent to the second form?
What about the order? Is it guaranteed?
I was "raised" on the "order is unspecified w/o an order-by-clause". Why
would be it be different here?
In our case, the query is more complex, with joins on pg_namespace,
pg_class, and pg_attribute, on
all constraints of a schema, and the order came out wrong w/o adding WITH
ORDINALITY and ordering on it.

Thus I worry the order is plan-dependent, and not guaranteed. Am I wrong to
worry?
The form you provide seems no different from our old form, to my non-expert
eye. --DD

Re: Convert pg_constraint.conkey array to same-order array of column names

2023-03-24 Thread Dominique Devienne

On Fri, Mar 24, 2023 at 11:51 AM Alvaro Herrera 
wrote:

> On 2023-Mar-23, Dominique Devienne wrote:
> > But I'm wondering about getting 1 row per constraint instead,
> > and fetching an array of column names.
>
> IIRC you can use array_agg(unnest ORDER BY ordinality), grouping by
> constraint OID or name.
>

Thanks. This lead me to a better search, yielding two informative results:
https://stackoverflow.com/questions/60877627/perform-unnest-inner-join-and-then-array-agg-as-part-of-an-update-query
http://amyszczepanski.com/2019/03/25/unnest-and-array-agg-respecting-order.html

Re: PostgreSQL vs MariaDB

2023-03-24 Thread Dominique Devienne

On Fri, Mar 24, 2023 at 3:50 PM Jeffrey Walton  wrote:

> On Fri, Mar 24, 2023 at 7:07 AM Inzamam Shafiq
> > Can someone please list pros and cons of MariaDB vs PostgreSQL [...]
> >
>
> [...] MySQL has over [1700 CVEs going back to 1999.][1]
> In contrast, PostgreSQL has about [240 CVEs going back to 1999.][2]
>
> [1]: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=MySQL
> [2]: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=PostgreSQL

OTOH, I checked a few, and they are not about PostgreSQL per-se,
but projects using PostgreSQL or extensions to PostgreSQL.
Thus I assume the same happens for MySQL.

Of course, I didn't check that many, just a small sample.

Also, there seems to be a low signal-to-noise ratio on CVEs.
See what [Richard Hipp of SQLite fame says about CVEs][3].
This (low quality of CVEs) comes up often on the SQLite Forum. FWIW.

Still, your input is insightful, qualitatively. Thanks, --DD

[3]: https://sqlite.org/cves.html

Re: Convert pg_constraint.conkey array to same-order array of column names

2023-03-24 Thread Dominique Devienne

On Fri, Mar 24, 2023 at 5:40 PM Adrian Klaver 
wrote:

> On 3/24/23 03:28, Dominique Devienne wrote:
> > On Thu, Mar 23, 2023 at 4:20 PM Adrian Klaver  > <mailto:adrian.kla...@aklaver.com>> wrote:
> Your original question was:
>
> "But I'm wondering about getting 1 row per constraint instead,
> and fetching an array of column names.
>
> So is there a way to "convert" int2[] conkey array into a text[] of those
> column names?"
>
> That is what I showed as a simple example.
>

Sorry, didn't mean to offend you.

My post started with "preserving order of the columns",
but I probably didn't make it clear enough preserving the source array's
order was also a requirement.

Plans for ON DELETE CASCADE? Which index is used, if at all?

2023-03-28 Thread Dominique Devienne

Hi. Given the classical parent/child schema below, with an ON DELETE
CASCADE FK constraint, I'd like to know which index is used (if any) to
delete child rows affected by the CASCADE.

But explain (analyze) does not give me that.

The reason I'm asking, is because we currently (automatically) add indexes
on FK columns, to avoid full-scan of child tables when a parent row is
deleted (i.e. un-indexed FKs), but I realized that index we create
explicitly is also a *prefix* of the natural-key constraint, thus I suspect
redundant with the index supporting that NK constraint.

But before getting rid of that index, which I suspect is redundant, I
wanted to verify.
And I happen to discover I don't know how to verify it. My PostgreSQL Fu is
not great...

So, could someone please:
1) teach me how to know which index is used when cascading FKs in child
tables?
2) confirm my suspicion that the  child_parent_idx index below is
redundant, given the  child_parent_name_key one?

Thanks, --DD

```
dd=> create table parent (id int generated always as identity primary key,
name text not null unique);
CREATE TABLE

dd=> create table child (id int generated always as identity primary key,
parent int not null references parent(id) on delete cascade, name text not
null, unique(parent, name));
CREATE TABLE

dd=> create index child_parent_idx on child(parent);
CREATE INDEX

dd=> insert into parent(name) values ('foo');
INSERT 0 1
dd=> insert into child(parent, name) values (1, 'c1');
INSERT 0 1
dd=> insert into child(parent, name) values (1, 'c2');
INSERT 0 1
dd=> insert into parent(name) values ('bar');
INSERT 0 1
dd=> insert into child(parent, name) values (2, 'c3');
INSERT 0 1

dd=> explain (analyze) delete from parent where id = 2;
QUERY PLAN
--
 Delete on parent  (cost=0.15..8.17 rows=1 width=6) (actual
time=0.052..0.053 rows=0 loops=1)
   ->  Index Scan using parent_pkey on parent  (cost=0.15..8.17 rows=1
width=6) (actual time=0.021..0.024 rows=1 loops=1)
 Index Cond: (id = 2)
 Planning Time: 0.090 ms
 Trigger for constraint child_parent_fkey: time=0.242 calls=1
 Execution Time: 0.329 ms
(6 rows)

dd=> \d+ child
  Table "public.child"
 Column |  Type   | Collation | Nullable |   Default|
Storage  | Stats target | Description
+-+---+--+--+--+--+-
 id | integer |   | not null | generated always as identity |
plain|  |
 parent | integer |   | not null |  |
plain|  |
 name   | text|   | not null |  |
extended |  |
Indexes:
"child_pkey" PRIMARY KEY, btree (id)
"child_parent_name_key" UNIQUE CONSTRAINT, btree (parent, name)
"child_parent_idx" btree (parent)
Foreign-key constraints:
"child_parent_fkey" FOREIGN KEY (parent) REFERENCES parent(id) ON
DELETE CASCADE
Access method: heap
```

Cluster table based on grand parent?

2023-03-28 Thread Dominique Devienne

Hi again,

I just sent a question regarding parent/child and cascading FKs.
But in reality, our schema has not 2 but 3 "layers",
with an additional grandchild "leaf" table (see below).

Given that many acces patterns are parent-based, i.e. get all
child of given parent, or get all grandchild of given child, I can
use [CLUSTER][1] leveraging the natural-key (parent, name) constraint/index.

But for grandchild rows, doesn't mean the rows for a given (grand)parent
won't be fully clustered? Yes, our software often accesses rows in child
and grandchild
for a given parent row.

So can grandchild table(s) be "fully" clustered per-(grand)parent?
Would that require denormalizing, and adding an extra grandparent column FK
in grandchild, to achieve that?
And if that's the case, then there are two "paths" to CASCADE a delete from
parent; Would that be a problem?  (w.r.t. performance or otherwise?)

Finally, does cluster affect associated toast tables too? (the doc doesn't
say)

Thanks for any insights. --DD

PS: At this point, I don't even know how much cluster affects performance.
But because it can affect the schema structure (by denormalizing), i'd
rather know early.

[1]: https://www.postgresql.org/docs/current/sql-cluster.html

```
dd=> create table parent (id int generated always as identity primary key,
name text not null unique);
CREATE TABLE

dd=> create table child (id int generated always as identity primary key,
parent int not null references parent(id) on delete cascade, name text not
null, unique(parent, name));
CREATE TABLE

dd=> create table grandchild (id int generated always as identity primary
key, parent int not null references child(id) on delete cascade, name text
not null, unique(parent, name));
CREATE TABLE
```

Re: Plans for ON DELETE CASCADE? Which index is used, if at all?

2023-03-28 Thread Dominique Devienne

On Tue, Mar 28, 2023 at 3:23 PM Tom Lane  wrote:

> Dominique Devienne  writes:
> > Hi. Given the classical parent/child schema below, with an ON DELETE
> > CASCADE FK constraint, I'd like to know which index is used (if any) to
> > delete child rows affected by the CASCADE.
>
> > But explain (analyze) does not give me that.
>
> Yeah, it will just report the time spent in the FK triggers,
> not what they were doing exactly.
>
> IIRC, you can see the CASCADE operations with contrib/auto_explain,
> if you enable auto_explain.log_nested_statements.
>

Thanks. Looks like this won't be easily available to me :(. --DD

dd=> select * from pg_available_extensions where name like '%auto%';
  name   | default_version | installed_version |comment
-+-+---+---
 autoinc | 1.0 |   | functions for
autoincrementing fields
(1 row)

dd=> select * from pg_available_extensions where name like '%explain%';
 name | default_version | installed_version | comment
--+-+---+-
(0 rows)

Re: Cluster table based on grand parent?

2023-03-28 Thread Dominique Devienne

On Tue, Mar 28, 2023 at 5:08 PM Peter J. Holzer  wrote:

> On 2023-03-28 07:26:47 -0700, Adrian Klaver wrote:
> > On 3/28/23 06:17, Dominique Devienne wrote:
> > > PS: At this point, I don't even know how much cluster affects
> performance.
>
> I think that this depends a lot on your access patterns

As I wrote, per-parent access to child and grandchild rows is typical.
So w/o parent-based clustering of grandchild table(s), access those rows
could potential seek to several (~50, see below) smaller clusters with
arbitrary gaps.

Cardinality is a few to ~20K on parent, x10-x50 on child, x20 - x100 on
grandchild.
So total row count rarely exceeds the 1M - 10M range. But there are
LOBs/BYTEa...

> (especially on
> how much you update the grandchild table and whether those updates can
> be HOT), so you will probably have to measure it yourself with a
> realistic work load.
>

In this particular case, there aren't much UPDATEs, because of a deficiency
of the client applications, which mostly do DELETE+INSERT instead of
UPDATEs.
Although we have to cascade modified dates up the parent hierarchy,
so some UPDATEs do occur, but mostly on the less numerous child and parent
tables.

> (Personally I doubt the impact is large, but I don't know your data or
> your access patterns.)
>

OK.

> > >  But because it can affect the schema structure (by denormalizing),
> > > i'd rather know early.
> >
> > You will need to explain to me how it denormalizes? It reorders rows by
> > index definition and does not maintain that order over updates and
> inserts.
>
> I think he means that in order to cluster the grandchild table by the
> parent.id
> he would have to denormalize the table.
>

exactly.

Thanks for your input.

Re: Cluster table based on grand parent?

2023-03-28 Thread Dominique Devienne

On Tue, Mar 28, 2023 at 6:06 PM Ron  wrote:

> You can only get from parent to grandchild via child.id to
> grandchild.parent, so why not cluster grandchild on grandchild.parent?
>

Hi. I don't understand your question. Yes, of course, if I want all
grand-children of a given parent, I'll do for example:

select p.id, c.id, c.name, gc.*
  from  grandchild gc
   join child c on gc.parent = c.id
   join parent p on c.parent = p.id
where p.name = $1

But w/o clustering on a denormalized grandchild.grandparent FK column, as
Peter showed,
and cluster only on grandchild.parent, that's not going to access a mostly
continuous range
of pages to fetch those all grandchild rows for that one parent. But
probably 10 to 50 "row-clusters",
given the fan-out I mentioned earlier at the child-table level. Or am I
missing something?

libpq: empty arrays have rank 0 in binary results? whatever the type's rank?

2023-03-29 Thread Dominique Devienne

Hi. Selecting/fetching an empty 1D array using a binary-mode PGresult,
gives me back 12 bytes, and the first 4, the rank, is 0, something I was
not expecting.
I was expecting dims_rank = 1, then first_dim = 0

Normal? Next two ints are kinda useless given the 0 rank?
It's easy to fix, to support rank=0 for empty arrays, I just want to
confirm.

Thanks, --DD

PS: On a side note; where can I find the sending code for arrays?
I didn't find the usual _send() and _recv() functions (but looking on
github online...)

PPS: The relevant part of my code is below, for context:

uint32_t dims_rank = read();
if (dims_rank != 1) {
throw std::runtime_error(fmt::format(
"Unsupported array rank: Got {}; Want 1", dims_rank
));
}
uint32_t has_nulls = read();
uint32_t actual_elem_oid = read();
uint32_t first_dim = read();
uint32_t lower_idx = read();

ensure_oid(Oid{ actual_elem_oid }, expected_elem_oid);
UNUSED(has_nulls);
assert(lower_idx == 1);

Re: libpq: empty arrays have rank 0 in binary results? whatever the type's rank?

2023-03-29 Thread Dominique Devienne

On Wed, Mar 29, 2023 at 3:45 PM Tom Lane  wrote:

> Dominique Devienne  writes:
> > Hi. Selecting/fetching an empty 1D array using a binary-mode PGresult,
> > gives me back 12 bytes, and the first 4, the rank, is 0, something I was
> > not expecting.
>
> Yeah, empty arrays have zero dimensions.
>
> > PS: On a side note; where can I find the sending code for arrays?
> > I didn't find the usual _send() and _recv() functions (but looking on
> > github online...)
>
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/arrayfuncs.c;h=87c987fb2704761c59333bf8c1fee47e5c14c598;hb=HEAD#l1583


Great. Thanks Tom. --DD

Re: Using CTID system column as a "temporary" primary key

2023-03-30 Thread Dominique Devienne

On Wed, Mar 29, 2023 at 9:23 PM Adrian Klaver 
wrote:

> On 3/29/23 12:11, Sebastien Flaesch wrote:
> > Oh the use of default keyword is new to me, thanks for that.
> >
> > But to make PostgreSQL more Informix-compatible,
> > zero should have been considered as well.

Perhaps.

> 1) Why? Down the road to compatibility with some undetermined group of
> databases lies mayhem.
>

Sure. Unless it's opt-in, see below.

> 2) 0 can be a valid sequence value:
>

Of course. Yet, as above, if that is opt-in as specified in the `create
table` DDL somehow, then why not?

BTW, default and 0 are not the same thing. You cannot bind "default" in
place of
an integer-valued prepared-statement placeholder, in a binary mode insert.
So it is
definitely not the same thing.

So while I can accept that not implementing that particular informix
compatibility wart
is a perfectly valid position, for impl and maintenance cost, the arguments
I've read so
far can be "easily" side-stepped from a technical perspective I suspect.
FWIW.

Re: Using CTID system column as a "temporary" primary key

2023-03-30 Thread Dominique Devienne

On Thu, Mar 30, 2023 at 11:42 AM Francisco Olarte 
wrote:

> On Thu, 30 Mar 2023 at 10:01, Dominique Devienne 
> wrote:
> > BTW, default and 0 are not the same thing. You cannot bind "default" in
> place of
> > an integer-valued prepared-statement placeholder, in a binary mode
> insert. So it is
> > definitely not the same thing.
>
> IMNSHO if you need to select between default and explicit in an insert
> via binding you have a design problem, and down this path lies madness.
>

I fail to see that myself, sorry. You can bind NULL, you can bind values,
so why
wouldn't you be able to bind DEFAULT too? I see that more as a failing to
the
binding API myself :)

But I guess it can be worked around with something like
`... values(coalesce($1, default), ..)`
and abusing NULL to mean DEFAULT on a case-by-case bases.
Assuming default can be used in this way (didn't try), of course.

libpq: COPY FROM STDIN BINARY of arrays

2023-03-30 Thread Dominique Devienne

Hi,

I already use arrays fine (e.g. text[]) in BINARY mode binds (with prepared
statement or not).
I already use COPY FROM STDIN BINARY fine, with scalar values.

But for the 1st time, I'm combining COPY with arrays,
and it is NOT fine yet :(

#08P01: ERROR:  insufficient data left in message
CONTEXT:  COPY UT_LibPQ_copy_arrays, line 1, column ptys

Does the wire-format of arrays differ between binary-binds and binary-copy?
Any pointers to what I could look at, to resolve this? doc? code? example?

Thanks, --DD

PS: If that helps, I'm willing to detail the break-down of all values /
bytes I'm sending

Re: libpq: COPY FROM STDIN BINARY of arrays

2023-03-31 Thread Dominique Devienne

On Thu, Mar 30, 2023 at 6:00 PM Tom Lane  wrote:

> Dominique Devienne  writes:
> > Does the wire-format of arrays differ between binary-binds and
> binary-copy?
>
> No.
>

An update on this, I think I figure it out, by comparing with COPY TO
STDOUT WITH (FORMAT BINARY).
I was missing the byte-count "header" for the array. Since I'm reusing my
binary-bind code for COPY,
I'm missing the 4 bytes for the array's "total encoded byte length" before
the array data proper.
Now that I understand the issue, should be a simple matter to fix. Thanks
for chiming in Tom. --DD

DEFINER / INVOKER conundrum

2023-04-03 Thread Dominique Devienne

My goal is to have clients connect to PostgreSQL,
and call a function that return a JWT token.

The JWT is supposed to capture the user (login role),
and the current_role (which has meaning in our app),
and sign it using a secret corresponding to a mid-tier
service the client will connect to later.

I've found https://github.com/michelp/pgjwt which seems
perfect for my use case, but I'm struggling with something.

On the one hand, I want a INVOKER security function,
to be able to capture the login and current ROLEs.

On the other hand, I want a DEFINER security function,
to be able to access the secret to sign the JWT with.

That secret will be in a table that regular users of our DB
do NOT have access to, of course. But that the function
doing the JWT signing does need access to, of course (again).

I thought I'd have two layers of functions, one INVOKER
that captures the ROLEs, which then calls the DEFINER one,
passing the ROLEs captured, but since the INVOKER function
must also be able to call the DEFINER function, what prevents
the client from calling it directly, with different (spoofed) ROLEs?

Is there a way out of that conundrum?

I also thought about pg_stat_activity.usename with pg_backend_pid(),
but there's no current_role in there, and also, I'm not sure how that
would play with connection pooling!?!?!?

So Is there a way to somehow mix INVOKER and DEFINER
to achieve the stated goal?

Or more broadly, mix information from the session and
"private" information (JWT secret, part of the "app") in a
server-side SQL function/procedure?

Thanks, --DD

Re: DEFINER / INVOKER conundrum

2023-04-04 Thread Dominique Devienne

First, let me say I was holding off replying/thanking everyone to have the
time
to properly test this. Erik's quasi-question makes me break that silence.

On Tue, Apr 4, 2023 at 10:46 AM Erik Wienhold  wrote:

> > On 04/04/2023 07:55 CEST walt...@technowledgy.de wrote:
> > For me, checking whether session_user has the privilege for claimed_role
> > is not enough, so I add a DOMAIN to the mix:
> > CREATE DOMAIN current_user_only AS NAME CHECK (VALUE = CURRENT_USER);
> > CREATE FUNCTION f(calling_user current_user_only DEFAULT CURRENT_USER)
>
> Nice.  It's equivalent to my version without the domain if the client can
> execute SET ROLE before calling f, thereby injecting any role for which
> pg_has_role(session_user, calling_user, 'MEMBER') returns true.
>

First, thanks to Christoph, Erik, and Walther, for helping out.
Second, very nice indeed. I'll test it out soon.

> Dominique did not say whether he controls the clients or not.
>

I would say I don't. The decision was made to go with a 2-tier architecture,
so the desktop apps connect to PostgreSQL using a typically per-OS-user
PostgreSQL LOGIN user, so even though it's our app's code that access
the DB, so we control that part, nothing prevents those users to connect
directly via psql or pgAdmin or any libpq or protocol-talking client, and
try
to do mischiefs or worse. Thus the server-side permission model must be
as tight as it can be. Which means "regular" users don't have DDL privs,
of course; thus the need for a mid-tier "more privileged" service to
"mediate"
the DDLs we sometimes need to do on behalf of "regular" users; thus the
need to properly authenticate those users with the mid-tier services, based
on the fact they can connect to the DB and its schema(s) and access/execute
that JWT returning function. Hopefully that's clear :)

COPY RETURNING?

2023-04-19 Thread Dominique Devienne

Hi.

We are switching a schema type's surrogate/primary key, from `uuid` to
`int`.
That schema has parent-child relationships enforced with foreign-keys.
Our ingestion/schema-loading code uses COPY FROM STDIN BINARY.

Before, the SK/PK was generated client-side, as random uuid.
The ingestion code maintained client-side maps from NKs to uuids,
to be used for FK columns in child-tables COPY'd later.

But now that the SK/PK is an integer identity column generated server-side,
thanks to an implicitly-created sequence, we need that generated per-row
`int` PK.
With a normal prepared statement, we'd use a RETURNING clause, to avoid
a separate round-trip to the server. Can the same somehow be achieved with
COPY?

I'm afraid the answer is no, but I want to ask anyway, maybe there's a way
or work-around?

At the end of the COPY, we do get a ResultSet, so API-wise it was be
possible for it
to contain some rows, I guess, but I'm not sure the COPY protocol supports
returning
rows, nor what the syntax would be to have a COPY RETURNING form of COPY.

Thanks for any insights. --DD

1 2 3 4 >

1 - 100 of 343 matches

Mail list logo