Re: Is this error expected ?

2022-10-19 Thread Laurenz Albe
On Wed, 2022-10-19 at 09:50 +0300, Yavuz TANRIVERDİ wrote:
> i have an 
> "ERROR:  UNION types "char" and text cannot be matched CASE WHEN indisprimary 
> THEN"
> error from 
> https://github.com/yiisoft/yii/blob/e7c298343bf1f76186d443b62ff853d2d36e19f0/framework/db/schema/pgsql/CPgsqlSchema.php#L233
> I read release notes, but can't find any related change,
> It works for PostgreSQL 14.5 but fails with PostgreSQL 15.0, a test sample 
> below.
> Is this error expected ?

Yes, it is.  See this paragraph from the release notes
(https://www.postgresql.org/docs/current/release-15.html#id-1.11.6.5.5.13):

- Create a new pg_type.typcategory value for "char" (Tom Lane)
  Some other internal-use-only types have also been assigned to this category.

Perhaps that should have been listed as a potential compatibility break, but
the documentation explicity says that "char" is not intended for use by the
end user (https://www.postgresql.org/docs/current/datatype-character.html):

> These are not intended for general-purpose use, only for use in the internal 
> system catalogs.

You will have to add an explicit type cast.

Yours,
Laurenz Albe




Re: Is this error expected ?

2022-10-19 Thread Yavuz TANRIVERDİ
Ok, thank you very much,

On Wed, Oct 19, 2022 at 10:15 AM Laurenz Albe 
wrote:

> On Wed, 2022-10-19 at 09:50 +0300, Yavuz TANRIVERDİ wrote:
> > i have an
> > "ERROR:  UNION types "char" and text cannot be matched CASE WHEN
> indisprimary THEN"
> > error from
> >
> https://github.com/yiisoft/yii/blob/e7c298343bf1f76186d443b62ff853d2d36e19f0/framework/db/schema/pgsql/CPgsqlSchema.php#L233
> > I read release notes, but can't find any related change,
> > It works for PostgreSQL 14.5 but fails with PostgreSQL 15.0, a test
> sample below.
> > Is this error expected ?
>
> Yes, it is.  See this paragraph from the release notes
> (https://www.postgresql.org/docs/current/release-15.html#id-1.11.6.5.5.13
> ):
>
> - Create a new pg_type.typcategory value for "char" (Tom Lane)
>   Some other internal-use-only types have also been assigned to this
> category.
>
> Perhaps that should have been listed as a potential compatibility break,
> but
> the documentation explicity says that "char" is not intended for use by the
> end user (https://www.postgresql.org/docs/current/datatype-character.html
> ):
>
> > These are not intended for general-purpose use, only for use in the
> internal system catalogs.
>
> You will have to add an explicit type cast.
>
> Yours,
> Laurenz Albe
>


Re: byte-size of column values

2022-10-19 Thread Dominique Devienne
On Tue, Oct 18, 2022 at 6:04 PM David G. Johnston
 wrote:
> On Tue, Oct 18, 2022 at 8:53 AM Dominique Devienne  
> wrote:
>> I'm surprised by the result for bit(3) and char, when calling 
>> pg_column_size().

> The base type is what matters, if the length of the actual type is a parameter
> (the (n) part) the underlying type must be variable.

Thanks. Interesting. Didn't know (n)-suffixed "fixed-length" types
where always based on variable-size ones.

>> How does one store as compactedly as possible several small enums
> int2

OK, I see. Thanks again.

> p.s., pretend char doesn't even exist.

I realize that now. Wasn't obvious to me, despite the warning in the doc.




Re: byte-size of column values

2022-10-19 Thread Dominique Devienne
On Tue, Oct 18, 2022 at 6:04 PM Tom Lane  wrote:
> Dominique Devienne  writes:
> > I'm surprised by the result for bit(3) and char
> > The doc does mention 5-8 bytes overhead, but I expected
> > those for varying bit, not fixed-sized bit typed values.
>
> Your expectation is incorrect.  Postgres always treats these types
> as variable-length, whether or not the column has a length constraint.

OK. Still, wasn't such a stretch to assume that, no?
Now I know better, thanks to you and David.
I'm not sure the doc on types talks about that either.
Didn't see it for sure (but could still be there and I missed it).

> Thus, there's always a header to store the actual length.  That can
> be either 1 or 4 bytes (I think the doc you are looking at might be
> a little out of date on that point).

Even the doc on v15 (or devel) still says 5-to-8.
https://www.postgresql.org/docs/15/datatype-bit.html

And on my v12, that's born out from my experimentation.
Being used to SQLite using varints,
I'd have expected fewer overhead bytes for the size, like your 1-to-4.

> Because of the popularity of variable-width character encodings,
> a column declared as N characters wide isn't necessarily a fixed
> number of bytes wide, making it a lot less useful than you might
> think to have optimizations for fixed-width storage.  Between that
> and the fact that most Postgres developers regard CHAR(N) as an
> obsolete hangover from the days of punched cards, no such
> optimizations have been attempted.

Thanks for the background. I definitely appreciate PostgreSQL's large 1GB
limit on text and bytea columns, coming from Oracle's tiny 4K one, which
created us all kind of headaches.

For kicks, I looked at bpchar, blank-padded-char, and its extra byte, which
I assume is again some kind of length, there at least the overhead is small
compared to bit(n). 1 bytes versus 5 bytes is no small difference.

ddevienne=> create table bar (bpc bpchar(16));
CREATE TABLE
ddevienne=> insert into bar values ('foo'), ('bar baz');
INSERT 0 2
ddevienne=> select length(bpc), pg_column_size(bpc), '<'||bpc||'>' from bar;
 length | pg_column_size | ?column?
++---
  3 | 17 | 
  7 | 17 | 
(2 rows)




How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne
Hi. I'd like some advice storing blobs (millions of them), ranging
from very small, to large > 1GB.

I know about bytea versus lo, and I have probably read most of what's
out there about them :)

Upfront, I have to state that I'm not keen on lo, because of security
considerations. We store
blobs in many different schemas, and users can access some schemas,
and not others. So
the fact the lo table is unique for the whole database would allow
users to see blobs from any
schema, as I understand it. Right? OTOH, lo has random access, which I
also need...
(I'm also not a fan of lo needing triggers for lifetime management)
(nor of the requirement to have an explicit transaction to use lo).

Here are the main requirement I need to fulfil:
1) store literally millions of rows, 1 "blob" per row. (scientific data).
2) most "blobs" are smallish, below a few KBs. bytea's perfect for that.
3) yet many blobs are on the dozens of MBs. bytea's still OK.
4) but some blobs exceed the 1GB byte limit. Found at least a dozen
just in our test data, so clients will have them.
5) for accessing larger blobs, the API I must implement accesses
contiguous chunks of the blobs. Thus I need random access.

I'm porting the backend of that API from Oracle to PostgreSQL.
In Oracle we used RAW for smaller values, and SecureFile blobs for larger ones.
Oracle LOBs are similar to lo I guess, providing random access, except
higher performance for large sizes,
but slower than bytea for smaller one. But a PITA to deal with, with
locators, breaking pre-fetching, etc...

PostgreSQL bytea is much better and simpler, except limited to 1GB...
Bytea also has no direct random access, except via substr[ing], but
how efficient and "random access" is that?
For example, SQLite stores large values in overflow pages, and has to
traverse all those pages to pointer-chase
the chain of those for large blobs, so the substr() "random access" is
NOT O(1) and involves way too much IO (i.e. O(N)).
Do TOAST'ed large values (text or bytea) fare better in term of
substr() efficiency, close to O(logN) and only
the necessary IO for what's actually requested by the substr() (modulo
the extrema 2K chunks of TOAST)?

Some posts on the internet also peg lo to be faster above 20MB
compared to bytea, for example. Is that even true?
One post I read (from Daniel Verite I think) kinda hinted the lo table
is not that different from TOAST ones,
with 2K chunks, thus I don't see why lo would be any faster than a
TOAST'ed bytea for example.
Any insights from someone with knowledge of the internals can share on this?

At this point, I have not yet redone the extensive benchmarking we did
a few years ago between
Oracle and PostgreSQL, which went into lo versus bytea then. And
benchmark are hard to do, not
having easy access to different kind of servers with different storage
backends, or cloud-hosted PG.
It's too easy to get biais from a local setup, leading to a design
that'd not perform optimally in a different one.
That's why I'm asking a more open-ended question to experts on this list.

Because of #4 above, I need to either use lo (but see above, notably
the security concern),
or roll-up my own TOAST^2 (squared), as I call it, where I manually
"shard" / chunk large blobs
in an auxiliary table, itself TOAST'ed of course, with some threshold
for chunks (e.g. 1 or 4 or 16 MB perhaps).
The latter keeps the blobs in the schema (good for security), lifetime
is managed by FKs (as usual),
and the sharding limits the ill-effects of "emulating" random-access
with substr() if necessary.
I've already done things like this in SQLite land (also has the 1GB
limit for its text and blob types).

So is this a terrible idea? What alternatives do I have?
I'd really appreciate some expert advice on the above, before I go too
far down the rabbit hole.

Thanks, --DD

PS: Another drawback of lo is that because it's a single table, it
still subject to the 32 TB limit on a relation.
  The 4TB limit per lo is way more than we need, but the 32 TB limit
may actually be more of an issue for
  our larger clients, which have thousands of projects, each with
upwards of a few millions of those blobs.
  bytea values being stored in different schemas (per-project, a
design constraint), puts that limit per-project
  which will be more than enough. For the sum of all projects, maybe
not... I.e. with real client-case of 3K projects,
  that puts an average of only 10GB of lo's per-project (i.e. schema),
which could very well be problematic...




Sv: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Andreas Joseph Krogh


First advice, don't do it. We started off storing blobs in DB for “TX safety”, 
but backup/restore quickly became too cumbersome so we ended up moving all 
blobs out and only store reference in DB. This required us to make a “vacuum 
system” that cleans up the blob-storage regularly as ROLLBACK/crash can make it 
out of sync.

We chose storing as LO because with it, streaming large blobs (not using much 
memory) actually worked, with JDBC at least.

På onsdag 19. oktober 2022 kl. 11:47:59, skrev Dominique Devienne <
ddevie...@gmail.com >:
Hi. I'd like some advice storing blobs (millions of them), ranging
from very small, to large > 1GB.

I know about bytea versus lo, and I have probably read most of what's
out there about them :)

Upfront, I have to state that I'm not keen on lo, because of security
considerations. We store
blobs in many different schemas, and users can access some schemas,
and not others. So
the fact the lo table is unique for the whole database would allow
users to see blobs from any
schema, as I understand it. Right? OTOH, lo has random access, which I
also need...
(I'm also not a fan of lo needing triggers for lifetime management)
(nor of the requirement to have an explicit transaction to use lo).

Here are the main requirement I need to fulfil:
1) store literally millions of rows, 1 "blob" per row. (scientific data).
2) most "blobs" are smallish, below a few KBs. bytea's perfect for that.
3) yet many blobs are on the dozens of MBs. bytea's still OK.
4) but some blobs exceed the 1GB byte limit. Found at least a dozen
just in our test data, so clients will have them.
5) for accessing larger blobs, the API I must implement accesses
contiguous chunks of the blobs. Thus I need random access.

I'm porting the backend of that API from Oracle to PostgreSQL.
In Oracle we used RAW for smaller values, and SecureFile blobs for larger ones.
Oracle LOBs are similar to lo I guess, providing random access, except
higher performance for large sizes,
but slower than bytea for smaller one. But a PITA to deal with, with
locators, breaking pre-fetching, etc...

PostgreSQL bytea is much better and simpler, except limited to 1GB...
Bytea also has no direct random access, except via substr[ing], but
how efficient and "random access" is that?
For example, SQLite stores large values in overflow pages, and has to
traverse all those pages to pointer-chase
the chain of those for large blobs, so the substr() "random access" is
NOT O(1) and involves way too much IO (i.e. O(N)).
Do TOAST'ed large values (text or bytea) fare better in term of
substr() efficiency, close to O(logN) and only
the necessary IO for what's actually requested by the substr() (modulo
the extrema 2K chunks of TOAST)?

Some posts on the internet also peg lo to be faster above 20MB
compared to bytea, for example. Is that even true?
One post I read (from Daniel Verite I think) kinda hinted the lo table
is not that different from TOAST ones,
with 2K chunks, thus I don't see why lo would be any faster than a
TOAST'ed bytea for example.
Any insights from someone with knowledge of the internals can share on this?

At this point, I have not yet redone the extensive benchmarking we did
a few years ago between
Oracle and PostgreSQL, which went into lo versus bytea then. And
benchmark are hard to do, not
having easy access to different kind of servers with different storage
backends, or cloud-hosted PG.
It's too easy to get biais from a local setup, leading to a design
that'd not perform optimally in a different one.
That's why I'm asking a more open-ended question to experts on this list.

Because of #4 above, I need to either use lo (but see above, notably
the security concern),
or roll-up my own TOAST^2 (squared), as I call it, where I manually
"shard" / chunk large blobs
in an auxiliary table, itself TOAST'ed of course, with some threshold
for chunks (e.g. 1 or 4 or 16 MB perhaps).
The latter keeps the blobs in the schema (good for security), lifetime
is managed by FKs (as usual),
and the sharding limits the ill-effects of "emulating" random-access
with substr() if necessary.
I've already done things like this in SQLite land (also has the 1GB
limit for its text and blob types).

So is this a terrible idea? What alternatives do I have?
I'd really appreciate some expert advice on the above, before I go too
far down the rabbit hole.

Thanks, --DD

PS: Another drawback of lo is that because it's a single table, it
still subject to the 32 TB limit on a relation.
 The 4TB limit per lo is way more than we need, but the 32 TB limit
may actually be more of an issue for
 our larger clients, which have thousands of projects, each with
upwards of a few millions of those blobs.
 bytea values being stored in different schemas (per-project, a
design constraint), puts that limit per-project
 which will be more than enough. For the sum of all projects, maybe
not... I.e. with real client-case of 3K projects,
 that puts an average of only 

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne
On Wed, Oct 19, 2022 at 12:17 PM Andreas Joseph Krogh 
wrote:

> First advice, don't do it. We started off storing blobs in DB for “TX
> safety”
>

Not really an option, I'm afraid.


> , but backup/restore quickly became too cumbersome so we ended up moving
> all blobs out and only store reference in DB.
>
This required us to make a “vacuum system” that cleans up the blob-storage
> regularly as ROLLBACK/crash can make it out of sync.
>

Note the fact the data is spread in many mostly independent schemas.
Might ease the backup/restore? I'm not much of a DBA though...


> We chose storing as LO because with it, streaming large blobs (not using
> much memory) actually worked, with JDBC at least.
>

I'm in C++, with I believe efficient use of binary binds and results, and
use of COPY as much as possible,
so as good as it gets I think (that I know of, with libpq), in terms of
performance. Haven't looked at libpq's
new in v14 pipeline mode yet though.

Investigated Cursor vs Statement too, and it's a tradeoff between latency
and throughput.
Not sure how good JDBC drivers for PostgreSQL are. I'm doing my best to go
fast with libpq.

In any case, thanks for your input. But it's not really a question of "if".
But of "how".

Putting thousands of large blobs in the file system is a no go. Assuming
the clients
can even see the file system the server sees. This is a 2-tier system,
there's no mid-tier
that would somehow magically handle proper security and lifetime management
of these blobs.

Thanks, --DD


Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Andreas Joseph Krogh


På onsdag 19. oktober 2022 kl. 12:48:24, skrev Dominique Devienne <
ddevie...@gmail.com >:

On Wed, Oct 19, 2022 at 12:17 PM Andreas Joseph Krogh mailto:andr...@visena.com>> wrote:



First advice, don't do it. We started off storing blobs in DB for “TX safety”

Not really an option, I'm afraid.


, but backup/restore quickly became too cumbersome so we ended up moving all 
blobs out and only store reference in DB.

This required us to make a “vacuum system” that cleans up the blob-storage 
regularly as ROLLBACK/crash can make it out of sync.

Note the fact the data is spread in many mostly independent schemas.
Might ease the backup/restore? I'm not much of a DBA though...


We chose storing as LO because with it, streaming large blobs (not using much 
memory) actually worked, with JDBC at least.


I'm in C++, with I believe efficient use of binary binds and results, and use 
of COPY as much as possible,
so as good as it gets I think (that I know of, with libpq), in terms of 
performance. Haven't looked at libpq's
new in v14 pipeline mode yet though.

Investigated Cursor vs Statement too, and it's a tradeoff between latency and 
throughput.
Not sure how good JDBC drivers for PostgreSQL are. I'm doing my best to go 
fast with libpq.

In any case, thanks for your input. But it's not really a question of "if". 
But of "how".

Putting thousands of large blobs in the file system is a no go. Assuming the 
clients
can even see the file system the server sees. This is a 2-tier system, there's 
no mid-tier
that would somehow magically handle proper security and lifetime management of 
these blobs.

Thanks, --DD
Ok, just something to think about; Will your database grow beyond 10TB with 
blobs? If so try to calculate how long it takes to restore, and comply with 
SLA, and how long it would have taken to restore without the blobs.



PS: Our blobstore is not “the file system”, but SeaweedFS.






--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne
On Wed, Oct 19, 2022 at 1:00 PM Andreas Joseph Krogh  wrote:
> Ok, just something to think about;

Thank you. I do appreciate the feedback.

> Will your database grow beyond 10TB with blobs?

The largest internal store I've seen (for the subset of data that goes
in the DB) is shy of 3TB.
But we are an ISV, not one of our clients, which have truly massive
scale for data.
And they don't share the exact scale of their proprietary data with me...

> If so try to calculate how long it takes to restore, and comply with SLA,
> and how long it would have taken to restore without the blobs.

Something I don't quite get is why somehow backup is no longer needed
if the large blobs are external?
i.e. are you saying backups are so much more worse in PostgreSQL than
with the FS? I'm curious now.

Also, managing the PostgreSQL server will be the client's own concern
mostly. We are not into Saas here.
As hinted above, the truly massive data is already not in the DB, used
by different systems, and processed
down to the GB sized inputs all the data put in the DB is generated
from. It's a scientific data heavy environment.
And one where security of the data is paramount, for contractual and
legal reasons. Files make that harder IMHO.

Anyways, this is straying from the main theme of this post I'm afraid.
Hopefully we can come back on the main one too. --DD




Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Andreas Joseph Krogh


På onsdag 19. oktober 2022 kl. 13:21:38, skrev Dominique Devienne <
ddevie...@gmail.com >:
On Wed, Oct 19, 2022 at 1:00 PM Andreas Joseph Krogh  
wrote:
> Ok, just something to think about;

Thank you. I do appreciate the feedback.

> Will your database grow beyond 10TB with blobs?

The largest internal store I've seen (for the subset of data that goes
in the DB) is shy of 3TB.
But we are an ISV, not one of our clients, which have truly massive
scale for data.
And they don't share the exact scale of their proprietary data with me...

> If so try to calculate how long it takes to restore, and comply with SLA,
> and how long it would have taken to restore without the blobs.

Something I don't quite get is why somehow backup is no longer needed
if the large blobs are external?
i.e. are you saying backups are so much more worse in PostgreSQL than
with the FS? I'm curious now.
I'm not saying you don't need backup (or redundancy) of other systems holding 
blobs, but moving them out of RDBMS makes you restore the DB to a consistent 
state, and able to serve clients, faster. In my experience It's quite unlikely 
that your (redundant) blob-store needs crash-recovery at the same time you DB 
does. The same goes with PITR, needed because of some logical error (like 
client deleted some data they shouldn't have), which is much faster without 
blobs in DB and doesn't affect the blobstore at all (if you have a smart 
insert/update/delete-policy there).



Also, managing the PostgreSQL server will be the client's own concern
mostly. We are not into Saas here.
As hinted above, the truly massive data is already not in the DB, used
by different systems, and processed
down to the GB sized inputs all the data put in the DB is generated
from. It's a scientific data heavy environment.
And one where security of the data is paramount, for contractual and
legal reasons. Files make that harder IMHO.

Anyways, this is straying from the main theme of this post I'm afraid.
Hopefully we can come back on the main one too. --DD

There's a reason “everybody” advices to move blobs out of DB, I've learned.






--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne
On Wed, Oct 19, 2022 at 1:38 PM Andreas Joseph Krogh 
wrote:

> There's a reason “everybody” advices to move blobs out of DB, I've learned.
>

I get that. I really do. But the alternative has some real downsides too.
Especially around security, as I already mentioned. That's why I'd like if
possible
to get input on the technical questions of my initial post.

That's not to say we wouldn't ultimately move out the big blobs outside the
DB.
But given how much that would complexify the project, I do believe it is
better
to do it as a second step, once the full system is up-and-running and
testing at
scale has actually been performed.

We've already moved other kind of data to PostgreSQL, from SQLite DBs
(thousands) this time,
and ported "as-is" the sharding done on the SQLite side to PostgreSQL
(despite TOAST).
And so far, so good. With good ingestion rates. And decent runtime access
to data too,
in the albeit limited testing we've had so far.

Now we need to move this other kind of data, from proprietary DB-like files
this times (thousands too),
to finish our system, and be able to finally test the whole system in
earnest, and at (our limited internal) scale.

So you see, I'm not completely ignoring your advise.

But for now, I'm inquiring as to the *best* way to put that data *in*
PostgreSQL,
with the requirements / constraints I've listed in the first post.
It may indeed be a bad idea long term. But let's make the most of it for
now.
Makes sense? Am I being unreasonable here? --DD


Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread esconsult1
We had the same thought of storing the blobs inside LO’s as well many years ago.

But ultimately chose cloud storage and stored a pointer in the database instead.

Now that we are approaching a terabyte of just normal data I don’t regret this 
decision one bit. Just handling backups and storage is already a chore. 

Data in S3 compatible storage is very easy to protect in numerous ways.

We have one set of code responsible for uploading, downloading and deleting the 
files themselves.

One downside? Occasionally an S3 delete fails and now and again a file or two 
gets orphaned. But we’ve never not found a file pointed to from our attachments 
table in 11 years.

We also only store pathnames/base names so we can easily move storage providers 
if we decide to go on Prem.

There is absolutely no upside to storing files in the db if you anticipate any 
kind of growth or significant volume.

Ericson Smith
CTO
Travel Agency Tribes

Sent from my iPhone

> On 19 Oct 2022, at 7:01 PM, Dominique Devienne  wrote:
> 
> 
>> On Wed, Oct 19, 2022 at 1:38 PM Andreas Joseph Krogh  
>> wrote:
> 
>> There's a reason “everybody” advices to move blobs out of DB, I've learned.
> 
> I get that. I really do. But the alternative has some real downsides too.
> Especially around security, as I already mentioned. That's why I'd like if 
> possible
> to get input on the technical questions of my initial post.
> 
> That's not to say we wouldn't ultimately move out the big blobs outside the 
> DB.
> But given how much that would complexify the project, I do believe it is 
> better
> to do it as a second step, once the full system is up-and-running and testing 
> at
> scale has actually been performed.
> 
> We've already moved other kind of data to PostgreSQL, from SQLite DBs 
> (thousands) this time,
> and ported "as-is" the sharding done on the SQLite side to PostgreSQL 
> (despite TOAST).
> And so far, so good. With good ingestion rates. And decent runtime access to 
> data too,
> in the albeit limited testing we've had so far.
> 
> Now we need to move this other kind of data, from proprietary DB-like files 
> this times (thousands too),
> to finish our system, and be able to finally test the whole system in 
> earnest, and at (our limited internal) scale.
> 
> So you see, I'm not completely ignoring your advise.
> 
> But for now, I'm inquiring as to the *best* way to put that data *in* 
> PostgreSQL,
> with the requirements / constraints I've listed in the first post.
> It may indeed be a bad idea long term. But let's make the most of it for now.
> Makes sense? Am I being unreasonable here? --DD


Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Daniel Verite
Dominique Devienne wrote:

> the fact the lo table is unique for the whole database would allow
> users to see blobs from any schema, as I understand it.

Direct access to pg_largeobject is only possible for superusers.
If lo_compat_privileges is on, any user can read any large
object with the lo* functions.
If it's off, they can read a large object only if they're the owner
or they have been granted permissions with

GRANT { { SELECT | UPDATE } [, ...] | ALL [ PRIVILEGES ] }
ON LARGE OBJECT loid [, ...]
TO role_specification [, ...] [ WITH GRANT OPTION ]
[ GRANTED BY role_specification ]

Each large object has its own set of permissions. This is a significant
difference with bytea, since every creation of a new large object
may need to be followed by GRANT statements.
Also if the roles and the access policies are changed in the
lifetime of the app, that might imply massive REVOKE/GRANT
statements to apply to existing objects.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Alvaro Herrera
On 2022-Oct-19, Dominique Devienne wrote:

> Upfront, I have to state that I'm not keen on lo, because of security
> considerations. We store blobs in many different schemas, and users
> can access some schemas, and not others. So the fact the lo table is
> unique for the whole database would allow users to see blobs from any
> schema, as I understand it. Right? OTOH, lo has random access, which I
> also need...

Generally speaking, bytea sucks for random access, because if a TOAST
item is compressed, it has to be always read from the beginning in order
to decompress correctly.  However, if you set
ALTER TABLE tab ALTER COLUMN byt SET STORAGE EXTERNAL
then compression is not used, and random access becomes fast.

https://www.postgresql.org/docs/15/sql-altertable.html

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"




Re: COMMIT IN STORED PROCEDURE WHILE IN A LOOP

2022-10-19 Thread Mladen Gogala
Comments in-line.

On Tue, 2022-10-18 at 21:02 -0700, Christophe Pettus wrote:
> 
> 
> > On Oct 18, 2022, at 19:18, gogala.mla...@gmail.com wrote:
> > 
> > Commit within a loop is an extremely bad idea.
> 
> This is an over-generalization.  There are many use-cases for this
> (if there were not, procedures wouldn't have been nearly as important
> a feature).
> 
> For example, if you are processing a large update (in the hundreds of
> thousands or more of rows), you often want to commit regularly so
> that other processes don't have to wait for the whole thing to finish
> due to row-level locks, and to give vacuum a chance to deal with the
> dead tuples.  Similarly, while inserting one row at a time and
> committing is usually not a great idea, it can make sense to do large
> inserts in batches.

That depends. Multiple commits will slow down the processing. If the
goal of the exercise is to let update complete as quickly as possible,
then it will be a single commit. If the goal is to enable normal
processing and let the enormous update complete in its due time, then
you'll do what you describe.


> 
> Applications do this kind of thing all the time, very successfully;
> it was just that the loop was in the application rather than in the
> procedure.
> 
> High commit rates happen all the time, and they don't break
> PostgreSQL.  For example, an IoT application collecting sensor data
> and doing many inserts per second is also doing many commits per
> second, since each bare INSERT is in its own transaction.  PostgreSQL
> handles it just fine.

Point of my post is that the business logic, in your case it's IoT
sensors, determines what is transaction and when to commit. Advice like
"commit often and commit early", to paraphrase the famous Chicago
mayor, is easy to find but I would take it with grain of salt.
Regards
-- 
Mladen Gogala
Database Consultant
https://dbwhisperer.wordpress.com



Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne
On Wed, Oct 19, 2022 at 3:01 PM Daniel Verite  wrote:
> Dominique Devienne wrote:
> > the fact the lo table is unique for the whole database would allow
> > users to see blobs from any schema, as I understand it.

> Each large object has its own set of permissions. This is a significant
> difference with bytea, since every creation of a new large object
> may need to be followed by GRANT statements.
> Also if the roles and the access policies are changed in the
> lifetime of the app, that might imply massive REVOKE/GRANT
> statements to apply to existing objects.

Thank you Daniel. Very interesting, and something I definitely didn't know.

I believe that's doable, given our design on ROLEs, but would for sure
be both a PITA, and
additional management / code to deal with. At least GRANTs are
transactional like the new
LO oids themselves, I think, so now I know it would be possible to
properly secure the LOs.

This insight is greatly appreciated. --DD




Speeding up adding fky on a very large table

2022-10-19 Thread Ravi Krishna
AWS Aurora based on PG 13.

Large partitioned table of 5+ billion rows and 7TB in size.
ALTER TABLE abc ADD CONSTRAINT fk_123 FOREIGN KEY (a,b,c) REFERENCES xyz(1,2,3);
It seems this is not parallelized. Is there a way. Or directly going into each 
partition is the only way ( not even sure it is possible).



Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Dominique Devienne
On Wed, Oct 19, 2022 at 3:05 PM Alvaro Herrera  wrote:
> On 2022-Oct-19, Dominique Devienne wrote:
> > OTOH, lo has random access, which I also need...
>
> Generally speaking, bytea sucks for random access, because if a TOAST
> item is compressed, it has to be always read from the beginning in order
> to decompress correctly.  However, if you set
> ALTER TABLE tab ALTER COLUMN byt SET STORAGE EXTERNAL
> then compression is not used, and random access becomes fast.

Thank you Álvaro. Quite insightful as well. I was aware of TOAST compression,
but didn't make the connection to the side-effects on random-access.

But now that TOAST has LZ4 support, which decompresses extremely fast,
compared to ZLib (I have experience with LZ4 for a WebSocket-based
server messages),
and choosing an appropriately small shard/chunk size, that might be
mitigated somewhat.
Would need testing / benchmarking to compare uncompressed vs LZ4, at
various chunk
and subset/offset sizes, of course.

Anybody has an answer to my question regarding how substr() works on
bytea values?
I.e. is it "pushed down" / optimized enough that it avoids reading the
whole N-byte value,
to then pass it to substr(), which then returns an M-byte value (where M < N)?

If TOAST stores 2,000 chunks, and those chunks' PKs are numbers,
simple arithmetic
should be able to select only the chunks of interest, those incurring
only the necessary IO
for the selected range, no?

Or the fact subsetting a bytea currently requires substr() prevents
using such a scenario?
And if so, why not support a native subsetting notation that did
support that scenario,
like the obvious bytea_col[offset, count] or bytea_col[start:end]?

Seems to be me efficient native subsetting of varlength values would
be quite valuable.




Re: byte-size of column values

2022-10-19 Thread Tom Lane
Dominique Devienne  writes:
> On Tue, Oct 18, 2022 at 6:04 PM Tom Lane  wrote:
>> Thus, there's always a header to store the actual length.  That can
>> be either 1 or 4 bytes (I think the doc you are looking at might be
>> a little out of date on that point).

> Even the doc on v15 (or devel) still says 5-to-8.
> https://www.postgresql.org/docs/15/datatype-bit.html

Oh, if you're thinking about bit rather than string types,
that's correct: in addition to the overall-field-width
header (1 or 4 bytes) there's a length-in-bits field
(always 4 bytes), then the bits themselves.  The overall
width is clearly insufficient to know how many bits are
valid in the last byte, so some additional storage is
necessary.  This could have been done in a more compact
way no doubt, but we aren't likely to change it now.

regards, tom lane




Re: Is this error expected ?

2022-10-19 Thread Ron

On 10/19/22 01:50, Yavuz TANRIVERDİ wrote:

|Hi,|
|i have an |
|
|"ERROR: UNION types "char" and text cannot be matched CASE WHEN 
indisprimary THEN"||

|error from |
|https://github.com/yiisoft/yii/blob/e7c298343bf1f76186d443b62ff853d2d36e19f0/framework/db/schema/pgsql/CPgsqlSchema.php#L233|
|I read release notes, but can't find any related change,|
|It works for PostgreSQL 14.5 but fails with PostgreSQL ||15.0, ||a test sample 
below.|
Is this error expected ?
Thanks,
|create table tbla ( a "char" ); |


Try CHAR(1) instead of just CHAR.

|create table tblb ( b char(1) ); insert into tbla values ('a'); insert 
into tbla values ('b'); insert into tblb values ('c'); insert into tblb 
values ('d'); -- works with 14.5 fails on 15.0|
|select a from tbla union all select case when true then 'p' else 'u' end 
from tblb; |


--
Angular momentum makes the world go 'round.

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Daniel Verite
Dominique Devienne wrote:

> PostgreSQL bytea is much better and simpler, except limited to 1GB...
> Bytea also has no direct random access, except via substr[ing], but
> how efficient and "random access" is that?

Bytea contents are compressed before being sliced (in chunks of
TOAST_MAX_CHUNK_SIZE bytes, typically it's 2000 IIRC), so it's not
possible to access a piece of data without decompressing the contents
before it.

By contrast large objects are sliced before compression, so the
performance of random access is likely to be completely different.

> Here are the main requirement I need to fulfil:
> 1) store literally millions of rows, 1 "blob" per row. (scientific data).
> 2) most "blobs" are smallish, below a few KBs. bytea's perfect for that.
> 3) yet many blobs are on the dozens of MBs. bytea's still OK.
> 4) but some blobs exceed the 1GB byte limit. Found at least a dozen
> just in our test data, so clients will have them.
> 5) for accessing larger blobs, the API I must implement accesses
> contiguous chunks of the blobs. Thus I need random access.

In your case I would probably opt for bytea (as opposed to large
objects), and slicing the blobs in the application in chunks of a
fixed size much larger than what TOAST does (for instance, somewhere
between 128 kB and 32 MB).

That is, having a table like:

create table blobs (
  blob_id some_type,
  chunk_no int,  /* 0->N */
  chunk bytea
);

It's not as simple as using a single bytea field or large objects,
but overall it avoids the management difficulties of both large
objects and very large contents in bytea columns.
Random access is achieved by skipping the chunks before the
requested piece of data.

If the app is able to read/write the chunks in binary mode, its queries
should perform as well as the large objects functions.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: COMMIT IN STORED PROCEDURE WHILE IN A LOOP

2022-10-19 Thread Ron

On 10/19/22 08:06, Mladen Gogala wrote:
[snip]




Applications do this kind of thing all the time, very successfully; it 
was just that the loop was in the application rather than in the procedure.


High commit rates happen all the time, and they don't break PostgreSQL.  
For example, an IoT application collecting sensor data and doing many 
inserts per second is also doing many commits per second, since each bare 
INSERT is in its own transaction.  PostgreSQL handles it just fine.


Point of my post is that the business logic, in your case it's IoT 
sensors, determines what is transaction and when to commit. Advice like 
"commit often and commit early", to paraphrase the famous Chicago mayor, 
is easy to find but I would take it with grain of salt.


In the normal course of operation (i.e, not when bulk loading), you /should/ 
commit at the end of every "business transaction". We've committed after X 
business  when running stovepipe "batch" jobs processing input files.  In 
those cases, though, we had to track progress through the file; in the case 
of a rollback, the application had to go back to the last input file "save 
point" and start over.


--
Angular momentum makes the world go 'round.

Re: Speeding up adding fky on a very large table

2022-10-19 Thread Ron

On 10/19/22 08:31, Ravi Krishna wrote:

AWS Aurora based on PG 13.


Large partitioned table of 5+ billion rows and 7TB in size.

ALTER TABLE abc ADD CONSTRAINT fk_123 FOREIGN KEY (a,b,c) REFERENCES 
xyz(1,2,3);


It seems this is not parallelized. Is there a way. Or directly going into 
each partition is the only way ( not even sure it is possible).


In vanilla Postgresql (including RDS Postgresql) you can add the FK to each 
partition and then ALTER TABLE ONLY the parent table.


Don't know about Aurora, though.

--
Angular momentum makes the world go 'round.

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Ron

On 10/19/22 04:47, Dominique Devienne wrote:
[snip]

PS: Another drawback of lo is that because it's a single table, it
still subject to the 32 TB limit on a relation.
   The 4TB limit per lo is way more than we need, but the 32 TB limit
may actually be more of an issue for
   our larger clients, which have thousands of projects, each with
upwards of a few millions of those blobs.


Partition the table on the first segment of the Primary Key.  From 
experience, anything else can lead to serious query degradation.


--
Angular momentum makes the world go 'round.




Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Ron

On 10/19/22 06:38, Andreas Joseph Krogh wrote:
På onsdag 19. oktober 2022 kl. 13:21:38, skrev Dominique Devienne 
:


On Wed, Oct 19, 2022 at 1:00 PM Andreas Joseph Krogh
 wrote:
> Ok, just something to think about;

Thank you. I do appreciate the feedback.

> Will your database grow beyond 10TB with blobs?

The largest internal store I've seen (for the subset of data that goes
in the DB) is shy of 3TB.
But we are an ISV, not one of our clients, which have truly massive
scale for data.
And they don't share the exact scale of their proprietary data with me...

> If so try to calculate how long it takes to restore, and comply with
SLA,
> and how long it would have taken to restore without the blobs.

Something I don't quite get is why somehow backup is no longer needed
if the large blobs are external?
i.e. are you saying backups are so much more worse in PostgreSQL than
with the FS? I'm curious now.

I'm not saying you don't need backup (or redundancy) of other systems 
holding blobs, but moving them out of RDBMS makes you restore the DB to a 
consistent state, and able to serve clients, faster. In my experience It's 
quite unlikely that your (redundant) blob-store needs crash-recovery at 
the same time you DB does. The same goes with PITR, needed because of some 
logical error (like client deleted some data they shouldn't have), which 
is much faster without blobs in DB and doesn't affect the blobstore at all 
(if you have a smart insert/update/delete-policy there).




This is nothing to sneeze at.  Backing up a 30TB database takes a *long* time


Also, managing the PostgreSQL server will be the client's own concern
mostly. We are not into Saas here.
As hinted above, the truly massive data is already not in the DB, used
by different systems, and processed
down to the GB sized inputs all the data put in the DB is generated
from. It's a scientific data heavy environment.
And one where security of the data is paramount, for contractual and
legal reasons. Files make that harder IMHO.

Anyways, this is straying from the main theme of this post I'm afraid.
Hopefully we can come back on the main one too. --DD

There's a reason “everybody” advices to move blobs out of DB, I've learned.



We deal with an ISV maintaining a banking application.  It stores scanned 
images of checks as bytea fields in a Postgresql 9.6 database.  The next 
version will store the images outside of the database.



--
*Andreas Joseph Krogh*
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com
www.visena.com 



--
Angular momentum makes the world go 'round.

Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Laurenz Albe
On Wed, 2022-10-19 at 12:48 +0200, Dominique Devienne wrote:
> On Wed, Oct 19, 2022 at 12:17 PM Andreas Joseph Krogh  
> wrote:
> > First advice, don't do it. We started off storing blobs in DB for “TX 
> > safety”
> 
> Not really an option, I'm afraid.

You should reconsider.  Ruling out that option now might get you into trouble
later.  Large Objects mean trouble.

Yours,
Laurenz Albe




Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Alvaro Herrera
On 2022-Oct-19, Dominique Devienne wrote:

> Anybody has an answer to my question regarding how substr() works on
> bytea values?  I.e. is it "pushed down" / optimized enough that it
> avoids reading the whole N-byte value, to then pass it to substr(),
> which then returns an M-byte value (where M < N)?
>
> If TOAST stores 2,000 chunks, and those chunks' PKs are numbers,
> simple arithmetic should be able to select only the chunks of
> interest, those incurring only the necessary IO for the selected
> range, no?

That's exactly what I was trying to say.  If there's no compression, we
don't read prior chunks.  (This is valid for bytea, at least; for
textual types we have to worry about multibyte characters, which are
again a potential source of confusion regarding the exact location you
want to seek.)

This can be seen in detoast_attr_slice() in
src/backend/access/common/detoast.c, though there are way too many^W^W^W
multiple layers of indirection if you start from bytea_substr() in
varlena.c.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Cannot restore windows dump on linux

2022-10-19 Thread ertan.kucukoglu
Hello,

 

I am using PostgreSQL 14.5

 

I tried to move a Linux database to Windows. Both identical version.

Linux dump successfully loaded on Windows system.

Reason for changing system didn't work out and now I am trying to move it
back because it has some modifications.

I just dumped a database backup on windows to a dump file. I see dump file
contains "\r\n" as new line identifier (hence windows dump). Just to
clarify, this is not \N character I am mixing.

When I try to restore windows dump on regular Linux system, I get a lot of
errors and it stops at this one below where this is a pure data load
position.

 

ERROR:  syntax error at or near "43589"

LINE 1: 43589 7102e523-f401-4cce-852d-e537f863886f.

 

I also tried to stop at first error when restoring, in this case it stops at
below error

 

root@app:/home/ek# psql -v ON_ERROR_STOP=1 -U myuser -d mydb <
last_backup.bak

SET

SET

SET

SET

SET

set_config



 

(1 satır)

 

SET

SET

SET

SET

ERROR:  index "ix_xrates_date" does not exist

 

I searched the internet and I could not solve my problem.

 

How can I successfully dump on Widnows and restore on Linux?

 

Any help is appreciated.

 

Thanks & Regards,

Ertan



RE: Cannot restore windows dump on linux

2022-10-19 Thread ertan.kucukoglu
Hello again,

While I was searching for a solution, I saw an example of manual running of
dump file within psql. Tried this and it did work just fine.

\connect mydb
\i last_backup.bak

Above two commands completed without any loading error at all.

Now I wonder why command line did not work.
Do let me know if there is a known reason, please.

Thanks & Regards,
Ertan


-
From: ertan.kucuko...@1nar.com.tr  
Sent: Wednesday, October 19, 2022 18:43
To: pgsql-general@lists.postgresql.org
Subject: Cannot restore windows dump on linux

Hello,

I am using PostgreSQL 14.5

I tried to move a Linux database to Windows. Both identical version.
Linux dump successfully loaded on Windows system.
Reason for changing system didn't work out and now I am trying to move it
back because it has some modifications.
I just dumped a database backup on windows to a dump file. I see dump file
contains "\r\n" as new line identifier (hence windows dump). Just to
clarify, this is not \N character I am mixing.
When I try to restore windows dump on regular Linux system, I get a lot of
errors and it stops at this one below where this is a pure data load
position.

ERROR:  syntax error at or near "43589"
LINE 1: 43589 7102e523-f401-4cce-852d-e537f863886f.

I also tried to stop at first error when restoring, in this case it stops at
below error

root@app:/home/ek# psql -v ON_ERROR_STOP=1 -U myuser -d mydb <
last_backup.bak
SET
SET
SET
SET
SET
set_config


(1 satır)

SET
SET
SET
SET
ERROR:  index "ix_xrates_date" does not exist

I searched the internet and I could not solve my problem.

How can I successfully dump on Widnows and restore on Linux?

Any help is appreciated.

Thanks & Regards,
Ertan






Column value derived from generated column in INSERT?

2022-10-19 Thread Mark Raynsford
Hello!

I'd like to be able to do something sort of analogous to this:

  create table t (
x integer not null generated always as identity, 
y integer not null
  );

  insert into t (y) values (t.x * 2);

In the real project, the "t.x * 2" expression is obviously something
a lot more complex, but I think it illustrates the point: I'd like to
be able to refer to the generated value of a column within the INSERT
statement that's going to cause it to be generated. Is there a way to
do this with a single statement right now?

I can think of various ways to do it with multiple statements, but a
single statement would be preferable.

-- 
Mark Raynsford | https://www.io7m.com





Re: Column value derived from generated column in INSERT?

2022-10-19 Thread David G. Johnston
On Wed, Oct 19, 2022 at 10:36 AM Mark Raynsford 
wrote:

>   insert into t (y) values (t.x * 2);
>
> I can think of various ways to do it with multiple statements, but a
> single statement would be preferable.
>
>
No, by extension of the documented constraint: "The generation expression
can refer to other columns in the table, but not other generated columns."

David J.


Custom function ROWS hint ignored due to inlining?

2022-10-19 Thread Philip Semanchuk
Hi,
I have a custom function where the ROWS hint is getting ignored. I think it’s 
because the function is getting inlined, but I’d like a second opinion.

Here’s my working (contrived) example.

CREATE TABLE my_table (
id int primary key GENERATED ALWAYS AS IDENTITY,
base_value int NOT NULL
);

INSERT INTO my_table (base_value) VALUES (42);

CREATE OR REPLACE FUNCTION fn_get_deltas(base_value int)
RETURNS TABLE (delta int, total int) AS $$
SELECT 
generate_series(1, CASE base_value WHEN 42 THEN 4 ELSE 20 END), 
base_value + generate_series(1, CASE base_value WHEN 42 THEN 4 ELSE 20 
END)
$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE ROWS 10;

EXPLAIN
SELECT base_value, delta, total
FROM my_table
CROSS JOIN LATERAL (SELECT delta, total FROM fn_get_deltas(base_value)) AS foo
+--+
| QUERY PLAN   |
|--|
| Nested Loop  (cost=0.00..107427.80 rows=226 width=12)|
|   ->  Seq Scan on my_table  (cost=0.00..32.60 rows=2260 width=4) |
|   ->  Result  (cost=0.00..27.52 rows=1000 width=8)   |
| ->  ProjectSet  (cost=0.00..5.02 rows=1000 width=4)  |
|   ->  Result  (cost=0.00..0.01 rows=1 width=0)   |
+--+

The plan estimates 1000 rows from the CROSS JOIN despite the “ROWS 10” hint on 
my function. I think this is because the planner never sees fn_get_deltas() — 
it has been inlined by the query preprocessor because fn_get_deltas() meets the 
criteria for inlining 
(https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions). Instead of 10 
rows, the planner uses its default assumption of 1000 rows.

If I change the function to VOLATILE to prevent inlining, I get this plan.
+-+
| QUERY PLAN  |
|-|
| Nested Loop  (cost=0.25..484.85 rows=22600 width=12)|
|   ->  Seq Scan on my_table  (cost=0.00..32.60 rows=2260 width=4)|
|   ->  Function Scan on fn_get_deltas  (cost=0.25..0.35 rows=10 width=8) |
+-+

I would prefer to have the function inlined for better performance, but I can 
declare it VOLATILE if that’s necessary to give decent estimates to the 
planner. Am I correctly reading the situation? If so, is there another solution 
that allows inlining *and* making the ROWS hint visible to the planner?

Thanks a bunch
Philip



Re: Column value derived from generated column in INSERT?

2022-10-19 Thread Mark Raynsford
On 2022-10-19T11:58:07 -0700
"David G. Johnston"  wrote:

> On Wed, Oct 19, 2022 at 10:36 AM Mark Raynsford 
> wrote:
> 
> >   insert into t (y) values (t.x * 2);
> >
> > I can think of various ways to do it with multiple statements, but a
> > single statement would be preferable.
> >
> >  
> No, by extension of the documented constraint: "The generation expression
> can refer to other columns in the table, but not other generated columns."
> 

Hello!

Just want to confirm that I wasn't misunderstood. The documentation in
CREATE TABLE has the sentence you quoted above, and unless I'm
misunderstanding that's saying that the expression used to generate
values in GENERATED (ALWAYS AS) columns can't refer to other GENERATED
columns. That's fine, but that's not what I was asking. In the table
above, `x` is generated without references to other columns, but for
the non-GENERATED `y` value, I want to refer to the value that `x` will
have when I calculate a value for the `y` column in the INSERT
statement.

If that's not doable, that's fine, I just want to be sure. :)

-- 
Mark Raynsford | https://www.io7m.com





Re: Column value derived from generated column in INSERT?

2022-10-19 Thread Adrian Klaver

On 10/19/22 12:30, Mark Raynsford wrote:

On 2022-10-19T11:58:07 -0700
"David G. Johnston"  wrote:


On Wed, Oct 19, 2022 at 10:36 AM Mark Raynsford 
wrote:


   insert into t (y) values (t.x * 2);

I can think of various ways to do it with multiple statements, but a
single statement would be preferable.

  

No, by extension of the documented constraint: "The generation expression
can refer to other columns in the table, but not other generated columns."



Hello!

Just want to confirm that I wasn't misunderstood. The documentation in
CREATE TABLE has the sentence you quoted above, and unless I'm
misunderstanding that's saying that the expression used to generate
values in GENERATED (ALWAYS AS) columns can't refer to other GENERATED
columns. That's fine, but that's not what I was asking. In the table
above, `x` is generated without references to other columns, but for
the non-GENERATED `y` value, I want to refer to the value that `x` will
have when I calculate a value for the `y` column in the INSERT
statement.

If that's not doable, that's fine, I just want to be sure. :)


create table t (
x integer not null generated always as identity,
y integer not null
  );
insert into t (y) values (t.x * 2);

ERROR:  invalid reference to FROM-clause entry for table "t"
LINE 1: insert into t (y) values (t.x * 2);
  ^
HINT:  There is an entry for table "t", but it cannot be referenced from 
this part of the query.


insert into t (y) values (x * 2);
ERROR:  column "x" does not exist
LINE 1: insert into t (y) values (x * 2);
  ^
HINT:  There is a column named "x" in table "t", but it cannot be 
referenced from this part of the query.




--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Column value derived from generated column in INSERT?

2022-10-19 Thread Mark Raynsford
On 2022-10-19T12:43:31 -0700
Adrian Klaver  wrote:
>
> HINT:  There is an entry for table "t", but it cannot be referenced from 
> this part of the query.
>
> HINT:  There is a column named "x" in table "t", but it cannot be 
> referenced from this part of the query.

Yes, I saw those, hence asking on the list if there was a way to do it.

I'll handle it with multiple statements.

-- 
Mark Raynsford | https://www.io7m.com





Re: Column value derived from generated column in INSERT?

2022-10-19 Thread Adrian Klaver

On 10/19/22 12:48, Mark Raynsford wrote:

On 2022-10-19T12:43:31 -0700
Adrian Klaver  wrote:


HINT:  There is an entry for table "t", but it cannot be referenced from
this part of the query.

HINT:  There is a column named "x" in table "t", but it cannot be
referenced from this part of the query.


Yes, I saw those, hence asking on the list if there was a way to do it.


Using a trigger.



I'll handle it with multiple statements.



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: COMMIT IN STORED PROCEDURE WHILE IN A LOOP

2022-10-19 Thread Bryn Llewellyn
It seems that I made a thread-discipline error when I asked a question that had 
nothing to do with the frequency, or the cost, of committing when I saw this 
reply (paraphrased for brevity here) from Christophe:

> You [cannot] commit in [a] BEGIN / END [block statement] that has an 
> exception handler [because] that creates a subtransaction for the duration of 
> the [block statement].

I asked this in response (again, paraphrased for brevity):

> Could the limitation be lifted...? [Or is the limitation] rooted in 
> profoundly deep features of the architecture?

Sorry that I caused some distraction. Anyway, Tom replied immediately. He said:

> BEGIN with an exception block is a subtransaction because it's defined to 
> roll back to the database state as of the start of the block if an exception 
> occurs. COMMIT in the middle fundamentally conflicts with that, I should 
> think.


Thanks, Tom. It's clear to me now that the present PG paradigm will never, ever 
change.

So my conclusion stands for this use case: I'm using "serializable" isolation 
(which luxury ORCL doesn't afford me); I know that I can get a "cannot 
serialize" error at "commit" time. Here, I cannot take appropriate action 
within my PL/pgSQL code and hide the whole story of what this is about from 
client code. Rather, I must explain the business to the authors of the next 
tier, and teach them when, and how, retry is appropriate.

Tom asked, too, if ORCL has a different paradigm... Briefly, yes—and radically 
so. But (all of you) do please feel free to skip over my sketch here if it 
doesn't interest you.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 
«
The real account of which this is a sketch is of no interest at all if you work 
only with PG and have never used ORCL. But if you need to switch, 
intellectually, from one to the other—and maybe need to do this all the time 
because your job duties span both systems—then it *is* interesting.

ORCL's PL/SQL is compiled (and optimized) at "create procedure" time. The new 
source code replaces the old in the catalog. And if it failed to compile, then 
you're left with an invalid unit that you cannot execute. Compilation errors 
are recorded in the catalog too. Further, static dependencies (proc upon proc, 
proc upon table, etc) are also recorded in the catalog. This is intimately 
connected with the hard distinction between static and dynamic SQL. The latter 
simply passes the text on "as is" into the byte code to be dealt with at 
run-time. Only statements like "select", "insert", "update", delete" and a few 
others can be static SQL. Table creation and the like must be dynamic SQL. This 
probably shocks those who move from PG to ORCL because you cannot, for example, 
create a table and then operate on it with static SQL in the same procedure.

In particular, for the present discussion, the PL/SQL block statement is a pure 
lexical device. (This is the case in PL/SQL's progenitor, ADA. And that's where 
all that stuff about DIANA, that the PL/SQL programmer eventually comes to hear 
about, comes from.) All memory that you had a block statement in the source is 
lost in the compiled so-called byte code that gets interpreted at run time. On 
the other hand, every call from PL/SQL to SQL is done in its own 
subtransaction—and if it fails, then that single statement is atomically rolled 
back. The effect of all the SQLs to date, at this moment, remains intact—but 
uncommitted. (Of course, you might have issued "commit"(s) programmatically. So 
I'm talking about SQLs that were done since the most recent "commit".) 

Significantly, the failure of a call from PL/SQL to SQL raises an exception—so 
(as well as the single-statement rollback) you now have an in-flight exception 
that flies up through successive scopes in search of a matching handler. If it 
remains unhandled at the last moment before the top-level PL/SQL "call" is due 
to finish, then a "rollback" is automatically issued. But if a handler *is* 
found, well... the exception is dead and you can carry on. Like everything else 
in programming, the code author must work out what "safe" is. (It could be to 
turn an insert that fails 'cos a unique key is violated into an update.) In 
ORCL, just as in PG, writing "when others than null" is held to be stupid. And 
code examples that do this are deemed to be not worthy of discussion.

Though the paradigms are different, each allows you properly to implement 
mission-critical applications. It's rather like English and Chinese. 
Astonishingly different. But each supports all that you need to let people 
communicate about mundane daily business, science, philosophy, epistemology, 
and so on.)
»








Re: Speeding up adding fky on a very large table

2022-10-19 Thread Ravi Krishna
Our issue is that it takes 20hrs to index the full table. Hopefully we can add 
FK in multiple child partitions concurrently, otherwise doing it per partition 
offers no advantage from performance pov.
Need to test. Hopefully PG should not lock the referred table during the first 
build, stopping concurrent execution of other partitions. 
 
  
 In vanilla Postgresql (including RDS Postgresql) you can add the FK to each 
partition and then ALTER TABLE ONLY the parent table.
 
 Don't know about Aurora, though

  


Postgres 15 upgrades and template1 public schema

2022-10-19 Thread Bruno Wolff III
I noticed when I did an upgrade from Postgres 14 to 15 that the public 
schema in template1 was still owned by postgres instead of pg_database_owner. 
I was expecting it to change because the release notes said that new 
database clusters would have that. But shouldn't new clusters use what 
is set in template1?





Re: Cannot restore windows dump on linux

2022-10-19 Thread Erik Wienhold
> On 19/10/2022 17:43 CEST ertan.kucuko...@1nar.com.tr wrote:
>
> I am using PostgreSQL 14.5
>
> I tried to move a Linux database to Windows. Both identical version.
> Linux dump successfully loaded on Windows system.
> Reason for changing system didn’t work out and now I am trying to move it back
> because it has some modifications.
> I just dumped a database backup on windows to a dump file. I see dump file
> contains “\r\n” as new line identifier (hence windows dump). Just to clarify,
> this is not \N character I am mixing.
> When I try to restore windows dump on regular Linux system, I get a lot of
> errors and it stops at this one below where this is a pure data load position.
>
> ERROR: syntax error at or near "43589"
> LINE 1: 43589 7102e523-f401-4cce-852d-e537f863886f…
>
> I also tried to stop at first error when restoring, in this case it stops at
> below error
>
> root@app:/home/ek# psql -v ON_ERROR_STOP=1 -U myuser -d mydb < last_backup.bak
> SET
> SET
> SET
> SET
> SET
> set_config
> 
>
> (1 satır)
>
> SET
> SET
> SET
> SET
> ERROR: index "ix_xrates_date" does not exist
>
> I searched the internet and I could not solve my problem.
>
> How can I successfully dump on Widnows and restore on Linux?

On Windows, did you run pg_dump with --file option or redirect the output to
a file?

pg_dump --file last_backup.bak -U myuser mydb

vs.

pg_dump -U myuser mydb > last_backup.bak

The redirect variant tripped me up before because Windows 10 writes the file as
UTF-16 which is not expected by psql.  I don't know if psql can be configured
to handle UTF-16.  The redirect is also the cause for CRLF line endings.

--
Erik




Re: Postgres 15 upgrades and template1 public schema

2022-10-19 Thread Thomas Kellerer



Bruno Wolff III schrieb am 19.10.2022 um 22:36:

I noticed when I did an upgrade from Postgres 14 to 15 that the
public schema in template1 was still owned by postgres instead of
pg_database_owner. I was expecting it to change because the release
notes said that new database clusters would have that. But shouldn't
new clusters use what is set in template1?


This is explained in the release notes:

  The change applies to new database clusters and to newly-created
  databases in existing clusters.
  Upgrading a cluster or restoring a database dump will preserve
  public's existing permissions.





Re: COMMIT IN STORED PROCEDURE WHILE IN A LOOP

2022-10-19 Thread Thomas Kellerer

gogala.mla...@gmail.com schrieb am 19.10.2022 um 01:46:

Amazon, lead by Kevin Closson, the guy who has famously designed
Oracle Exadata among other things, even came up with the recipe how
to migrate it to Postgres:

https://aws.amazon.com/blogs/database/migrating-oracle-autonomous-transactions-to-postgresql/



The workaround to use dblink to simulate autonomous transactions is nothing new,
and has been floating around for quite a while.

Here is a blog post from 2012 (6 years before the Amazon recipe was published)

https://raghavt.blog/autonomous-transaction-in-postgresql-9-1/

and another one from 2016

https://blog.dalibo.com/2016/09/08/Pragma_Autonoumous_transactions.html





Re: Column value derived from generated column in INSERT?

2022-10-19 Thread Adrian Klaver

On 10/19/22 12:58 PM, Adrian Klaver wrote:

On 10/19/22 12:48, Mark Raynsford wrote:

On 2022-10-19T12:43:31 -0700
Adrian Klaver  wrote:


HINT:  There is an entry for table "t", but it cannot be referenced from
this part of the query.

HINT:  There is a column named "x" in table "t", but it cannot be
referenced from this part of the query.


Yes, I saw those, hence asking on the list if there was a way to do it.


Using a trigger.


To expand:

create table t (
x integer not null generated always as identity,
y integer not null
  );
insert into t(y) values (1);

select * from t;

x | y
---+---
 1 | 1
(1 row)



CREATE FUNCTION identity_test( )
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
NEW.y = NEW.x * 2;
RETURN NEW;
END;

$function$
;

create trigger identity_trg before insert on t for each row execute 
function identity_test();


insert into t(y) values (0);

select * from t;

 x | y
---+---
 1 | 1
 2 | 4
(2 rows)





I'll handle it with multiple statements.






--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Column value derived from generated column in INSERT?

2022-10-19 Thread Erik Wienhold
> On 19/10/2022 23:51 CEST Adrian Klaver  wrote:
>
> On 10/19/22 12:58 PM, Adrian Klaver wrote:
> > On 10/19/22 12:48, Mark Raynsford wrote:
> >> On 2022-10-19T12:43:31 -0700
> >> Adrian Klaver  wrote:
> >>>
> >>> HINT:  There is an entry for table "t", but it cannot be referenced from
> >>> this part of the query.
> >>>
> >>> HINT:  There is a column named "x" in table "t", but it cannot be
> >>> referenced from this part of the query.
> >>
> >> Yes, I saw those, hence asking on the list if there was a way to do it.
> >
> > Using a trigger.
>
> To expand:
>
> create table t (
>  x integer not null generated always as identity,
>  y integer not null
>);
> insert into t(y) values (1);
>
> select * from t;
>
> x | y
> ---+---
>   1 | 1
> (1 row)
>
>
>
> CREATE FUNCTION identity_test( )
>   RETURNS trigger
>   LANGUAGE plpgsql
> AS $function$
> BEGIN
>  NEW.y = NEW.x * 2;
>  RETURN NEW;
> END;
>
> $function$
> ;
>
> create trigger identity_trg before insert on t for each row execute
> function identity_test();
>
> insert into t(y) values (0);
>
> select * from t;
>
>   x | y
> ---+---
>   1 | 1
>   2 | 4
> (2 rows)

Make t.y a generated column and avoid the trigger:

create table t (
  x int not null generated always as identity,
  y int not null generated always as (x * 2) stored;
);

insert into t (x) values (default), (default);

select * from t;

 x | y
---+---
 1 | 2
 2 | 4
(2 rows)

But I think Mark wants to specify the expression in the INSERT and not define
it as part of the database schema, if I understand it correctly.

--
Erik




Re: Column value derived from generated column in INSERT?

2022-10-19 Thread David G. Johnston
On Wed, Oct 19, 2022 at 12:34 PM Mark Raynsford 
wrote:

> On 2022-10-19T11:58:07 -0700
> "David G. Johnston"  wrote:
>
> > On Wed, Oct 19, 2022 at 10:36 AM Mark Raynsford <
> co+org.postgre...@io7m.com>
> > wrote:
> >
> > >   insert into t (y) values (t.x * 2);
> > >
> > > I can think of various ways to do it with multiple statements, but a
> > > single statement would be preferable.
> > >
> > >
> > No, by extension of the documented constraint: "The generation expression
> > can refer to other columns in the table, but not other generated
> columns."
> >
>
>

> That's fine, but that's not what I was asking. In the table
> above, `x` is generated without references to other columns, but for
> the non-GENERATED `y` value, I want to refer to the value that `x` will
> have when I calculate a value for the `y` column in the INSERT
> statement.
>

Yes, I got that, but if another generated column cannot reference the
resultant computation of a generated column anything you write, which is by
definition computed before the generated column, is not going to be able to
see the result of the generated column either.  There was at least some
hope for one generated depending upon another so the the expressions could
maybe be evaluated in sequence.

Thus, while the docs do not explicitly state the non-functionality it can
be inferred by what other non-functionality is stated.

David J.


Re: COMMIT IN STORED PROCEDURE WHILE IN A LOOP

2022-10-19 Thread Benedict Holland
This seems like a bad use of a stored procedure. Why wouldn't you spin up
40 clients with a table name and run it across 40 connections? But also, I
don't like loops in stored procedures. Working with loops in a set based
system hurts me but it's personal preference.

Like, I could write this in python in less than 4 hours with an extremely
simple sproc. How often are you running table deletions like this to
require a sproc?

Thanks,
Ben

On Wed, Oct 19, 2022, 5:39 PM Thomas Kellerer  wrote:

> gogala.mla...@gmail.com schrieb am 19.10.2022 um 01:46:
> > Amazon, lead by Kevin Closson, the guy who has famously designed
> > Oracle Exadata among other things, even came up with the recipe how
> > to migrate it to Postgres:
> >
> >
> https://aws.amazon.com/blogs/database/migrating-oracle-autonomous-transactions-to-postgresql/
>
>
> The workaround to use dblink to simulate autonomous transactions is
> nothing new,
> and has been floating around for quite a while.
>
> Here is a blog post from 2012 (6 years before the Amazon recipe was
> published)
>
> https://raghavt.blog/autonomous-transaction-in-postgresql-9-1/
>
> and another one from 2016
>
> https://blog.dalibo.com/2016/09/08/Pragma_Autonoumous_transactions.html
>
>
>
>


Re: Custom function ROWS hint ignored due to inlining?

2022-10-19 Thread Tom Lane
Philip Semanchuk  writes:
> I have a custom function where the ROWS hint is getting ignored. I think it’s 
> because the function is getting inlined, but I’d like a second opinion.

Yeah, I believe you're right about that.

> I would prefer to have the function inlined for better performance, but
> I can declare it VOLATILE if that’s necessary to give decent estimates
> to the planner. Am I correctly reading the situation? If so, is there
> another solution that allows inlining *and* making the ROWS hint visible
> to the planner?

No, but you could experiment with changing the function to plpgsql
instead of SQL.  Not sure about the relative performance of those
cases, but it's worth trying it both ways.

regards, tom lane




Re: Postgres 15 upgrades and template1 public schema

2022-10-19 Thread Bruno Wolff III

On Wed, Oct 19, 2022 at 23:30:58 +0200,
 Thomas Kellerer  wrote:


Bruno Wolff III schrieb am 19.10.2022 um 22:36:

I noticed when I did an upgrade from Postgres 14 to 15 that the
public schema in template1 was still owned by postgres instead of
pg_database_owner. I was expecting it to change because the release
notes said that new database clusters would have that. But shouldn't
new clusters use what is set in template1?


This is explained in the release notes:

 The change applies to new database clusters and to newly-created
 databases in existing clusters.
 Upgrading a cluster or restoring a database dump will preserve
 public's existing permissions.


How do new databases in pre-existing clusters get the new public schema 
security if it doesn't come from template1?





Re: Postgres 15 upgrades and template1 public schema

2022-10-19 Thread Tom Lane
Bruno Wolff III  writes:
> On Wed, Oct 19, 2022 at 23:30:58 +0200,
>   Thomas Kellerer  wrote:
>> This is explained in the release notes:
>> 
>> The change applies to new database clusters and to newly-created
>> databases in existing clusters.
>> Upgrading a cluster or restoring a database dump will preserve
>> public's existing permissions.

> How do new databases in pre-existing clusters get the new public schema 
> security if it doesn't come from template1?

The release notes could probably use some tweaking here.  It looks to
me like pg_dumpall (and hence pg_upgrade) will adjust the ownership and
permissions of template1's public schema to match what was in the old
installation, but it doesn't touch template0.  Hence, whether a
"newly-created database in an existing cluster" has the old or new
properties of the public schema will depend on whether you clone it
from template1 or template0.  That definitely needs explained, and
maybe we should recommend that DBAs consider manually changing
what's in template1.

regards, tom lane




pg_restore 12 "permission denied for schema" errors

2022-10-19 Thread Ron

RDS Postgresql 12.11

Backed up a database from one RDS instance, and now am trying to restore it 
to a new instance.  (If you're wondering why, it's to restore in an instance 
with less disk space.)


Here are the commands:

export PGHOST=${RDSENV}..us-east-1.rds.amazonaws.com
cd /migrate/TASK001793786/2022-10-19b
DB=sides
pg_restore -v --create --clean --jobs=`nproc` -Fd --dbname=template1 $DB

Attached are log file extracts showing the errors.  The errors are only on 
partitioned child tables, the list of which is also attached.  However, some 
(but not many)


Some of the erroring tables are owned by sides_owner and some are owned by 
"stransuser".


"\z" on a child with errors looks exactly like a child without errors.

What could be the problem?

--
Angular momentum makes the world go 'round.[a467197@uisawsclitstlinrds TASK001793786]$ grep -B3 ERROR: TASK001793786_restore.out | more
pg_restore: processing data for table "strans.response_xml_p2019_10"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 7699; 0 3661008 TABLE DATA response_xml_p2019_10 sides_owner
pg_restore: error: COPY failed for table "response_xml_p2019_10": ERROR:  permission denied for schema strans
--
QUERY:  SELECT 1 FROM ONLY "strans"."employer_response" x WHERE "employer_response_id" OPERATOR(pg_catalog.=) $1 AND "part_date" OPERATOR(pg_catalog.=) $
2 FOR KEY SHARE OF x
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 7931; 0 3847065 TABLE DATA response_xml_p2022_01 stransuser
pg_restore: error: COPY failed for table "response_xml_p2022_01": ERROR:  permission denied for schema strans
--
pg_restore: launching item 8045 TABLE DATA dd_request_attachment
pg_restore: processing data for table "strans.dd_request_attachment"
pg_restore: from TOC entry 7715; 0 3661104 TABLE DATA response_xml_p2021_02 sides_owner
pg_restore: error: COPY failed for table "response_xml_p2021_02": ERROR:  permission denied for schema strans
--
pg_restore: launching item 7692 TABLE DATA response_xml_p2019_03
pg_restore: processing data for table "strans.response_xml_p2019_03"
pg_restore: from TOC entry 7694; 0 3660978 TABLE DATA response_xml_p2019_05 sides_owner
pg_restore: error: COPY failed for table "response_xml_p2019_05": ERROR:  permission denied for schema strans
--
pg_restore: launching item 7687 TABLE DATA response_xml_p2018_10
pg_restore: processing data for table "strans.response_xml_p2018_10"
pg_restore: from TOC entry 7692; 0 3660966 TABLE DATA response_xml_p2019_03 sides_owner
pg_restore: error: COPY failed for table "response_xml_p2019_03": ERROR:  permission denied for schema strans
--
pg_restore: launching item 7998 TABLE DATA ev_request_xml
pg_restore: processing data for table "strans.ev_request_xml"
pg_restore: from TOC entry 7687; 0 3660936 TABLE DATA response_xml_p2018_10 sides_owner
pg_restore: error: COPY failed for table "response_xml_p2018_10": ERROR:  permission denied for schema strans
--
pg_restore: launching item 7688 TABLE DATA response_xml_p2018_11
pg_restore: processing data for table "strans.response_xml_p2018_11"
pg_restore: from TOC entry 7688; 0 3660942 TABLE DATA response_xml_p2018_11 sides_owner
pg_restore: error: COPY failed for table "response_xml_p2018_11": ERROR:  permission denied for schema strans
--
pg_restore: launching item 7429 FK CONSTRAINT employer_response response_to_request
pg_restore: creating FK CONSTRAINT "strans.employer_response response_to_request"
pg_restore: from TOC entry 7691; 0 3660960 TABLE DATA response_xml_p2019_02 sides_owner
pg_restore: error: COPY failed for table "response_xml_p2019_02": ERROR:  permission denied for schema strans
--
pg_restore: launching item 7725 TABLE DATA response_xml_p2021_12
pg_restore: processing data for table "strans.response_xml_p2021_12"
pg_restore: from TOC entry 7683; 0 3660543 TABLE DATA request_xml_p2021_10 sides_owner
pg_restore: error: COPY failed for table "request_xml_p2021_10": ERROR:  permission denied for schema strans
--
pg_restore: launching item 7670 TABLE DATA request_xml_p2020_09
pg_restore: processing data for table "strans.request_xml_p2020_09"
pg_restore: from TOC entry 7725; 0 3661164 TABLE DATA response_xml_p2021_12 sides_owner
pg_restore: error: COPY failed for table "response_xml_p2021_12": ERROR:  permission denied for schema strans
--
pg_restore: launching item 7719 TABLE DATA response_xml_p2021_06
pg_restore: processing data for table "strans.response_xml_p2021_06"
pg_restore: from TOC entry 7670; 0 3660465 TABLE DATA request_xml_p2020_09 sides_owner
pg_restore: error: COPY failed for table "request_xml_p2020_09": ERROR:  permission denied for schema strans
--
pg_restore: launching item 8031 TABLE DATA dd_report_extract
pg_restore: processing data for table "strans.dd_report_extract"
pg_restore: from TOC entry 7719; 0 3661128 TABLE DATA response_xml_p2021_06 sides_owner
pg_restore: error: COPY failed for table "response_xml_p2021_06": ERROR:  permission denied for schema strans
--
pg_restore: launc

PG RPMS for RHEL 9 on aarch64?

2022-10-19 Thread Lists
I'm trying to get PostGIS installed for Postgresql 13 on an Oracle 9 / aarch64 
box. PostgreSQL-server 
installs OK from the RPMS provided by RHEL/Oracle. However, the default repos 
don't include 
PostGIS.

I usually install the RPMs from download.postgresql.org[1] anyway, but there 
are only RPMs for 
x86_64 EL9, no aarch64 at all. 

Is there something useful I can do to either facilitate getting these built, or 
at least a workaround to 
get PostgreSQL/PostGIS 13+ working on my PG box? 

Background info: 

I've successfully used EL8/aarch64 rpms: 
_https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-8-aarch64/_

and also EL9/x86_64 rpms: 
https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-9-x86_64/[2]

But there don't yet seem to be EL9/aarch64 rpms! 
https://download.postgresql.org/pub/repos/yum/
13/redhat/[3]

And the same is true with later 
releases:https://download.postgresql.org/pub/repos/yum/14/redhat/
[4]

15 has folders for aarch64, but they are empty: 
https://download.postgresql.org/pub/repos/yum/15/
redhat/[5]


Ben S


[1] https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-8-aarch64/
[2] https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-9-x86_64/
[3] https://download.postgresql.org/pub/repos/yum/13/redhat/
[4] https://download.postgresql.org/pub/repos/yum/14/redhat/
[5] https://download.postgresql.org/pub/repos/yum/15/redhat/


signature.asc
Description: This is a digitally signed message part.


Re: pg_restore 12 "permission denied for schema" errors

2022-10-19 Thread Frank Gard

Hi Ron,

Am 20.10.22 um 02:58 schrieb Ron:

RDS Postgresql 12.11

Backed up a database from one RDS instance, and now am trying to restore it to 
a new instance.  (If you're wondering why, it's to restore in an instance with 
less disk space.)

Here are the commands:

export PGHOST=${RDSENV}..us-east-1.rds.amazonaws.com
cd /migrate/TASK001793786/2022-10-19b
DB=sides
pg_restore -v --create --clean --jobs=`nproc` -Fd --dbname=template1 $DB


What database do you want to restore your data into? As far as I know your 
pg_restore command would import the data into template1, right? pg_restore's 
manpage says:


[…]

> -ddbname
> --dbname=dbname
>  Connect to database dbnameand restore directly into the database. 
The dbnamecan be a connection string. If so, connection string parameters will 
override any conflicting command line options.
> […] ^^

And I'm wondering, if it's what you want. Possibly your errors could come from 
there…

Cheers,
Frank.


Re: pg_restore 12 "permission denied for schema" errors

2022-10-19 Thread Tom Lane
Frank Gard  writes:
> Am 20.10.22 um 02:58 schrieb Ron:
>> pg_restore -v --create --clean --jobs=`nproc` -Fd --dbname=template1 $DB

> What database do you want to restore your data into? As far as I know your 
> pg_restore command would import the data into template1, right?

No --- my ears went up about that too, but he's using --create.
So template1 is just the springboard for the initial connection
in which CREATE DATABASE will be issued.

My guess is that there's something inconsistent between source
and target installations about roles and role-privilege grants.
We don't have enough details to do more than guess, though.

regards, tom lane




Re: Postgres 15 upgrades and template1 public schema

2022-10-19 Thread Bruno Wolff III

On Wed, Oct 19, 2022 at 19:59:52 -0400,
 Tom Lane  wrote:


The release notes could probably use some tweaking here.  It looks to
me like pg_dumpall (and hence pg_upgrade) will adjust the ownership and
permissions of template1's public schema to match what was in the old
installation, but it doesn't touch template0.  Hence, whether a
"newly-created database in an existing cluster" has the old or new
properties of the public schema will depend on whether you clone it
from template1 or template0.  That definitely needs explained, and
maybe we should recommend that DBAs consider manually changing
what's in template1.


This answers my question about what is actually happening.

I think expanding the release notes section on this a bit could be 
helpful for other people.





Re: PG RPMS for RHEL 9 on aarch64?

2022-10-19 Thread Johannes Münch

Hi Ben,


I faced the same issue few weeks back and opened a ticket for this, 
please see below.


https://redmine.postgresql.org/issues/7724

It seems like they now have a RHEL 9 aarch64 build machine but I was 
told it would take some time for the packages to be built.



Joe


On 10/20/22 5:13 AM, Lists wrote:


I'm trying to get PostGIS installed for Postgresql 13 on an Oracle 9 / 
aarch64 box. PostgreSQL-server installs OK from the RPMS provided by 
RHEL/Oracle. However, the default repos don't include PostGIS.


I usually install the RPMs from download.postgresql.org 
 anyway, 
but there are only RPMs for x86_64 EL9, no aarch64 at all.


Is there something useful I can do to either facilitate getting these 
built, or at least a workaround to get PostgreSQL/PostGIS 13+ working 
on my PG box?



Background info:

I've successfully used EL8/aarch64 rpms:

_https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-8-aarch64/_

and also EL9/x86_64 rpms:

https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-9-x86_64/

But there don't yet seem to be EL9/aarch64 rpms! 
https://download.postgresql.org/pub/repos/yum/13/redhat/


And the same is true with later 
releases:https://download.postgresql.org/pub/repos/yum/14/redhat/


15 has folders for aarch64, but they are empty: 
https://download.postgresql.org/pub/repos/yum/15/redhat/



Ben S