Re: How to show current schema of running queries in postgresql 13

2023-06-26 Thread Pavel Stehule
Hi

po 26. 6. 2023 v 8:39 odesílatel 陈锡汉  napsal:

> Hello,I use multi-schemas in one database in Postgres,such as
>
> ```
> Postgres(instance)
>  MyDB
>public
>MySchema1
>  table1
>  table2
>MySchema2
>  table1
>  table2
>MySchema3
>  table1
>  table2
> ```
>
> And It's open to my users,my users will run queries,
> such as
> User1:
> ```
> set search_path=MySchema1;
> select * from table1,table2;
> ```
>
> User2:
> ```
> set search_path=MySchema2;
> select * from table1,table2;
> ```
>
> User3:
> ```
> set search_path=MySchema3;
> insert into table3 select * from MySchema1.table1,MySchema2.table2;
> select * from table3;
> ```
>
> I want to show current schema of running queries,But pg_stat_activity can
> only show database name, not schema name.
>
> I want current schema (search_path ) as
>
> | datname  | username | schema   | query   |
> |  |  |  |  |
> | MyDB | User1| MySchema1  | select * from table1,table2;  |
> | MyDB | User2| MySchema2  | select * from table1,table2;  |
> | MyDB | User3| MySchema3  | insert into table3 select * from
> MySchema1.table1,MySchema2.table2;  |
>
> Is there any sys views can do it?
>

no, there is nothing for this purpose.

you can use application_name

so user can do

SET search_path=MySchema;
SET application_name = 'MySchema';
SELECT * FROM ...





>
> Thank you!
> Best regards,
> CavonChen
>
>
>
>


A question about the postgres's website

2023-06-26 Thread wen-yi
Hi community,
When I check the postgres's website, I find something confused:



https://www.postgresql.org/community/ (Quick Links: Don't see the slack item)

https://www.postgresql.org/list/ (Quick Links: See the slack item)

Can someone give me some advice?
Thanks in advance!


Yours,
Wen Yi

Re: How to show current schema of running queries in postgresql 13

2023-06-26 Thread Pavel Stehule
po 26. 6. 2023 v 9:19 odesílatel Pavel Stehule 
napsal:

> Hi
>
>
>
> no, there is nothing for this purpose.
>
> you can use application_name
>
> so user can do
>
> SET search_path=MySchema;
> SET application_name = 'MySchema';
> SELECT * FROM ...
>
>
and application name is visible from pg_stat_activity

regards

Pavel Stehule


>
>
>
>
>>
>> Thank you!
>> Best regards,
>> CavonChen
>>
>>
>>
>>


回复:How to show current schema of running queries in postgresql 13

2023-06-26 Thread 陈锡汉
I have no way to force users doing this…



 回复的原邮件 
| 发件人 | Pavel Stehule |
| 日期 | 2023年06月26日 17:51 |
| 收件人 | 陈锡汉 |
| 抄送至 | pgsql-general@lists.postgresql.org |
| 主题 | Re: How to show current schema of running queries in postgresql 13 |




po 26. 6. 2023 v 9:19 odesílatel Pavel Stehule  napsal:

Hi







no, there is nothing for this purpose.


you can use application_name


so user can do


SET search_path=MySchema;
SET application_name = 'MySchema';
SELECT * FROM ...




and application name is visible from pg_stat_activity


regards


Pavel Stehule
 





 


Thank you!
Best regards,
CavonChen





Re: How to show current schema of running queries in postgresql 13

2023-06-26 Thread Pavel Stehule
po 26. 6. 2023 v 12:23 odesílatel 陈锡汉  napsal:

> I have no way to force users doing this…
>

Then there is only one possible way - via custom extension you can catch
the SET SEARCH_PATH statement, and you can do this. If you cannot use own
extension, then there is not any way.




>
>
>  回复的原邮件 
> 发件人 Pavel Stehule 
> 日期 2023年06月26日 17:51
> 收件人 陈锡汉 
> 抄送至 pgsql-general@lists.postgresql.org
> 主题 Re: How to show current schema of running queries in postgresql 13
>
>
> po 26. 6. 2023 v 9:19 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>>
>>
>> no, there is nothing for this purpose.
>>
>> you can use application_name
>>
>> so user can do
>>
>> SET search_path=MySchema;
>> SET application_name = 'MySchema';
>> SELECT * FROM ...
>>
>>
> and application name is visible from pg_stat_activity
>
> regards
>
> Pavel Stehule
>
>
>>
>>
>>
>>
>>>
>>> Thank you!
>>> Best regards,
>>> CavonChen
>>>
>>>
>>>
>>>


Re: PostgreSQL Commercial Support

2023-06-26 Thread Joe Conway

On 6/26/23 05:56, Oduba, Ernest wrote:
We want to start using PostgreSQL and we intent to have a commercial 
support. Kindly advise who can assist with this request for further 
engagement.


See:

https://www.postgresql.org/support/professional_support/

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: bug or lacking doc hint

2023-06-26 Thread Marc Millas
On Mon, Jun 26, 2023 at 5:47 AM Avin Kavish  wrote:

> Sounds like the problem you are having is, the server is running out of
> temporary resources for the operation that users are trying to do. So
> according to Tom, on the postgres side, the operation cannot be optimized
> further.
>
> I think you have few choices here,
>   - See if increasing the resources of the server will allow them to run
> the operation
>   - Ask users not to do that operation
>   - Use a extension like citus to scale horizontally
>
> But I'm thinking why a massively inefficient join is needed in the first
> place. Shouldn't joins be for following keyed relationships. So ideally a
> unique indexed column, but at the very least an indexed column. Why is a
> join required on a dynamically calculated substring? Can it be made into a
> static computed value and indexed? Substring sounds like an op that should
> be in the filter stage.
>
> Can you describe your data model? Maybe we can give some specific advice.
>

There is a set of big tables containing fine grain health data. The DB is
accessed by people doing research on various kind of sicknesses. So, by
nature, all columns (I mean ALL) can be used for every kind of SQL
including, obviously, lots of intricated joins.
>From time to time, People write requests that may take more than 2 days to
complete.
So the idea of indexing 'appropriate' columns translate in indexing all
columns, of all tables, including the big ones with 30+ columns.
with only main keys indexes, the DB is already 15TB+.
and my own experience of putting 30 indexes on one table is not very
positive.
so...
BTW rewriting the original request using cte and union does complete in 134
seconds, doing 3 merge join.

And I have one more question: the explain analyze plan shows that Postgres
decided to do external sorts using around 2 GB of disk space.
I did a set work_mem to '4GB' to try to have those sorts in memory. No
effect. How can I tell the planner to do those sort in memory ?? thanks


> Regards,
> Avin
>
> On Mon, Jun 26, 2023 at 3:57 AM Marc Millas 
> wrote:
>
>>
>> On Sun, Jun 25, 2023 at 11:48 PM Tom Lane  wrote:
>>
>>> David Rowley  writes:
>>> > The problem is that out of the 3 methods PostgreSQL uses to join
>>> > tables, only 1 of them supports join conditions with an OR clause.
>>> > Merge Join cannot do this because results can only be ordered one way
>>> > at a time.  Hash Join technically could do this, but it would require
>>> > that it built multiple hash tables. Currently, it only builds one
>>> > table.  That leaves Nested Loop as the join method to implement joins
>>> > with OR clauses. Unfortunately, nested loops are quadratic and the
>>> > join condition must be evaluated once per each cartesian product row.
>>>
>>> We can do better than that if the OR'd conditions are each amenable
>>> to an index scan on one of the tables: then it can be a nestloop with
>>> a bitmap-OR'd inner index scan.  I thought the upthread advice to
>>> convert the substr() condition into something that could be indexed
>>> was on-point.
>>>
>> ok. but one of the tables within the join(s) tables is 10 billions rows,
>> splitted in 120 partitions. Creating something like 20 more indexes to
>> fulfill that condition do have its own problems.
>>
>>>
>>> > Tom Lane did start some work [1] to allow the planner to convert some
>>> > queries to use UNION instead of evaluating OR clauses, but, if I
>>> > remember correctly, it didn't handle ORs in join conditions, though
>>> > perhaps having it do that would be a natural phase 2. I don't recall
>>> > why the work stopped.
>>>
>>> As I recall, I was having difficulty convincing myself that
>>> de-duplication of results (for cases where the same row satisfies
>>> more than one of the OR'd conditions) would work correctly.
>>> You can't just blindly make it a UNION because that might remove
>>> identical rows that *should* appear more than once in the result.
>>>
>>
>> I did rewrite the query using a cte and union(s). For that query, no
>> dedup point.
>> But my pb is that  that DB will be used by a bunch of people writing raw
>> SQL queries, and I cannot let them write queries that are going to go on
>> for ages, and eventually crash over temp_file_limit after hours every now
>> and then.
>> So, my understanding of the above is that I must inform the users NOT to
>> use OR clauses into joins.
>> which maybe a pb by itself.
>> regards
>> Marc
>>
>>
>>> regards, tom lane
>>>
>>
>> Marc MILLAS
>>
>


Re: A question about the postgres's website

2023-06-26 Thread Adrian Klaver

On 6/24/23 06:36, wen-yi wrote:

Hi community,
When I check the postgres's website, I find something confused:

https://www.postgresql.org/community/ 
 (Quick Links: Don't see the 
slack item)


https://www.postgresql.org/list/  
(Quick Links: See the slack item)


Can someone give me some advice?


Read this thread at -www beginning here:

https://www.postgresql.org/message-id/CACoCc8ALTqkSwFr%2BSkynQB-4NX75ddVL%3DhopRKyzqzXsu0QQvQ%40mail.gmail.com


Thanks in advance!

Yours,
Wen Yi


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: bug or lacking doc hint

2023-06-26 Thread Ron

On 6/26/23 07:22, Marc Millas wrote:



On Mon, Jun 26, 2023 at 5:47 AM Avin Kavish  wrote:

Sounds like the problem you are having is, the server is running out
of temporary resources for the operation that users are trying to do.
So according to Tom, on the postgres side, the operation cannot be
optimized further.

I think you have few choices here,
  - See if increasing the resources of the server will allow them to
run the operation
  - Ask users not to do that operation
  - Use a extension like citus to scale horizontally

But I'm thinking why a massively inefficient join is needed in the
first place. Shouldn't joins be for following keyed relationships. So
ideally a unique indexed column, but at the very least an indexed
column. Why is a join required on a dynamically calculated substring?
Can it be made into a static computed value and indexed? Substring
sounds like an op that should be in the filter stage.

Can you describe your data model? Maybe we can give some specific advice.


There is a set of big tables containing fine grain health data. The DB is 
accessed by people doing research on various kind of sicknesses. So, by 
nature, all columns (I mean ALL) can be used for every kind of SQL 
including, obviously, lots of intricated joins.


This sounds like the kind of problem normally solved by data warehouses.  Is 
your schema designed like a DW, or is it in 3NF?


--
Born in Arizona, moved to Babylonia.

Re: A question about the postgres's website

2023-06-26 Thread Daniel Gustafsson
> On 24 Jun 2023, at 15:36, wen-yi  wrote:
> 
> Hi community,
> When I check the postgres's website, I find something confused:
> 
> https://www.postgresql.org/community/ (Quick Links: Don't see the slack item)
> 
> https://www.postgresql.org/list/ (Quick Links: See the slack item)
> 
> Can someone give me some advice?

The latter page website is actually hosted by a separate system and the change
in the menu content hasn't been synchronized there, hence why it's showing up.
We'll get it fixed, thanks for the report!

--
Daniel Gustafsson





Large pkey index on insert-only table

2023-06-26 Thread Devin Ivy
Hi all,
I have a suspiciously large index, and I could use a hand finding a root
cause for its size.  This index supports the primary key for a closure
table that models threaded comments with columns `(id, ancestor_id,
depth)`.  The primary key is composite: `(id, ancestor_id)`.  The id
columns are varchars which are a bit long for identifiers, around 70
bytes.  This table is insert-only: the application never performs updates
or deletes.

The table has grown to 200GB, and the unique index supporting the primary
key is nearly double that at around 360GB, which stood out to me as rather
large compared to the table itself.  The index uses the default fillfactor
of 90.  I would not anticipate very much bloat since updates and deletes
never occur on this table, and according to pg_stat_all_tables autovacuum
has been running regularly.  I've used the btree bloat estimator from
https://github.com/ioguix/pgsql-bloat-estimation, and it estimates the
bloat percentage at 47%.

Any thoughts on why this may be, or where to go next to continue tracking
this down?  Also, could the primary key column order `(id, ancestor_id)` vs
`(ancestor_id, id)` significantly affect the index size depending on the
column cardinalities?  I appreciate your time and input, thanks!

--
Devin Ivy


Re: Large pkey index on insert-only table

2023-06-26 Thread Peter Geoghegan
On Mon, Jun 26, 2023 at 8:50 AM Devin Ivy  wrote:
> Any thoughts on why this may be, or where to go next to continue tracking 
> this down?  Also, could the primary key column order `(id, ancestor_id)` vs 
> `(ancestor_id, id)` significantly affect the index size depending on the 
> column cardinalities?

That is certainly possible, yes. I describe one particular pathology
that causes composite B-Tree indexes to only have about 50% space
utilization here:

https://youtu.be/p5RaATILoiE?t=2079

Theoretically this shouldn't be a problem anymore, because (as the
talk describes) Postgres 12 added heuristics that avoid the problem.
But those heuristics are kind of conservative; they only kick in when
it's fairly clearly the right thing to do. I don't think that they'll
work very reliably for varchar columns.

Note that 90% space utilization isn't really the standard case for
B-Trees in general. Even absent updates and deletes, an index with
completely random insertions (e.g., a UUID index) is expected to have
about 70% space utilization. You can only really expect ~90% space
utilization with monotonically increasing insertions.

On the other hand having less than 50% space utilization is pretty
poor, so (assuming that that's what "bloat percentage 47%" means) then
I'd say that you're right to suspect that something is a bit off here.
This isn't necessarily a big deal, but I tend to agree that what
you're seeing is something that theoretically can be avoided by the
implementation (if there were deletes involved then that wouldn't
apply, but there aren't).

--
Peter Geoghegan




Re: bug or lacking doc hint

2023-06-26 Thread Marc Millas
On Mon, Jun 26, 2023 at 4:05 PM Ron  wrote:

> On 6/26/23 07:22, Marc Millas wrote:
>
>
>
> On Mon, Jun 26, 2023 at 5:47 AM Avin Kavish  wrote:
>
>> Sounds like the problem you are having is, the server is running out of
>> temporary resources for the operation that users are trying to do. So
>> according to Tom, on the postgres side, the operation cannot be optimized
>> further.
>>
>> I think you have few choices here,
>>   - See if increasing the resources of the server will allow them to run
>> the operation
>>   - Ask users not to do that operation
>>   - Use a extension like citus to scale horizontally
>>
>> But I'm thinking why a massively inefficient join is needed in the first
>> place. Shouldn't joins be for following keyed relationships. So ideally a
>> unique indexed column, but at the very least an indexed column. Why is a
>> join required on a dynamically calculated substring? Can it be made into a
>> static computed value and indexed? Substring sounds like an op that should
>> be in the filter stage.
>>
>> Can you describe your data model? Maybe we can give some specific advice.
>>
>
> There is a set of big tables containing fine grain health data. The DB is
> accessed by people doing research on various kind of sicknesses. So, by
> nature, all columns (I mean ALL) can be used for every kind of SQL
> including, obviously, lots of intricated joins.
>
>
> This sounds like the kind of problem normally solved by data warehouses.
> Is your schema designed like a DW, or is it in 3NF?
>

it's, indeed, some kind of dwh.
but it's neither a star nor a snowflake .at least not used like those
standard schemas.
in one of the big tables (10 billions+ rows), there is around 60 columns,
describing one event: some guy have had a given sickness, got a given medoc
etc
The pb is that its not one simple event with a set of dimensions,
the  people  using that db are NOT looking for an event according to
various criterias, they are looking for correlations between each of the
60+ columns.
As a consequence very few indexes are used as most requests end in some
kind of huge sequential reads.
The machine was built for this and perform well, but some requests are
posing pb and we must find solutions/workaround.
one of the users did rewrite the request using  a select distinct matched
with left join(s) and table.a is not null set of conditions.
looks crazy, but does work. I'll get the request tomorrow.



> --
> Born in Arizona, moved to Babylonia.
>

Marc MILLAS


Large scale reliable software system

2023-06-26 Thread B M
Dear all,

After greeting,

I taught PostgreSQL myself and developed a small scale experimental
software system using PostgreSQL in the back-end.

I would like to know your advices to develop a large scale reliable
software system using PostgreSQL in the back-end, through which i can share
the storage with the different system users where they login to the system
through the web application front-end with different passwords and
usernames , save the privacy of each user data, improve overall system
security and performance, achieve fast response, make backups and save the
stored data from loss. The system will be hosted on a cloud.

Thank you in advance.


Re: Large scale reliable software system

2023-06-26 Thread Adrian Klaver

On 6/26/23 16:48, B M wrote:

Dear all,

After greeting,

I taught PostgreSQL myself and developed a small scale 
experimentalsoftware system using PostgreSQL in the back-end.


I would like to know your advices to develop a large scale reliable 
software system using PostgreSQL in the back-end, through which i can 
share the storage with the different system users where they login to 
the system through the web application front-end with different 
passwords and usernames , save the privacy of each user data, improve 
overall system security and performance, achieve fast response, make 
backups and save the stored data from loss. The system will be hosted on 
a cloud.


https://www.djangoproject.com/



Thank you in advance.



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Large scale reliable software system

2023-06-26 Thread Guyren Howe
This is a reasonable answer, but I want to offer a caveat.

Likely because of the influence of the originator of Ruby on Rails, it is close 
to holy writ in the web development community that the database must be treated 
as a dumb data bucket and all business logic must be implemented in the Ruby or 
Python or whatever back end code.

This heuristic is nearly always mostly wrong.

Guyren G Howe
On Jun 26, 2023 at 17:48 -0700, Adrian Klaver , 
wrote:
> On 6/26/23 16:48, B M wrote:
> > Dear all,
> >
> > After greeting,
> >
> > I taught PostgreSQL myself and developed a small scale
> > experimentalsoftware system using PostgreSQL in the back-end.
> >
> > I would like to know your advices to develop a large scale reliable
> > software system using PostgreSQL in the back-end, through which i can
> > share the storage with the different system users where they login to
> > the system through the web application front-end with different
> > passwords and usernames , save the privacy of each user data, improve
> > overall system security and performance, achieve fast response, make
> > backups and save the stored data from loss. The system will be hosted on
> > a cloud.
>
> https://www.djangoproject.com/
>
> >
> > Thank you in advance.
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>


Re: Large scale reliable software system

2023-06-26 Thread Michael Nolan
It's not just Ruby, dumb databases are preferred in projects like
WordPress, Drupal and Joomla, too.

Now, if it's because they're used to using MySQL, well maybe that's
not so hard to understand.  :-)

On Mon, Jun 26, 2023 at 8:05 PM Guyren Howe  wrote:
>
> This is a reasonable answer, but I want to offer a caveat.
>
> Likely because of the influence of the originator of Ruby on Rails, it is 
> close to holy writ in the web development community that the database must be 
> treated as a dumb data bucket and all business logic must be implemented in 
> the Ruby or Python or whatever back end code.
>
> This heuristic is nearly always mostly wrong.
>
> Guyren G Howe
> On Jun 26, 2023 at 17:48 -0700, Adrian Klaver , 
> wrote:
>
> On 6/26/23 16:48, B M wrote:
>
> Dear all,
>
> After greeting,
>
> I taught PostgreSQL myself and developed a small scale
> experimentalsoftware system using PostgreSQL in the back-end.
>
> I would like to know your advices to develop a large scale reliable
> software system using PostgreSQL in the back-end, through which i can
> share the storage with the different system users where they login to
> the system through the web application front-end with different
> passwords and usernames , save the privacy of each user data, improve
> overall system security and performance, achieve fast response, make
> backups and save the stored data from loss. The system will be hosted on
> a cloud.
>
>
> https://www.djangoproject.com/
>
>
> Thank you in advance.
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>




Re: Large scale reliable software system

2023-06-26 Thread Guyren Howe
Correct. It’s a tragically wrong piece of folk wisdom that’s pretty general 
across web development communities.

> On Jun 26, 2023, at 21:32, Michael Nolan  wrote:
> 
> It's not just Ruby, dumb databases are preferred in projects like
> WordPress, Drupal and Joomla, too.
> 
> Now, if it's because they're used to using MySQL, well maybe that's
> not so hard to understand.  :-)
> 
> On Mon, Jun 26, 2023 at 8:05 PM Guyren Howe  wrote:
>> 
>> This is a reasonable answer, but I want to offer a caveat.
>> 
>> Likely because of the influence of the originator of Ruby on Rails, it is 
>> close to holy writ in the web development community that the database must 
>> be treated as a dumb data bucket and all business logic must be implemented 
>> in the Ruby or Python or whatever back end code.
>> 
>> This heuristic is nearly always mostly wrong.
>> 
>> Guyren G Howe
>> On Jun 26, 2023 at 17:48 -0700, Adrian Klaver , 
>> wrote:
>> 
>> On 6/26/23 16:48, B M wrote:
>> 
>> Dear all,
>> 
>> After greeting,
>> 
>> I taught PostgreSQL myself and developed a small scale
>> experimentalsoftware system using PostgreSQL in the back-end.
>> 
>> I would like to know your advices to develop a large scale reliable
>> software system using PostgreSQL in the back-end, through which i can
>> share the storage with the different system users where they login to
>> the system through the web application front-end with different
>> passwords and usernames , save the privacy of each user data, improve
>> overall system security and performance, achieve fast response, make
>> backups and save the stored data from loss. The system will be hosted on
>> a cloud.
>> 
>> 
>> https://www.djangoproject.com/
>> 
>> 
>> Thank you in advance.
>> 
>> 
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>> 
>> 
>>