RE: Configure autovacuum

2024-07-04 Thread Shenavai, Manuel
We see that our DB keeps increasing under high load (many updates). We see that 
our DB has a size of 200GB and we got 199GB bloat, 0 dead tuple. And when the 
DB is put on high load (many updates), we still see that the DB size grows. We 
try to find parameters to avoid DB growth.

I think we need to tweak the autovacuum settings and maybe limit the volume of 
data that can be written to the DB.

Is there any setting in postgres that would allow to write only certain volume? 
For example, limit the amount of data that can be written to a table to 
100MB/minute.

Best regards,
Manuel

-Original Message-
From: Adrian Klaver  
Sent: 14 June 2024 16:32
To: Shenavai, Manuel ; pgsql-general 

Subject: Re: Configure autovacuum

On 6/13/24 23:20, Shenavai, Manuel wrote:
> Hi everyone,
> 
> I would like to configure the autovacuum in a way that it runs very 
> frequently (i.e. after each update-statement). I tried the following 

Why?

What is the problem you are trying to solve?

> settings on my table:
> 
> alter table mytable set (autovacuum_vacuum_scale_factor  = 0.0);
> 
> alter table mytable set (autovacuum_vacuum_cost_delay  = 0.0);
> 
> alter table mytable set (autovacuum_vacuum_cost_limit  = 1);
> 
> alter table mytable set (autovacuum_vacuum_threshold  = 1);
> 
> I do a lot of updates on a single tuple and I would expect that the 
> autovacuum would start basically after each update (due to 
> autovacuum_vacuum_threshold=1). But the autovacuum is not running.
> 
> Is it possible to configure postgres to autovacuum very aggressively 
> (i.e. after each update-statement)?
> 
> Thanks in advance &
> 
> Best regards,
> 
> Manuel
> 

-- 
Adrian Klaver
adrian.kla...@aklaver.com



Re: Configure autovacuum

2024-07-04 Thread Adrian Klaver

On 7/4/24 08:16, Shenavai, Manuel wrote:

We see that our DB keeps increasing under high load (many updates). We see that 
our DB has a size of 200GB and we got 199GB bloat, 0 dead tuple. And when the 
DB is put on high load (many updates), we still see that the DB size grows. We 
try to find parameters to avoid DB growth.


Show your work:

1) How did you determine the bloat number?

2) How did you determine there are 0 dead tuples?

3) Define high load.

4) Postgres version?

5) What are your autovacuum settings?




I think we need to tweak the autovacuum settings and maybe limit the volume of 
data that can be written to the DB.


That will need to happen on client end.



Is there any setting in postgres that would allow to write only certain volume? 
For example, limit the amount of data that can be written to a table to 
100MB/minute.

Best regards,
Manuel

-Original Message-
From: Adrian Klaver 
Sent: 14 June 2024 16:32
To: Shenavai, Manuel ; pgsql-general 

Subject: Re: Configure autovacuum

On 6/13/24 23:20, Shenavai, Manuel wrote:

Hi everyone,

I would like to configure the autovacuum in a way that it runs very
frequently (i.e. after each update-statement). I tried the following


Why?

What is the problem you are trying to solve?


settings on my table:

alter table mytable set (autovacuum_vacuum_scale_factor  = 0.0);

alter table mytable set (autovacuum_vacuum_cost_delay  = 0.0);

alter table mytable set (autovacuum_vacuum_cost_limit  = 1);

alter table mytable set (autovacuum_vacuum_threshold  = 1);

I do a lot of updates on a single tuple and I would expect that the
autovacuum would start basically after each update (due to
autovacuum_vacuum_threshold=1). But the autovacuum is not running.

Is it possible to configure postgres to autovacuum very aggressively
(i.e. after each update-statement)?

Thanks in advance &

Best regards,

Manuel





--
Adrian Klaver
adrian.kla...@aklaver.com





RE: Configure autovacuum

2024-07-04 Thread Shenavai, Manuel
Thanks for the questions.

Here are some details:
1) we use this query to get the bloat: 
https://github.com/ioguix/pgsql-bloat-estimation/blob/master/table/table_bloat.sql
But in our load test, we got a empty database with 1 record that gets updated 
very frequently. Here we just meature the DB size to see how big the DB grows:
SELECT  pg_total_relation_size('my-table') / 1024/1014;

2) Dead tuples: select n_dead_tup,n_live_tup,  n_tup_del, relname,* from 
pg_stat_all_tables where relname= (select REPLACE((SELECT cast 
(reltoastrelid::regclass as text) FROM pg_class WHERE relkind = 'r' AND 
reltoastrelid <> 0 and relname = 'my-table'),'pg_toast.',''));
We are only updating the blob so we are mostly interested in the toast

3) In our load test, High Load means constantly updating a single record with a 
predefined payload  (i.e. random bytearray of x MB) for x minutes. We update up 
to 60MB per second
4) Postgres Version: 14.12-2
5) We are using default autovacuum-settings

Best regards,
Manuel

-Original Message-
From: Adrian Klaver  
Sent: 04 July 2024 17:43
To: Shenavai, Manuel ; pgsql-general 

Subject: Re: Configure autovacuum

On 7/4/24 08:16, Shenavai, Manuel wrote:
> We see that our DB keeps increasing under high load (many updates). We see 
> that our DB has a size of 200GB and we got 199GB bloat, 0 dead tuple. And 
> when the DB is put on high load (many updates), we still see that the DB size 
> grows. We try to find parameters to avoid DB growth.

Show your work:

1) How did you determine the bloat number?

2) How did you determine there are 0 dead tuples?

3) Define high load.

4) Postgres version?

5) What are your autovacuum settings?


> 
> I think we need to tweak the autovacuum settings and maybe limit the volume 
> of data that can be written to the DB.

That will need to happen on client end.

> 
> Is there any setting in postgres that would allow to write only certain 
> volume? For example, limit the amount of data that can be written to a table 
> to 100MB/minute.
> 
> Best regards,
> Manuel
> 
> -Original Message-
> From: Adrian Klaver 
> Sent: 14 June 2024 16:32
> To: Shenavai, Manuel ; pgsql-general 
> 
> Subject: Re: Configure autovacuum
> 
> On 6/13/24 23:20, Shenavai, Manuel wrote:
>> Hi everyone,
>>
>> I would like to configure the autovacuum in a way that it runs very
>> frequently (i.e. after each update-statement). I tried the following
> 
> Why?
> 
> What is the problem you are trying to solve?
> 
>> settings on my table:
>>
>> alter table mytable set (autovacuum_vacuum_scale_factor  = 0.0);
>>
>> alter table mytable set (autovacuum_vacuum_cost_delay  = 0.0);
>>
>> alter table mytable set (autovacuum_vacuum_cost_limit  = 1);
>>
>> alter table mytable set (autovacuum_vacuum_threshold  = 1);
>>
>> I do a lot of updates on a single tuple and I would expect that the
>> autovacuum would start basically after each update (due to
>> autovacuum_vacuum_threshold=1). But the autovacuum is not running.
>>
>> Is it possible to configure postgres to autovacuum very aggressively
>> (i.e. after each update-statement)?
>>
>> Thanks in advance &
>>
>> Best regards,
>>
>> Manuel
>>
> 

-- 
Adrian Klaver
adrian.kla...@aklaver.com



Design strategy for table with many attributes

2024-07-04 Thread Lok P
Hello,
In one of the applications we are getting transactions in messages/events
format and also in files and then they are getting parsed and stored into
the relational database. The number of attributes/columns each transaction
has is ~900+. Logically they are part of one single transaction and should
be stored in one table as one single row. There will be ~500million such
transactions each day coming into the system. And there will be approx ~10K
peak write TPS and 5K read TPS in target state. This system has a postgres
database as a "source of truth" or OLTP store. And then data moves to
snowflakes for the olap store.

Initially when the system was designed the number of attributes per
transaction was <100 but slowly the business wants to keep/persist other
attributes too in the current system and the number of columns keep growing.

However, as worked with some database systems , we get few suggestions from
DBA's to not have many columns in a single table. For example in oracle
they say not to go beyond ~255 columns as then row chaining and row
migration type of things are going to hunt us. Also we are afraid
concurrent DMLS on the table may cause this as a contention point. So I
wanted to understand , in such a situation what would be the best design
approach we should use irrespective of databases? Or say, what is the
maximum number of columns per table we should restrict? Should we break the
single transaction into multiple tables like one main table and other
addenda tables with the same primary key to join and fetch the results
wherever necessary?

Regards
Lok


Re: Design strategy for table with many attributes

2024-07-04 Thread David G. Johnston
On Thu, Jul 4, 2024 at 12:38 PM Lok P  wrote:

>
> Should we break the single transaction into multiple tables like one main
> table and other addenda tables with the same primary key to join and fetch
> the results wherever necessary?
>
>
I would say yes.  Find a way to logically group sets of columns together
and place those groups into separate tables.  I'd also be looking for cases
where multiple columns really should be multiple rows.  This is not
uncommon.

David J.


David J.


Re: Configure autovacuum

2024-07-04 Thread Adrian Klaver

On 7/4/24 10:24, Shenavai, Manuel wrote:

Thanks for the questions.

Here are some details:
1) we use this query to get the bloat: 
https://github.com/ioguix/pgsql-bloat-estimation/blob/master/table/table_bloat.sql
But in our load test, we got a empty database with 1 record that gets updated 
very frequently. Here we just meature the DB size to see how big the DB grows:
SELECT  pg_total_relation_size('my-table') / 1024/1014;


That really does not clear things up:

1) pg_total_relation_size measures the size of a relation(table) not the 
database.


2) The database is not empty if it has relation of size 200GB.

3) Just because a database grows big does not mean it is bloated. 
Include the output of the bloat query.




2) Dead tuples: select n_dead_tup,n_live_tup,  n_tup_del, relname,* from 
pg_stat_all_tables where relname= (select REPLACE((SELECT cast 
(reltoastrelid::regclass as text) FROM pg_class WHERE relkind = 'r' AND reltoastrelid 
<> 0 and relname = 'my-table'),'pg_toast.',''));
We are only updating the blob so we are mostly interested in the toast


By blob do you mean bytea or large objects?



3) In our load test, High Load means constantly updating a single record with a 
predefined payload  (i.e. random bytearray of x MB) for x minutes. We update up 
to 60MB per second


Do you do this all in one transaction?


4) Postgres Version: 14.12-2
5) We are using default autovacuum-settings

Best regards,
Manuel

-Original Message-
From: Adrian Klaver 
Sent: 04 July 2024 17:43
To: Shenavai, Manuel ; pgsql-general 

Subject: Re: Configure autovacuum

On 7/4/24 08:16, Shenavai, Manuel wrote:

We see that our DB keeps increasing under high load (many updates). We see that 
our DB has a size of 200GB and we got 199GB bloat, 0 dead tuple. And when the 
DB is put on high load (many updates), we still see that the DB size grows. We 
try to find parameters to avoid DB growth.


Show your work:

1) How did you determine the bloat number?

2) How did you determine there are 0 dead tuples?

3) Define high load.

4) Postgres version?

5) What are your autovacuum settings?




I think we need to tweak the autovacuum settings and maybe limit the volume of 
data that can be written to the DB.


That will need to happen on client end.



Is there any setting in postgres that would allow to write only certain volume? 
For example, limit the amount of data that can be written to a table to 
100MB/minute.

Best regards,
Manuel

-Original Message-
From: Adrian Klaver 
Sent: 14 June 2024 16:32
To: Shenavai, Manuel ; pgsql-general 

Subject: Re: Configure autovacuum

On 6/13/24 23:20, Shenavai, Manuel wrote:

Hi everyone,

I would like to configure the autovacuum in a way that it runs very
frequently (i.e. after each update-statement). I tried the following


Why?

What is the problem you are trying to solve?


settings on my table:

alter table mytable set (autovacuum_vacuum_scale_factor  = 0.0);

alter table mytable set (autovacuum_vacuum_cost_delay  = 0.0);

alter table mytable set (autovacuum_vacuum_cost_limit  = 1);

alter table mytable set (autovacuum_vacuum_threshold  = 1);

I do a lot of updates on a single tuple and I would expect that the
autovacuum would start basically after each update (due to
autovacuum_vacuum_threshold=1). But the autovacuum is not running.

Is it possible to configure postgres to autovacuum very aggressively
(i.e. after each update-statement)?

Thanks in advance &

Best regards,

Manuel







--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Design strategy for table with many attributes

2024-07-04 Thread Kent Dorfman

On 7/4/24 15:37, Lok P wrote:
Or say, what is the maximum number of columns per table we should 
restrict? Should we break the single transaction into multiple tables 
like one main table and other addenda tables with the same primary key 
to join and fetch the results wherever necessary?


900 columns makes my head hurt badly...

The neophyte will design a table with a separate field for each type of 
phone number that may be encountered.  The experienced designer will 
move all phone numbers to its own table, where each entry/row contains a 
reference link, a "number_type" field, and a field with the actual 
number in it...Three fields in a table that is manageable and can be 
queried/modified without stressing the database server.


Re: Design strategy for table with many attributes

2024-07-04 Thread Ron Johnson
On Thu, Jul 4, 2024 at 3:38 PM Lok P  wrote:

> Hello,
> In one of the applications we are getting transactions in messages/events
> format and also in files and then they are getting parsed and stored into
> the relational database. The number of attributes/columns each transaction
> has is ~900+. Logically they are part of one single transaction
>

Nothing out of the ordinary.


> and should be stored in one table as one single row.
>

Says who?


> There will be ~500million such transactions each day coming into the
> system. And there will be approx ~10K peak write TPS and 5K read TPS in
> target state. This system has a postgres database as a "source of truth" or
> OLTP store. And then data moves to snowflakes for the olap store.
>
> Initially when the system was designed the number of attributes per
> transaction was <100 but slowly the business wants to keep/persist other
> attributes too in the current system and the number of columns keep growing.
>
> However, as worked with some database systems , we get few suggestions
> from DBA's to not have many columns in a single table. For example in
> oracle they say not to go beyond ~255 columns as then row chaining and row
> migration type of things are going to hunt us. Also we are afraid
> concurrent DMLS on the table may cause this as a contention point. So I
> wanted to understand , in such a situation what would be the best design
> approach we should use irrespective of databases? Or say, what is the
> maximum number of columns per table we should restrict? Should we break the
> single transaction into multiple tables like one main table and other
> addenda tables with the same primary key to join and fetch the results
> wherever necessary?
>

You need database normalization.  It's a big topic.  Here's a good simple
explanation:
https://popsql.com/blog/normalization-in-sql


JSONPath operator and escaping values in query

2024-07-04 Thread Vasu Nagendra
Good afternoon,
I am running into the following issue with a JSONPath exists query.

This is a valid query

SELECT '{"n": {"a-b": 1, "@ab": 2, "ab": 3}}'::jsonb @? '$ ? (@.n.ab >= 3)';

This is an invalid query (syntax error)

SELECT '{"n": {"a-b": 1, "@ab": 2, "ab": 3}}'::jsonb @? '$ ? (@.n.a-b >= 3)';

The thing that is making it invalid is the key “a-b”. Same error occurs for key 
“@ab”. In looking at the following link 
https://github.com/postgres/postgres/blob/master/src/include/utils/jsonpath.h#L62,
 it looks like anything that is in the enum JsonPathItemType if present in the 
query will cause a syntax error and must be escaped like so


SELECT '{"n": {"a-b": 1, "@ab": 2, "ab": 3}}'::jsonb @? '$ ? (@.n.a\-b >= 3)';

I also looked at the section 4.1.4 
(https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-SPECIAL-CHARS),
 but this is only talking about the SQL allowed/disallowed special characters – 
not specific to the JSONPath query.

Looking at the source code here for function printJsonPathItem 
https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/jsonpath.c#L685,
 I think this is just processing characters in the path one by one, which would 
explain why there is no special syntax for how to escape the string.

Question: Is this a valid assumption? If I have a python program (for example) 
that is formatting the string for the query '$ ? (@.n.a\-b >= 3)’, is it 
correct to format anything that is present in the JsonPathItemType enum 
documentation? Of course this assumes all the standard security things about 
sanitizing user input and handling the path conversion for arrays correctly – 
meaning “a.*.b” must be replaced with “a[*].b”…

If this is documentation I should contribute to, I am happy to – I’d imagine it 
belongs in section 9.16.1 
https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-JSON-PROCESSING
 as a footnote to Table 9.46. Additional jsonb Operators

Thanks!
--Vasu



Re: Design strategy for table with many attributes

2024-07-04 Thread Guyren Howe
Ultimately, the way you should store the data depends on how you will use it. 
When you retrieve these values, what are the different ways you’ll be using 
them?

Normalised representations are more flexible, and the pragmatic, 
performance-based consideration is that all the values in a row are always 
effectively retrieved together. So if you reasonably often have a query that 
only accesses the creation date and transaction id, then it will be pretty slow 
if you are also always retrieving 500 other columns you don’t need.

So: you might often pull all the values *other* than the attributes (creation 
date, creator, job information, whatever) together. This argues that those 
values should be in one table, and the attributes in another.

Will you usually be using *all* of the attributes for a particular transaction 
together in the same operation? It might make sense to store them in eg an 
array in that case. But this significantly reduces the speed of accessing 
particular attributes separately.

It is likely that you will want to slice things by particular named attribute 
across many transactions. This argues for the more normalised form, as does the 
general principle of doing things in the way that is most general and flexible.

When considering how the data will be used, please consider not only the 
developers of your current application(s), but also eg data analysts, managers, 
future applications etc.

The less specific you can be about how you want to use the data, the more you 
should lean toward fully normalising.

Note also that you can store your data in a normalised and maximally flexible 
form, but also use triggers or batch jobs to gather various permutations of the 
data for specific purposes. If you really do, say, both have some major part of 
your project that uses all the attributes on a given transaction together, but 
you also have other uses, you may want to store both the normalised/attribute 
table and the “all the values together” version.

Even if you want to store “all the values together”, it may well be better to 
use an array, JSON or HStore, rather than having a super-wide table. JSON would 
eg let you enumerate all the column names (for example) and employ Postgres’s 
really nice JSON query features.


> On Jul 4, 2024, at 12:37, Lok P  wrote:
> 
> Hello,
> In one of the applications we are getting transactions in messages/events 
> format and also in files and then they are getting parsed and stored into the 
> relational database. The number of attributes/columns each transaction has is 
> ~900+. Logically they are part of one single transaction and should be stored 
> in one table as one single row. There will be ~500million such transactions 
> each day coming into the system. And there will be approx ~10K peak write TPS 
> and 5K read TPS in target state. This system has a postgres database as a 
> "source of truth" or OLTP store. And then data moves to snowflakes for the 
> olap store.
> 
> Initially when the system was designed the number of attributes per 
> transaction was <100 but slowly the business wants to keep/persist other 
> attributes too in the current system and the number of columns keep growing.
> 
> However, as worked with some database systems , we get few suggestions from 
> DBA's to not have many columns in a single table. For example in oracle they 
> say not to go beyond ~255 columns as then row chaining and row migration type 
> of things are going to hunt us. Also we are afraid concurrent DMLS on the 
> table may cause this as a contention point. So I wanted to understand , in 
> such a situation what would be the best design approach we should use 
> irrespective of databases? Or say, what is the maximum number of columns per 
> table we should restrict? Should we break the single transaction into 
> multiple tables like one main table and other addenda tables with the same 
> primary key to join and fetch the results wherever necessary?
> 
> Regards
> Lok





psql help

2024-07-04 Thread Murthy Nunna
Hello:

Following works-

SELECT pid, pg_terminate_backend(pid) FROM pg_stat_activity
WHERE pid IN (select unnest(pg_blocking_pids(pid)) from pg_stat_activity 
where cardinality(pg_blocking_pids(pid)) > 0)
 and usename = 'DBUSER_10'
 and now() - state_change >= interval $a'${TIMEOUT_MINS}'$a
 order by now() - state_change >= interval 
$a'${TIMEOUT_MINS}'$a desc limit 1;

How can I rewrite the above in psql and pg_terminate_backend  all pids that 
meet above criteria (essentially remove limit 1) ?

Thanks!

Note:
I run this in Linux. TIMEOUT_MINS is env variable.





Re: JSONPath operator and escaping values in query

2024-07-04 Thread David G. Johnston
On Thu, Jul 4, 2024 at 1:57 PM Vasu Nagendra  wrote:

>
>
> SELECT '{"n": {"a-b": 1, "@ab": 2, "ab": 3}}'::jsonb @? '$ ? (@.n.a\-b >= 3)';
>
>
Which is better written as:

select '{"n": {"a-b": 1, "@ab": 2, "ab": 3}}'::jsonb @? '$ ? (@.n."a-b" >=
3)';

Using the same double-quotes you defined the key with originally.

The relevant documentation for jsonpath syntax is here:

https://www.postgresql.org/docs/current/datatype-json.html#DATATYPE-JSONPATH

David J.

>


Re: psql help

2024-07-04 Thread David G. Johnston
On Thu, Jul 4, 2024 at 4:56 PM Murthy Nunna  wrote:

>
>
> How can I rewrite the above in psql
>

The only real trick is using a psql variable instead of the shell-injection
of the environment variable.  Use the --set CLI argument to assign the
environment variable to a psql variable then refer to it in the query using
:'timout_mins'

Removing the limit 1 should be as simple as not typing limit 1 when you
bring the query into the psql script.

David J.


RE: psql help

2024-07-04 Thread Murthy Nunna
Sorry, there is no problem with the following statement and the environment 
variable. It works fine. But it terminates only one PID due to LIMIT 1. I want 
to terminate all pids that meet this criteria. If I remove LIMIT 1, 
pg_terminate_backend(pid) will not work as it expects only one pid at a time. 
So, the question is how to rewrite this psql so it loops through all pids one 
pid at a time? Thanks in advance for your help.

SELECT pid, pg_terminate_backend(pid) FROM pg_stat_activity
WHERE pid IN (select unnest(pg_blocking_pids(pid)) from pg_stat_activity 
where cardinality(pg_blocking_pids(pid)) > 0)
 and usename = 'DBUSER_10'
 and now() - state_change >= interval $a'${TIMEOUT_MINS}'$a
 order by now() - state_change >= interval 
$a'${TIMEOUT_MINS}'$a desc limit 1;





From: David G. Johnston 
Sent: Thursday, July 4, 2024 8:17 PM
To: Murthy Nunna 
Cc: pgsql-general@lists.postgresql.org
Subject: Re: psql help


[EXTERNAL] – This message is from an external sender
On Thu, Jul 4, 2024 at 4:56 PM Murthy Nunna 
mailto:mnu...@fnal.gov>> wrote:

How can I rewrite the above in psql

The only real trick is using a psql variable instead of the shell-injection of 
the environment variable.  Use the --set CLI argument to assign the environment 
variable to a psql variable then refer to it in the query using :'timout_mins'

Removing the limit 1 should be as simple as not typing limit 1 when you bring 
the query into the psql script.

David J.



Re: psql help

2024-07-04 Thread Adrian Klaver

On 7/4/24 20:54, Murthy Nunna wrote:
Sorry, there is no problem with the following statement and the 
environment variable. It works fine. But it terminates only one PID due 
to LIMIT 1. I want to terminate all pids that meet this criteria. If I 
remove LIMIT 1, pg_terminate_backend(pid) will not work as it expects 
only one pid at a time. So, the question is how to rewrite this psql so 
it loops through all pids one pid at a time? Thanks in advance for your 
help.


From here:

https://www.postgresql.org/docs/current/app-psql.html

See:

\gexec

--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Design strategy for table with many attributes

2024-07-04 Thread Lok P
On Fri, Jul 5, 2024 at 1:26 AM David G. Johnston 
wrote:

> On Thu, Jul 4, 2024 at 12:38 PM Lok P  wrote:
>
>>
>> Should we break the single transaction into multiple tables like one main
>> table and other addenda tables with the same primary key to join and fetch
>> the results wherever necessary?
>>
>>
> I would say yes.  Find a way to logically group sets of columns together
> and place those groups into separate tables.  I'd also be looking for cases
> where multiple columns really should be multiple rows.  This is not
> uncommon.
>
> David J.
>
>
Thank you David.

As you said, to logically break this into multiple tables so i believe it
means it should be such that there will be no need to query multiple tables
and join them most of the time for fetching the results. It should just
fetch the results from one table at any point in time.

But do you also suggest keeping those table pieces related to each other
through the same primary key ? Won't there be a problem when we load the
data like say for example , in normal scenario the data load will be to one
table but when we break it to multiple tables it will happen to all the
individual pieces, won't that cause additional burden to the data load?

Also I understand the technical limitation of the max number of columns per
table is ~1600. But should you advise to restrict/stop us to some low
number long before reaching that limit , such that we will not face any
anomalies when we grow in future. And if we should maintain any specific
order in the columns from start to end column in the specific table?


Re: psql help

2024-07-04 Thread David G. Johnston
The convention here is to in-line replies, or bottom-post.  Top-posting
makes the archives more difficult to read.

On Thursday, July 4, 2024, Murthy Nunna  wrote:

> pg_terminate_backend(pid) will not work as it expects only one pid at a
> time.
>
>
> Interesting…I wouldn’t expect the function calls to interact that
way…maybe try putting the select query into a plpgsql DO command loop and
then call pg_terminate_backend once per loop iteration.


> So, the question is how to rewrite this psql so it loops through all pids
> one pid at a time?
>
> I can’t tell if you really mean the psql program or are misusing the term
to mean something different…the code you wrote doesn’t seem like it would
execute in psql.

David J.


Re: Design strategy for table with many attributes

2024-07-04 Thread David Rowley
On Fri, 5 Jul 2024 at 17:07, Lok P  wrote:
> Also I understand the technical limitation of the max number of columns per 
> table is ~1600. But should you advise to restrict/stop us to some low number 
> long before reaching that limit , such that we will not face any anomalies 
> when we grow in future. And if we should maintain any specific order in the 
> columns from start to end column in the specific table?

Something else you may wish to consider, depending on the column types
of your 900+ columns is the possibility that some INSERTs may fail due
to row length while others with shorter variable length values may be
ok.

Here's a quick example with psql:

select 'create table a (' || string_agg('a'||x||' text not null
default ',',') || ')' from generate_series(1,1000)x;
\gexec
insert into a default values;
INSERT 0 1

again but with a larger DEFAULT to make the tuple larger.

select 'create table b (' || string_agg('a'||x||' text not null
default $$hello world$$',',') || ')' from generate_series(1,1000)x;
\gexec
insert into b default values;
ERROR:  row is too big: size 12024, maximum size 8160

There is a paragraph at the bottom of [1] with some warnings about
things relating to this.

The tuple length would be fixed for fixed-length types defined as NOT
NULL. So, if you have that, there should be no such surprises.

David

[1] https://www.postgresql.org/docs/current/limits.html




Re: psql help

2024-07-04 Thread Tom Lane
"David G. Johnston"  writes:
> On Thursday, July 4, 2024, Murthy Nunna  wrote:
>> pg_terminate_backend(pid) will not work as it expects only one pid at a
>> time.

> Interesting…I wouldn’t expect the function calls to interact that
> way

TBH, my reaction to that was that the OP doesn't understand SQL
semantics.  As you previously said, simply removing the LIMIT clause
should work fine.  (The ORDER BY looks kinda pointless, too, unless
there are operational constraints we weren't told about.)

There is a question of exactly what "$a'${TIMEOUT_MINS}'$a"
is supposed to mean, but that's independent of the LIMIT issue.

regards, tom lane




Re: Design strategy for table with many attributes

2024-07-04 Thread David G. Johnston
On Thursday, July 4, 2024, Lok P  wrote:

>
> But do you also suggest keeping those table pieces related to each other
> through the same primary key ?
>
>
Yes, everyone row produced from the input data “row” should get the same ID
associated with it - either as an entire PK or a component of a
multi-column PK/unique index.

>
>
>  Won't there be a problem when we load the data like say for example , in
> normal scenario the data load will be to one table but when we break it to
> multiple tables it will happen to all the individual pieces, won't that
> cause additional burden to the data load?
>

Yes, doing this requires additional CPU time to perform the work.  I’d say
IO is hopefully a wash.


>
> Also I understand the technical limitation of the max number of
> columns per table is ~1600. But should you advise to restrict/stop us to
> some low number long before reaching that limit , such that we will not
> face any anomalies when we grow in future.
>

In a row-oriented system wider is worser.  I start evaluation of table
design with that in mind at the fourth column (including the surrogate key
that is usually present, and the natural key - ignoring auditing columns.)


>
>  And if we should maintain any specific order in the columns from start to
> end column in the specific table?
>

There is material out there on micro-optimizing column ordering to match
with alignment boundaries.  I’d the benefit is meaningful but there is a
non-trivial cost to actually setup the testing to verify that what you’ve
figured out is working.  Never actually done it myself.  Though it actually
seems like something someone could/may have written an algorithm for
(though I do not recall ever seeing mention of one.)

David J.


Re: Design strategy for table with many attributes

2024-07-04 Thread Lok P
On Fri, Jul 5, 2024 at 10:45 AM Guyren Howe  wrote:

> On Jul 4, 2024, at 22:07, Lok P  wrote:
>
> If you stick to the principle of grouping columns in a table when you use
> those columns together, you should be good.
>
> Note that you might want to split up the “parent” table if that naturally
> groups its columns together for certain uses. In that case, you could have
> the same pk on all the 1:1 tables you then have. In that case, the pk for
> each of those tables is also the fk.
>


Thank you.

When you said below,

*"Note that you might want to split up the “parent” table if that naturally
groups its columns together for certain uses. In that case, you could have
the same pk on all the 1:1 tables you then have. In that case, the pk for
each of those tables is also the fk."*
Do you mean having a real FK created through DDL and maintaining it or just
assume it and no need to define it for all the pieces/tables. Only keep the
same PK across all the pieces and as we know these are related to the same
transaction and are logically related?