RE: Configure autovacuum
We see that our DB keeps increasing under high load (many updates). We see that our DB has a size of 200GB and we got 199GB bloat, 0 dead tuple. And when the DB is put on high load (many updates), we still see that the DB size grows. We try to find parameters to avoid DB growth. I think we need to tweak the autovacuum settings and maybe limit the volume of data that can be written to the DB. Is there any setting in postgres that would allow to write only certain volume? For example, limit the amount of data that can be written to a table to 100MB/minute. Best regards, Manuel -Original Message- From: Adrian Klaver Sent: 14 June 2024 16:32 To: Shenavai, Manuel ; pgsql-general Subject: Re: Configure autovacuum On 6/13/24 23:20, Shenavai, Manuel wrote: > Hi everyone, > > I would like to configure the autovacuum in a way that it runs very > frequently (i.e. after each update-statement). I tried the following Why? What is the problem you are trying to solve? > settings on my table: > > alter table mytable set (autovacuum_vacuum_scale_factor = 0.0); > > alter table mytable set (autovacuum_vacuum_cost_delay = 0.0); > > alter table mytable set (autovacuum_vacuum_cost_limit = 1); > > alter table mytable set (autovacuum_vacuum_threshold = 1); > > I do a lot of updates on a single tuple and I would expect that the > autovacuum would start basically after each update (due to > autovacuum_vacuum_threshold=1). But the autovacuum is not running. > > Is it possible to configure postgres to autovacuum very aggressively > (i.e. after each update-statement)? > > Thanks in advance & > > Best regards, > > Manuel > -- Adrian Klaver adrian.kla...@aklaver.com
Re: Configure autovacuum
On 7/4/24 08:16, Shenavai, Manuel wrote: We see that our DB keeps increasing under high load (many updates). We see that our DB has a size of 200GB and we got 199GB bloat, 0 dead tuple. And when the DB is put on high load (many updates), we still see that the DB size grows. We try to find parameters to avoid DB growth. Show your work: 1) How did you determine the bloat number? 2) How did you determine there are 0 dead tuples? 3) Define high load. 4) Postgres version? 5) What are your autovacuum settings? I think we need to tweak the autovacuum settings and maybe limit the volume of data that can be written to the DB. That will need to happen on client end. Is there any setting in postgres that would allow to write only certain volume? For example, limit the amount of data that can be written to a table to 100MB/minute. Best regards, Manuel -Original Message- From: Adrian Klaver Sent: 14 June 2024 16:32 To: Shenavai, Manuel ; pgsql-general Subject: Re: Configure autovacuum On 6/13/24 23:20, Shenavai, Manuel wrote: Hi everyone, I would like to configure the autovacuum in a way that it runs very frequently (i.e. after each update-statement). I tried the following Why? What is the problem you are trying to solve? settings on my table: alter table mytable set (autovacuum_vacuum_scale_factor = 0.0); alter table mytable set (autovacuum_vacuum_cost_delay = 0.0); alter table mytable set (autovacuum_vacuum_cost_limit = 1); alter table mytable set (autovacuum_vacuum_threshold = 1); I do a lot of updates on a single tuple and I would expect that the autovacuum would start basically after each update (due to autovacuum_vacuum_threshold=1). But the autovacuum is not running. Is it possible to configure postgres to autovacuum very aggressively (i.e. after each update-statement)? Thanks in advance & Best regards, Manuel -- Adrian Klaver adrian.kla...@aklaver.com
RE: Configure autovacuum
Thanks for the questions. Here are some details: 1) we use this query to get the bloat: https://github.com/ioguix/pgsql-bloat-estimation/blob/master/table/table_bloat.sql But in our load test, we got a empty database with 1 record that gets updated very frequently. Here we just meature the DB size to see how big the DB grows: SELECT pg_total_relation_size('my-table') / 1024/1014; 2) Dead tuples: select n_dead_tup,n_live_tup, n_tup_del, relname,* from pg_stat_all_tables where relname= (select REPLACE((SELECT cast (reltoastrelid::regclass as text) FROM pg_class WHERE relkind = 'r' AND reltoastrelid <> 0 and relname = 'my-table'),'pg_toast.','')); We are only updating the blob so we are mostly interested in the toast 3) In our load test, High Load means constantly updating a single record with a predefined payload (i.e. random bytearray of x MB) for x minutes. We update up to 60MB per second 4) Postgres Version: 14.12-2 5) We are using default autovacuum-settings Best regards, Manuel -Original Message- From: Adrian Klaver Sent: 04 July 2024 17:43 To: Shenavai, Manuel ; pgsql-general Subject: Re: Configure autovacuum On 7/4/24 08:16, Shenavai, Manuel wrote: > We see that our DB keeps increasing under high load (many updates). We see > that our DB has a size of 200GB and we got 199GB bloat, 0 dead tuple. And > when the DB is put on high load (many updates), we still see that the DB size > grows. We try to find parameters to avoid DB growth. Show your work: 1) How did you determine the bloat number? 2) How did you determine there are 0 dead tuples? 3) Define high load. 4) Postgres version? 5) What are your autovacuum settings? > > I think we need to tweak the autovacuum settings and maybe limit the volume > of data that can be written to the DB. That will need to happen on client end. > > Is there any setting in postgres that would allow to write only certain > volume? For example, limit the amount of data that can be written to a table > to 100MB/minute. > > Best regards, > Manuel > > -Original Message- > From: Adrian Klaver > Sent: 14 June 2024 16:32 > To: Shenavai, Manuel ; pgsql-general > > Subject: Re: Configure autovacuum > > On 6/13/24 23:20, Shenavai, Manuel wrote: >> Hi everyone, >> >> I would like to configure the autovacuum in a way that it runs very >> frequently (i.e. after each update-statement). I tried the following > > Why? > > What is the problem you are trying to solve? > >> settings on my table: >> >> alter table mytable set (autovacuum_vacuum_scale_factor = 0.0); >> >> alter table mytable set (autovacuum_vacuum_cost_delay = 0.0); >> >> alter table mytable set (autovacuum_vacuum_cost_limit = 1); >> >> alter table mytable set (autovacuum_vacuum_threshold = 1); >> >> I do a lot of updates on a single tuple and I would expect that the >> autovacuum would start basically after each update (due to >> autovacuum_vacuum_threshold=1). But the autovacuum is not running. >> >> Is it possible to configure postgres to autovacuum very aggressively >> (i.e. after each update-statement)? >> >> Thanks in advance & >> >> Best regards, >> >> Manuel >> > -- Adrian Klaver adrian.kla...@aklaver.com
Design strategy for table with many attributes
Hello, In one of the applications we are getting transactions in messages/events format and also in files and then they are getting parsed and stored into the relational database. The number of attributes/columns each transaction has is ~900+. Logically they are part of one single transaction and should be stored in one table as one single row. There will be ~500million such transactions each day coming into the system. And there will be approx ~10K peak write TPS and 5K read TPS in target state. This system has a postgres database as a "source of truth" or OLTP store. And then data moves to snowflakes for the olap store. Initially when the system was designed the number of attributes per transaction was <100 but slowly the business wants to keep/persist other attributes too in the current system and the number of columns keep growing. However, as worked with some database systems , we get few suggestions from DBA's to not have many columns in a single table. For example in oracle they say not to go beyond ~255 columns as then row chaining and row migration type of things are going to hunt us. Also we are afraid concurrent DMLS on the table may cause this as a contention point. So I wanted to understand , in such a situation what would be the best design approach we should use irrespective of databases? Or say, what is the maximum number of columns per table we should restrict? Should we break the single transaction into multiple tables like one main table and other addenda tables with the same primary key to join and fetch the results wherever necessary? Regards Lok
Re: Design strategy for table with many attributes
On Thu, Jul 4, 2024 at 12:38 PM Lok P wrote: > > Should we break the single transaction into multiple tables like one main > table and other addenda tables with the same primary key to join and fetch > the results wherever necessary? > > I would say yes. Find a way to logically group sets of columns together and place those groups into separate tables. I'd also be looking for cases where multiple columns really should be multiple rows. This is not uncommon. David J. David J.
Re: Configure autovacuum
On 7/4/24 10:24, Shenavai, Manuel wrote: Thanks for the questions. Here are some details: 1) we use this query to get the bloat: https://github.com/ioguix/pgsql-bloat-estimation/blob/master/table/table_bloat.sql But in our load test, we got a empty database with 1 record that gets updated very frequently. Here we just meature the DB size to see how big the DB grows: SELECT pg_total_relation_size('my-table') / 1024/1014; That really does not clear things up: 1) pg_total_relation_size measures the size of a relation(table) not the database. 2) The database is not empty if it has relation of size 200GB. 3) Just because a database grows big does not mean it is bloated. Include the output of the bloat query. 2) Dead tuples: select n_dead_tup,n_live_tup, n_tup_del, relname,* from pg_stat_all_tables where relname= (select REPLACE((SELECT cast (reltoastrelid::regclass as text) FROM pg_class WHERE relkind = 'r' AND reltoastrelid <> 0 and relname = 'my-table'),'pg_toast.','')); We are only updating the blob so we are mostly interested in the toast By blob do you mean bytea or large objects? 3) In our load test, High Load means constantly updating a single record with a predefined payload (i.e. random bytearray of x MB) for x minutes. We update up to 60MB per second Do you do this all in one transaction? 4) Postgres Version: 14.12-2 5) We are using default autovacuum-settings Best regards, Manuel -Original Message- From: Adrian Klaver Sent: 04 July 2024 17:43 To: Shenavai, Manuel ; pgsql-general Subject: Re: Configure autovacuum On 7/4/24 08:16, Shenavai, Manuel wrote: We see that our DB keeps increasing under high load (many updates). We see that our DB has a size of 200GB and we got 199GB bloat, 0 dead tuple. And when the DB is put on high load (many updates), we still see that the DB size grows. We try to find parameters to avoid DB growth. Show your work: 1) How did you determine the bloat number? 2) How did you determine there are 0 dead tuples? 3) Define high load. 4) Postgres version? 5) What are your autovacuum settings? I think we need to tweak the autovacuum settings and maybe limit the volume of data that can be written to the DB. That will need to happen on client end. Is there any setting in postgres that would allow to write only certain volume? For example, limit the amount of data that can be written to a table to 100MB/minute. Best regards, Manuel -Original Message- From: Adrian Klaver Sent: 14 June 2024 16:32 To: Shenavai, Manuel ; pgsql-general Subject: Re: Configure autovacuum On 6/13/24 23:20, Shenavai, Manuel wrote: Hi everyone, I would like to configure the autovacuum in a way that it runs very frequently (i.e. after each update-statement). I tried the following Why? What is the problem you are trying to solve? settings on my table: alter table mytable set (autovacuum_vacuum_scale_factor = 0.0); alter table mytable set (autovacuum_vacuum_cost_delay = 0.0); alter table mytable set (autovacuum_vacuum_cost_limit = 1); alter table mytable set (autovacuum_vacuum_threshold = 1); I do a lot of updates on a single tuple and I would expect that the autovacuum would start basically after each update (due to autovacuum_vacuum_threshold=1). But the autovacuum is not running. Is it possible to configure postgres to autovacuum very aggressively (i.e. after each update-statement)? Thanks in advance & Best regards, Manuel -- Adrian Klaver adrian.kla...@aklaver.com
Re: Design strategy for table with many attributes
On 7/4/24 15:37, Lok P wrote: Or say, what is the maximum number of columns per table we should restrict? Should we break the single transaction into multiple tables like one main table and other addenda tables with the same primary key to join and fetch the results wherever necessary? 900 columns makes my head hurt badly... The neophyte will design a table with a separate field for each type of phone number that may be encountered. The experienced designer will move all phone numbers to its own table, where each entry/row contains a reference link, a "number_type" field, and a field with the actual number in it...Three fields in a table that is manageable and can be queried/modified without stressing the database server.
Re: Design strategy for table with many attributes
On Thu, Jul 4, 2024 at 3:38 PM Lok P wrote: > Hello, > In one of the applications we are getting transactions in messages/events > format and also in files and then they are getting parsed and stored into > the relational database. The number of attributes/columns each transaction > has is ~900+. Logically they are part of one single transaction > Nothing out of the ordinary. > and should be stored in one table as one single row. > Says who? > There will be ~500million such transactions each day coming into the > system. And there will be approx ~10K peak write TPS and 5K read TPS in > target state. This system has a postgres database as a "source of truth" or > OLTP store. And then data moves to snowflakes for the olap store. > > Initially when the system was designed the number of attributes per > transaction was <100 but slowly the business wants to keep/persist other > attributes too in the current system and the number of columns keep growing. > > However, as worked with some database systems , we get few suggestions > from DBA's to not have many columns in a single table. For example in > oracle they say not to go beyond ~255 columns as then row chaining and row > migration type of things are going to hunt us. Also we are afraid > concurrent DMLS on the table may cause this as a contention point. So I > wanted to understand , in such a situation what would be the best design > approach we should use irrespective of databases? Or say, what is the > maximum number of columns per table we should restrict? Should we break the > single transaction into multiple tables like one main table and other > addenda tables with the same primary key to join and fetch the results > wherever necessary? > You need database normalization. It's a big topic. Here's a good simple explanation: https://popsql.com/blog/normalization-in-sql
JSONPath operator and escaping values in query
Good afternoon, I am running into the following issue with a JSONPath exists query. This is a valid query SELECT '{"n": {"a-b": 1, "@ab": 2, "ab": 3}}'::jsonb @? '$ ? (@.n.ab >= 3)'; This is an invalid query (syntax error) SELECT '{"n": {"a-b": 1, "@ab": 2, "ab": 3}}'::jsonb @? '$ ? (@.n.a-b >= 3)'; The thing that is making it invalid is the key “a-b”. Same error occurs for key “@ab”. In looking at the following link https://github.com/postgres/postgres/blob/master/src/include/utils/jsonpath.h#L62, it looks like anything that is in the enum JsonPathItemType if present in the query will cause a syntax error and must be escaped like so SELECT '{"n": {"a-b": 1, "@ab": 2, "ab": 3}}'::jsonb @? '$ ? (@.n.a\-b >= 3)'; I also looked at the section 4.1.4 (https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-SPECIAL-CHARS), but this is only talking about the SQL allowed/disallowed special characters – not specific to the JSONPath query. Looking at the source code here for function printJsonPathItem https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/jsonpath.c#L685, I think this is just processing characters in the path one by one, which would explain why there is no special syntax for how to escape the string. Question: Is this a valid assumption? If I have a python program (for example) that is formatting the string for the query '$ ? (@.n.a\-b >= 3)’, is it correct to format anything that is present in the JsonPathItemType enum documentation? Of course this assumes all the standard security things about sanitizing user input and handling the path conversion for arrays correctly – meaning “a.*.b” must be replaced with “a[*].b”… If this is documentation I should contribute to, I am happy to – I’d imagine it belongs in section 9.16.1 https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-JSON-PROCESSING as a footnote to Table 9.46. Additional jsonb Operators Thanks! --Vasu
Re: Design strategy for table with many attributes
Ultimately, the way you should store the data depends on how you will use it. When you retrieve these values, what are the different ways you’ll be using them? Normalised representations are more flexible, and the pragmatic, performance-based consideration is that all the values in a row are always effectively retrieved together. So if you reasonably often have a query that only accesses the creation date and transaction id, then it will be pretty slow if you are also always retrieving 500 other columns you don’t need. So: you might often pull all the values *other* than the attributes (creation date, creator, job information, whatever) together. This argues that those values should be in one table, and the attributes in another. Will you usually be using *all* of the attributes for a particular transaction together in the same operation? It might make sense to store them in eg an array in that case. But this significantly reduces the speed of accessing particular attributes separately. It is likely that you will want to slice things by particular named attribute across many transactions. This argues for the more normalised form, as does the general principle of doing things in the way that is most general and flexible. When considering how the data will be used, please consider not only the developers of your current application(s), but also eg data analysts, managers, future applications etc. The less specific you can be about how you want to use the data, the more you should lean toward fully normalising. Note also that you can store your data in a normalised and maximally flexible form, but also use triggers or batch jobs to gather various permutations of the data for specific purposes. If you really do, say, both have some major part of your project that uses all the attributes on a given transaction together, but you also have other uses, you may want to store both the normalised/attribute table and the “all the values together” version. Even if you want to store “all the values together”, it may well be better to use an array, JSON or HStore, rather than having a super-wide table. JSON would eg let you enumerate all the column names (for example) and employ Postgres’s really nice JSON query features. > On Jul 4, 2024, at 12:37, Lok P wrote: > > Hello, > In one of the applications we are getting transactions in messages/events > format and also in files and then they are getting parsed and stored into the > relational database. The number of attributes/columns each transaction has is > ~900+. Logically they are part of one single transaction and should be stored > in one table as one single row. There will be ~500million such transactions > each day coming into the system. And there will be approx ~10K peak write TPS > and 5K read TPS in target state. This system has a postgres database as a > "source of truth" or OLTP store. And then data moves to snowflakes for the > olap store. > > Initially when the system was designed the number of attributes per > transaction was <100 but slowly the business wants to keep/persist other > attributes too in the current system and the number of columns keep growing. > > However, as worked with some database systems , we get few suggestions from > DBA's to not have many columns in a single table. For example in oracle they > say not to go beyond ~255 columns as then row chaining and row migration type > of things are going to hunt us. Also we are afraid concurrent DMLS on the > table may cause this as a contention point. So I wanted to understand , in > such a situation what would be the best design approach we should use > irrespective of databases? Or say, what is the maximum number of columns per > table we should restrict? Should we break the single transaction into > multiple tables like one main table and other addenda tables with the same > primary key to join and fetch the results wherever necessary? > > Regards > Lok
psql help
Hello: Following works- SELECT pid, pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid IN (select unnest(pg_blocking_pids(pid)) from pg_stat_activity where cardinality(pg_blocking_pids(pid)) > 0) and usename = 'DBUSER_10' and now() - state_change >= interval $a'${TIMEOUT_MINS}'$a order by now() - state_change >= interval $a'${TIMEOUT_MINS}'$a desc limit 1; How can I rewrite the above in psql and pg_terminate_backend all pids that meet above criteria (essentially remove limit 1) ? Thanks! Note: I run this in Linux. TIMEOUT_MINS is env variable.
Re: JSONPath operator and escaping values in query
On Thu, Jul 4, 2024 at 1:57 PM Vasu Nagendra wrote: > > > SELECT '{"n": {"a-b": 1, "@ab": 2, "ab": 3}}'::jsonb @? '$ ? (@.n.a\-b >= 3)'; > > Which is better written as: select '{"n": {"a-b": 1, "@ab": 2, "ab": 3}}'::jsonb @? '$ ? (@.n."a-b" >= 3)'; Using the same double-quotes you defined the key with originally. The relevant documentation for jsonpath syntax is here: https://www.postgresql.org/docs/current/datatype-json.html#DATATYPE-JSONPATH David J. >
Re: psql help
On Thu, Jul 4, 2024 at 4:56 PM Murthy Nunna wrote: > > > How can I rewrite the above in psql > The only real trick is using a psql variable instead of the shell-injection of the environment variable. Use the --set CLI argument to assign the environment variable to a psql variable then refer to it in the query using :'timout_mins' Removing the limit 1 should be as simple as not typing limit 1 when you bring the query into the psql script. David J.
RE: psql help
Sorry, there is no problem with the following statement and the environment variable. It works fine. But it terminates only one PID due to LIMIT 1. I want to terminate all pids that meet this criteria. If I remove LIMIT 1, pg_terminate_backend(pid) will not work as it expects only one pid at a time. So, the question is how to rewrite this psql so it loops through all pids one pid at a time? Thanks in advance for your help. SELECT pid, pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid IN (select unnest(pg_blocking_pids(pid)) from pg_stat_activity where cardinality(pg_blocking_pids(pid)) > 0) and usename = 'DBUSER_10' and now() - state_change >= interval $a'${TIMEOUT_MINS}'$a order by now() - state_change >= interval $a'${TIMEOUT_MINS}'$a desc limit 1; From: David G. Johnston Sent: Thursday, July 4, 2024 8:17 PM To: Murthy Nunna Cc: pgsql-general@lists.postgresql.org Subject: Re: psql help [EXTERNAL] – This message is from an external sender On Thu, Jul 4, 2024 at 4:56 PM Murthy Nunna mailto:mnu...@fnal.gov>> wrote: How can I rewrite the above in psql The only real trick is using a psql variable instead of the shell-injection of the environment variable. Use the --set CLI argument to assign the environment variable to a psql variable then refer to it in the query using :'timout_mins' Removing the limit 1 should be as simple as not typing limit 1 when you bring the query into the psql script. David J.
Re: psql help
On 7/4/24 20:54, Murthy Nunna wrote: Sorry, there is no problem with the following statement and the environment variable. It works fine. But it terminates only one PID due to LIMIT 1. I want to terminate all pids that meet this criteria. If I remove LIMIT 1, pg_terminate_backend(pid) will not work as it expects only one pid at a time. So, the question is how to rewrite this psql so it loops through all pids one pid at a time? Thanks in advance for your help. From here: https://www.postgresql.org/docs/current/app-psql.html See: \gexec -- Adrian Klaver adrian.kla...@aklaver.com
Re: Design strategy for table with many attributes
On Fri, Jul 5, 2024 at 1:26 AM David G. Johnston wrote: > On Thu, Jul 4, 2024 at 12:38 PM Lok P wrote: > >> >> Should we break the single transaction into multiple tables like one main >> table and other addenda tables with the same primary key to join and fetch >> the results wherever necessary? >> >> > I would say yes. Find a way to logically group sets of columns together > and place those groups into separate tables. I'd also be looking for cases > where multiple columns really should be multiple rows. This is not > uncommon. > > David J. > > Thank you David. As you said, to logically break this into multiple tables so i believe it means it should be such that there will be no need to query multiple tables and join them most of the time for fetching the results. It should just fetch the results from one table at any point in time. But do you also suggest keeping those table pieces related to each other through the same primary key ? Won't there be a problem when we load the data like say for example , in normal scenario the data load will be to one table but when we break it to multiple tables it will happen to all the individual pieces, won't that cause additional burden to the data load? Also I understand the technical limitation of the max number of columns per table is ~1600. But should you advise to restrict/stop us to some low number long before reaching that limit , such that we will not face any anomalies when we grow in future. And if we should maintain any specific order in the columns from start to end column in the specific table?
Re: psql help
The convention here is to in-line replies, or bottom-post. Top-posting makes the archives more difficult to read. On Thursday, July 4, 2024, Murthy Nunna wrote: > pg_terminate_backend(pid) will not work as it expects only one pid at a > time. > > > Interesting…I wouldn’t expect the function calls to interact that way…maybe try putting the select query into a plpgsql DO command loop and then call pg_terminate_backend once per loop iteration. > So, the question is how to rewrite this psql so it loops through all pids > one pid at a time? > > I can’t tell if you really mean the psql program or are misusing the term to mean something different…the code you wrote doesn’t seem like it would execute in psql. David J.
Re: Design strategy for table with many attributes
On Fri, 5 Jul 2024 at 17:07, Lok P wrote: > Also I understand the technical limitation of the max number of columns per > table is ~1600. But should you advise to restrict/stop us to some low number > long before reaching that limit , such that we will not face any anomalies > when we grow in future. And if we should maintain any specific order in the > columns from start to end column in the specific table? Something else you may wish to consider, depending on the column types of your 900+ columns is the possibility that some INSERTs may fail due to row length while others with shorter variable length values may be ok. Here's a quick example with psql: select 'create table a (' || string_agg('a'||x||' text not null default ',',') || ')' from generate_series(1,1000)x; \gexec insert into a default values; INSERT 0 1 again but with a larger DEFAULT to make the tuple larger. select 'create table b (' || string_agg('a'||x||' text not null default $$hello world$$',',') || ')' from generate_series(1,1000)x; \gexec insert into b default values; ERROR: row is too big: size 12024, maximum size 8160 There is a paragraph at the bottom of [1] with some warnings about things relating to this. The tuple length would be fixed for fixed-length types defined as NOT NULL. So, if you have that, there should be no such surprises. David [1] https://www.postgresql.org/docs/current/limits.html
Re: psql help
"David G. Johnston" writes: > On Thursday, July 4, 2024, Murthy Nunna wrote: >> pg_terminate_backend(pid) will not work as it expects only one pid at a >> time. > Interesting…I wouldn’t expect the function calls to interact that > way TBH, my reaction to that was that the OP doesn't understand SQL semantics. As you previously said, simply removing the LIMIT clause should work fine. (The ORDER BY looks kinda pointless, too, unless there are operational constraints we weren't told about.) There is a question of exactly what "$a'${TIMEOUT_MINS}'$a" is supposed to mean, but that's independent of the LIMIT issue. regards, tom lane
Re: Design strategy for table with many attributes
On Thursday, July 4, 2024, Lok P wrote: > > But do you also suggest keeping those table pieces related to each other > through the same primary key ? > > Yes, everyone row produced from the input data “row” should get the same ID associated with it - either as an entire PK or a component of a multi-column PK/unique index. > > > Won't there be a problem when we load the data like say for example , in > normal scenario the data load will be to one table but when we break it to > multiple tables it will happen to all the individual pieces, won't that > cause additional burden to the data load? > Yes, doing this requires additional CPU time to perform the work. I’d say IO is hopefully a wash. > > Also I understand the technical limitation of the max number of > columns per table is ~1600. But should you advise to restrict/stop us to > some low number long before reaching that limit , such that we will not > face any anomalies when we grow in future. > In a row-oriented system wider is worser. I start evaluation of table design with that in mind at the fourth column (including the surrogate key that is usually present, and the natural key - ignoring auditing columns.) > > And if we should maintain any specific order in the columns from start to > end column in the specific table? > There is material out there on micro-optimizing column ordering to match with alignment boundaries. I’d the benefit is meaningful but there is a non-trivial cost to actually setup the testing to verify that what you’ve figured out is working. Never actually done it myself. Though it actually seems like something someone could/may have written an algorithm for (though I do not recall ever seeing mention of one.) David J.
Re: Design strategy for table with many attributes
On Fri, Jul 5, 2024 at 10:45 AM Guyren Howe wrote: > On Jul 4, 2024, at 22:07, Lok P wrote: > > If you stick to the principle of grouping columns in a table when you use > those columns together, you should be good. > > Note that you might want to split up the “parent” table if that naturally > groups its columns together for certain uses. In that case, you could have > the same pk on all the 1:1 tables you then have. In that case, the pk for > each of those tables is also the fk. > Thank you. When you said below, *"Note that you might want to split up the “parent” table if that naturally groups its columns together for certain uses. In that case, you could have the same pk on all the 1:1 tables you then have. In that case, the pk for each of those tables is also the fk."* Do you mean having a real FK created through DDL and maintaining it or just assume it and no need to define it for all the pieces/tables. Only keep the same PK across all the pieces and as we know these are related to the same transaction and are logically related?