Performance
Hi, We want to work with PostgreSQL in our new project. I need your opinion on the best way to create a database. Description of our Project: It will be in Client/Server Architecture. Windows Application users will access the server as clients and they are all in different locations. There will be a simple ERP system that will perform CRUD transactions and report them. We are considering connecting to the Embarcadero Firedac dataset. We can also connect clients with PosgreRestAPI. Our number of clients can be between 5k-20K. We have a maximum of 200 tables consisting of invoice, order, customer, bank and stock information. I can create a second Postgre SQL for reporting if necessary. Question 1 : Should we install PostgreSQL on Windows server operating system or Linux operating system? 2: Is it correct to open a field named client_id for each table, for example the customer table, and use this field in CRUD operations to host the same single customer table for all users? 3: Create a separate table for each User? (result: 5000 users x 200 Tables = 1,000,000 tables) 4: Create a database per user? (result: 5000 databases) 5: Is each user a separate schema? (result: 5000 schemas) Can you share your ideas with me? Thank you.
Re: Performance
Hi Mehmet, On Wed, 31 Jan 2024 at 13:33, Mehmet COKCEVIK wrote: > Hi, > We want to work with PostgreSQL in our new project. I need your opinion on > the best way to create a database. > First of all, congratulations on your decision to use PostgreSQL for your new project. :) > Description of our Project: > It will be in Client/Server Architecture. Windows Application users will > access the server as clients and they are all in different locations. There > will be a simple ERP system that will perform CRUD transactions and report > them. > I hope you are not thinking of keeping business logic on the application side and querying the database from different locations. If you treat the database as a regular application's database and run multiple DML's for each request through the internet, performance of the application will be horrible due to latency between the application and the database. In case you plan to use such a model, the best approach would be to decrease the number of queries as much as possible, and achieve multiple operations by a single request, instead of reading from multiple tables, doing some calculations, writing back something to the database etc. I would move the logic to the database side as much as possible and do function/procedure calls, or have an application nearby the database and make clients' applications interact with it. So, the business logic would still be in an application and close to the database. > We are considering connecting to the Embarcadero Firedac dataset. We can > also connect clients with PosgreRestAPI. > Our number of clients can be between 5k-20K. > We have a maximum of 200 tables consisting of invoice, order, customer, > bank and stock information. I can create a second Postgre SQL for reporting > if necessary. > This is an interesting point. Because, if you plan to have 20k clients, you should also be planning high availability, backups, replications etc. Serving 20k clients with a standalone server would not be something I would like to involve :) > Question 1 : > Should we install PostgreSQL on Windows server operating system or Linux > operating system? > My personal opinion, this is not even a question. The answer is and will always be Linux for me :D However, the actual question is what is the cost of managing a Linux server for you. If you are not familiar with Linux, if you don't have any experience with linux, and if you don't have a company or budget to hire/work with you on this who is a professional linux or PostgreSQL admin, going with Windows is a much more sensible option for you even though it is not the best OS or not the best performing option for PostgreSQL. > 2: > Is it correct to open a field named client_id for each table, for example > the customer table, and use this field in CRUD operations to host the same > single customer table for all users? > It depends on the data size and your project's isolation/security requirements. You may also consider partitioning and row level security features of PostgreSQL. There is not a single recipe that is good for all multi-tenancy needs. :) > 3: > Create a separate table for each User? (result: 5000 users x 200 Tables = > 1,000,000 tables) > 4: > Create a database per user? (result: 5000 databases) > 5: > Is each user a separate schema? (result: 5000 schemas) > > Can you share your ideas with me? > Thank you. > Best regards. Samed YILDIRIM
Re: Slow query in table where many rows were deleted. VACUUM FULL fixes it
Hi Pavlos This is my understanding of why you were not able to run the query fast enough after the vacuum analyze. This is possibly what would have happened: 1. The relation has 5 million expired URLs and 5 thousand non-expired URLs 2. Assuming that the table only has 5 million and 5 thousand tuples, once you delete the expired ones, there will be an autovacuum triggered. “If the number of tuples obsoleted since the last VACUUM exceeds the “vacuum threshold”, the table is vacuumed“ - https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM ; As the Analyze threshold will also be exceeded, that would also have been run by autovacuum alongside. 3. The status of this autovacuum (if it is running or blocked), could have been checked in the pg_stat_activity. 4. Note, autovacuum does not trigger to clean up the dead tuples if it is disabled for the relation (or in the postgresql.conf file). However, if you would have taken transaction IDs to the threshold of autovacuum_freeze_max_age, autovacuum would trigger to FREEZE transaction IDs even if disabled. 5. As you stated its a t3.micro instance, they have limited resources, so it could be that the autovacuum was slow running (again, this can be checked in pg_stat_activity). 6. Given that you manually ran a VACUUM ANALYZE and it did not make the query faster, could be due to internal fragmentation. You are right, Vacuum does not release the space back to the operating system in most cases. This statement is the documentation that can clarify this for you : “The standard form of VACUUM removes dead row versions in tables and indexes and marks the space available for future reuse. However, it will not return the space to the operating system, except in the special case where one or more pages at the end of a table become entirely free and an exclusive table lock can be easily obtained. In contrast, VACUUM FULL actively compacts tables by writing a complete new version of the table file with no dead space. This minimizes the size of the table, but can take a long time. It also requires extra disk space for the new copy of the table, until the operation completes.” https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY 7. This basically means that once you ran a VACUUM FULL, it might have actually shrunk the table quite significantly, which made the query to be much faster. 8. You could have compared the size of the table before and after the VACUUM FULL to understand this better. Just a few suggestion for doing bulk removal of data : - It would be worth looking into pg_repack for such bulk deletes rather than vacuum full as the former does not take an exclusive lock for the entire duration of the operation - https://reorg.github.io/pg_repack/ . However, you will still need double the space of the table, as it also recreates the table. - Another way of doing bulk removal of data would be to do a CTAS ( https://www.postgresql.org/docs/14/sql-createtableas.html) to a new table with the live data (in your case the 5 thousand tuples), and then dropping the old table (which means no dead tuples). You might need a trigger in between to make sure all the live data during use is transferred to the new table. - You might want to look into partitioning and drop the partitions once the URLs in that particular partition are no longer needed (Like URLs older than 6 months). Kind Regards Divya Sharma On Tue, Jan 30, 2024 at 8:38 PM David Rowley wrote: > On Wed, 31 Jan 2024 at 09:09, Philip Semanchuk > wrote: > > So in your case those 5m rows that you deleted were probably still > clogging up your table until you ran VACUUM FULL. > > It seems more likely to me that the VACUUM removed the rows and just > left empty pages in the table. Since there's no index on expires_at, > the only way to answer that query is to Seq Scan and Seq Scan will > need to process those empty pages. While that processing is very fast > if the page's item pointers array is empty, it could still be slow if > the page needs to be read from disk. Laurenz's request for the explain > (analyze, buffers) output with track_io_timing on will help confirm > this. > > If it is just reading empty pages that's causing this issue then > adding that missing index would improve the situation after running > just plain VACUUM each time there's a bulk delete. > > David > > >
