Dynamic creation of list partitions in highly concurrent write environment
Hello everyone. We are attempting to evaluate list partitioning over hash partitioning (which we currently use) to better facilitate dropping tables that distinctly model devices we wish to purge. We don't want to use a DML statement for cleanup since these tables can contain billions of rows otherwise. Using PG15, we've hit a snag; We're attempting to use declarative partitioning syntax in a function run as an after trigger; we hit deadlocks. We try the older inheritance alternative, we still hit deadlocks. I appreciate this is generally because creating tables isn't a concurrently supported operation. Here's the general approach we have, without code and with hypothetical model, since I'm writing this hastily; table devices; <-- After trigger on here table routes; <-- To create physical partition for this logical table table route_for_device_N; <-- The dynamically created partition We don't know the device names and thus partitions up front, hence the need for dynamic creation. How do you go about doing this or is it folly!? Cheers Jim -- Jim Vanns Principal Production Engineer Industrial Light & Magic, London
Re: Hash Index on Partitioned Table
Hi Peter, in postgres 13, create index should be, by default, parallelized. so albeit for specific values of the parallelization parameters in postgresql.conf, your machine should use more than one core while creating the indexes. also you can set the maintenance_workmem parameter to the max for such a job, as you have some RAM. In my own experience of indexing big partitioned tables, I did create a few scripts to create each index (there is one index for each partition) and when finished create the 'global' index which, as the job is already done, is fast. (check the 'only' parameter in create index doc). doing this it was easy to somewhat optimize the process according to number of available core/RAM/storage. hf Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Wed, May 31, 2023 at 7:53 PM peter.boris...@kartographia.com < peter.boris...@kartographia.com> wrote: > Hi Tom, > > Thanks so much for your quick response. As luck would have it, the > index FINALLY finished about an hour ago. For a size comparison: > > > > BTree: 6,433 GB > > Hash: 8,647 GB > > > > Although I don't have a proper benchmark to compare performance, I can say > the hash is working as good as if not faster than the BTree for my use case > (web application). > > > > I guess I was getting a little nervous waiting for the index to complete > and seeing such a huge difference in file size but I'm ok now :-) > > > > Thanks again, > > Peter > > > > > > -Original Message- > From: "Tom Lane" > Sent: Wednesday, May 31, 2023 10:07am > To: "peter.boris...@kartographia.com" > Cc: pgsql-general@lists.postgresql.org > Subject: Re: Hash Index on Partitioned Table > > "peter.boris...@kartographia.com" > writes: > > I have a rather large database with ~250 billion records in a > partitioned table. The database has been running and ingesting data > continuously for about 3 years. > > > I had a "regular" BTree index on one of the fields (a unique bigint > column) but it was getting too big for the disk it was on. The index was > consuming 6.4 TB of disk space. > > That's ... really about par for the course. Each row requires an 8-byte > index entry, plus 12 bytes index overhead. If I'm doing the math right > then the index is physically about 78% full which is typical to good for > a b-tree. Reindexing would remove most of the extra space, but only > temporarily. > > > After doing some research I decided to try to create a hash index > instead of a BTree. For my purposes, the index is only used to find > specific numbers ("=" and "IN" queries). From what I read, the hash index > should run a little faster than btree for my use case and should use less > disk space. > > I'm skeptical. The thing to bear in mind is that btree is the mainstream > use-case and has been refined and optimized far more than the hash index > logic. > > > (1) Why is the hash index consuming more disk space than the btree > index? Is it because the hash of the bigint values larger than the storing > the bigints in the btree? > > From memory, the index entries will be the same size in this case, > but hash might have more wasted space. > > > (4) Is there any way to estimate when the index process will complete? > > An index on a partitioned table isn't a single object, it's one index per > partition. So you should be able to look at how many partitions have > indexes so far. You might have to drill down to the point of counting how > many files in the database's directory, if the individual indexes aren't > showing up as committed catalog entries yet. > > regards, tom lane >
RE: EXTERNAL: Re: Question - Does PostgreSQL have an Evaluation Assurance Level?
Hi Laurenz, Thanks for your reply but we are actually all set with this. We found out that while PostgreSQL does not have EAL, the 'Crunchy Data' does have EAL of 2. Please feel free to close/discontinue this question and discussion if you like. Thanks, Nick -Original Message- From: Laurenz Albe Sent: Wednesday, May 31, 2023 3:08 PM To: Mayer, Nicholas J (US) ; pgsql-general@lists.postgresql.org Subject: EXTERNAL: Re: Question - Does PostgreSQL have an Evaluation Assurance Level? On Tue, 2023-05-30 at 13:48 +, Mayer, Nicholas J wrote: > My name is Nick Mayer, and I had a question concerning PostgreSQL’s > EAL. Has PostgreSQL been put through any audit/security testing, and > does it have an EAL? If so, would I be able to get this information? I would > appreciate any assistance you are able to provide for this. I have never heard of that, but I'll reply on the -general list, where the question is more likely to reach the people who know. Yours, Laurenz Albe
RE: EXTERNAL: Re: Question - Does PostgreSQL have an Evaluation Assurance Level?
Hi Laurenz, Thanks for this information. That is correct, we are just ticking off the checkboxes at the moment but I appreciate your feedback. Thanks again, Nick -Original Message- From: Laurenz Albe Sent: Wednesday, May 31, 2023 4:31 PM To: Mayer, Nicholas J (US) ; pgsql-general@lists.postgresql.org Subject: EXTERNAL: Re: EXTERNAL: Re: Question - Does PostgreSQL have an Evaluation Assurance Level? On Wed, 2023-05-31 at 19:51 +, Mayer, Nicholas J wrote: > We found out that while PostgreSQL does not have EAL, the 'Crunchy Data' does > have EAL of 2. I see. I guess you are aware that a closed source fork of PostgreSQL is probably no more secure than the original. But this is more about ticking off checkboxes, right? Yours, Laurenz Albe
Re: Help needed to understand query planner regression with incremental sort
Thank you for taking the time to look this through. After reading your answer it obviously makes perfect sense, I was just thrown off by a fact that when it started happening, it happened on every query execution (for the same test query that I used). But I failed to think about trying different search keywords to see if that made any difference in the query plan / estimations. I'll investigate this further and will try to clean the indexes up and make sure the filter has correct index to use. And yes, the schema attachment seems a bit off, the DDL copy didn't include the ordering for the fields inside composite indexes I think, so that's why they appear the same. Very much appreciated of your time. With best, Henrik. On Wed, May 31, 2023 at 6:31 PM Tom Lane wrote: > "Henrik Peinar (nodeSWAT.com)" writes: > > *Quick description: *After upgrading our Aurora PostgreSQL engine from > v11 > > to v15, one of our often run SQL's started taking tens of seconds instead > > of running sub 100ms. Explain analyze showed that the query planner had > > switched to using incremental sort. Running ANALYZE on the table fixed > the > > issue temporarily, but it came back after few days. > > Hmm. I think it's quite accidental that you get one plan over the other, > because it looks like the key difference is something the planner doesn't > account for. In the fast case you have > > -> Index Scan using products_pinned_at_listed_at_ix on products > (cost=0.43..3929423.12 rows=26523 width=1463) (actual time=2.325..32.872 > rows=61 loops=1) > Filter: ((desc_tsv @@ '''nike'':*'::tsquery) AND (status = > 'published'::prod_status)) > Rows Removed by Filter: 3376 > > The index is only being used to produce ordered output here: the filter > condition isn't related to the index. And what we see is that the > query is fast because the desired rows are found in the first 3376+61 > rows visited in this direction. Meanwhile in the slow case you have > > -> Index Scan Backward using products_pinned_at_ix on > products (cost=0.43..1172249.47 rows=26139 width=1460) (actual > time=5.263..5203.180 rows=18411 loops=1) > Filter: ((desc_tsv @@ '''nike'':*'::tsquery) AND > (status = 'published'::prod_status)) > Rows Removed by Filter: 5415895 > > Again, the filter condition isn't exploiting the index, we're just using > the index to (partially) satisfy the ORDER BY. This one takes a long time > because it has to trawl through 5415895+61 rows before reaching the LIMIT. > > So AFAICS, the runtime differential is mostly/entirely because the rows > satisfying the filter condition are located near one end of the range of > pinned_at. That is not a correlation that the planner knows anything > about, so it's unable to see that these two ways of scanning the table > will have significantly different costs. Moreover, I think you'd be > mistaken to draw any great conclusions from this specific example, > because with some other search term(s) the results might be totally > different due to the required rows not falling in the same place. > > What I'd think about if this type of query is important is to set up > an index that can actually be used to satisfy the filter condition, > so that you're not forcing it into "scan the whole table till you > find the rows you want". It looks like you already have such an > index, ie a GIN index on the desc_tsv column, although I don't trust > that your schema attachment is actually accurate because if it is > then you have a bunch of duplicative indexes. You might try > dropping the other indexes to see if you can coerce the planner > into using that one, and then seeing what the cost estimate is. > > regards, tom lane >
Re: Dynamic creation of list partitions in highly concurrent write environment
Jim Vanns writes: > Here's the general approach we have, without code and with hypothetical > model, since I'm writing this hastily; > table devices; <-- After trigger on here > table routes; <-- To create physical partition for this logical table > table route_for_device_N; <-- The dynamically created partition > We don't know the device names and thus partitions up front, hence the need > for dynamic creation. How do you go about doing this or is it folly!? Sounds like folly, TBH. You're going to end with a ridiculous number of partitions of the routes table, which will not be good for performance. regards, tom lane
Re: Dynamic creation of list partitions in highly concurrent write environment
Yeah, that's a known concern - hence the evaluation as part of the list vs. hash comparison ;) Jim On Thu, 1 Jun 2023 at 14:32, Tom Lane wrote: > Jim Vanns writes: > > Here's the general approach we have, without code and with hypothetical > > model, since I'm writing this hastily; > > > table devices; <-- After trigger on here > > table routes; <-- To create physical partition for this logical table > > table route_for_device_N; <-- The dynamically created partition > > > We don't know the device names and thus partitions up front, hence the > need > > for dynamic creation. How do you go about doing this or is it folly!? > > Sounds like folly, TBH. You're going to end with a ridiculous number > of partitions of the routes table, which will not be good for performance. > > regards, tom lane > -- Jim Vanns Principal Production Engineer Industrial Light & Magic, London
Re: speed up full table scan using psql
On 5/31/23 22:51, Lian Jiang wrote: The whole command is: psql %(pg_uri)s -c %(sql)s | %(sed)s | %(pv)s | %(split)s) 2>&1 | %(tr)s where: sql is "copy (select row_to_json(x_tmp_uniq) from public.mytable x_tmp_uniq) to stdout" sed, pv, split, tr together format and split the stdout into jsonl files. Well that is quite the pipeline. At this point I think you need to do some testing on your end. First create a table that is a subset of the original data to make testing a little quicker. Then break the process down into smaller actions. Start with just doing a COPY direct to CSV and one with the row_to_json to see if that makes a difference. Then COPY directly to a file before applying the above pipeline. There are more ways you can slice this depending on what the preceding shows you. Hope this helps. -- Adrian Klaver adrian.kla...@aklaver.com
[Beginner Question]A question about yacc & lex
Hi team, now I'm learning the yacc & lex to understand the principle of the postgres's parser. And I write a test program as this: /* array.l Array program Wen Yi */ %option noyywrap %option noinput %{ #include