Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.
On Sun, Feb 24, 2019 at 2:04 AM Justin Pryzby wrote: > Some ideas: > > You could ALTER TABLE SET (fillfactor=50) to try to maximize use of HOT > indices > during UPDATEs (check pg_stat_user_indexes). > > You could also ALTER TABLE SET autovacuum parameters for more aggressive > vacuuming. > > You could recreate indices using the CONCURRENTLY trick > (CREATE INDEX CONCURRENTLY new; DROP old; ALTER .. RENAME;) > I have basically the same issue with a table. Each new row enters the table with a active=true kind of flag. The row is updated a lot, until a business condition expires, it is updated to active=false and then the row is almost never updated after that. We also used a partial index, to good effect, but also had/have an issue where the index bloats and performs worse rather quickly, only to recover a bit after an autovacuum pass completes. Lowering the fillfactor isn't a good solution because 99%+ of the table is "cold". One manual VACUUM FREEZE coupled with lowering the vacuum sensitivity on that one table helps quite a bit by increasing the frequency shortening the runtimes of autovacuums, but it's not a total solution. My next step is to partition the table on the "active" boolean flag, which eliminates the need for the partial indexes, and allows for different fillfactor for each partition (50 for true, 100 for false). This should also aid vacuum speed and make re-indexing the hot partition much faster. However, we have to upgrade to v11 first to enable row migration, so I can't yet report on how much of a solution that is.
Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.
> > Also, the REINDEX command always fails with a deadlock because there is a > row lock and a complete table lock involved. > > I consider this ultimately a bug, or at the very least there is room for > improvement. And I am on version 11.1. > regards, > -Gunther > REINDEX doesn't work concurrently yet (slated for v12). I think your solution may be something like this: 1. Create a new table, same columns, partitioned on the pending column. 2. Rename your existing queue table old_queue to the partitioned table as a default partition. 3. Rename new table to queue 4. add old_queue as the default partition of queue 5. add a new partition for pending = true rows, set the fillfactor kind of low, maybe 50, you can always change it. Now your pending = true rows can be one of two places, but your pending = false rows are all in 6. add all existing old_queue indexes (except those that are partial indexes on pending) to queue, these will be created on the new (empty) partition, and just matched to the existing indexes on old_queue 7. If pending = true records all ultimately become pending = false, wait for normal attrition to reach a state where all rows in the default partition are pending = false. If that won't happen, you may need to manually migrate some with a DELETE-INSERT 8. At this point, you can transactionally remove old_queue as a partition of queue, and then immediately re-add it to queue as the pending = false partition. There won't need to be a default partition. 9. drop all remaining partial indexes on pending, they're no longer useful. That's roughly my plan for my own hotspot table when we can upgrade to 11.
Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.
On Sun, Feb 24, 2019 at 5:43 PM Justin Pryzby wrote: > On Sun, Feb 24, 2019 at 04:34:34PM -0500, Corey Huinker wrote: > > I think your solution may be something like this: > > 1. Create a new table, same columns, partitioned on the pending column. > > 2. Rename your existing queue table old_queue to the partitioned table > as a > > default partition. > > 3. Rename new table to queue > > 4. add old_queue as the default partition of queue > > 5. add a new partition for pending = true rows, set the fillfactor kind > of > > FYI, the "default partition" isn't just for various and sundry > uncategorized > tuples (like a relkind='r' inheritence without any constraint). It's for > "tuples which are excluded by every other partition". And "row migration" > doesn't happen during "ALTER..ATTACH", only UPDATE. So you'll be unable to > attach a partition for pending=true if the default partition includes any > such > rows: > > |ERROR: updated partition constraint for default partition "t0" would be > violated by some row > > I think you'll need to schedule a maintenance window, create a new > partitioned > heirarchy, and INSERT INTO queue SELECT * FROM old_queue, or similar. > > Justin > Good point, I forgot about that. I had also considered making a partitioned table, adding a "true" partition to that, and then making the partitioned table an *inheritance* partition of the existing table, then siphoning off rows from the original table until such time as it has no more pending rows, then doing a transaction where you de-inherit the partitioned table, and then attach the original table as the false partition. It's all a lot of acrobatics to try to minimize downtime and it could be done better by having a longer maintenance window, but I got the impression from the OP that big windows were not to be had.
Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.
> > > Anyway, I think the partitioned table is the right and brilliant solution, > because an index really isn't required. The actual pending partition will > always remain quite small, and being a queue, it doesn't even matter how > big it might grow, as long as new rows are inserted at the end and not in > the middle of the data file and still there be some way of fast skip over > the part of the dead rows at the beginning that have already been processed > and moved away. > > Good thing is, I don't worry about maintenance window. I have the leisure > to simply tear down my design now and make a better design. What's 2 > million transactions if I can re-process them at a rate of 80/s? 7 hours > max. I am still in development. So, no need to worry about migration / > transition acrobatics. So if I take Corey's steps and envision the final > result, not worrying about the transition steps, then I understand this: > > 1. Create the Queue table partitioned on the pending column, this creates > the partition with the pending jobs (on which I set the fillfactor kind of > low, maybe 50) and the default partition with all the rest. Of course that > allows people with a constant transaction volume to also partition on jobId > or completionTime and move chunks out to cold archive storage. But that's > beside the current point. > I'm guessing there's a fairly insignificant difference in performance between one true partition and one false partition vs one true partition and a default partition, but I don't have insight into which one is better. > > One question I have though: I imagine our pending partition heap file to > now be essentially sequentially organized as a queue. New jobs are appended > at the end, old jobs are at the beginning. As pending jobs become completed > (pending = false) these initial rows will be marked as dead. So, while the > number of live rows will remain small in that pending partition, sequential > scans will have to skip over the dead rows in the beginning. > That's basically true, but vacuums are erasing deleted rows, and that space gets re-used. So the table builds up to a working-set size, and I envision it looking like a clock sweep, where your existing rows are at 11pm to 7pm, your new rows are inserting into space at 8pm that was vacuumed clean a while ago, and 9pm and 10pm have deleted rows that haven't been vacuumed yet. Where the empty spot is just keeps cycling through the table. Of course vacuum eliminates dead rows, but unless I do vacuum full, it will > not re-pack the live rows, and that requires an exclusive table lock. So, > what is the benefit of vacuuming that pending partition? What I *don't* > want is insertion of new jobs to go into open slots at the beginning of the > file. I want them to be appended (in Oracle there is an INSERT /*+APPEND*/ > hint for that. How does that work in PostgreSQL? > See above, the db (tries to) reuse the space space before new space is allocated. I don't know of an append equivalent for pgsql. If memory servers, the big win of /*+ APPEND */ was that raw data blocks were assembled out-of-band and then just written to disk. > Ultimately that partition will amass too many dead rows, then what do I > do? I don't think that the OS has a way to truncate files physically from > the head, does it? I guess it could set the file pointer from the first > block to a later block. But I don't know of an IOCTL/FCNTL command for > that. On some OS there is a way of making blocks sparse again, is that how > PostgreSQL might do it? Just knock out blocks as sparse from the front of > the file? > See clock sweep analogy above.
Planner not choosing GIN index
A client had an issue with a where that had a where clause something like this: WHERE 123456 = ANY(integer_array_column) I was surprised that this didn't use the pre-existing GIN index on integer_array_column, whereas recoding as WHERE ARRAY[123456] <@ integer_array_column did cause the GIN index to be used. Is this a known/expected behavior? If so, is there any logical reason why we couldn't have the planner pick up on that?
Re: Planner not choosing GIN index
On Wed, Mar 13, 2019 at 5:11 AM Flo Rance wrote: > It is an expected behavior. You can see the list of array operators with > which a GIN index can be used in the doc: > > https://www.postgresql.org/docs/current/indexes-types.html > > And a very good and detailed explanation about any operator here: > > > https://stackoverflow.com/questions/4058731/can-postgresql-index-array-columns/29245753#29245753 > > Regards, > Flo > > On Wed, Mar 13, 2019 at 2:44 AM Corey Huinker > wrote: > >> A client had an issue with a where that had a where clause something like >> this: >> >> WHERE 123456 = ANY(integer_array_column) >> >> >> I was surprised that this didn't use the pre-existing GIN index on >> integer_array_column, whereas recoding as >> >> WHERE ARRAY[123456] <@ integer_array_column >> >> >> did cause the GIN index to be used. Is this a known/expected behavior? If >> so, is there any logical reason why we couldn't have the planner pick up on >> that? >> > Thanks. I'll bring the question of why-cant-we over to the hackers list.
Re: [EXT] Re: Troubleshooting a long running delete statement
> > > > This won't be immediately useful, but there's been a patch proposed for > Postgres 15 to allow logging the plan of a running query [1]. Progress > seems to have stalled a bit, but it seems like there was a fair amount of > interest, so I wouldn't count it out yet. If you have thoughts on the > proposed functionality, I'm sure thoughtful feedback would be appreciated. > > > Didn't something get into v14 about doing ctd range scans, which would allow you to break up a large update/delete into chunks, so you wouldn't do a full seq scan, but you also would avoid needing an index as a proxy for batching records?
