Hi Peter, Thanks! I have implemented this and indeed the maximum delays are lower but the time for a delete batch to complete takes now longer (roughly 3-4 times; for batchSize=500, total deleted items around ~10000). The problem is likely that I have VARCHAR for the ID column.
If I increase the frequency of issuing the original DELETE statement: DELETE FROM mytable WHERE created_at < ? Won't it have a similar effect due to smaller batches? Regards Peter On 07.10.19 16:31, Peter Ondruška wrote: > In my case I have two separate steps. First SELECT primary keys of > those records to be deleted (in your case SELECT id FROM mytable WHERE > created_at < some_fixed_millis). And then I issue DELETE for those > primary keys in batches of N statements (N being configuration > parameter). You could create stored procedure for this with two > parameters (some_fixed_millis, batch_size). > Your idea DELETE WHERE SELECT and limiting rows needs to be run for > every DELETE step making unnecessary read I/O. > > > On Mon, 7 Oct 2019 at 14:10, Peter <[email protected] > <mailto:[email protected]>> wrote: > > Hi Peter, > > Thanks a lot for the suggestion.This would be nice if it performs > better. > > Is the idea to split one request into smaller parts or will > "Select+Delete IDs" just perform better? > > And regarding the latter option - is this possible in one SQL > request? So something like > > DELETE FROM mytable WHERE id IN > > ( SELECT id FROM mytable WHERE created_at < some_fixed_millis OFFSET 0 > ROWS FETCH NEXT 1000 ROWS ONLY ) > > > And then loop through the results via changing OFFSET and ROWS? > (Btw: the column created_at is indexed) > > Or would you recommend doing this as 2 separate statements in > Java/JDBC? Or via maybe even just issuing the original DELETE > request more frequent? > > Regards > Peter > > On 06.10.19 03:50, Peter Ondruška wrote: >> Peter, try this if it makes a difference: >> >> 1. Select entries to be deleted, note their primary keys. >> 2. Issue delete using keys to be deleted (1.) and use short >> transaction batches. >> >> On Sun, 6 Oct 2019, 01:33 Peter, <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi, >> >> I have a table "mytable" with columns "id", "created_at" and >> "json" >> (VARCHAR, BIGINT, LONG VARCHAR), where data is coming in like >> new 200k >> entries every hour and I would like to keep only entries of >> the last 1 >> or 2 hours. It is expected behaviour for the user if too old >> entries >> gets lost as it is some kind of a LRU cache. >> >> The current solution is to delete entries older than 4 hours >> every 30 >> minutes: >> >> DELETE FROM mytable WHERE created_at < ? >> >> I'm using this in a prepared statement where ? is "4 hours >> ago" in >> milliseconds (new DateTime().getMillis()). >> >> This works, but some (not all) INSERT statement get a bigger >> delay in >> the same order (2-5 seconds) that this DELETE takes, which is >> ugly. >> These INSERT statements are executed independently (using >> different >> threads) of the DELETE. >> >> Is there a better way? Can I somehow avoid locking the >> unrelated INSERT >> operations? >> >> What helps a bit is when I make those deletes more frequently >> than the >> delays will get smaller, but then the number of those delayed >> requests >> will increase. >> >> What also helps a bit (currently have not seen a negative >> impact) is >> increasing the page size for the Derby Network Server: >> -Dderby.storage.pageSize=32768 >> >> Regards >> Peter >> >
