Hi Peter, Thanks.
This procedure with disabling the autocommit is indeed simpler than I had before via "DELETE FROM mytable WHERE id IN (...)" but the delete itself takes longer (5-6 times) and I do not see differences with different batch sizes. I've also benchmarked this process against postgresql and derby seems to be much slower here. Will investigate more as migrating is also a risk. Regards Peter On 16.10.19 14:03, Peter Ondruška wrote: > You would need to test various scenarios. First I would propose larger > batch size (N thousands of rows). Are you sure you execute deletes in > batches? You should have autocommit off, execute N times delete > statement, commit, repeat. Pseudo code (I am on mobile phone): > > 1. Acquire connection > 2. Set connection autocommit to false > 3. Create prepared statement with delete, DELETE FROM WHERE primary > key = ? > 4. Create set of primary keys to be deleted > 5. Iterate set (4.) with adding those keys to delete statement (3.) as > batch > 6. When you reach batch size or end of key set execute batch and > commit, continue (5.) > > In my case with slow disk this really performs better and should avoid > your issue as well. > > Peter > > On Mon, 7 Oct 2019, 22:11 Peter, <[email protected] > <mailto:[email protected]>> wrote: > > Hi Peter, > > Thanks! I have implemented this and indeed the maximum delays are > lower but the time for a delete batch to complete takes now longer > (roughly 3-4 times; for batchSize=500, total deleted items around > ~10000). The problem is likely that I have VARCHAR for the ID column. > > If I increase the frequency of issuing the original DELETE statement: > > DELETE FROM mytable WHERE created_at < ? > > Won't it have a similar effect due to smaller batches? > > Regards > Peter > > On 07.10.19 16:31, Peter Ondruška wrote: >> In my case I have two separate steps. First SELECT primary keys >> of those records to be deleted (in your case SELECT id FROM >> mytable WHERE created_at < some_fixed_millis). And then I issue >> DELETE for those primary keys in batches of N statements (N being >> configuration parameter). You could create stored procedure for >> this with two parameters (some_fixed_millis, batch_size). >> Your idea DELETE WHERE SELECT and limiting rows needs to be run >> for every DELETE step making unnecessary read I/O. >> >> >> On Mon, 7 Oct 2019 at 14:10, Peter <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Peter, >> >> Thanks a lot for the suggestion.This would be nice if it >> performs better. >> >> Is the idea to split one request into smaller parts or will >> "Select+Delete IDs" just perform better? >> >> And regarding the latter option - is this possible in one SQL >> request? So something like >> >> DELETE FROM mytable WHERE id IN >> >> ( SELECT id FROM mytable WHERE created_at < some_fixed_millis OFFSET >> 0 ROWS FETCH NEXT 1000 ROWS ONLY ) >> >> >> And then loop through the results via changing OFFSET and >> ROWS? (Btw: the column created_at is indexed) >> >> Or would you recommend doing this as 2 separate statements in >> Java/JDBC? Or via maybe even just issuing the original DELETE >> request more frequent? >> >> Regards >> Peter >> >> On 06.10.19 03:50, Peter Ondruška wrote: >>> Peter, try this if it makes a difference: >>> >>> 1. Select entries to be deleted, note their primary keys. >>> 2. Issue delete using keys to be deleted (1.) and use short >>> transaction batches. >>> >>> On Sun, 6 Oct 2019, 01:33 Peter, <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi, >>> >>> I have a table "mytable" with columns "id", "created_at" >>> and "json" >>> (VARCHAR, BIGINT, LONG VARCHAR), where data is coming in >>> like new 200k >>> entries every hour and I would like to keep only entries >>> of the last 1 >>> or 2 hours. It is expected behaviour for the user if too >>> old entries >>> gets lost as it is some kind of a LRU cache. >>> >>> The current solution is to delete entries older than 4 >>> hours every 30 >>> minutes: >>> >>> DELETE FROM mytable WHERE created_at < ? >>> >>> I'm using this in a prepared statement where ? is "4 >>> hours ago" in >>> milliseconds (new DateTime().getMillis()). >>> >>> This works, but some (not all) INSERT statement get a >>> bigger delay in >>> the same order (2-5 seconds) that this DELETE takes, >>> which is ugly. >>> These INSERT statements are executed independently >>> (using different >>> threads) of the DELETE. >>> >>> Is there a better way? Can I somehow avoid locking the >>> unrelated INSERT >>> operations? >>> >>> What helps a bit is when I make those deletes more >>> frequently than the >>> delays will get smaller, but then the number of those >>> delayed requests >>> will increase. >>> >>> What also helps a bit (currently have not seen a >>> negative impact) is >>> increasing the page size for the Derby Network Server: >>> -Dderby.storage.pageSize=32768 >>> >>> Regards >>> Peter >>> >> >
