Hi Peter,

Thanks.

This procedure with disabling the autocommit is indeed simpler than I
had before via "DELETE FROM mytable WHERE id IN (...)" but the delete
itself takes longer (5-6 times) and I do not see differences with
different batch sizes.

I've also benchmarked this process against postgresql and derby seems to
be much slower here. Will investigate more as migrating is also a risk.

Regards
Peter

On 16.10.19 14:03, Peter Ondruška wrote:
> You would need to test various scenarios. First I would propose larger
> batch size (N thousands of rows). Are you sure you execute deletes in
> batches? You should have autocommit off, execute N times delete
> statement, commit, repeat. Pseudo code (I am on mobile phone):
>
> 1. Acquire connection
> 2. Set connection autocommit to false
> 3. Create prepared statement with delete, DELETE FROM WHERE primary
> key = ?
> 4. Create set of primary keys to be deleted
> 5. Iterate set (4.) with adding those keys to delete statement (3.) as
> batch
> 6. When you reach batch size or end of key set execute batch and
> commit, continue (5.)
>
> In my case with slow disk this really performs better and should avoid
> your issue as well.
>
> Peter
>
> On Mon, 7 Oct 2019, 22:11 Peter, <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Hi Peter,
>
>     Thanks! I have implemented this and indeed the maximum delays are
>     lower but the time for a delete batch to complete takes now longer
>     (roughly 3-4 times; for batchSize=500, total deleted items around
>     ~10000). The problem is likely that I have VARCHAR for the ID column.
>
>     If I increase the frequency of issuing the original DELETE statement:
>
>     DELETE FROM mytable WHERE created_at < ?
>
>     Won't it have a similar effect due to smaller batches?
>
>     Regards
>     Peter
>
>     On 07.10.19 16:31, Peter Ondruška wrote:
>>     In my case I have two separate steps. First SELECT primary keys
>>     of those records to be deleted (in your case SELECT id FROM
>>     mytable WHERE created_at < some_fixed_millis). And then I issue
>>     DELETE for those primary keys in batches of N statements (N being
>>     configuration parameter). You could create stored procedure for
>>     this with two parameters (some_fixed_millis, batch_size).
>>     Your idea DELETE WHERE SELECT and limiting rows needs to be run
>>     for every DELETE step making unnecessary read I/O.
>>
>>
>>     On Mon, 7 Oct 2019 at 14:10, Peter <[email protected]
>>     <mailto:[email protected]>> wrote:
>>
>>         Hi Peter,
>>
>>         Thanks a lot for the suggestion.This would be nice if it
>>         performs better.
>>
>>         Is the idea to split one request into smaller parts or will
>>         "Select+Delete IDs" just perform better?
>>
>>         And regarding the latter option - is this possible in one SQL
>>         request? So something like
>>
>>         DELETE FROM mytable WHERE id IN 
>>
>>         ( SELECT id FROM mytable WHERE created_at < some_fixed_millis OFFSET 
>> 0 ROWS FETCH NEXT 1000 ROWS ONLY )
>>
>>
>>         And then loop through the results via changing OFFSET and
>>         ROWS? (Btw: the column created_at is indexed)
>>
>>         Or would you recommend doing this as 2 separate statements in
>>         Java/JDBC? Or via maybe even just issuing the original DELETE
>>         request more frequent?
>>
>>         Regards
>>         Peter
>>
>>         On 06.10.19 03:50, Peter Ondruška wrote:
>>>         Peter, try this if it makes a difference:
>>>
>>>         1. Select entries to be deleted, note their primary keys.
>>>         2. Issue delete using keys to be deleted (1.) and use short
>>>         transaction batches.
>>>
>>>         On Sun, 6 Oct 2019, 01:33 Peter, <[email protected]
>>>         <mailto:[email protected]>> wrote:
>>>
>>>             Hi,
>>>
>>>             I have a table "mytable" with columns "id", "created_at"
>>>             and "json"
>>>             (VARCHAR, BIGINT, LONG VARCHAR), where data is coming in
>>>             like new 200k
>>>             entries every hour and I would like to keep only entries
>>>             of the last 1
>>>             or 2 hours. It is expected behaviour for the user if too
>>>             old entries
>>>             gets lost as it is some kind of a LRU cache.
>>>
>>>             The current solution is to delete entries older than 4
>>>             hours every 30
>>>             minutes:
>>>
>>>             DELETE FROM mytable WHERE created_at < ?
>>>
>>>             I'm using this in a prepared statement where ? is "4
>>>             hours ago" in
>>>             milliseconds (new DateTime().getMillis()).
>>>
>>>             This works, but some (not all) INSERT statement get a
>>>             bigger delay in
>>>             the same order (2-5 seconds) that this DELETE takes,
>>>             which is ugly.
>>>             These INSERT statements are executed independently
>>>             (using different
>>>             threads) of the DELETE.
>>>
>>>             Is there a better way? Can I somehow avoid locking the
>>>             unrelated INSERT
>>>             operations?
>>>
>>>             What helps a bit is when I make those deletes more
>>>             frequently than the
>>>             delays will get smaller, but then the number of those
>>>             delayed requests
>>>             will increase.
>>>
>>>             What also helps a bit (currently have not seen a
>>>             negative impact) is
>>>             increasing the page size for the Derby Network Server:
>>>             -Dderby.storage.pageSize=32768
>>>
>>>             Regards
>>>             Peter
>>>
>>
>

Reply via email to