Hi Peter,

Thanks! I have implemented this and indeed the maximum delays are lower
but the time for a delete batch to complete takes now longer (roughly
3-4 times; for batchSize=500, total deleted items around ~10000). The
problem is likely that I have VARCHAR for the ID column.

If I increase the frequency of issuing the original DELETE statement:

DELETE FROM mytable WHERE created_at < ?

Won't it have a similar effect due to smaller batches?

Regards
Peter

On 07.10.19 16:31, Peter Ondruška wrote:
> In my case I have two separate steps. First SELECT primary keys of
> those records to be deleted (in your case SELECT id FROM mytable WHERE
> created_at < some_fixed_millis). And then I issue DELETE for those
> primary keys in batches of N statements (N being configuration
> parameter). You could create stored procedure for this with two
> parameters (some_fixed_millis, batch_size).
> Your idea DELETE WHERE SELECT and limiting rows needs to be run for
> every DELETE step making unnecessary read I/O.
>
>
> On Mon, 7 Oct 2019 at 14:10, Peter <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Hi Peter,
>
>     Thanks a lot for the suggestion.This would be nice if it performs
>     better.
>
>     Is the idea to split one request into smaller parts or will
>     "Select+Delete IDs" just perform better?
>
>     And regarding the latter option - is this possible in one SQL
>     request? So something like
>
>     DELETE FROM mytable WHERE id IN 
>
>     ( SELECT id FROM mytable WHERE created_at < some_fixed_millis OFFSET 0 
> ROWS FETCH NEXT 1000 ROWS ONLY )
>
>
>     And then loop through the results via changing OFFSET and ROWS?
>     (Btw: the column created_at is indexed)
>
>     Or would you recommend doing this as 2 separate statements in
>     Java/JDBC? Or via maybe even just issuing the original DELETE
>     request more frequent?
>
>     Regards
>     Peter
>
>     On 06.10.19 03:50, Peter Ondruška wrote:
>>     Peter, try this if it makes a difference:
>>
>>     1. Select entries to be deleted, note their primary keys.
>>     2. Issue delete using keys to be deleted (1.) and use short
>>     transaction batches.
>>
>>     On Sun, 6 Oct 2019, 01:33 Peter, <[email protected]
>>     <mailto:[email protected]>> wrote:
>>
>>         Hi,
>>
>>         I have a table "mytable" with columns "id", "created_at" and
>>         "json"
>>         (VARCHAR, BIGINT, LONG VARCHAR), where data is coming in like
>>         new 200k
>>         entries every hour and I would like to keep only entries of
>>         the last 1
>>         or 2 hours. It is expected behaviour for the user if too old
>>         entries
>>         gets lost as it is some kind of a LRU cache.
>>
>>         The current solution is to delete entries older than 4 hours
>>         every 30
>>         minutes:
>>
>>         DELETE FROM mytable WHERE created_at < ?
>>
>>         I'm using this in a prepared statement where ? is "4 hours
>>         ago" in
>>         milliseconds (new DateTime().getMillis()).
>>
>>         This works, but some (not all) INSERT statement get a bigger
>>         delay in
>>         the same order (2-5 seconds) that this DELETE takes, which is
>>         ugly.
>>         These INSERT statements are executed independently (using
>>         different
>>         threads) of the DELETE.
>>
>>         Is there a better way? Can I somehow avoid locking the
>>         unrelated INSERT
>>         operations?
>>
>>         What helps a bit is when I make those deletes more frequently
>>         than the
>>         delays will get smaller, but then the number of those delayed
>>         requests
>>         will increase.
>>
>>         What also helps a bit (currently have not seen a negative
>>         impact) is
>>         increasing the page size for the Derby Network Server:
>>         -Dderby.storage.pageSize=32768
>>
>>         Regards
>>         Peter
>>
>

Reply via email to