[ 
https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509171#comment-17509171
 ] 

kkewwei commented on LUCENE-10448:
----------------------------------

[~vigyas] I test again, and want to find any no-pause bytes, which 1.5 times 
bigger than minPauseCheckBytes,  but find nothing. if as you say, we should 
include the pause time, MergeRateLimiter is doing just that. maybe we should 
write it in chunks.

[~jpountz], please help confirm:
RateLimitedIndexOutput.writeBytes():
{code:java}
  @Override
  public void writeBytes(byte[] b, int offset, int length) throws IOException {
    bytesSinceLastPause += length;
    checkRate();
    delegate.writeBytes(b, offset, length);
  }
{code}

We should execute delegate.writeBytes(b, offset, length) first, and execute 
checkRate() later, because the bytes are not written, but are counted into the 
written indicator, here exists a little logical problem. we should change the 
order as follow:
{code:java}
@Override
  public void writeBytes(byte[] b, int offset, int length) throws IOException {
    bytesSinceLastPause += length;
    delegate.writeBytes(b, offset, length);
    checkRate();
  }
{code}




> MergeRateLimiter doesn't always limit instant rate.
> ---------------------------------------------------
>
>                 Key: LUCENE-10448
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10448
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 8.11.1
>            Reporter: kkewwei
>            Priority: Major
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We can see the code in *MergeRateLimiter*:
> {code:java}
> private long maybePause(long bytes, long curNS) throws 
> MergePolicy.MergeAbortedException {
>    
>     double rate = mbPerSec; 
>     double secondsToPause = (bytes / 1024. / 1024.) / rate;
>     long targetNS = lastNS + (long) (1000000000 * secondsToPause);
>     long curPauseNS = targetNS - curNS;
>     // We don't bother with thread pausing if the pause is smaller than 2 
> msec.
>     if (curPauseNS <= MIN_PAUSE_NS) {
>       // Set to curNS, not targetNS, to enforce the instant rate, not
>       // the "averaged over all history" rate:
>       lastNS = curNS;
>       return -1;
>     }
>    ......
>   }
> {code}
> If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, 
> then the *maybePause* is called in 7:05 again,  so the value of 
> *targetNS=lastNS + (long) (1000000000 * secondsToPause)* must be smaller than 
> *curNS*, no matter how big the bytes is, we will return -1 and ignore to 
> pause. 
> I count the total times(callTimes) calling *maybePause* and ignored pause 
> times(ignorePauseTimes) and detail ignored bytes(detailBytes):
> {code:java}
> [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] 
> [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 
> docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec 
> throttle], [callTimes=857], [ignorePauseTimes=25],  [detailBytes(mb) = 
> [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, 
> 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, 
> 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, 
> 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]]
> {code}
> There are 857 times calling *maybePause*, including 25 times which is ignored 
> to pause, we can see that the ignored detail bytes (such as 0.28125mb) are 
> not small.
> As long as the interval between two *maybePause* calls is relatively long, 
> the pause action that should be executed will not be executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to