[ https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502076#comment-17502076 ]
kkewwei edited comment on LUCENE-10448 at 3/8/22, 12:43 AM: ------------------------------------------------------------ [~vigyas], [~jpountz] I count the burst write rate of no-pause bytes with high pressure of writing: {code:java} [2022-03-07T08:23:05,864][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] [index1][26] merge segment [_43a] done: took [25.2s], [317.8 MB], [176,633 docs], [0s stopped], [19.8s throttled], [625.8 MB written], [29.2 MB/sec throttle], [callTimes=852],[ignorePauseTimes=49], [detailBytes(mb) = [0.7422037, 0.73268795, 0.7350941, 0.730608, 0.7306595, 0.7303219, 0.7305584, 0.73028755, 0.7304802, 0.73048687, 0.7303219, 0.73038864, 0.73030186, 0.7305927, 0.7303219, 0.73028755, 0.73043823, 0.7314129, 0.73110104, 0.7306318, 0.7303457, 0.7315569, 0.731061, 0.73035717, 0.73029804, 0.73031235, 0.7302904, 0.7303295, 0.73033714, 0.7304115, 0.7304363, 0.73035145, 0.7303152, 0.7303295, 0.7309208, 0.73061085, 0.7315531, 0.7372618, 0.734375, 0.734375, 0.734375, 0.73813057, 0.734375, 0.7342024, 0.734375, 0.734375, 0.734375, 0.734375, 0.734375]], [detailRate(mb/s) = [2.75478E-8, 2.3733156, 3.8783703, 2.7117402E-8, 2.7119311E-8, 2.710678E-8, 2.711556E-8, 2.7105507E-8, 4.484641, 3.1927187, 2.4854445, 2.0364974, 1.722687, 1.4875951, 1.2919687, 2.7105507E-8, 0.98034406, 0.9306443, 0.8845948, 0.8425208, 0.80407506, 0.77062935, 0.7450095, 2.7108092E-8, 4.1841693, 1.4971824, 16.385893, 4.7222166, 1.9614059, 7.041826, 8.008545, 2.534279, 17.670755, 6.7497787, 2.7129008E-8, 19.349627, 26.39924, 16.710173, 5.9312387, 12.802376, 10.644308, 4.5160117, 14.152909, 2.8590457, 460.67938, 432.62634, 92.555466, 290.15073, 12.046124]], [biggerThanLimitedRate(mb/s) = [460.67938, 432.62634, 92.555466, 290.15073]] {code} *callTimes=852* means that MergeRateLimiter.pause is called 852 times. *ignorePauseTimes=49* means that there are 49 no-pause times in 852 times. *detailBytes(mb)* means the detail no-pause bytes, total count is 49. *detailRate(mb/s)* means the detail instant rate of the 49 *detailBytes*. *biggerThanLimitedRate(mb/s)*: means the instant rate bigger than limited rate(29.2 MB/sec throttle), we can see that the max instant rate is 460.67938mb/s, which is 10 times the limited rate. The rate of the no-pause frequency is about 0-10%, It depends on the writing pressure, In my test, the write thread is relatively busy. This is how I count the statistics. {code:java} @Override public void writeBytes(byte[] b, int offset, int length) throws IOException { if (bytesSinceLastPause == 0) { // writing time start at writing startTime = System.nanoTime(); } bytesSinceLastPause += length; delegate.writeBytes(b, offset, length); checkRate(); } private void checkRate() throws IOException { if (bytesSinceLastPause > currentMinPauseCheckBytes) { // count the lasted time. rateLimiter.pause(bytesSinceLastPause, (System.nanoTime())- startTime); bytesSinceLastPause = 0; currentMinPauseCheckBytes = rateLimiter.getMinPauseCheckBytes(); } } {code} with the lasted time and writing bytes, It's easy to compute the instant rate. was (Author: kkewwei): [~vigyas], [~jpountz] I count the burst write rate of no-pause bytes with high pressure of writing: {code:java} [2022-03-07T08:23:05,864][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] [index1][26] merge segment [_43a] done: took [25.2s], [317.8 MB], [176,633 docs], [0s stopped], [19.8s throttled], [625.8 MB written], [29.2 MB/sec throttle], [callTimes=852],[ignorePauseTimes=49], [detailBytes(mb) = [0.7422037, 0.73268795, 0.7350941, 0.730608, 0.7306595, 0.7303219, 0.7305584, 0.73028755, 0.7304802, 0.73048687, 0.7303219, 0.73038864, 0.73030186, 0.7305927, 0.7303219, 0.73028755, 0.73043823, 0.7314129, 0.73110104, 0.7306318, 0.7303457, 0.7315569, 0.731061, 0.73035717, 0.73029804, 0.73031235, 0.7302904, 0.7303295, 0.73033714, 0.7304115, 0.7304363, 0.73035145, 0.7303152, 0.7303295, 0.7309208, 0.73061085, 0.7315531, 0.7372618, 0.734375, 0.734375, 0.734375, 0.73813057, 0.734375, 0.7342024, 0.734375, 0.734375, 0.734375, 0.734375, 0.734375]], [detailRate(mb/s) = [2.75478E-8, 2.3733156, 3.8783703, 2.7117402E-8, 2.7119311E-8, 2.710678E-8, 2.711556E-8, 2.7105507E-8, 4.484641, 3.1927187, 2.4854445, 2.0364974, 1.722687, 1.4875951, 1.2919687, 2.7105507E-8, 0.98034406, 0.9306443, 0.8845948, 0.8425208, 0.80407506, 0.77062935, 0.7450095, 2.7108092E-8, 4.1841693, 1.4971824, 16.385893, 4.7222166, 1.9614059, 7.041826, 8.008545, 2.534279, 17.670755, 6.7497787, 2.7129008E-8, 19.349627, 26.39924, 16.710173, 5.9312387, 12.802376, 10.644308, 4.5160117, 14.152909, 2.8590457, 460.67938, 432.62634, 92.555466, 290.15073, 12.046124]], [biggerThanLimitedRate(mb/s) = [460.67938, 432.62634, 92.555466, 290.15073]] {code} *callTimes=852* means that MergeRateLimiter.pause is called 852 times. *ignorePauseTimes=49* means that there are 49 no-pause times in 852 times. *detailBytes(mb)* means the detail no-pause bytes, total count is 49. *detailRate(mb/s)* means the detail instant rate of the 49 *detailBytes*. *biggerThanLimitedRate(mb/s)*: means the instant rate bigger than limited rate(29.2 MB/sec throttle), we can see that the max instant rate is 460.67938mb/s, which is 10 times the limited rate. The burst write rate of the no-pause-write frequency is about 0-10%, It depends on the writing pressure, In my test, the write thread is relatively busy. This is how I count the statistics. {code:java} @Override public void writeBytes(byte[] b, int offset, int length) throws IOException { if (bytesSinceLastPause == 0) { // writing time start at writing startTime = System.nanoTime(); } bytesSinceLastPause += length; delegate.writeBytes(b, offset, length); checkRate(); } private void checkRate() throws IOException { if (bytesSinceLastPause > currentMinPauseCheckBytes) { // count the lasted time. rateLimiter.pause(bytesSinceLastPause, (System.nanoTime())- startTime); bytesSinceLastPause = 0; currentMinPauseCheckBytes = rateLimiter.getMinPauseCheckBytes(); } } {code} with the lasted time and writing bytes, It's easy to compute the instant rate. > MergeRateLimiter doesn't always limit instant rate. > --------------------------------------------------- > > Key: LUCENE-10448 > URL: https://issues.apache.org/jira/browse/LUCENE-10448 > Project: Lucene - Core > Issue Type: Bug > Components: core/other > Affects Versions: 8.11.1 > Reporter: kkewwei > Priority: Major > > We can see the code in *MergeRateLimiter*: > {code:java} > private long maybePause(long bytes, long curNS) throws > MergePolicy.MergeAbortedException { > > double rate = mbPerSec; > double secondsToPause = (bytes / 1024. / 1024.) / rate; > long targetNS = lastNS + (long) (1000000000 * secondsToPause); > long curPauseNS = targetNS - curNS; > // We don't bother with thread pausing if the pause is smaller than 2 > msec. > if (curPauseNS <= MIN_PAUSE_NS) { > // Set to curNS, not targetNS, to enforce the instant rate, not > // the "averaged over all history" rate: > lastNS = curNS; > return -1; > } > ...... > } > {code} > If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, > then the *maybePause* is called in 7:05 again, so the value of > *targetNS=lastNS + (long) (1000000000 * secondsToPause)* must be smaller than > *curNS*, no matter how big the bytes is, we will return -1 and ignore to > pause. > I count the total times(callTimes) calling *maybePause* and ignored pause > times(ignorePauseTimes) and detail ignored bytes(detailBytes): > {code:java} > [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] > [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 > docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec > throttle], [callTimes=857], [ignorePauseTimes=25], [detailBytes(mb) = > [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, > 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, > 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, > 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]] > {code} > There are 857 times calling *maybePause*, including 25 times which is ignored > to pause, we can see that the ignored detail bytes (such as 0.28125mb) are > not small. > As long as the interval between two *maybePause* calls is relatively long, > the pause action that should be executed will not be executed. > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org