mikemccand commented on code in PR #14964:
URL: https://github.com/apache/lucene/pull/14964#discussion_r2256819456


##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.

Review Comment:
   Maybe state this more succinctly: `A {@link MergeScheduler} that caps total 
IO write bandwidth across all running merges to a specified max MB/sec 
bandwidth.`?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {

Review Comment:
   Maybe move this to Lucene's sandbox module?  It's where we land more 
experimental thingies and give them a chance to be used / thrive / iterate and 
eventually graduate to core.



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */

Review Comment:
   Could you add to the javadoc that this setting is "live", meaning any 
running merges running now will be updated to the new cap too?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {
+    return bandwidthRateBucket;
+  }
+
+  /** Distributes the global bandwidth rate bucket evenly among all active 
merge threads. */
+  @Override
+  protected synchronized void updateMergeThreads() {
+    super.updateMergeThreads();
+    int activeMerges = 0;
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        activeMerges++;
+      }
+    }
+    double perMergeRate =

Review Comment:
   Could we expand this into true `if` (don't use the ternary operator)?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {
+    return bandwidthRateBucket;
+  }
+
+  /** Distributes the global bandwidth rate bucket evenly among all active 
merge threads. */
+  @Override
+  protected synchronized void updateMergeThreads() {
+    super.updateMergeThreads();
+    int activeMerges = 0;
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        activeMerges++;
+      }
+    }
+    double perMergeRate =
+        activeMerges > 0
+            ? Math.max(
+                MIN_MERGE_MB_PER_SEC,
+                Math.min(MAX_MERGE_MB_PER_SEC, bandwidthRateBucket / 
activeMerges))
+            : Double.POSITIVE_INFINITY;
+
+    // Apply the calculated rate limit to each active merge thread
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        mergeThread.rateLimiter.setMBPerSec(perMergeRate);
+      }
+    }
+  }
+
+  /** Creates a custom merge thread with bandwidth tracking capabilities. */
+  @Override
+  protected synchronized MergeThread getMergeThread(MergeSource mergeSource, 
OneMerge merge)
+      throws IOException {
+    return new BandwidthTrackingMergeThread(mergeSource, merge);
+  }
+
+  /** Returns a string representation including the current bandwidth rate 
bucket setting. */
+  @Override
+  public String toString() {
+    return getClass().getSimpleName()
+        + ": "
+        + super.toString()
+        + ", bandwidthRateBucket="
+        + bandwidthRateBucket
+        + " MB/s";
+  }
+
+  /** Merge thread that logs the rate limiter value after merge completes. */
+  protected class BandwidthTrackingMergeThread extends MergeThread {
+    private final double mergeBandwidthMB;
+
+    /** Creates a new BandwidthTrackingMergeThread for the given merge. */
+    public BandwidthTrackingMergeThread(MergeSource mergeSource, OneMerge 
merge) {
+      super(mergeSource, merge);
+      this.mergeBandwidthMB = merge.estimatedMergeBytes / (1024.0 * 1024.0);

Review Comment:
   Oh, this is not actually a bandwidth right?  It's just the estimated final 
segment size, in MB?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();

Review Comment:
   I think this is implicitly done in java?  Even if we add the starting 
`mbPerSec` to the ctor?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {
+    return bandwidthRateBucket;
+  }
+
+  /** Distributes the global bandwidth rate bucket evenly among all active 
merge threads. */
+  @Override
+  protected synchronized void updateMergeThreads() {
+    super.updateMergeThreads();
+    int activeMerges = 0;
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        activeMerges++;
+      }
+    }
+    double perMergeRate =
+        activeMerges > 0
+            ? Math.max(
+                MIN_MERGE_MB_PER_SEC,
+                Math.min(MAX_MERGE_MB_PER_SEC, bandwidthRateBucket / 
activeMerges))
+            : Double.POSITIVE_INFINITY;
+
+    // Apply the calculated rate limit to each active merge thread
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        mergeThread.rateLimiter.setMBPerSec(perMergeRate);
+      }
+    }
+  }
+
+  /** Creates a custom merge thread with bandwidth tracking capabilities. */
+  @Override
+  protected synchronized MergeThread getMergeThread(MergeSource mergeSource, 
OneMerge merge)
+      throws IOException {
+    return new BandwidthTrackingMergeThread(mergeSource, merge);
+  }
+
+  /** Returns a string representation including the current bandwidth rate 
bucket setting. */
+  @Override
+  public String toString() {
+    return getClass().getSimpleName()
+        + ": "
+        + super.toString()
+        + ", bandwidthRateBucket="
+        + bandwidthRateBucket
+        + " MB/s";
+  }
+
+  /** Merge thread that logs the rate limiter value after merge completes. */
+  protected class BandwidthTrackingMergeThread extends MergeThread {
+    private final double mergeBandwidthMB;
+
+    /** Creates a new BandwidthTrackingMergeThread for the given merge. */
+    public BandwidthTrackingMergeThread(MergeSource mergeSource, OneMerge 
merge) {
+      super(mergeSource, merge);
+      this.mergeBandwidthMB = merge.estimatedMergeBytes / (1024.0 * 1024.0);
+    }
+
+    @Override
+    public void run() {
+      long startTime = System.currentTimeMillis();
+      try {
+        if (verbose()) {
+          message(
+              "Starting bandwidth-capped merge: "
+                  + getSegmentName(merge)
+                  + " (estimated="
+                  + mergeBandwidthMB
+                  + " MB)");
+        }
+        super.run(); // IO throttling is handled by the RateLimiter
+      } finally {
+        long duration = System.currentTimeMillis() - startTime;
+        if (verbose()) {
+          double mbPerSec = mergeBandwidthMB / Math.max(duration / 1000.0, 
0.001);
+          message(
+              "Merge completed: "
+                  + getSegmentName(merge)
+                  + " "
+                  + mergeBandwidthMB
+                  + " MB in "
+                  + duration
+                  + "ms ("
+                  + String.format(java.util.Locale.US, "%.2f", mbPerSec)
+                  + " MB/s)");
+        }
+      }
+    }
+  }
+
+  private static String getSegmentName(OneMerge merge) {
+    return merge.info != null ? merge.info.info.name : "_na_";

Review Comment:
   Hmm is `merge.info` sometimes `null`?  It shouldn't ever be, I think?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge

Review Comment:
   Maybe don't talk about buckets?  I'm not sure what they are in this context.



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;

Review Comment:
   Do we really need the `min` and `max`?
   
   I.e. the min could be 1 
[ULP](https://en.wikipedia.org/wiki/Unit_in_the_last_place) above 0.0, and the 
max is unbounded (`Double.MAX_VALUE`).  In practice very large values of the 
cap won't matter because merges will become CPU bound at that point and cannot 
use so much bandwidth.



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit

Review Comment:
   Aha, thank you!  This is because PR is still draft right?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {

Review Comment:
   `getMaxMbPerSec`?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {
+    return bandwidthRateBucket;
+  }
+
+  /** Distributes the global bandwidth rate bucket evenly among all active 
merge threads. */
+  @Override
+  protected synchronized void updateMergeThreads() {
+    super.updateMergeThreads();
+    int activeMerges = 0;
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        activeMerges++;
+      }
+    }
+    double perMergeRate =
+        activeMerges > 0
+            ? Math.max(
+                MIN_MERGE_MB_PER_SEC,
+                Math.min(MAX_MERGE_MB_PER_SEC, bandwidthRateBucket / 
activeMerges))
+            : Double.POSITIVE_INFINITY;
+
+    // Apply the calculated rate limit to each active merge thread
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        mergeThread.rateLimiter.setMBPerSec(perMergeRate);
+      }
+    }
+  }
+
+  /** Creates a custom merge thread with bandwidth tracking capabilities. */
+  @Override
+  protected synchronized MergeThread getMergeThread(MergeSource mergeSource, 
OneMerge merge)
+      throws IOException {
+    return new BandwidthTrackingMergeThread(mergeSource, merge);
+  }
+
+  /** Returns a string representation including the current bandwidth rate 
bucket setting. */
+  @Override
+  public String toString() {
+    return getClass().getSimpleName()
+        + ": "
+        + super.toString()
+        + ", bandwidthRateBucket="
+        + bandwidthRateBucket
+        + " MB/s";
+  }
+
+  /** Merge thread that logs the rate limiter value after merge completes. */
+  protected class BandwidthTrackingMergeThread extends MergeThread {
+    private final double mergeBandwidthMB;
+
+    /** Creates a new BandwidthTrackingMergeThread for the given merge. */
+    public BandwidthTrackingMergeThread(MergeSource mergeSource, OneMerge 
merge) {
+      super(mergeSource, merge);
+      this.mergeBandwidthMB = merge.estimatedMergeBytes / (1024.0 * 1024.0);
+    }
+
+    @Override
+    public void run() {
+      long startTime = System.currentTimeMillis();

Review Comment:
   Let's use `System.nanoTime()` instead?  It is guaranteed monotonic, and has 
no relation to the odd human labeling of time (which we do not need since we 
are just computing duration).
   
   Also let's add the units in the variable name (`startTimeNS`).



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {
+    return bandwidthRateBucket;
+  }
+
+  /** Distributes the global bandwidth rate bucket evenly among all active 
merge threads. */
+  @Override
+  protected synchronized void updateMergeThreads() {
+    super.updateMergeThreads();

Review Comment:
   Let's detect if `doAutioIOThrottle` is enabled and throw some exception 
somewhere?  That CMS feature is incompatible with this class, I think, so it 
should be clear to the user.  Oh, maybe simply override 
`enable/disableAutoIOThrottle`?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {
+    return bandwidthRateBucket;
+  }
+
+  /** Distributes the global bandwidth rate bucket evenly among all active 
merge threads. */
+  @Override
+  protected synchronized void updateMergeThreads() {
+    super.updateMergeThreads();
+    int activeMerges = 0;
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        activeMerges++;
+      }
+    }
+    double perMergeRate =
+        activeMerges > 0
+            ? Math.max(
+                MIN_MERGE_MB_PER_SEC,
+                Math.min(MAX_MERGE_MB_PER_SEC, bandwidthRateBucket / 
activeMerges))
+            : Double.POSITIVE_INFINITY;
+
+    // Apply the calculated rate limit to each active merge thread
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        mergeThread.rateLimiter.setMBPerSec(perMergeRate);
+      }
+    }
+  }
+
+  /** Creates a custom merge thread with bandwidth tracking capabilities. */
+  @Override
+  protected synchronized MergeThread getMergeThread(MergeSource mergeSource, 
OneMerge merge)
+      throws IOException {
+    return new BandwidthTrackingMergeThread(mergeSource, merge);
+  }
+
+  /** Returns a string representation including the current bandwidth rate 
bucket setting. */
+  @Override
+  public String toString() {
+    return getClass().getSimpleName()
+        + ": "
+        + super.toString()
+        + ", bandwidthRateBucket="
+        + bandwidthRateBucket
+        + " MB/s";
+  }
+
+  /** Merge thread that logs the rate limiter value after merge completes. */
+  protected class BandwidthTrackingMergeThread extends MergeThread {
+    private final double mergeBandwidthMB;
+
+    /** Creates a new BandwidthTrackingMergeThread for the given merge. */
+    public BandwidthTrackingMergeThread(MergeSource mergeSource, OneMerge 
merge) {
+      super(mergeSource, merge);
+      this.mergeBandwidthMB = merge.estimatedMergeBytes / (1024.0 * 1024.0);
+    }
+
+    @Override
+    public void run() {
+      long startTime = System.currentTimeMillis();
+      try {
+        if (verbose()) {
+          message(
+              "Starting bandwidth-capped merge: "
+                  + getSegmentName(merge)
+                  + " (estimated="

Review Comment:
   `(estimatedMbPerSec=` instead?
   
   Also, this is just the value when this merge kicks off?  So maybe change to 
`startingMbPerSec=`?
   
   Edit: oh this is the estimated size of the final merged segment in MB?  
Maybe just change to `merge.estimatedMergeBytes/1024./1024.` and no need for 
member variable?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {

Review Comment:
   Let's include units in the method name, e.g. `setMaxMbPerSec`?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {
+    return bandwidthRateBucket;
+  }
+
+  /** Distributes the global bandwidth rate bucket evenly among all active 
merge threads. */
+  @Override
+  protected synchronized void updateMergeThreads() {
+    super.updateMergeThreads();
+    int activeMerges = 0;
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        activeMerges++;
+      }
+    }
+    double perMergeRate =
+        activeMerges > 0
+            ? Math.max(
+                MIN_MERGE_MB_PER_SEC,
+                Math.min(MAX_MERGE_MB_PER_SEC, bandwidthRateBucket / 
activeMerges))

Review Comment:
   `activeMerges` isn't right because it can include merge threads that are 
created and alive (in the JVM sense) but are paused by CMS parent (in the "soft 
merge limit" case).  I think it suffices to take `min(maxThreadCount, 
activeMerges)` instead.  But you'll need to make `maxThreadCount` member 
protected, or use `getMaxThreadCount()` instead, but also take steps to protect 
against the `AUTO_DETECT_MERGES_AND_THREADS` in case it didn't get default'd 
yet.
   
   CMS has a soft limit (`maxThreadCount`) and a hard limit (`maxMergeCount`) 
on the number of merges requested by the `MergePolicy` that it will tolerate.  
Below or equal to the soft limit, everything is normal.  When more merges are 
needed than the soft limit, the largest N merges are paused so that the 
remaining soft limit merges can run.  If more merges than the hard limit are 
requested, CMS will apply backpressure by hijacking the incoming indexing 
threads (those threads creating new segments and requesting new merges), 
pausing them, until the merging can catch up and the number of merges requested 
is again <= the hard limit.



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {
+    return bandwidthRateBucket;
+  }
+
+  /** Distributes the global bandwidth rate bucket evenly among all active 
merge threads. */
+  @Override
+  protected synchronized void updateMergeThreads() {
+    super.updateMergeThreads();
+    int activeMerges = 0;
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        activeMerges++;
+      }
+    }
+    double perMergeRate =
+        activeMerges > 0
+            ? Math.max(
+                MIN_MERGE_MB_PER_SEC,
+                Math.min(MAX_MERGE_MB_PER_SEC, bandwidthRateBucket / 
activeMerges))
+            : Double.POSITIVE_INFINITY;
+
+    // Apply the calculated rate limit to each active merge thread
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        mergeThread.rateLimiter.setMBPerSec(perMergeRate);
+      }
+    }
+  }
+
+  /** Creates a custom merge thread with bandwidth tracking capabilities. */
+  @Override
+  protected synchronized MergeThread getMergeThread(MergeSource mergeSource, 
OneMerge merge)
+      throws IOException {
+    return new BandwidthTrackingMergeThread(mergeSource, merge);
+  }
+
+  /** Returns a string representation including the current bandwidth rate 
bucket setting. */
+  @Override
+  public String toString() {
+    return getClass().getSimpleName()
+        + ": "
+        + super.toString()
+        + ", bandwidthRateBucket="
+        + bandwidthRateBucket
+        + " MB/s";
+  }
+
+  /** Merge thread that logs the rate limiter value after merge completes. */
+  protected class BandwidthTrackingMergeThread extends MergeThread {
+    private final double mergeBandwidthMB;
+
+    /** Creates a new BandwidthTrackingMergeThread for the given merge. */
+    public BandwidthTrackingMergeThread(MergeSource mergeSource, OneMerge 
merge) {
+      super(mergeSource, merge);
+      this.mergeBandwidthMB = merge.estimatedMergeBytes / (1024.0 * 1024.0);
+    }
+
+    @Override
+    public void run() {
+      long startTime = System.currentTimeMillis();
+      try {
+        if (verbose()) {
+          message(
+              "Starting bandwidth-capped merge: "
+                  + getSegmentName(merge)
+                  + " (estimated="
+                  + mergeBandwidthMB
+                  + " MB)");
+        }
+        super.run(); // IO throttling is handled by the RateLimiter
+      } finally {
+        long duration = System.currentTimeMillis() - startTime;

Review Comment:
   `durationNS`?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {
+    return bandwidthRateBucket;
+  }
+
+  /** Distributes the global bandwidth rate bucket evenly among all active 
merge threads. */
+  @Override
+  protected synchronized void updateMergeThreads() {
+    super.updateMergeThreads();
+    int activeMerges = 0;
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        activeMerges++;
+      }
+    }
+    double perMergeRate =
+        activeMerges > 0
+            ? Math.max(
+                MIN_MERGE_MB_PER_SEC,
+                Math.min(MAX_MERGE_MB_PER_SEC, bandwidthRateBucket / 
activeMerges))
+            : Double.POSITIVE_INFINITY;
+
+    // Apply the calculated rate limit to each active merge thread
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        mergeThread.rateLimiter.setMBPerSec(perMergeRate);
+      }
+    }
+  }
+
+  /** Creates a custom merge thread with bandwidth tracking capabilities. */
+  @Override
+  protected synchronized MergeThread getMergeThread(MergeSource mergeSource, 
OneMerge merge)
+      throws IOException {
+    return new BandwidthTrackingMergeThread(mergeSource, merge);
+  }
+
+  /** Returns a string representation including the current bandwidth rate 
bucket setting. */
+  @Override
+  public String toString() {
+    return getClass().getSimpleName()
+        + ": "
+        + super.toString()
+        + ", bandwidthRateBucket="
+        + bandwidthRateBucket
+        + " MB/s";
+  }
+
+  /** Merge thread that logs the rate limiter value after merge completes. */
+  protected class BandwidthTrackingMergeThread extends MergeThread {
+    private final double mergeBandwidthMB;
+
+    /** Creates a new BandwidthTrackingMergeThread for the given merge. */
+    public BandwidthTrackingMergeThread(MergeSource mergeSource, OneMerge 
merge) {
+      super(mergeSource, merge);
+      this.mergeBandwidthMB = merge.estimatedMergeBytes / (1024.0 * 1024.0);
+    }
+
+    @Override
+    public void run() {
+      long startTime = System.currentTimeMillis();
+      try {
+        if (verbose()) {
+          message(
+              "Starting bandwidth-capped merge: "
+                  + getSegmentName(merge)
+                  + " (estimated="
+                  + mergeBandwidthMB
+                  + " MB)");
+        }
+        super.run(); // IO throttling is handled by the RateLimiter
+      } finally {
+        long duration = System.currentTimeMillis() - startTime;
+        if (verbose()) {
+          double mbPerSec = mergeBandwidthMB / Math.max(duration / 1000.0, 
0.001);

Review Comment:
   It's OK to just let the math produce `Double.POSITIVE_INFINITY`?  It's only 
`int`/`long` etc. that cannot divide-by-zero?



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;
+
+  /** Default constructor with 1000 MB/s bandwidth rate bucket */
+  public BandwidthCappedMergeScheduler() {
+    super();
+  }
+
+  /** Set the global bandwidth rate bucket in MB/s (default 1000 MB/s) */
+  public void setBandwidthRateBucket(double mbPerSec) {
+    if (mbPerSec < MIN_MERGE_MB_PER_SEC || mbPerSec > MAX_MERGE_MB_PER_SEC) {
+      throw new IllegalArgumentException(
+          "Bandwidth rate must be between "
+              + MIN_MERGE_MB_PER_SEC
+              + " and "
+              + MAX_MERGE_MB_PER_SEC
+              + " MB/s");
+    }
+    this.bandwidthRateBucket = mbPerSec;
+    updateMergeThreads();
+  }
+
+  /** Get the global bandwidth rate bucket in MB/s */
+  public double getBandwidthRateBucket() {
+    return bandwidthRateBucket;
+  }
+
+  /** Distributes the global bandwidth rate bucket evenly among all active 
merge threads. */
+  @Override
+  protected synchronized void updateMergeThreads() {
+    super.updateMergeThreads();
+    int activeMerges = 0;
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        activeMerges++;
+      }
+    }
+    double perMergeRate =
+        activeMerges > 0
+            ? Math.max(
+                MIN_MERGE_MB_PER_SEC,
+                Math.min(MAX_MERGE_MB_PER_SEC, bandwidthRateBucket / 
activeMerges))
+            : Double.POSITIVE_INFINITY;
+
+    // Apply the calculated rate limit to each active merge thread
+    for (MergeThread mergeThread : mergeThreads) {
+      if (mergeThread.isAlive()) {
+        mergeThread.rateLimiter.setMBPerSec(perMergeRate);

Review Comment:
   This may indeed prove too simplistic (statically dividing bandwidth into the 
N running merges), but it's good for starters / MLP ("minimum lovable PR") to 
unlock fun testing.
   
   Actually, instead of statically dividing the cap across N rate limiters ... 
couldn't we create a single `MergeRateLimiter` shared across all merge threads, 
set to the cap?  This way, if one merge is doing CPU heavy part of merging and 
not able to use much IO bandwidth, other merges could surge over their 1/Nth 
static share?  But this is kinda a big change... CMS seems to quietly 
assume/expect that each merge thread has its own rate limiter.  It might freak 
out if you use the same rate limiter across all threads.



##########
lucene/core/src/java/org/apache/lucene/index/BandwidthCappedMergeScheduler.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.index.MergePolicy.OneMerge;
+
+/**
+ * A {@link MergeScheduler} that extends {@link ConcurrentMergeScheduler} with 
bandwidth tracking
+ * and limiting capabilities. This scheduler maintains a bandwidth rate bucket 
that is divided among
+ * active merges. When bandwidth is limited, merges are throttled to preserve 
system resources.
+ *
+ * <p>Key features: Global bandwidth rate bucket with configurable capacity, 
dynamic per-merge
+ * throttling.
+ *
+ * @lucene.experimental
+ */
+// nocommit
+public class BandwidthCappedMergeScheduler extends ConcurrentMergeScheduler {
+
+  /** Floor for IO write rate limit (we will never go any lower than this) */
+  private static final double MIN_MERGE_MB_PER_SEC = 5.0;
+
+  /** Ceiling for IO write rate limit (we will never go any higher than this) 
*/
+  private static final double MAX_MERGE_MB_PER_SEC = 10240.0;
+
+  /** Initial value for IO write rate limit */
+  private static final double START_MB_PER_SEC = 1000.0;
+
+  /** Global bandwidth rate bucket in MB/s */
+  private double bandwidthRateBucket = START_MB_PER_SEC;

Review Comment:
   Hmm why do we even have the `START_MB_PER_SEC`?  Could we rather make the 
ctor take a required `mbPerSec` and use that as the starting value?
   
   Does this class support updating the cap even while merges are running?  It 
looks like it does, yay!  (Maybe apps have time-dependent bandwidth 
availability).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to