gsmiller commented on code in PR #13559:
URL: https://github.com/apache/lucene/pull/13559#discussion_r1697210371
##########
lucene/core/src/java/org/apache/lucene/util/BitSet.java:
##########
@@ -92,6 +92,12 @@ public void clear() {
*/
public abstract int nextSetBit(int index);
+ /**
+ * Returns the index of the first set bit from start (inclusive) until end
(inclusive). {@link
Review Comment:
I would tend to expect `end` to be exclusive in an API like this. (e.g.,
think of something like `String#substring(from, to)` in Java). WDYT about
changing this?
##########
lucene/core/src/java/org/apache/lucene/util/FixedBitSet.java:
##########
@@ -291,6 +291,32 @@ public int nextSetBit(int index) {
return DocIdSetIterator.NO_MORE_DOCS;
}
+ @Override
+ public int firstSetBitInRange(int start, int upperBound) {
+ // Depends on the ghost bits being clear!
+ assert start >= 0 && start < numBits : "index=" + start + ", numBits=" +
numBits;
+ assert start <= upperBound : "index=" + start + ", upperBound=" +
upperBound;
+ int i = start >> 6;
+ long word = bits[i] >> start; // skip all the bits to the right of index
+
+ if (word != 0) {
+ int res = start + Long.numberOfTrailingZeros(word);
+ return res > upperBound ? DocIdSetIterator.NO_MORE_DOCS : res;
+ }
+
+ int maxWord = Math.min((upperBound >> 6) + 1, numWords);
Review Comment:
nit: `maxWord` is slightly confusing naming IMO. I would have expected it to
be an _inclusive_ max index but it's _exclusive_. The logic all makes sense to
me, but maybe rename to something like `limit` (or change it to inclusive and
keep the name)?
##########
lucene/core/src/java/org/apache/lucene/util/SparseFixedBitSet.java:
##########
@@ -353,14 +355,47 @@ public int nextSetBit(int i) {
final long indexBits = index >>> i64 >>> 1;
if (indexBits == 0) {
// no more bits are set in the current block of 4096 bits, go to the
next one
- return firstDoc(i4096 + 1);
+ return firstDoc(i4096 + 1, indices.length);
}
// there are still set bits
i64 += 1 + Long.numberOfTrailingZeros(indexBits);
final long bits = bitArray[o];
return (i64 << 6) | Long.numberOfTrailingZeros(bits);
}
+ @Override
+ public int firstSetBitInRange(int start, int upperBound) {
+ assert start < length;
+ assert upperBound >= start;
+ final int i4096 = start >>> 12;
+ final long index = indices[i4096];
+ final long[] bitArray = this.bits[i4096];
+ int i64 = start >>> 6;
+ int o = Long.bitCount(index & ((1L << i64) - 1));
Review Comment:
nit: since you repeat `1L << i64` a couple times here, maybe introduce
`final long i64bit = 1L << i64;` (like we do in `#get`)
##########
lucene/join/src/java/org/apache/lucene/search/join/BlockJoinSelector.java:
##########
@@ -64,14 +64,19 @@ public boolean get(int docID) {
return false;
}
- final int firstChild = parents.prevSetBit(docID - 1) + 1;
- for (int child = children.nextSetBit(firstChild);
- child < docID;
- child = children.nextSetBit(child + 1)) {
- if (docsWithValue.get(child)) {
- return true;
+ final int lastPotentialChild = docID - 1;
+
+ final int firstPotentialChild = parents.prevSetBit(lastPotentialChild)
+ 1;
+ if (firstPotentialChild < docID) {
+ for (int child = children.firstSetBitInRange(firstPotentialChild,
lastPotentialChild);
+ child != DocIdSetIterator.NO_MORE_DOCS;
+ child = children.firstSetBitInRange(child + 1,
lastPotentialChild)) {
+ if (docsWithValue.get(child)) {
+ return true;
+ }
Review Comment:
nit: I would find this just slightly more readable, but I don't have a
strong preference. Just a small suggestion.
```suggestion
assert firstPotentialChild <= docID;
if (firstPotentialChild == docID) {
// no children
return false;
}
for (int child = children.firstSetBitInRange(firstPotentialChild,
lastPotentialChild);
child != DocIdSetIterator.NO_MORE_DOCS;
child = children.firstSetBitInRange(child + 1,
lastPotentialChild)) {
if (docsWithValue.get(child)) {
return true;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]