[GitHub] [lucene] dweiss commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader
dweiss commented on code in PR #11998: URL: https://github.com/apache/lucene/pull/11998#discussion_r1045027949 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/AssertingLeafReader.java: ## @@ -113,34 +116,84 @@ public Fields getTermVectors(int docID) throws IOException { return fields == null ? null : new AssertingFields(fields); } + @Override + public TermVectors termVectors() throws IOException { +return new AssertingTermVectors(super.termVectors()); + } + + @Override + public StoredFields storedFields() throws IOException { +return new AssertingStoredFields(super.storedFields()); + } + + /** Wraps a StoredFields but with additional asserts */ + public static class AssertingStoredFields extends StoredFields { +private final StoredFields in; +private final Thread creationThread = Thread.currentThread(); + +public AssertingStoredFields(StoredFields in) { + this.in = in; +} + +@Override +public void document(int docID, StoredFieldVisitor visitor) throws IOException { + assertThread("StoredFields", creationThread); + in.document(docID, visitor); +} + } + + /** Wraps a TermVectors but with additional asserts */ + public static class AssertingTermVectors extends TermVectors { +private final TermVectors in; +private final Thread creationThread = Thread.currentThread(); + +public AssertingTermVectors(TermVectors in) { + this.in = in; +} + +@Override +public Fields get(int doc) throws IOException { + assertThread("TermVectors", creationThread); + Fields fields = in.get(doc); + return fields == null ? null : new AssertingFields(fields); +} + } + /** Wraps a Fields but with additional asserts */ public static class AssertingFields extends FilterFields { +private final Thread creationThread = Thread.currentThread(); Review Comment: +1. It's not just thread-safety but strict thread-confinement. I assume we're fine with this stricter rule within tests/codebase. I don't see any reason why these objects should be passed around (?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader
rmuir commented on code in PR #11998: URL: https://github.com/apache/lucene/pull/11998#discussion_r1045078398 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/AssertingLeafReader.java: ## @@ -113,34 +116,84 @@ public Fields getTermVectors(int docID) throws IOException { return fields == null ? null : new AssertingFields(fields); } + @Override + public TermVectors termVectors() throws IOException { +return new AssertingTermVectors(super.termVectors()); + } + + @Override + public StoredFields storedFields() throws IOException { +return new AssertingStoredFields(super.storedFields()); + } + + /** Wraps a StoredFields but with additional asserts */ + public static class AssertingStoredFields extends StoredFields { +private final StoredFields in; +private final Thread creationThread = Thread.currentThread(); + +public AssertingStoredFields(StoredFields in) { + this.in = in; +} + +@Override +public void document(int docID, StoredFieldVisitor visitor) throws IOException { + assertThread("StoredFields", creationThread); + in.document(docID, visitor); +} + } + + /** Wraps a TermVectors but with additional asserts */ + public static class AssertingTermVectors extends TermVectors { +private final TermVectors in; +private final Thread creationThread = Thread.currentThread(); + +public AssertingTermVectors(TermVectors in) { + this.in = in; +} + +@Override +public Fields get(int doc) throws IOException { + assertThread("TermVectors", creationThread); + Fields fields = in.get(doc); + return fields == null ? null : new AssertingFields(fields); +} + } + /** Wraps a Fields but with additional asserts */ public static class AssertingFields extends FilterFields { +private final Thread creationThread = Thread.currentThread(); Review Comment: I had the idea to add these checks for stored/fields vectors just to fail a test clearly rather than with some "crazy" behavior that would be difficult to debug. Looks like i wasn't the first person to have this idea, as @jpountz already added most of them 8 years ago: https://github.com/apache/lucene/commit/fc94b0b4d9e0 For the postings API, previously we had existing checks on `TermsEnum`, `PostingsEnum`. But `Terms` and `Fields` were not checked. Seems like an oversight to me, they are all in the same codec postings api (also used by term vectors). So, depending on codec's implementation, codec might do e.g. any number of disk reads or whatever it wants and these should be checked too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery
jpountz commented on code in PR #12003: URL: https://github.com/apache/lucene/pull/12003#discussion_r1045084049 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java: ## @@ -212,9 +212,9 @@ public boolean isCacheable(LeafReaderContext ctx) { @Override public int count(LeafReaderContext context) throws IOException { if (context.reader().hasDeletions() == false) { - BoundedDocIdSetIterator disi = getDocIdSetIteratorOrNull(context); - if (disi != null && disi.delegate == null) { -return disi.lastDoc - disi.firstDoc; + DocIdSetIterator disi = getDocIdSetIteratorOrNull(context); + if (disi != null && disi instanceof BoundedDocIdSetIterator == false) { +return Math.toIntExact(disi.cost()); Review Comment: I worry that this might be a bit fragile since cost() has no guarantee to be accurate. I wonder if we could make `getDocIdSetIteratorOrNull()` return both a `DocIdSetIterator` and a number of matches (possibly -1 when unknown) to make this less trappy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader
jpountz commented on code in PR #11998: URL: https://github.com/apache/lucene/pull/11998#discussion_r1045086896 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/AssertingLeafReader.java: ## @@ -113,34 +116,84 @@ public Fields getTermVectors(int docID) throws IOException { return fields == null ? null : new AssertingFields(fields); } + @Override + public TermVectors termVectors() throws IOException { +return new AssertingTermVectors(super.termVectors()); + } + + @Override + public StoredFields storedFields() throws IOException { +return new AssertingStoredFields(super.storedFields()); + } + + /** Wraps a StoredFields but with additional asserts */ + public static class AssertingStoredFields extends StoredFields { +private final StoredFields in; +private final Thread creationThread = Thread.currentThread(); + +public AssertingStoredFields(StoredFields in) { + this.in = in; +} + +@Override +public void document(int docID, StoredFieldVisitor visitor) throws IOException { + assertThread("StoredFields", creationThread); + in.document(docID, visitor); +} + } + + /** Wraps a TermVectors but with additional asserts */ + public static class AssertingTermVectors extends TermVectors { +private final TermVectors in; +private final Thread creationThread = Thread.currentThread(); + +public AssertingTermVectors(TermVectors in) { + this.in = in; +} + +@Override +public Fields get(int doc) throws IOException { + assertThread("TermVectors", creationThread); + Fields fields = in.get(doc); + return fields == null ? null : new AssertingFields(fields); +} + } + /** Wraps a Fields but with additional asserts */ public static class AssertingFields extends FilterFields { +private final Thread creationThread = Thread.currentThread(); Review Comment: +1 to check Terms and Fields too. Most instances returned by codecs should be thread-safe but it's still good to make sure consumers don't share them across threads for consistency with other codec APIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new issue, #12005: Should we fail retrieving doc values on the wrong type?
jpountz opened a new issue, #12005: URL: https://github.com/apache/lucene/issues/12005 ### Description Lucene currently treats the different doc-value types as if they were independent features and `LeafReader#getXXXDocValues()` returns `null` in case of mismatch, e.g. retrieving binary doc values on a field that indexed numeric doc values. I recently learned that this behavior is considered trappy enough that some code bases [forbid these methods](https://github.com/apache/lucene/pull/11950#discussion_r1029470218), which makes me wonder if we should change our stance and fail `LeafReader#getXXXDocValues()` in case of mismatch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12004: Move byte vector queries into new KnnByteVectorQuery
jpountz commented on code in PR #12004: URL: https://github.com/apache/lucene/pull/12004#discussion_r1045087956 ## lucene/core/src/java/org/apache/lucene/search/KnnByteVectorQuery.java: ## @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; +import java.util.Objects; +import org.apache.lucene.codecs.KnnVectorsReader; +import org.apache.lucene.document.KnnVectorField; +import org.apache.lucene.index.FieldInfo; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.VectorEncoding; +import org.apache.lucene.util.Bits; +import org.apache.lucene.util.BytesRef; + +/** + * Uses {@link KnnVectorsReader#search(String, BytesRef, int, Bits, int)} to perform nearest + * neighbour search. + * + * This query also allows for performing a kNN search subject to a filter. In this case, it first + * executes the filter for each leaf, then chooses a strategy dynamically: + * + * + * If the filter cost is less than k, just execute an exact search + * Otherwise run a kNN search subject to the filter + * If the kNN search visits too many vectors without completing, stop and run an exact search + * + */ +public class KnnByteVectorQuery extends AbstractKnnVectorQuery { + + private static final TopDocs NO_RESULTS = TopDocsCollector.EMPTY_TOPDOCS; + + private final BytesRef target; + + /** + * Find the k nearest documents to the target vector according to the vectors in the + * given field. target vector. + * + * @param field a field that has been indexed as a {@link KnnVectorField}. + * @param target the target of the search + * @param k the number of documents to find + * @throws IllegalArgumentException if k is less than 1 + */ + public KnnByteVectorQuery(String field, BytesRef target, int k) { +this(field, target, k, null); + } + + /** + * Find the k nearest documents to the target vector according to the vectors in the + * given field. target vector. + * + * @param field a field that has been indexed as a {@link KnnVectorField}. + * @param target the target of the search + * @param k the number of documents to find + * @param filter a filter applied before the vector search + * @throws IllegalArgumentException if k is less than 1 + */ + public KnnByteVectorQuery(String field, BytesRef target, int k, Query filter) { +super(field, k, filter); +this.target = target; + } + + @Override + protected TopDocs approximateSearch(LeafReaderContext context, Bits acceptDocs, int visitedLimit) + throws IOException { +TopDocs results = +context.reader().searchNearestVectors(field, target, k, acceptDocs, visitedLimit); +return results != null ? results : NO_RESULTS; + } + + @Override + VectorScorer createVectorScorer(LeafReaderContext context, FieldInfo fi) throws IOException { +if (fi.getVectorEncoding() != VectorEncoding.BYTE) { + return null; +} +return VectorScorer.create(context, fi, target); + } + + @Override + public String toString(String field) { +return getClass().getSimpleName() ++ ":" ++ this.field ++ "[" ++ target.bytes[target.offset] ++ ",...][" ++ k ++ "]"; + } + + @Override + public void visit(QueryVisitor visitor) { +if (visitor.acceptField(field)) { + visitor.visitLeaf(this); +} + } Review Comment: It looks the same in the parent class, maybe it doesn't need to be overridden? ## lucene/core/src/java/org/apache/lucene/search/KnnByteVectorQuery.java: ## @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed
[GitHub] [lucene] jpountz commented on pull request #11997: Add IntField, LongField, FloatField and DoubleField
jpountz commented on PR #11997: URL: https://github.com/apache/lucene/pull/11997#issuecomment-1345272269 > What if we always set the docvalue type to SORTED_NUMERIC, even if the user is just storing single values? Lucene will optimize the storage if all docs are single-valued, and it's already a common pattern to look for this optimization on read This sounds good to me. The main issue I see with this is that our default `SortField`s fail if the field is indexed with multi-valued doc values. But maybe we could make it more lenient and sort on the min value for ascending sorts and max values for descending sorts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 opened a new pull request, #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries
gf2121 opened a new pull request, #12006: URL: https://github.com/apache/lucene/pull/12006 In LatLonPointQueries we encode query ints to bytes, and compare bytes by decode bytes back to int in `ArrayUtil#compareUnsigned4`. We can directly compare ints instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries
rmuir commented on PR #12006: URL: https://github.com/apache/lucene/pull/12006#issuecomment-1345293418 @gf2121 does this execute faster? I think originally the reason `byte[]` was compared, was that it avoided having to "decode" values with `NumericUtils.sortableBytesToInt`. But, maybe this `NumericUtils.sortableBytesToInt` method has become faster now that it is implemented with varhandle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 commented on pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries
gf2121 commented on PR #12006: URL: https://github.com/apache/lucene/pull/12006#issuecomment-1345297296 Thanks @rmuir for feedback! This is the current implementation of `ArrayUtil#compareUnsigned4`: ``` public static int compareUnsigned4(byte[] a, int aOffset, byte[] b, int bOffset) { return Integer.compareUnsigned( (int) BitUtil.VH_BE_INT.get(a, aOffset), (int) BitUtil.VH_BE_INT.get(b, bOffset)); } ``` I think it's equivalent to ``` public static int compareUnsigned4(byte[] a, int aOffset, byte[] b, int bOffset) { return Integer.compare(NumericUtils.sortableBytesToInt(a, aOffset), NumericUtils.sortableBytesToInt(b, bOffset)); } ``` Because they are both doing a varhandle get and compare unsigned. The point of this PR is to avoid the decoding of one side as the query values are consistent during the intersect. I'll do some JMH benchmark to confirm if there is any speed up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery
gsmiller commented on code in PR #12003: URL: https://github.com/apache/lucene/pull/12003#discussion_r1045104269 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java: ## @@ -212,9 +212,9 @@ public boolean isCacheable(LeafReaderContext ctx) { @Override public int count(LeafReaderContext context) throws IOException { if (context.reader().hasDeletions() == false) { - BoundedDocIdSetIterator disi = getDocIdSetIteratorOrNull(context); - if (disi != null && disi.delegate == null) { -return disi.lastDoc - disi.firstDoc; + DocIdSetIterator disi = getDocIdSetIteratorOrNull(context); + if (disi != null && disi instanceof BoundedDocIdSetIterator == false) { +return Math.toIntExact(disi.cost()); Review Comment: That's a great point. I'll work on making this less fragile. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 commented on pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries
gf2121 commented on PR #12006: URL: https://github.com/apache/lucene/pull/12006#issuecomment-1345303453 Here is the JMH result: ``` Benchmark Mode Cnt Score Error Units ReadInts24Benchmark.compareUnsigned4 thrpt 10 0.224 ± 0.003 ops/us ReadInts24Benchmark.numericUtilDecode thrpt 10 0.224 ± 0.006 ops/us ReadInts24Benchmark.numericUtilDecodeOneSide thrpt 10 0.995 ± 0.006 ops/us ``` * compareUnsigned4 is using `Integer.compareUnsigned((int) BitUtil.VH_BE_INT.get(a, aOffset), (int) BitUtil.VH_BE_INT.get(b, bOffset));` * numericUtilDecode is using `Integer.compare(NumericUtils.sortableBytesToInt(a, aOffset), NumericUtils.sortableBytesToInt(b, bOffset));` * numericUtilDecodeOneSide is using `aValue > NumericUtils.sortableBytesToInt(b, bOffset)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries
rmuir commented on PR #12006: URL: https://github.com/apache/lucene/pull/12006#issuecomment-1345341579 makes sense to me, as numericutils is basically just varhandle fetch with an xor. thanks for testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #11901: Github#11869: Add RangeOnRangeFacetCounts
gsmiller commented on code in PR #11901: URL: https://github.com/apache/lucene/pull/11901#discussion_r1045109686 ## lucene/facet/src/java/org/apache/lucene/facet/rangeonrange/RangeOnRangeFacetCounts.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.rangeonrange; + +import java.io.IOException; +import java.util.Collections; +import java.util.List; +import org.apache.lucene.document.BinaryRangeDocValues; +import org.apache.lucene.document.RangeFieldQuery; +import org.apache.lucene.facet.FacetCountsWithFilterQuery; +import org.apache.lucene.facet.FacetResult; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.LabelAndValue; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.Query; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.PriorityQueue; + +abstract class RangeOnRangeFacetCounts extends FacetCountsWithFilterQuery { + + private final byte[][] encodedRanges; + private final String[] labels; + private final int numEncodedValueBytes; + private final int dims; + + /** Counts, initialized in by subclass. */ + protected final int[] counts; + + /** Our field name. */ + protected final String field; + + /** Total number of hits. */ + protected int totCount; + + private final ArrayUtil.ByteArrayComparator comparator; + + /** Type of "range overlap" we want to count. */ + RangeFieldQuery.QueryType queryType; + + protected RangeOnRangeFacetCounts( + String field, + FacetsCollector hits, + RangeFieldQuery.QueryType queryType, + Query fastMatchQuery, + int numEncodedValueBytes, + byte[][] encodedRanges, + String[] labels) + throws IOException { +super(fastMatchQuery); + +assert encodedRanges.length == labels.length; +assert encodedRanges[0].length % (2 * numEncodedValueBytes) == 0; + +this.encodedRanges = encodedRanges; +this.field = field; +this.labels = labels; +this.numEncodedValueBytes = numEncodedValueBytes; +this.dims = encodedRanges[0].length / (2 * this.numEncodedValueBytes); +this.queryType = queryType; +this.comparator = ArrayUtil.getUnsignedComparator(this.numEncodedValueBytes); +this.counts = new int[encodedRanges.length]; +count(field, hits.getMatchingDocs()); + } + + /** Counts from the provided field. */ + protected void count(String field, List matchingDocs) + throws IOException { +// TODO: We currently just exhaustively check the ranges in each document with every range in +// the ranges array. +// We might be able to do something more efficient here by grouping the ranges array into a +// space partitioning +// data structure of some sort. + +int missingCount = 0; + +for (FacetsCollector.MatchingDocs hits : matchingDocs) { + + BinaryRangeDocValues binaryRangeDocValues = + new BinaryRangeDocValues( + DocValues.getBinary(hits.context.reader(), field), dims, numEncodedValueBytes); + + final DocIdSetIterator it = createIterator(hits); Review Comment: Forgot to mention earlier, but I love that you've added support for fast-match queries. And that it was so straight-forward to do so after that functionality was factored out into its own class. Nice! ## lucene/facet/src/java/org/apache/lucene/facet/rangeonrange/LongRangeOnRangeFacetCounts.java: ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or imp
[GitHub] [lucene] gf2121 merged pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries
gf2121 merged PR #12006: URL: https://github.com/apache/lucene/pull/12006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 opened a new pull request, #12007: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries (Backport 9x)
gf2121 opened a new pull request, #12007: URL: https://github.com/apache/lucene/pull/12007 Backport of https://github.com/apache/lucene/pull/12006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 merged pull request #12007: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries (Backport 9x)
gf2121 merged PR #12007: URL: https://github.com/apache/lucene/pull/12007 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request, #12008: Remove unnecessary NaN checks from LongRange#verifyAndEncode
gsmiller opened a new pull request, #12008: URL: https://github.com/apache/lucene/pull/12008 ### Description Minor tweak to logic that appears to have been copy/pasted from `DoubleRange`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery
gsmiller merged PR #12003: URL: https://github.com/apache/lucene/pull/12003 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12008: Remove unnecessary NaN checks from LongRange#verifyAndEncode
rmuir commented on PR #12008: URL: https://github.com/apache/lucene/pull/12008#issuecomment-1345382186 I think we should open a followup issue to look into this check. I enabled the check (without your patch) to make sure it finds this bug, and it did, and it also finds other stuff: [log.txt](https://github.com/apache/lucene/files/10200907/log.txt) I think if we really intend to cast a long to a double, we can do it explicitly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12003: Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery
gsmiller commented on PR #12003: URL: https://github.com/apache/lucene/pull/12003#issuecomment-1345428478 Thanks @jpountz. Merged/backported. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12005: Should we fail retrieving doc values on the wrong type?
rmuir commented on issue #12005: URL: https://github.com/apache/lucene/issues/12005#issuecomment-1345435529 Don't we have the `DocValues` class already for this? If you are writing an algorithm on string docvalues, you just call `DocValues.getSortedSet`. * If the user indexed it as single-valued `SORTED`, it gift-wraps it as a singleton SortedSet instance * If the user indexed it as multi-valued `SORTED_SET`, it returns it * If the user indexed it as something else (e.g. number), it throws error. Then, if you wish, you can specialize the single-valued case by checking `DocValues.unwrapSingleton`. * If the user indexed it as single-valued `SORTED`, you can process it with specialized algorithm. * If the user indexed it as multi-valued `SORTED` but no documents actually have multiple values, you can process it with specialized algorithm (since codecs use this same "gift-wrapping"). I think for stuff like queries this is always the way to go. Maybe just javadoc-link it from LeafReader.java? But IMO leafreader should stay simple and not do such "shenanigans" as things like CheckIndex, merge, etc need to work low-level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org