[GitHub] [lucene] jpountz commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.
jpountz commented on PR #12055: URL: https://github.com/apache/lucene/pull/12055#issuecomment-1438199152 Thanks Greg for sharing more info about how it helped on Amazon Product search. Do your queries early terminate somehow (in which case I'd expect this change to help the most since it can skip evaluating the tail of long postings)? I like the idea of having multiple rewrite methods and possibly an `auto` method that tries to guess a sensible rewrite method given index statistics. It helps keep things simple without having a single rewrite method that needs to be heroic. Reuse of postings enums looks ok to me, we could improve naming and add more comments to make it more obviously ok, but we only create up to 16 postings enums from scratch, reuse otherwise, and make sure to never reuse a postings enum that is in the priority queue. The threshold of 16 looks conservative to me so I wouldn't worry about NIOFSDirectory, if we have a problem with NIOFSDirectory and this threshold of 16 then many simple boolean queries have problems too, which I don't think is the case in practice? The threshold on the minimum document frequency should also help here, e.g. a near-PK field would only accumulate hits into a DocIdSetBuilder and not pull postings enums? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] henryrneh opened a new issue, #12165: Integrating Apache Lucene into OSS-Fuzz
henryrneh opened a new issue, #12165: URL: https://github.com/apache/lucene/issues/12165 ### Description Hi Apache Lucene developers, We have prepared the [initial integration](https://github.com/google/oss-fuzz/pull/9772) of Apache Lucene into [Google OSS-Fuzz](https://github.com/google/oss-fuzz) which will provide more security for your project. **Why do you need Fuzzing?** The Code Intelligence JVM fuzzer [Jazzer](https://github.com/CodeIntelligenceTesting/jazzer) has already found [hundreds of bugs](https://github.com/CodeIntelligenceTesting/jazzer#findings) in open source projects including for example [OpenJDK](https://nvd.nist.gov/vuln/detail/CVE-2022-21360), [Protobuf](https://nvd.nist.gov/vuln/detail/CVE-2021-22569) or [jsoup](https://github.com/jhy/jsoup/security/advisories/GHSA-m72m-mhq2-9p6c). Fuzzing proved to be very effective having no false positives. It provides a crashing input which helps you to reproduce and debug any finding easily. The integration of your project into the OSS-Fuzz platform will enable continuous fuzzing of your project by [Jazzer](https://github.com/CodeIntelligenceTesting/jazzer). **What do you need to do?** The integration requires the maintainer or one established project committer to deal with the bug reports. You need to create or provide one email address that is associated with a google account as per [here](https://google.github.io/oss-fuzz/getting-started/accepting-new-projects/). When a bug is found, you will receive an email that will provide you with access to ClusterFuzz, crash reports, code coverage reports and fuzzer statistics. More than 1 person can be included. **How can Code Intelligence support you?** We will continue to add more fuzz targets to improve code coverage over time. Furthermore, we are permanently enhancing fuzzing technologies by developing new fuzzers and bug detectors. Please let me know if you have any questions regarding fuzzing or the OSS-Fuzz integration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz
dweiss commented on issue #12165: URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438317336 Thank you. Your contribution is appreciated but Lucene already uses what you call a "fuzzer" - a reproducible, pseudo-random component assembly for tests... In fact, we have used it for many years now and indeed it's been successful at finding bugs in both Lucene and the JVM runtime. I'm not sure if there's any added value in what this patch provides. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] henryrneh commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz
henryrneh commented on issue #12165: URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438392493 Hello @dweiss, great to hear that Apache Lucene is already using fuzzing! The big value is that computation power is sponsored by Google and that it is fuzzed by [Jazzer](https://github.com/CodeIntelligenceTesting/jazzer) the most modern state of the art fuzzer for JVM languages which maximise code coverage automatically based on feedback and which is enhanced regularly with new bug detectors. Please let me know if you have further questions regarding OSS-Fuzz and Jazzer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.
jpountz commented on PR #12055: URL: https://github.com/apache/lucene/pull/12055#issuecomment-1438404538 OK I think I better understand the concern around the slowness with NIOFSDirectory now. With a single PostingsEnum getting reused, a single BufferedIndexInput refill would buffer postings lists for multiple terms at once, so next calls to `Terms#postings` and `PostingsEnum#nextDoc` would read from the buffer. We're losing this property by reusing across up to 16+1 PostingsEnums, though the fact that the top 16 postings enums that have the highest doc freq should get relatively stable after some time, ie. most terms would have a lower doc freq, plus the threshold on docFreq=512 should help too, as we're always reusing the last postings enum in that case, and postings lists that have more matches would likely need to refill their buffer multiple times to read all matching docs anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues
rmuir commented on PR #12162: URL: https://github.com/apache/lucene/pull/12162#issuecomment-1438418170 major problem with newGeometryQuery is that, people think it is some universal language to speak across points and shapes. its not. points are infinitely small and have no "mass": the relationships here make no sense for them. the API is bad and we must stop proliferation of it for points. I don't care about use of it for shapes since only 0.001% of users give a crap about shapes, but we can't let points get corrupted by these horrible apis. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent merged pull request #12152: Minor vector search matching doc optimizations
benwtrent merged PR #12152: URL: https://github.com/apache/lucene/pull/12152 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues
rmuir commented on PR #12162: URL: https://github.com/apache/lucene/pull/12162#issuecomment-1438428404 just look at the code to see what i mean: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/document/LatLonPoint.java#L325-L341 The worst part is, the user who just wants to find points in their bounding box or simple polygon, doesn't think that they should be using `INTERSECTS` because the verb is senseless when thinking of something that is infinitely small. This disastrous api should be completely removed from LatLonPoint, too. Keep it for shapes if you want. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues
jpountz commented on code in PR #12162: URL: https://github.com/apache/lucene/pull/12162#discussion_r1112997446 ## lucene/CHANGES.txt: ## @@ -112,6 +112,9 @@ API Changes * GITHUB#12129: Move DocValuesTermsQuery from sandbox to SortedDocValuesField#newSlowSetQuery and SortedSetDocValuesField#newSlowSetQuery. (Robert Muir) +* GITHUB#12161: Introduce LatLonField which ndex both LatLonField and LatLonDDocValues. Review Comment: ```suggestion * GITHUB#12161: Introduce LatLonField which indexes both LatLonPoint and LatLonDocValuesField. ``` ## lucene/core/src/java/org/apache/lucene/document/LatLonField.java: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import static org.apache.lucene.geo.GeoEncodingUtils.decodeLatitude; +import static org.apache.lucene.geo.GeoEncodingUtils.decodeLongitude; +import static org.apache.lucene.geo.GeoEncodingUtils.encodeLatitude; +import static org.apache.lucene.geo.GeoEncodingUtils.encodeLongitude; + +import java.io.IOException; +import org.apache.lucene.geo.LatLonGeometry; +import org.apache.lucene.geo.Polygon; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.search.BooleanClause.Occur; +import org.apache.lucene.search.BooleanQuery; +import org.apache.lucene.search.FieldDoc; +import org.apache.lucene.search.IndexOrDocValuesQuery; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.MatchAllDocsQuery; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.SortField; +import org.apache.lucene.search.TopFieldDocs; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.NumericUtils; + +/** + * An indexed location field for querying and sorting. If you need more fine-grained control you can + * use {@link LatLonPoint} and {@link LatLonDocValuesField}. + * + * Finding all documents within a range at search time is efficient. Multiple values for the same + * field in one document is allowed. + * + * This field defines static factory methods for common operations: + * + * + * {@link #newGeometryQuery newGeometryQuery()} for matching points complying with a spatial + * relationship with an arbitrary geometry. + * {@link #newDistanceFeatureQuery newDistanceFeatureQuery()} for returning points scored by + * distance to a specified location. + * {@link #nearest nearest()} for returning the nearest points from a specified location. + * {@link #newDistanceSort newDistanceSort()} for ordering documents by distance from a + * specified location. Review Comment: Mention the box and polygon queries too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues
jpountz commented on PR #12162: URL: https://github.com/apache/lucene/pull/12162#issuecomment-1438438628 For `KeywordField` and `LongField`/`DoubleField` we ended up adding an option to the ctor to store the field. This PR doesn't have this, but I'm unsure what should be the canonical representation of a geo point in stored fields. So maybe it's best to leave it to the app depending on whether they'd rather store it as a string, two doubles, or something else? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #12139: Skip the TokenStream overhead when indexing simple keywords.
jpountz merged PR #12139: URL: https://github.com/apache/lucene/pull/12139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues
rmuir commented on PR #12162: URL: https://github.com/apache/lucene/pull/12162#issuecomment-1438469151 Perhaps we need to walk thru an example. A user wants to only documents within 25km of their current location, very typical. This signature is pretty intuitive for that use-case: ``` Query newDistanceQuery(String field, double latitude, double longitude, double radiusMeters); ``` This one is not: ``` Query newGeometryQuery(String field, ShapeField.QueryRelation queryRelation, LatLonGeometry... latLonGeometries); ``` * ShapeField.QueryRelation: Which one should i use? The javadocs provides **zero help** on what the relationships mean: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/document/ShapeField.java#L119-L128. Furthermore, many of these relationships simply **don't make sense** for points which are infinitely small. "oh but it works for both shapes and points" is like saying "oh but i made an abstraction that works for both quicksorts and hashtables", it doesnt make sense why you did that in the first place, you just made shit confusing for no good reason. * LatLonGeometry...: this isn't too helpful, again we have a class has **zero javadocs** https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/geo/LatLonGeometry.java#L20. So at the end of the day, as a simple user trying to do a distance query, i can dig thru all these crazy relationships and abstractions, and ultimately determine that in order to find documents within 25km i need to create a `Circle` object and then ask for documents that `INTERSECTS` it with my point? Or i could just use a non-fucking-insane api that just takes my point and distance in kilometers as parameters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Tjianke commented on issue #11707: Re-evaluate different ways to encode postings [LUCENE-10672]
Tjianke commented on issue #11707: URL: https://github.com/apache/lucene/issues/11707#issuecomment-1438514073 Any progress on this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues
iverase commented on PR #12162: URL: https://github.com/apache/lucene/pull/12162#issuecomment-1438631454 > For KeywordField and LongField/DoubleField we ended up adding an option to the ctor to store the field. This PR doesn't have this, but I'm unsure what should be the canonical representation of a geo point in stored fields. So maybe it's best to leave it to the app depending on whether they'd rather store it as a string, two doubles, or something else? I noticed that what I decided not to add it because as you said, there is no canonical representation of a point. I added javadocs to LatLonGeometry and the #newGeometryQuery function. In addition I tagged the method as expert so hopefully that overcomes the issues raised here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nknize commented on issue #11829: Reproducible TestShapeDocValues failure
nknize commented on issue #11829: URL: https://github.com/apache/lucene/issues/11829#issuecomment-1438669547 I wasn't getting notifications on this for some reason, so thanks for stepping in @ioanatia! Closing this issue for now and will re-open if we see another failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nknize closed issue #11829: Reproducible TestShapeDocValues failure
nknize closed issue #11829: Reproducible TestShapeDocValues failure URL: https://github.com/apache/lucene/issues/11829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz
uschindler commented on issue #12165: URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438783686 I checked the patch in the related issue. It fuzzes analyzer creation with some data and also has some fuzzing for IndexSearcher. That's nothing new, e.g., we have our own Analysis Fuzzer already: https://github.com/apache/lucene/blob/main/lucene/analysis.tests/src/test/org/apache/lucene/analysis/tests/TestRandomChains.java So I disagree with adding another fuzzing engine into Lucene. We have a library called "randomized-testing" which provides everything needed. Almost every test in Lucene has fuzzing included, the example above is just a very special one with a very wide range of components tested. Background: Lucene is using randomized testing since around 2012. Here is a talk from 2014 by @dweiss about it: https://2019.berlinbuzzwords.de/14/session/randomize-your-tests-and-it-will-blow-your-socks.html At moment we have fuzzing not only in our own tests, the so called "Policeman Jenkins" Server (https://jenkine.thetaphi.de) runs our test suite in an endless loop and on top of that each run is using a different Java version and different settings for garbage collection and java's pointer size. If you provide computing power to Lucene and ASF we are happy to use it, but it is enough to run Lucene's test suite in a loop. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz
rmuir commented on issue #12165: URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438859463 In the analyzers example given there, it is a good one to see the differences. Both approaches (OSS Fuzz and existing TestRandomChains) test "random analysis chains", but the current TestRandomChains also tests all possible ctors of these analysis components (not just the default constructor), and injects random stuff into them. Default constructor is usually tested anyway in the component's own unit tests with fuzzed data (see testRandomData() methods everywhere). fuzzing the ctors in this way, finds problems e.g. if a component is e.g. missing a sanity or range check on an integer parameter. This might even be more productive overall than actually feeding "fuzzed data". The current TestRandomChains also doesn't require us to "register" any new components, it just discovers all Tokenizers, CharFilters, TokenFilters, etc that are available. This ensures we catch problems in new analyzers that get added. As far as actual data fuzzing, it is more than just randomized data, have a look at our base analyzers test class, it is a "torture chamber" for analyzers and will find things such as thread-safety/race issues as well: https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/tests/analysis/BaseTokenStreamTestCase.java All analyzers use this class for "fuzzing" in their own unit test: TestRandomChains is just an "integration test" that then combines them together. It is also important to think about how much time it takes to debug a failure, too. The current setup across both unit and integration tests makes it pretty easy to spot when the problem is a specific analyzer component, vs some crazy "interaction" between more than one of them. Nobody wants to debug a integration test if they can debug a unit test. We did a lot of work with BaseTokenStreamTestCase/TestRandomChains such as adding special logging of the analysis chain, adding "ValidatingTokenFilter" at every step,etc. It still sucks to debug this stuff when it fails, we have a lot of analyzers :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a diff in pull request #12160: Concurrent rewrite for KnnVectorQuery
zhaih commented on code in PR #12160: URL: https://github.com/apache/lucene/pull/12160#discussion_r1113630957 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -73,17 +76,41 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException { .build(); Query rewritten = indexSearcher.rewrite(booleanQuery); filterWeight = indexSearcher.createWeight(rewritten, ScoreMode.COMPLETE_NO_SCORES, 1f); +} else { + filterWeight = null; } -for (LeafReaderContext ctx : reader.leaves()) { - TopDocs results = searchLeaf(ctx, filterWeight); - if (ctx.docBase > 0) { -for (ScoreDoc scoreDoc : results.scoreDocs) { - scoreDoc.doc += ctx.docBase; -} - } - perLeafResults[ctx.ord] = results; -} +TopDocs[] perLeafResults = +reader.leaves().stream() +.map( +ctx -> { + Supplier supplier = + () -> { +try { + TopDocs results = searchLeaf(ctx, filterWeight); + if (ctx.docBase > 0) { +for (ScoreDoc scoreDoc : results.scoreDocs) { + scoreDoc.doc += ctx.docBase; +} + } + return results; +} catch (Exception e) { + throw new CompletionException(e); +} + }; + + Executor executor = indexSearcher.getExecutor(); + if (executor == null) { +return CompletableFuture.completedFuture(supplier.get()); + } else { +return CompletableFuture.supplyAsync(supplier, executor); Review Comment: In `IndexSearcher` we're using [`SliceExecutor`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/SliceExecutor.java#L67) to make sure the main thread is also doing some work but not only wait for joining. I think we can replicate the same logic here? (Since KNN search is likely to be slow so probably the main thread should do some work as well?) Maybe we can just use the `SliceExecutor` from `IndexSearcher` so that it might also kind of solving the load balancing problem? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a diff in pull request #12158: Clone the BytesRef[] values in KeywordField#newSetQuery
zhaih commented on code in PR #12158: URL: https://github.com/apache/lucene/pull/12158#discussion_r1113646765 ## lucene/core/src/java/org/apache/lucene/document/KeywordField.java: ## @@ -169,7 +169,7 @@ public static Query newSetQuery(String field, BytesRef... values) { Objects.requireNonNull(field, "field must not be null"); Objects.requireNonNull(values, "values must not be null"); return new IndexOrDocValuesQuery( -new TermInSetQuery(field, values), new SortedSetDocValuesSetQuery(field, values)); +new TermInSetQuery(field, values), new SortedSetDocValuesSetQuery(field, values.clone())); Review Comment: Should we just move the clone into the constructor to prevent future mistakes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org