[GitHub] [lucene] jpountz merged pull request #12064: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String)
jpountz merged PR #12064: URL: https://github.com/apache/lucene/pull/12064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12064: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String)
jpountz commented on PR #12064: URL: https://github.com/apache/lucene/pull/12064#issuecomment-1378387639 @benwtrent Would you mind working on a backport PR, since there are a few conflicts that need resolving? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang opened a new issue, #12074: Enhance XXXField#newRangeQuery
LuXugang opened a new issue, #12074: URL: https://github.com/apache/lucene/issues/12074 ### Description Since dim of XXXField's point value is always `1`, should we introduce `IndexSortSortedNumericDocValuesRangeQuery` to `IntFiled#newRangeQuery` and `LongField#newRangeQuery` ? This idea is also mentioned by @jpountz in [LUCENE-10446](https://issues.apache.org/jira/browse/LUCENE-10446). like : ```java public static Query newRangeQuery(String field, int lowerValue, int upperValue) { PointRangeQuery.checkArgs(field, lowerValue, upperValue); Query fallbackQuery = IntPoint.newRangeQuery(field, lowerValue, upperValue); return new IndexOrDocValuesQuery( new IndexSortSortedNumericDocValuesRangeQuery(field, lowerValue, upperValue, fallbackQuery), SortedNumericDocValuesField.newSlowRangeQuery(field, lowerValue, upperValue)); } ``` If above is possible, we should also need to move `IndexSortSortedNumericDocValuesRangeQuery` from sandbox to core module? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #12052: Cut over Lucene Demo from LongPoint to LongField.
jpountz merged PR #12052: URL: https://github.com/apache/lucene/pull/12052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #12070: Never throttle creation of compound files.
jpountz merged PR #12070: URL: https://github.com/apache/lucene/pull/12070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz closed issue #12068: Is it right to throttle the creation of compound files?
jpountz closed issue #12068: Is it right to throttle the creation of compound files? URL: https://github.com/apache/lucene/issues/12068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12053: Allow reusing indexed binary fields.
jpountz commented on PR #12053: URL: https://github.com/apache/lucene/pull/12053#issuecomment-1378448862 @rmuir Do you have thoughts on this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12053: Allow reusing indexed binary fields.
rmuir commented on PR #12053: URL: https://github.com/apache/lucene/pull/12053#issuecomment-1378455423 @jpountz I will try to review this today. Sorry for the delay. I haven't written java code in years, i'm crazy busy at work, and i try to give more time to `.document` api all contributing to my slowness. I had in mind that i wanted to checkout this branch and "poke" too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12053: Allow reusing indexed binary fields.
jpountz commented on PR #12053: URL: https://github.com/apache/lucene/pull/12053#issuecomment-1378457531 Thank you, and no worries at all about the delay, I just wanted to check if it was still on your mind since you said you were interested in looking into it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent opened a new pull request, #12075: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String) (#12064)
benwtrent opened a new pull request, #12075: URL: https://github.com/apache/lucene/pull/12075 Backport of #12064 This completes the refactoring as described in: https://github.com/apache/lucene/issues/11963 This commit: - splits out `ByteVectorValues` from `VectorValues`. - Adds `getByteVectorValues(String field)` to `KnnVectorsReader` - Adds a new `KnnByteVectorField` and disallows `BytesRef` values in the `KnnVectorField` - No longer allows `ByteVectorValues` to be read from a `KnnVectorField`. These refactors are difficult to split up any further. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12075: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String) (#12064)
benwtrent commented on PR #12075: URL: https://github.com/apache/lucene/pull/12075#issuecomment-1378802761 @jpountz backport :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sherman opened a new issue, #12076: The question about MultiRangeQuery.
sherman opened a new issue, #12076: URL: https://github.com/apache/lucene/issues/12076 Hi! AFAICS, the family of multiple ranges queries (MultiRangeQuery) were added to the sandbox module recently. In our index we have a few indexed fields with type long (aren't ranges, it's just long ids, dimension = 1). We have about 16% of search queries which include a predicate like: where field in (id1, id2, id3), and the length of argument list is over 40 elements. Currently, our parser maps such predicates as a list of separate PointRangeQuery. When I found MultiRangeQuery, I immediately started the tests to see the way how to optimize such queries. The first results were impressive, because I saw the difference in the amount of data were read from mapped files. The list of PointRangeQuery ``` 16:39:23.587 [ProfilingSearchThread] INFO com.ozon.grpc.BlockingRequestExecutor - Query statistics [null]. Seeks: [2363], Bytes were read: [78198], Shorts were read: [1266], Ints were read: [48], Longs were read: [9821], Floats were read: [0], Total bytes were read: [159490] 16:39:27.030 [ProfilingSearchThread] INFO com.ozon.grpc.BlockingRequestExecutor - Query statistics [null]. Seeks: [2363], Bytes were read: [78198], Shorts were read: [1266], Ints were read: [48], Longs were read: [9821], Floats were read: [0], Total bytes were read: [159490] 16:39:35.890 [ProfilingSearchThread] INFO com.ozon.grpc.BlockingRequestExecutor - Query statistics [null]. Seeks: [2363], Bytes were read: [78198], Shorts were read: [1266], Ints were read: [48], Longs were read: [9821], Floats were read: [0], Total bytes were read: [159490] ``` A single MultiRangeQuery ``` 16:49:02.372 [ProfilingSearchThread] INFO com.ozon.grpc.BlockingRequestExecutor - Query statistics [null]. Seeks: [505], Bytes were read: [61917], Shorts were read: [1266], Ints were read: [48], Longs were read: [9053], Floats were read: [0], Total bytes were read: [137065] 16:49:03.794 [ProfilingSearchThread] INFO com.ozon.grpc.BlockingRequestExecutor - Query statistics [null]. Seeks: [505], Bytes were read: [61917], Shorts were read: [1266], Ints were read: [48], Longs were read: [9053], Floats were read: [0], Total bytes were read: [137065] 16:49:05.582 [ProfilingSearchThread] INFO com.ozon.grpc.BlockingRequestExecutor - Query statistics [null]. Seeks: [505], Bytes were read: [61917], Shorts were read: [1266], Ints were read: [48], Longs were read: [9053], Floats were read: [0], Total bytes were read: [137065] ``` The numbers above said in case of MultiRangeQuery we have a reduction of seek and read long operations, as expected. Surprisingly, MultiRangeQuery is slower in terms of wall clock time. I compared the timings for the same queries and I see the significant slowdown in the build score stage. The list of PointRangeQuery ``` "timings": { "score": "50297", "build_scorer_count": "2", "match_count": "29", "create_weight": "562343", "next_doc": "252004", "match": "12136", "create_weight_count": "1", "next_doc_count": "30", "score_count": "29", "build_scorer": "2884027", "advance": "0", "advance_count": "0" }, "statistics": { "match_avg": "418", "next_doc_avg": "8400", "build_scorer_avg": "1442013", "score_avg": "1734" }, "total": "3760898" } ``` A single MultiRangeQuery ``` timings": { "score": "7793", "build_scorer_count": "2", "match_count": "0", "create_weight": "7762", "next_doc": "12713", "match": "0", "create_weight_count": "1", "next_doc_count": "30", "score_count": "29", "build_scorer": "5283654", "advance": "0", "advance_count": "0" }, "statistics": { "next_doc_avg": "423", "build_scorer_avg": "2641827", "score_avg": "268" }, "total": "5311984" } ``` As you can see, build_scorer is pretty "fat" in MultiRangeQuery. I guess, the most of time was spent for a building an interval tree in the method: `MultiRangeQuery.scorerSupplier()` Is it expected behaviour? And what the case of usage for MultiRangeQuery (a number of ranges reduction through the merges is probably one of, but it's not mine)? Is any known problems in the interval tree implementation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-
[GitHub] [lucene] jpountz merged pull request #12075: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String) (#12064)
jpountz merged PR #12075: URL: https://github.com/apache/lucene/pull/12075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12075: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String) (#12064)
jpountz commented on PR #12075: URL: https://github.com/apache/lucene/pull/12075#issuecomment-1378896877 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check
jpountz commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1378939790 Do these queries actually use `ReqExclScorer`? I would have expected them to use `ReqExclBulkScorer`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check
rmuir commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1378947355 I cant keep up with the conditions where bulk is unsuitable. But just generally, my concern is to implement a best-case 1-2% speedup and create a large worst-case slowdown somewhere else. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check
jpountz commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1378951278 Sorry my comment was not targeted at your comment but at understanding why we're seeing a speedup with this change, since I wouldn't expect this scorer to be used (though my understanding could be outdated). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067144911 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEquals(new MatchNoDocsQuery(), searcher.rewrite(bq2)); } + public void testDeeplyNestedBooleanRewrite() throws IOException { +IndexSearcher searcher = newSearcher(new MultiReader()); + +Directory dir = newDirectory(); +(new RandomIndexWriter(random(), dir)).close(); +IndexReader r = DirectoryReader.open(dir); +Function termQueryFunction = +(i) -> new TermQuery(new Term("layer[" + i + "]", "foo")); +int depth = TestUtil.nextInt(random(), 10, 30); +TermQuery tq = termQueryFunction.apply(depth); +Query expectedQuery = new BooleanQuery.Builder().add(tq, Occur.FILTER).build(); Review Comment: Would it make sense to write a similar test for boolean queries that do need scoring? I am worried that the wrapping we do in the main rewrite has some impact too but we don't exercise it as much with this test. ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,17 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOException { for (BooleanClause clause : clauses) { Query query = clause.getQuery(); - Query rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); - if (rewritten instanceof ConstantScoreQuery) { -rewritten = ((ConstantScoreQuery) rewritten).getQuery(); + Query rewritten; + // If the clause query is a boolean query, we shouldn't call ConstantScoreQuery#rewrite as + // this causes + // exponential growth of runtime. + if (query instanceof BooleanQuery booleanQuery) { +rewritten = booleanQuery.rewriteNoScoring(indexSearcher); Review Comment: I would make the same change to the main rewrite method. That will reduce further the amount of time we spend on needlessly rewriting boolean queries. ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEquals(new MatchNoDocsQuery(), searcher.rewrite(bq2)); } + public void testDeeplyNestedBooleanRewrite() throws IOException { +IndexSearcher searcher = newSearcher(new MultiReader()); + +Directory dir = newDirectory(); +(new RandomIndexWriter(random(), dir)).close(); +IndexReader r = DirectoryReader.open(dir); +Function termQueryFunction = +(i) -> new TermQuery(new Term("layer[" + i + "]", "foo")); +int depth = TestUtil.nextInt(random(), 10, 30); +TermQuery tq = termQueryFunction.apply(depth); +Query expectedQuery = new BooleanQuery.Builder().add(tq, Occur.FILTER).build(); +Query deepBuilder = new BooleanQuery.Builder().add(tq, Occur.MUST).build(); +for (int i = depth; i > 0; i--) { + tq = termQueryFunction.apply(i); + // Do this to accurately set setMinimumNumberShouldMatch to the number of should clauses. + // This makes setting expectation for rewrite much easier. + boolean useShoulds = random().nextBoolean(); Review Comment: I wonder if we should have a separate test with should clauses, just to simplify the test a bit. ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEquals(new MatchNoDocsQuery(), searcher.rewrite(bq2)); } + public void testDeeplyNestedBooleanRewrite() throws IOException { +IndexSearcher searcher = newSearcher(new MultiReader()); + +Directory dir = newDirectory(); +(new RandomIndexWriter(random(), dir)).close(); +IndexReader r = DirectoryReader.open(dir); +Function termQueryFunction = +(i) -> new TermQuery(new Term("layer[" + i + "]", "foo")); +int depth = TestUtil.nextInt(random(), 10, 30); +TermQuery tq = termQueryFunction.apply(depth); +Query expectedQuery = new BooleanQuery.Builder().add(tq, Occur.FILTER).build(); +Query deepBuilder = new BooleanQuery.Builder().add(tq, Occur.MUST).build(); +for (int i = depth; i > 0; i--) { + tq = termQueryFunction.apply(i); + // Do this to accurately set setMinimumNumberShouldMatch to the number of should clauses. + // This makes setting expectation for rewrite much easier. + boolean useShoulds = random().nextBoolean(); + BooleanQuery.Builder bq = + new BooleanQuery.Builder() + .setMinimumNumberShouldMatch(useShoulds ? 2 : 0) + .add(tq, useShoulds ? Occur.SHOULD : Occur.MUST) + .add(deepBuilder, useShoulds ? Occur.SHOULD : Occur.MUST); + deepBuilder = bq.build(); + BooleanQuery.Builder expectedBq = new BooleanQuery.Builder().add(tq, Occur.FILT
[GitHub] [lucene] rmuir commented on issue #12076: The question about MultiRangeQuery.
rmuir commented on issue #12076: URL: https://github.com/apache/lucene/issues/12076#issuecomment-1378993896 > In our index we have a few indexed fields with type long (aren't ranges, it's just long ids, dimension = 1). > > We have about 16% of search queries which include a predicate like: where field in (id1, id2, id3), and the length of argument list is over 40 elements. This sounds like you want to use `LongPoint.newSetQuery()` (e.g. PointInSetQuery) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067160561 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEquals(new MatchNoDocsQuery(), searcher.rewrite(bq2)); } + public void testDeeplyNestedBooleanRewrite() throws IOException { +IndexSearcher searcher = newSearcher(new MultiReader()); + +Directory dir = newDirectory(); +(new RandomIndexWriter(random(), dir)).close(); +IndexReader r = DirectoryReader.open(dir); +Function termQueryFunction = +(i) -> new TermQuery(new Term("layer[" + i + "]", "foo")); +int depth = TestUtil.nextInt(random(), 10, 30); +TermQuery tq = termQueryFunction.apply(depth); +Query expectedQuery = new BooleanQuery.Builder().add(tq, Occur.FILTER).build(); Review Comment: The key issue for the bug was when there was no scoring, and thus fell into the `rewriteNoScoring` case from the CSQ. All my testing indicates that runtime doesn't exponentially grow when we need scoring as we don't use the CSQ redirect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067162137 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEquals(new MatchNoDocsQuery(), searcher.rewrite(bq2)); } + public void testDeeplyNestedBooleanRewrite() throws IOException { +IndexSearcher searcher = newSearcher(new MultiReader()); + +Directory dir = newDirectory(); +(new RandomIndexWriter(random(), dir)).close(); +IndexReader r = DirectoryReader.open(dir); +Function termQueryFunction = +(i) -> new TermQuery(new Term("layer[" + i + "]", "foo")); +int depth = TestUtil.nextInt(random(), 10, 30); +TermQuery tq = termQueryFunction.apply(depth); +Query expectedQuery = new BooleanQuery.Builder().add(tq, Occur.FILTER).build(); +Query deepBuilder = new BooleanQuery.Builder().add(tq, Occur.MUST).build(); +for (int i = depth; i > 0; i--) { + tq = termQueryFunction.apply(i); + // Do this to accurately set setMinimumNumberShouldMatch to the number of should clauses. + // This makes setting expectation for rewrite much easier. + boolean useShoulds = random().nextBoolean(); + BooleanQuery.Builder bq = + new BooleanQuery.Builder() + .setMinimumNumberShouldMatch(useShoulds ? 2 : 0) + .add(tq, useShoulds ? Occur.SHOULD : Occur.MUST) + .add(deepBuilder, useShoulds ? Occur.SHOULD : Occur.MUST); + deepBuilder = bq.build(); + BooleanQuery.Builder expectedBq = new BooleanQuery.Builder().add(tq, Occur.FILTER); + if (i == depth - 1) { +expectedBq.add(termQueryFunction.apply(depth), Occur.FILTER); + } else { +expectedBq.add(expectedQuery, Occur.FILTER); + } + expectedQuery = expectedBq.build(); +} +BooleanQuery bq = new BooleanQuery.Builder().add(deepBuilder, Occur.FILTER).build(); +expectedQuery = new BoostQuery(new ConstantScoreQuery(expectedQuery), 0.0f); +Query rewritten = searcher.rewrite(bq); +r.close(); +dir.close(); +assertEquals(expectedQuery, rewritten); Review Comment: I wrote this test originally to compare my change with the original implementation. So, I wrote the test to pass the original optimization. Then I verified my change didn't break anything. I did use it for checking runtime as well, but that was secondary to making sure I didn't break anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #12072: Fix exponential runtime for Boolean#rewrite
javanna commented on PR #12072: URL: https://github.com/apache/lucene/pull/12072#issuecomment-1379007346 I did some additional testing to understand the impact of the regression and in light of that I view this as a bug rather than a performance regression, because we end up performing many needless rewrite steps that waste resources and end up making rewrite take tens of seconds depending on the depth of a boolean query. I added some logging to the rewrite methods involved to more clearly show the issue. The following is the output from a boolean query with depth 3. ``` IndexSearcher searcher = newSearcher(new MultiReader()); Directory dir = newDirectory(); (new RandomIndexWriter(random(), dir)).close(); IndexReader r = DirectoryReader.open(dir); int depth = 3; BooleanQuery.Builder bq = new BooleanQuery.Builder().add(new TermQuery(new Term("field", "value")), Occur.MUST); for (int i = 0; i < depth; i++) { bq = new BooleanQuery.Builder().add(bq.build(), Occur.MUST).add(new TermQuery(new Term("depth" + (depth - i), "value")), Occur.MUST); } BooleanQuery booleanQuery = new BooleanQuery.Builder().add(bq.build(), Occur.FILTER).build(); Query rewritten = searcher.rewrite(booleanQuery); System.out.println("final rewritten query: " + rewritten); r.close(); dir.close(); ``` Output before the fix (87 lines): ``` boolean query rewrite: #(+(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value) -> (ConstantScore(+(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value))^0.0 -- top level rewrite round completed + constant score query rewriting inner query: +(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value boolean query rewrite: +field:value -> field:value boolean query rewrite: +(+field:value) +depth3:value -> +field:value +depth3:value boolean query rewrite: +(+(+field:value) +depth3:value) +depth2:value -> +(+field:value +depth3:value) +depth2:value boolean query rewrite: +(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value -> +(+(+field:value +depth3:value) +depth2:value) +depth1:value + constant score query rewriting inner boolean query no scoring: +(+(+field:value +depth3:value) +depth2:value) +depth1:value + Artificially wrapping into constant score query to rewrite + constant score query rewriting inner query: +(+field:value +depth3:value) +depth2:value boolean query rewrite: +field:value +depth3:value -> +field:value +depth3:value boolean query rewrite: +(+field:value +depth3:value) +depth2:value -> +(+field:value +depth3:value) +depth2:value + constant score query rewriting inner boolean query no scoring: +(+field:value +depth3:value) +depth2:value + Artificially wrapping into constant score query to rewrite + constant score query rewriting inner query: +field:value +depth3:value boolean query rewrite: +field:value +depth3:value -> +field:value +depth3:value + constant score query rewriting inner boolean query no scoring: +field:value +depth3:value + Artificially wrapping into constant score query to rewrite + constant score query rewriting inner query: field:value + Artificially wrapping into constant score query to rewrite + constant score query rewriting inner query: depth3:value boolean query rewrite no scoring: +field:value +depth3:value -> #field:value #depth3:value + Artificially wrapping into constant score query to rewrite + constant score query rewriting inner query: depth2:value boolean query rewrite no scoring: +(+field:value +depth3:value) +depth2:value -> #(#field:value #depth3:value) #depth2:value + Artificially wrapping into constant score query to rewrite + constant score query rewriting inner query: depth1:value boolean query rewrite no scoring: +(+(+field:value +depth3:value) +depth2:value) +depth1:value -> #(#(#field:value #depth3:value) #depth2:value) #depth1:value -- top level rewrite round completed + constant score query rewriting inner query: #(#(#field:value #depth3:value) #depth2:value) #depth1:value + constant score query rewriting inner query: #(#field:value #depth3:value) #depth2:value + constant score query rewriting inner query: #field:value #depth3:value + constant score query rewriting inner query: field:value + constant score query rewriting inner query: depth3:value boolean query rewrite: #field:value #depth3:value -> #field:value #depth3:value + constant score query rewriting inner boolean query no scoring: #field:value #depth3:value + Artificially wrapping into constant score query to rewrite + constant score query rewriting inner query: field:value + Artificially wrapping into constant score query to rewrite + constant score query rewriting inner query: depth3:v
[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067175218 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEquals(new MatchNoDocsQuery(), searcher.rewrite(bq2)); } + public void testDeeplyNestedBooleanRewrite() throws IOException { +IndexSearcher searcher = newSearcher(new MultiReader()); + +Directory dir = newDirectory(); +(new RandomIndexWriter(random(), dir)).close(); +IndexReader r = DirectoryReader.open(dir); +Function termQueryFunction = +(i) -> new TermQuery(new Term("layer[" + i + "]", "foo")); +int depth = TestUtil.nextInt(random(), 10, 30); +TermQuery tq = termQueryFunction.apply(depth); +Query expectedQuery = new BooleanQuery.Builder().add(tq, Occur.FILTER).build(); Review Comment: Also, the wrapping (I think you mean in the CSQ) we do in the main rewrite is ONLY in the non-scoring case, when `BooleanClause.Occur#FILTER` or `BooleanClause.Occur#MUST_NOT`. So, only `BooleanQuery` objects that are under those non-scoring clauses would be wrapped. I am not sure what we would test in this case... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sherman closed issue #12076: The question about MultiRangeQuery.
sherman closed issue #12076: The question about MultiRangeQuery. URL: https://github.com/apache/lucene/issues/12076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sherman commented on issue #12076: The question about MultiRangeQuery.
sherman commented on issue #12076: URL: https://github.com/apache/lucene/issues/12076#issuecomment-1379062301 @rmuir Awesome! That's what I really need. Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #12050: Reuse HNSW graph for intialization during merge
msokolov commented on PR #12050: URL: https://github.com/apache/lucene/pull/12050#issuecomment-1379112305 > To support this functionality, a couple of changes to current graph construction process needed to be made. OnHeapHnswGraph had to support out of order insertion. This is because the mapped ordinals of the nodes in the graph used for initialization are not necessarily the first X ordinals in the new graph. I'm having trouble wrapping my head around this. When we start merging some field, each segment seg has a graph with ordinals in [0,seg.size]. Why can't we preserve the ordinals from the largest segment, and then let the others fall where they may? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #12071: Can we better take advantage of compact strings?
msokolov commented on issue #12071: URL: https://github.com/apache/lucene/issues/12071#issuecomment-1379144309 I wonder if we could update `UnicodeUtil` to use `getBytes` internally? It could check the size of the byte array, and if it is equal to the length of the string, then just return it. Otherwise it could scan the string for the replacement char and update it in place before returning, in order to maintain backward compatibility. Not sure if it would give back the gains in those cases though? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
jpountz commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067223779 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,17 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOException { for (BooleanClause clause : clauses) { Query query = clause.getQuery(); - Query rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); - if (rewritten instanceof ConstantScoreQuery) { -rewritten = ((ConstantScoreQuery) rewritten).getQuery(); + Query rewritten; + // If the clause query is a boolean query, we shouldn't call ConstantScoreQuery#rewrite as + // this causes + // exponential growth of runtime. + if (query instanceof BooleanQuery booleanQuery) { +rewritten = booleanQuery.rewriteNoScoring(indexSearcher); + } else { +rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); +if (rewritten instanceof ConstantScoreQuery constantScoreQuery) { + rewritten = constantScoreQuery.getQuery(); +} Review Comment: I'm contemplating replacing your fix with this one which is similar: ```patch diff --git a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java index 0354280eb01..be323bf0e4c 100644 --- a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java +++ b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java @@ -203,7 +203,13 @@ public class BooleanQuery extends Query implements Iterable { for (BooleanClause clause : clauses) { Query query = clause.getQuery(); - Query rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); + // NOTE: rewritingNoScoring() should not call rewrite(), otherwise this + // method could run in exponential time with the depth of the query as + // every new level would rewrite 2x more than its parent level. + Query rewritten = query; + if (rewritten instanceof BoostQuery) { +rewritten = ((BoostQuery) query).getQuery(); + } if (rewritten instanceof ConstantScoreQuery) { rewritten = ((ConstantScoreQuery) rewritten).getQuery(); } ``` I think it would be a bit more robust, as it would keep working if there are `ConstantScoreQuery` wrappers between the inner levels of `BooleanQuery`s. ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEquals(new MatchNoDocsQuery(), searcher.rewrite(bq2)); } + public void testDeeplyNestedBooleanRewrite() throws IOException { +IndexSearcher searcher = newSearcher(new MultiReader()); + +Directory dir = newDirectory(); +(new RandomIndexWriter(random(), dir)).close(); +IndexReader r = DirectoryReader.open(dir); +Function termQueryFunction = +(i) -> new TermQuery(new Term("layer[" + i + "]", "foo")); +int depth = TestUtil.nextInt(random(), 10, 30); +TermQuery tq = termQueryFunction.apply(depth); +Query expectedQuery = new BooleanQuery.Builder().add(tq, Occur.FILTER).build(); +Query deepBuilder = new BooleanQuery.Builder().add(tq, Occur.MUST).build(); +for (int i = depth; i > 0; i--) { + tq = termQueryFunction.apply(i); + // Do this to accurately set setMinimumNumberShouldMatch to the number of should clauses. + // This makes setting expectation for rewrite much easier. + boolean useShoulds = random().nextBoolean(); + BooleanQuery.Builder bq = + new BooleanQuery.Builder() + .setMinimumNumberShouldMatch(useShoulds ? 2 : 0) + .add(tq, useShoulds ? Occur.SHOULD : Occur.MUST) + .add(deepBuilder, useShoulds ? Occur.SHOULD : Occur.MUST); + deepBuilder = bq.build(); + BooleanQuery.Builder expectedBq = new BooleanQuery.Builder().add(tq, Occur.FILTER); + if (i == depth - 1) { +expectedBq.add(termQueryFunction.apply(depth), Occur.FILTER); + } else { +expectedBq.add(expectedQuery, Occur.FILTER); + } + expectedQuery = expectedBq.build(); +} +BooleanQuery bq = new BooleanQuery.Builder().add(deepBuilder, Occur.FILTER).build(); +expectedQuery = new BoostQuery(new ConstantScoreQuery(expectedQuery), 0.0f); +Query rewritten = searcher.rewrite(bq); +r.close(); +dir.close(); +assertEquals(expectedQuery, rewritten); Review Comment: I wonder if we could improve the test by e.g. having a special query implementation that would count the number of times that rewrite() gets called on the innermost query of this tree of queries, and assert that rewrite() has only been called once. -- This is an automated message from the Apache Git Service. To respond to th
[GitHub] [lucene] jpountz commented on issue #12074: Enhance XXXField#newRangeQuery
jpountz commented on issue #12074: URL: https://github.com/apache/lucene/issues/12074#issuecomment-1379151763 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on issue #11902: Customization of Edit distance costs for different operations
tang-hi commented on issue #11902: URL: https://github.com/apache/lucene/issues/11902#issuecomment-1379206167 Lucene does not calculate the Levenshtein distance one by one. Instead, it precompiles the Levenshtein automaton based on your output, and then finds terms that meet the distance requirements. The state transitions of the Levenshtein automaton are also already hard-coded in the code.This is also why the maximum edit distance supported by Lucene is 2. I think it would be difficult to support custom distances with different operation costs, because this would involve reworking the code related to the Levenshtein automaton. If you're interested, you can check out the following resources. [Blog](https://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jmazanec15 commented on pull request #12050: Reuse HNSW graph for intialization during merge
jmazanec15 commented on PR #12050: URL: https://github.com/apache/lucene/pull/12050#issuecomment-1379247267 @msokolov The main reason I did not do this was to avoid having to modify the ordering of the vectors from the MergedVectorValues. I believe that the ordinals in the graph map to the positioning in the the vector values, so they need to be synchronized. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12072: Fix exponential runtime for Boolean#rewrite
benwtrent commented on PR #12072: URL: https://github.com/apache/lucene/pull/12072#issuecomment-1379274249 @jpountz applied your suggestions @javanna I added a rewrite count check and split the should & must test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067299935 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEquals(new MatchNoDocsQuery(), searcher.rewrite(bq2)); } + public void testDeeplyNestedBooleanRewrite() throws IOException { +IndexSearcher searcher = newSearcher(new MultiReader()); + +Directory dir = newDirectory(); +(new RandomIndexWriter(random(), dir)).close(); +IndexReader r = DirectoryReader.open(dir); +Function termQueryFunction = +(i) -> new TermQuery(new Term("layer[" + i + "]", "foo")); +int depth = TestUtil.nextInt(random(), 10, 30); +TermQuery tq = termQueryFunction.apply(depth); +Query expectedQuery = new BooleanQuery.Builder().add(tq, Occur.FILTER).build(); +Query deepBuilder = new BooleanQuery.Builder().add(tq, Occur.MUST).build(); +for (int i = depth; i > 0; i--) { + tq = termQueryFunction.apply(i); + // Do this to accurately set setMinimumNumberShouldMatch to the number of should clauses. + // This makes setting expectation for rewrite much easier. + boolean useShoulds = random().nextBoolean(); + BooleanQuery.Builder bq = + new BooleanQuery.Builder() + .setMinimumNumberShouldMatch(useShoulds ? 2 : 0) + .add(tq, useShoulds ? Occur.SHOULD : Occur.MUST) + .add(deepBuilder, useShoulds ? Occur.SHOULD : Occur.MUST); + deepBuilder = bq.build(); + BooleanQuery.Builder expectedBq = new BooleanQuery.Builder().add(tq, Occur.FILTER); + if (i == depth - 1) { +expectedBq.add(termQueryFunction.apply(depth), Occur.FILTER); + } else { +expectedBq.add(expectedQuery, Occur.FILTER); + } + expectedQuery = expectedBq.build(); +} +BooleanQuery bq = new BooleanQuery.Builder().add(deepBuilder, Occur.FILTER).build(); +expectedQuery = new BoostQuery(new ConstantScoreQuery(expectedQuery), 0.0f); +Query rewritten = searcher.rewrite(bq); +r.close(); +dir.close(); +assertEquals(expectedQuery, rewritten); Review Comment: added verification. It only rewrites once with `MUST`, but `SHOULD` adds some because of how `SHOULD`s change on rewrites. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12071: Can we better take advantage of compact strings?
rmuir commented on issue #12071: URL: https://github.com/apache/lucene/issues/12071#issuecomment-1379311821 there seems to be some confusion, the purpose of unicodeutil is not to allocate String in the first place. It doesnt use String hence getBytes is not really relevant. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12071: Can we better take advantage of compact strings?
rmuir commented on issue #12071: URL: https://github.com/apache/lucene/issues/12071#issuecomment-1379313710 Nor does it allocate stuff. that's a problem with String.getBytes is that it forces allocation too. Sorry, I don't see anything here. If you want to speed up UnicodeUtil conversions, pressure openjdk to release their vector api -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih merged pull request #12051: Fix wrong assertion in TestBooleanQuery.testQueryMatchesCount
zhaih merged PR #12051: URL: https://github.com/apache/lucene/pull/12051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check
gsmiller commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1379399579 @jpountz: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check
gsmiller commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1379404470 @jpountz: > understanding why we're seeing a speedup with this change Good question. I was not familiar with `ReqExclBulkScorer`, but after taking a bit of a look, I have the same question now. I'll have to spend more time looking at these benchmark tasks to understand what might be going on here. Thank you for pointing this out. @rmuir: Thanks for the feedback! You raise some good questions here. My intuition was that reducing the size of the first phase match set would be beneficial in general. The reasoning being that, 1) it reduces the number of "candidates" that other potential conjunction clauses need to evaluate, and 2) it reduces the number of candidates that need to go to a second-phase (particularly if there are 2nd phase checks occurring _before_ the exclusion check). But I hear your point about term queries being a best-case scenario here. If the exclusion clauses are more complex, it's more difficult to reason about. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Brain2000 commented on issue #10458: Ordered intervals can give inaccurate hits on interleaved terms [LUCENE-9418]
Brain2000 commented on issue #10458: URL: https://github.com/apache/lucene/issues/10458#issuecomment-1379477576 @romseygeek Two years later, I pinpointed the source of the issue above. It was getting worse where records were just being omitted without any rhyme or reason. It looks like there's a problem in the TopFieldCollector.java where it calls "compareBottom" without calling "setBottom" first. I believe this is an issue if getLeafComparer( ) is called more than once. I found a workaround is to rescope the variable storing the bottom slot set from calling "setBottom( )", to outside of the LeafFieldComparator class to make it persistent (just like the setTopValue( ) is called outside of the LeafFieldComparator class). Making the bottom slot number persistent *should* also be safe, since the documentation states that setBottom will always be called before compareBottom. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067459267 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEquals(new MatchNoDocsQuery(), searcher.rewrite(bq2)); } + public void testDeeplyNestedBooleanRewrite() throws IOException { +IndexSearcher searcher = newSearcher(new MultiReader()); + +Directory dir = newDirectory(); +(new RandomIndexWriter(random(), dir)).close(); +IndexReader r = DirectoryReader.open(dir); +Function termQueryFunction = +(i) -> new TermQuery(new Term("layer[" + i + "]", "foo")); +int depth = TestUtil.nextInt(random(), 10, 30); +TermQuery tq = termQueryFunction.apply(depth); +Query expectedQuery = new BooleanQuery.Builder().add(tq, Occur.FILTER).build(); +Query deepBuilder = new BooleanQuery.Builder().add(tq, Occur.MUST).build(); +for (int i = depth; i > 0; i--) { + tq = termQueryFunction.apply(i); + // Do this to accurately set setMinimumNumberShouldMatch to the number of should clauses. + // This makes setting expectation for rewrite much easier. + boolean useShoulds = random().nextBoolean(); + BooleanQuery.Builder bq = + new BooleanQuery.Builder() + .setMinimumNumberShouldMatch(useShoulds ? 2 : 0) + .add(tq, useShoulds ? Occur.SHOULD : Occur.MUST) + .add(deepBuilder, useShoulds ? Occur.SHOULD : Occur.MUST); + deepBuilder = bq.build(); + BooleanQuery.Builder expectedBq = new BooleanQuery.Builder().add(tq, Occur.FILTER); + if (i == depth - 1) { +expectedBq.add(termQueryFunction.apply(depth), Occur.FILTER); + } else { +expectedBq.add(expectedQuery, Occur.FILTER); + } + expectedQuery = expectedBq.build(); +} +BooleanQuery bq = new BooleanQuery.Builder().add(deepBuilder, Occur.FILTER).build(); +expectedQuery = new BoostQuery(new ConstantScoreQuery(expectedQuery), 0.0f); +Query rewritten = searcher.rewrite(bq); +r.close(); +dir.close(); +assertEquals(expectedQuery, rewritten); Review Comment: Thanks, this is what I was looking for but I was not sure how to achieve :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067468924 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOException { for (BooleanClause clause : clauses) { Query query = clause.getQuery(); - Query rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); - if (rewritten instanceof ConstantScoreQuery) { -rewritten = ((ConstantScoreQuery) rewritten).getQuery(); + // NOTE: rewritingNoScoring() should not call rewrite(), otherwise this + // method could run in exponential time with the depth of the query as + // every new level would rewrite 2x more than its parent level. + Query rewritten = query; + if (query instanceof BoostQuery) { +rewritten = ((BoostQuery) query).getQuery(); + } + if (query instanceof ConstantScoreQuery) { +rewritten = ((ConstantScoreQuery) query).getQuery(); + } + if (query instanceof BooleanQuery) { +rewritten = ((BooleanQuery) query).rewriteNoScoring(indexSearcher); Review Comment: I believe that we can completely remove the check for boolean query (like Adrien suggested). If we unwrap what's in the boost query the rewriteNoScoring will be taken care of at the next round of rewrite if the inner query is a constant score query. That allows us to remove the IndexSearcher argument from this method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067469765 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOException { for (BooleanClause clause : clauses) { Query query = clause.getQuery(); - Query rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); - if (rewritten instanceof ConstantScoreQuery) { -rewritten = ((ConstantScoreQuery) rewritten).getQuery(); + // NOTE: rewritingNoScoring() should not call rewrite(), otherwise this + // method could run in exponential time with the depth of the query as + // every new level would rewrite 2x more than its parent level. + Query rewritten = query; + if (query instanceof BoostQuery) { +rewritten = ((BoostQuery) query).getQuery(); + } + if (query instanceof ConstantScoreQuery) { +rewritten = ((ConstantScoreQuery) query).getQuery(); + } + if (query instanceof BooleanQuery) { +rewritten = ((BooleanQuery) query).rewriteNoScoring(indexSearcher); Review Comment: I would make the same change to the main rewrite method. it simplifies things further quite a bit, even if fixing rewriteNoScoring is already fixing the exponential runtime issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067470681 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,17 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOException { for (BooleanClause clause : clauses) { Query query = clause.getQuery(); - Query rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); - if (rewritten instanceof ConstantScoreQuery) { -rewritten = ((ConstantScoreQuery) rewritten).getQuery(); + Query rewritten; + // If the clause query is a boolean query, we shouldn't call ConstantScoreQuery#rewrite as + // this causes + // exponential growth of runtime. + if (query instanceof BooleanQuery booleanQuery) { +rewritten = booleanQuery.rewriteNoScoring(indexSearcher); + } else { +rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); +if (rewritten instanceof ConstantScoreQuery constantScoreQuery) { + rewritten = constantScoreQuery.getQuery(); +} Review Comment: It seems like what Adrien is suggested removes the need to artificially wrap which makes me happy :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067472865 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOException { for (BooleanClause clause : clauses) { Query query = clause.getQuery(); - Query rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); - if (rewritten instanceof ConstantScoreQuery) { -rewritten = ((ConstantScoreQuery) rewritten).getQuery(); + // NOTE: rewritingNoScoring() should not call rewrite(), otherwise this + // method could run in exponential time with the depth of the query as + // every new level would rewrite 2x more than its parent level. + Query rewritten = query; + if (query instanceof BoostQuery) { +rewritten = ((BoostQuery) query).getQuery(); + } + if (query instanceof ConstantScoreQuery) { +rewritten = ((ConstantScoreQuery) query).getQuery(); + } + if (query instanceof BooleanQuery) { +rewritten = ((BooleanQuery) query).rewriteNoScoring(indexSearcher); Review Comment: Without checking for `BooleanQuery`, the number of rewrites on the innermost query are no longer `1` in the `MUST` clause case. Removing the recursive `rewriteNoScoring` here, makes the total number of rewrites on the innermost query to equal the depth in the `MUST` clause, and 2*depth in the `SHOULD` clause. @jpountz what say you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067473812 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEquals(new MatchNoDocsQuery(), searcher.rewrite(bq2)); } + public void testDeeplyNestedBooleanRewrite() throws IOException { +IndexSearcher searcher = newSearcher(new MultiReader()); + +Directory dir = newDirectory(); +(new RandomIndexWriter(random(), dir)).close(); +IndexReader r = DirectoryReader.open(dir); +Function termQueryFunction = +(i) -> new TermQuery(new Term("layer[" + i + "]", "foo")); +int depth = TestUtil.nextInt(random(), 10, 30); +TermQuery tq = termQueryFunction.apply(depth); +Query expectedQuery = new BooleanQuery.Builder().add(tq, Occur.FILTER).build(); Review Comment: right, I see what you mean. we are good then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067491908 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOException { for (BooleanClause clause : clauses) { Query query = clause.getQuery(); - Query rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); - if (rewritten instanceof ConstantScoreQuery) { -rewritten = ((ConstantScoreQuery) rewritten).getQuery(); + // NOTE: rewritingNoScoring() should not call rewrite(), otherwise this + // method could run in exponential time with the depth of the query as + // every new level would rewrite 2x more than its parent level. + Query rewritten = query; + if (query instanceof BoostQuery) { +rewritten = ((BoostQuery) query).getQuery(); + } + if (query instanceof ConstantScoreQuery) { +rewritten = ((ConstantScoreQuery) query).getQuery(); + } + if (query instanceof BooleanQuery) { +rewritten = ((BooleanQuery) query).rewriteNoScoring(indexSearcher); Review Comment: > I would make the same change to the main rewrite method. it simplifies things further quite a bit, even if fixing rewriteNoScoring is already fixing the exponential runtime issue. I am not sure where you would want to make this change? I suppose I am dense... I am guessing right here: https://github.com/apache/lucene/blob/7e0b142afeed676c4fc6dc40bbc7fcd277df5c34/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java#L287 The whole point is to wrap that query as a ConstantScore query. I guess you don't want it wrapped if its boolean and short circuit that to be a rewriteNoScoring. I was trying something along these lines, but then the inner query never got rewritten in the `MUST` case...which was troubling. But, we can already remove the IndexSearch from the method, which I am doing now in a following commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067505060 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOException { for (BooleanClause clause : clauses) { Query query = clause.getQuery(); - Query rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher); - if (rewritten instanceof ConstantScoreQuery) { -rewritten = ((ConstantScoreQuery) rewritten).getQuery(); + // NOTE: rewritingNoScoring() should not call rewrite(), otherwise this + // method could run in exponential time with the depth of the query as + // every new level would rewrite 2x more than its parent level. + Query rewritten = query; + if (query instanceof BoostQuery) { +rewritten = ((BoostQuery) query).getQuery(); + } + if (query instanceof ConstantScoreQuery) { +rewritten = ((ConstantScoreQuery) query).getQuery(); + } + if (query instanceof BooleanQuery) { +rewritten = ((BooleanQuery) query).rewriteNoScoring(indexSearcher); Review Comment: > I am guessing right here: yes that is what I had in mind. I was thinking the logic is exactly the same as what we have in rewriteNoScoring and we should be able to align the two. I have observed even less intermediate rewriting in my tests with that additional change. If it complicates things it's not required, I was assuming it does not. > But, we can already remove the IndexSearch from the method That's great. It basically enforces that we don't call rewrite (the note you added) cause we no longer have a searcher to provide to it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] hossman opened a new issue, #12077: WordBreakSpellChecker.maxEvaluations usage in generateBreakUpSuggestions() makes no sense
hossman opened a new issue, #12077: URL: https://github.com/apache/lucene/issues/12077 ### Description `WordBreakSpellChecker` has a `maxEvaluations` config option (default: 1000) which is suppose to be the "maximum number of word combinations to evaluate" but the way this setting is used in `generateBreakUpSuggestions()` makes no sense. The crux of this method is to iteratively loop over each character in the input to consider if splitting at that point in the string produces valid suggestion words on the left & right side of the split. If the left side is a valid suggestion, then regardless of whether the right side was, recursively process the right side. As it's looping, it compares `maxEvaluations` to a `totalEvaluations` counter which is incremented on each iteration, as well as being passed down in each recursive call, breaking out of the loop if `totalEvaluations >= maxEvaluations`. But it *also* maintains a `thisTimeEvaluations` counter, which is only incremented based on the loop iterations, and this is what the method returns -- meaning that even if a nested recursive call exceeds `maxEvaluations`, the return value it gives back to it's caller is a much smaller number (bounded by the length of the string), and the caller will proceed to do more iterations (and more recursive calls) It's such a weird implementation choice (to explicitly maintain these two variables) that I'm sure my explanation above is hard to make sense of (it seems like it must have been intentional -- but i can't understand why?) and I confused myself 3 times trying to write it. The only way i found to really wrap my head around how "wrong" this is was by adding some sysout messages to the methods showing just how much iterating & recursing it does even when `maxEvaluations`. As a result of this behavior, it's actually possible for `WordBreakSpellChecker.suggestWordBreaks(...)` to return significantly more suggestions then the setting of `maxEvaluations` --- I'm attaching: * [WordBreakSpellChecker.nocommit.sysout.patch.txt](https://github.com/apache/lucene/files/10396330/WordBreakSpellChecker.nocommit.sysout.patch.txt) with the sysout logging i mentioned tp help visualize the recursion, and a trivial test case showing that even with `maxEvaluations=100` calling `suggestWordBreaks(...)` (on a 20 character input string) can generate 500+ suggested parsings. * [OUTPUT-org.apache.lucene.search.spell.TestWordBreakSpellChecker.txt](https://github.com/apache/lucene/files/10396366/OUTPUT-org.apache.lucene.search.spell.TestWordBreakSpellChecker.txt) with the output of the nocommit sysout println's from the above mentioned test -- which shows `generateSuggestWord()` (the method that looks up the `docFreq` to see if a "word" is valid) was called 35013 times! * [WordBreakSpellChecker.fix.patch.txt](https://github.com/apache/lucene/files/10396636/WordBreakSpellChecker.fix.patch.txt) with my suggested fix (and the same test which now passes) ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] hossman commented on issue #12077: WordBreakSpellChecker.maxEvaluations usage in generateBreakUpSuggestions() makes no sense
hossman commented on issue #12077: URL: https://github.com/apache/lucene/issues/12077#issuecomment-1379633112 FWIW: It also seems strange to me that this method is essentially doing a "depth first" walk of the possible splits, given that it's working a character at a time and the only possible `BreakSuggestionSortMethod` values start with `NUM_CHANGES_THEN_...`. it seems like we could get "better" results, with lower values of `maxEvaluations`, and less recursion (even if `maxChanges` is very large) if the logic was something like: * init a BitSet the same length as our input * loop over each character postion (`i`) * break if `totalEvaluations >= maxEvaluations` otherwise increment `totalEvaluations` * if `leftWord` is "valid" suggestion, record `i` in our bitset * if `rightWord` is also a "valid" suggestion, offer this left+right combo to our `suggestions` queue * if `numberBreaks` has not yet exceeded `maxChanges`: * loop over each set bit (`i`) in our BitSet: * break if `totalEvaluations >= maxEvaluations` otherwise increment `totalEvaluations` * recursively parse the portion of our input to the "right" of `i` (in the context of a new prefix using the portion to the "left" of `i` and an incremented `numberBreaks`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang opened a new pull request, #12078: Enhance XXXField#newRangeQuery
LuXugang opened a new pull request, #12078: URL: https://github.com/apache/lucene/pull/12078 Introduce `IndexSortSortedNumericDocValuesRangeQuery` to `IntFiled#newRangeQuery` and `LongField#newRangeQuery`. See more discussion https://github.com/apache/lucene/issues/12074 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge
jmazanec15 commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1067742826 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -94,36 +93,83 @@ public int size() { } /** - * Add node on the given level + * Add node on the given level. Nodes can be inserted out of order, but it requires that the nodes Review Comment: That makes sense. This shouldnt be a big problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org