[GitHub] [lucene] jpountz merged pull request #12064: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String)

2023-01-11 Thread GitBox
jpountz merged PR #12064: URL: https://github.com/apache/lucene/pull/12064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on pull request #12064: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String)

2023-01-11 Thread GitBox
jpountz commented on PR #12064: URL: https://github.com/apache/lucene/pull/12064#issuecomment-1378387639 @benwtrent Would you mind working on a backport PR, since there are a few conflicts that need resolving? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [lucene] LuXugang opened a new issue, #12074: Enhance XXXField#newRangeQuery

2023-01-11 Thread GitBox
LuXugang opened a new issue, #12074: URL: https://github.com/apache/lucene/issues/12074 ### Description Since dim of XXXField's point value is always `1`, should we introduce `IndexSortSortedNumericDocValuesRangeQuery` to `IntFiled#newRangeQuery` and `LongField#newRangeQuery` ? This

[GitHub] [lucene] jpountz merged pull request #12052: Cut over Lucene Demo from LongPoint to LongField.

2023-01-11 Thread GitBox
jpountz merged PR #12052: URL: https://github.com/apache/lucene/pull/12052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz merged pull request #12070: Never throttle creation of compound files.

2023-01-11 Thread GitBox
jpountz merged PR #12070: URL: https://github.com/apache/lucene/pull/12070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz closed issue #12068: Is it right to throttle the creation of compound files?

2023-01-11 Thread GitBox
jpountz closed issue #12068: Is it right to throttle the creation of compound files? URL: https://github.com/apache/lucene/issues/12068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [lucene] jpountz commented on pull request #12053: Allow reusing indexed binary fields.

2023-01-11 Thread GitBox
jpountz commented on PR #12053: URL: https://github.com/apache/lucene/pull/12053#issuecomment-1378448862 @rmuir Do you have thoughts on this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [lucene] rmuir commented on pull request #12053: Allow reusing indexed binary fields.

2023-01-11 Thread GitBox
rmuir commented on PR #12053: URL: https://github.com/apache/lucene/pull/12053#issuecomment-1378455423 @jpountz I will try to review this today. Sorry for the delay. I haven't written java code in years, i'm crazy busy at work, and i try to give more time to `.document` api all contribu

[GitHub] [lucene] jpountz commented on pull request #12053: Allow reusing indexed binary fields.

2023-01-11 Thread GitBox
jpountz commented on PR #12053: URL: https://github.com/apache/lucene/pull/12053#issuecomment-1378457531 Thank you, and no worries at all about the delay, I just wanted to check if it was still on your mind since you said you were interested in looking into it. -- This is an automated mes

[GitHub] [lucene] benwtrent opened a new pull request, #12075: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String) (#12064)

2023-01-11 Thread GitBox
benwtrent opened a new pull request, #12075: URL: https://github.com/apache/lucene/pull/12075 Backport of #12064 This completes the refactoring as described in: https://github.com/apache/lucene/issues/11963 This commit: - splits out `ByteVectorValues` from `VectorValues`.

[GitHub] [lucene] benwtrent commented on pull request #12075: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String) (#12064)

2023-01-11 Thread GitBox
benwtrent commented on PR #12075: URL: https://github.com/apache/lucene/pull/12075#issuecomment-1378802761 @jpountz backport :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [lucene] sherman opened a new issue, #12076: The question about MultiRangeQuery.

2023-01-11 Thread GitBox
sherman opened a new issue, #12076: URL: https://github.com/apache/lucene/issues/12076 Hi! AFAICS, the family of multiple ranges queries (MultiRangeQuery) were added to the sandbox module recently. In our index we have a few indexed fields with type long (aren't ranges, it's j

[GitHub] [lucene] jpountz merged pull request #12075: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String) (#12064)

2023-01-11 Thread GitBox
jpountz merged PR #12075: URL: https://github.com/apache/lucene/pull/12075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on pull request #12075: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String) (#12064)

2023-01-11 Thread GitBox
jpountz commented on PR #12075: URL: https://github.com/apache/lucene/pull/12075#issuecomment-1378896877 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [lucene] jpountz commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

2023-01-11 Thread GitBox
jpountz commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1378939790 Do these queries actually use `ReqExclScorer`? I would have expected them to use `ReqExclBulkScorer`? -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [lucene] rmuir commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

2023-01-11 Thread GitBox
rmuir commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1378947355 I cant keep up with the conditions where bulk is unsuitable. But just generally, my concern is to implement a best-case 1-2% speedup and create a large worst-case slowdown somewhere else.

[GitHub] [lucene] jpountz commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

2023-01-11 Thread GitBox
jpountz commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1378951278 Sorry my comment was not targeted at your comment but at understanding why we're seeing a speedup with this change, since I wouldn't expect this scorer to be used (though my understandin

[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067144911 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEq

[GitHub] [lucene] rmuir commented on issue #12076: The question about MultiRangeQuery.

2023-01-11 Thread GitBox
rmuir commented on issue #12076: URL: https://github.com/apache/lucene/issues/12076#issuecomment-1378993896 > In our index we have a few indexed fields with type long (aren't ranges, it's just long ids, dimension = 1). > > We have about 16% of search queries which include a predicate

[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067160561 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assert

[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067162137 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assert

[GitHub] [lucene] javanna commented on pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
javanna commented on PR #12072: URL: https://github.com/apache/lucene/pull/12072#issuecomment-1379007346 I did some additional testing to understand the impact of the regression and in light of that I view this as a bug rather than a performance regression, because we end up performing many

[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067175218 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assert

[GitHub] [lucene] sherman closed issue #12076: The question about MultiRangeQuery.

2023-01-11 Thread GitBox
sherman closed issue #12076: The question about MultiRangeQuery. URL: https://github.com/apache/lucene/issues/12076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[GitHub] [lucene] sherman commented on issue #12076: The question about MultiRangeQuery.

2023-01-11 Thread GitBox
sherman commented on issue #12076: URL: https://github.com/apache/lucene/issues/12076#issuecomment-1379062301 @rmuir Awesome! That's what I really need. Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [lucene] msokolov commented on pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-11 Thread GitBox
msokolov commented on PR #12050: URL: https://github.com/apache/lucene/pull/12050#issuecomment-1379112305 > To support this functionality, a couple of changes to current graph construction process needed to be made. OnHeapHnswGraph had to support out of order insertion. This is because the

[GitHub] [lucene] msokolov commented on issue #12071: Can we better take advantage of compact strings?

2023-01-11 Thread GitBox
msokolov commented on issue #12071: URL: https://github.com/apache/lucene/issues/12071#issuecomment-1379144309 I wonder if we could update `UnicodeUtil` to use `getBytes` internally? It could check the size of the byte array, and if it is equal to the length of the string, then just return

[GitHub] [lucene] jpountz commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
jpountz commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067223779 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,17 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOExcept

[GitHub] [lucene] jpountz commented on issue #12074: Enhance XXXField#newRangeQuery

2023-01-11 Thread GitBox
jpountz commented on issue #12074: URL: https://github.com/apache/lucene/issues/12074#issuecomment-1379151763 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

[GitHub] [lucene] tang-hi commented on issue #11902: Customization of Edit distance costs for different operations

2023-01-11 Thread GitBox
tang-hi commented on issue #11902: URL: https://github.com/apache/lucene/issues/11902#issuecomment-1379206167 Lucene does not calculate the Levenshtein distance one by one. Instead, it precompiles the Levenshtein automaton based on your output, and then finds terms that meet the distance re

[GitHub] [lucene] jmazanec15 commented on pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-11 Thread GitBox
jmazanec15 commented on PR #12050: URL: https://github.com/apache/lucene/pull/12050#issuecomment-1379247267 @msokolov The main reason I did not do this was to avoid having to modify the ordering of the vectors from the MergedVectorValues. I believe that the ordinals in the graph map to the

[GitHub] [lucene] benwtrent commented on pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
benwtrent commented on PR #12072: URL: https://github.com/apache/lucene/pull/12072#issuecomment-1379274249 @jpountz applied your suggestions @javanna I added a rewrite count check and split the should & must test. -- This is an automated message from the Apache Git Service. To respo

[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067299935 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assert

[GitHub] [lucene] rmuir commented on issue #12071: Can we better take advantage of compact strings?

2023-01-11 Thread GitBox
rmuir commented on issue #12071: URL: https://github.com/apache/lucene/issues/12071#issuecomment-1379311821 there seems to be some confusion, the purpose of unicodeutil is not to allocate String in the first place. It doesnt use String hence getBytes is not really relevant. -- This is an

[GitHub] [lucene] rmuir commented on issue #12071: Can we better take advantage of compact strings?

2023-01-11 Thread GitBox
rmuir commented on issue #12071: URL: https://github.com/apache/lucene/issues/12071#issuecomment-1379313710 Nor does it allocate stuff. that's a problem with String.getBytes is that it forces allocation too. Sorry, I don't see anything here. If you want to speed up UnicodeUtil conver

[GitHub] [lucene] zhaih merged pull request #12051: Fix wrong assertion in TestBooleanQuery.testQueryMatchesCount

2023-01-11 Thread GitBox
zhaih merged PR #12051: URL: https://github.com/apache/lucene/pull/12051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] gsmiller commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

2023-01-11 Thread GitBox
gsmiller commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1379399579 @jpountz: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [lucene] gsmiller commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

2023-01-11 Thread GitBox
gsmiller commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1379404470 @jpountz: > understanding why we're seeing a speedup with this change Good question. I was not familiar with `ReqExclBulkScorer`, but after taking a bit of a look, I have the same

[GitHub] [lucene] Brain2000 commented on issue #10458: Ordered intervals can give inaccurate hits on interleaved terms [LUCENE-9418]

2023-01-11 Thread GitBox
Brain2000 commented on issue #10458: URL: https://github.com/apache/lucene/issues/10458#issuecomment-1379477576 @romseygeek Two years later, I pinpointed the source of the issue above. It was getting worse where records were just being omitted without any rhyme or reason. It looks li

[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067459267 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEq

[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067468924 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOExcept

[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067469765 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOExcept

[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067470681 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,17 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOExcept

[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067472865 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOExce

[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067473812 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanRewrites.java: ## @@ -322,6 +323,45 @@ public void testMatchAllMustNot() throws IOException { assertEq

[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067491908 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOExce

[GitHub] [lucene] javanna commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-11 Thread GitBox
javanna commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067505060 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOExcept

[GitHub] [lucene] hossman opened a new issue, #12077: WordBreakSpellChecker.maxEvaluations usage in generateBreakUpSuggestions() makes no sense

2023-01-11 Thread GitBox
hossman opened a new issue, #12077: URL: https://github.com/apache/lucene/issues/12077 ### Description `WordBreakSpellChecker` has a `maxEvaluations` config option (default: 1000) which is suppose to be the "maximum number of word combinations to evaluate" but the way this setting is

[GitHub] [lucene] hossman commented on issue #12077: WordBreakSpellChecker.maxEvaluations usage in generateBreakUpSuggestions() makes no sense

2023-01-11 Thread GitBox
hossman commented on issue #12077: URL: https://github.com/apache/lucene/issues/12077#issuecomment-1379633112 FWIW: It also seems strange to me that this method is essentially doing a "depth first" walk of the possible splits, given that it's working a character at a time and the only possi

[GitHub] [lucene] LuXugang opened a new pull request, #12078: Enhance XXXField#newRangeQuery

2023-01-11 Thread GitBox
LuXugang opened a new pull request, #12078: URL: https://github.com/apache/lucene/pull/12078 Introduce `IndexSortSortedNumericDocValuesRangeQuery` to `IntFiled#newRangeQuery` and `LongField#newRangeQuery`. See more discussion https://github.com/apache/lucene/issues/12074 . -- This

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-11 Thread GitBox
jmazanec15 commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1067742826 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -94,36 +93,83 @@ public int size() { } /** - * Add node on the given level