Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-19 Thread via GitHub
msfroh commented on PR #13747: URL: https://github.com/apache/lucene/pull/13747#issuecomment-2362571471 Okay -- I wrapped all of the `Collector`s in `CollectorManagers`, and managed to remove all uses of `CollectorManager.forSequentialExecution`. I also went ahead and added the remaining `C

Re: [I] Should EdgeNGramTokenizer's DEFAULT_MAX_GRAM_SIZE be ONE? [lucene]

2024-09-19 Thread via GitHub
YeonghyeonKO commented on issue #13802: URL: https://github.com/apache/lucene/issues/13802#issuecomment-2362509242 In actual Elasticsearch settings, it is common to use values โ€‹โ€‹of 8 or 10 as @jpountz said. Of course, people may have different preferences, but I think it is not a good idea

Re: [PR] Reduce number of calculations in FSTCompiler [lucene]

2024-09-19 Thread via GitHub
mrhbj commented on PR #13788: URL: https://github.com/apache/lucene/pull/13788#issuecomment-2362382794 I had use elasticsearch in our sofeware.Elasticsearch is so faster to search data. I know elasticsearch use lucence to make it faster.I am very interested in this and enjoy read lucence so

Re: [PR] Reduce number of calculations in FSTCompiler [lucene]

2024-09-19 Thread via GitHub
mikemccand commented on PR #13788: URL: https://github.com/apache/lucene/pull/13788#issuecomment-2362368497 > @mikemccand You are welcome. I am a ream human. no an AI. -_- Ahh thanks for the reply. Could you describe how you got involved / interested in open source development? How

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-19 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2362354868 I think the idea w/Dictionary is that callers, instead of calling `copy().vectorValue(int ord)` would call `dictionary().vectorValue(int ord)`. So then the scratch vector storage (if ne

Re: [PR] Improve testing of mismatched field numbers. [lucene]

2024-09-19 Thread via GitHub
rmuir commented on PR #13812: URL: https://github.com/apache/lucene/pull/13812#issuecomment-2361889388 Seems the test/functionality uses points and docvalues. Original failing test that inspired this PR `testSparseDocValuesVsStoredFields`. Maybe we need a similar one to cover DocValuesVsPo

Re: [PR] Improve testing of mismatched field numbers. [lucene]

2024-09-19 Thread via GitHub
jpountz commented on PR #13812: URL: https://github.com/apache/lucene/pull/13812#issuecomment-2361711688 We haven't had failures in thes test case for a very long time, so I suspect it's a genuine failure indeed, which got triggered by the additional testing of mismatched field numbers? -

Re: [PR] Improve testing of mismatched field numbers. [lucene]

2024-09-19 Thread via GitHub
iverase commented on PR #13812: URL: https://github.com/apache/lucene/pull/13812#issuecomment-2361514453 The failure seems legit. It happens because of this method in MismatchedCodecReader: ``` @Override public FieldInfos getFieldInfos() { return shuffled; } `

Re: [PR] Improve testing of mismatched field numbers. [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on PR #13812: URL: https://github.com/apache/lucene/pull/13812#issuecomment-2361413309 I can confirm that the previously failing tests seen in the 9.x branch pass successfully with the one-liner fix in this PR. ๐Ÿ‘ -- This is an automated message from the Apache Git

Re: [PR] Improve testing of mismatched field numbers. [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on code in PR #13812: URL: https://github.com/apache/lucene/pull/13812#discussion_r1766990067 ## lucene/test-framework/src/java/org/apache/lucene/tests/codecs/asserting/AssertingDocValuesFormat.java: ## @@ -229,6 +234,7 @@ static class AssertingDocValuesPr

Re: [PR] Improve testing of mismatched field numbers. [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on code in PR #13812: URL: https://github.com/apache/lucene/pull/13812#discussion_r1766988440 ## lucene/core/src/java/org/apache/lucene/codecs/DocValuesConsumer.java: ## @@ -613,7 +613,7 @@ public void mergeSortedField(FieldInfo fieldInfo, final MergeStat

[PR] Improve testing of mismatched field numbers. [lucene]

2024-09-19 Thread via GitHub
jpountz opened a new pull request, #13812: URL: https://github.com/apache/lucene/pull/13812 This improves testing of mismatched field numbers by - improving `AssertingDocValuesProducer` to detect mismatched field numbers, - introducing a `MismatchedCodecReader` to actually test mismat

Re: [PR] Copy stored fields during flush with index sort [lucene]

2024-09-19 Thread via GitHub
dnhatn commented on PR #13803: URL: https://github.com/apache/lucene/pull/13803#issuecomment-2361154788 @jpountz I tried using merges for this, but it didn't work because merges expect the documents to be already sorted by index sorts. -- This is an automated message from the Apache Git S

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
jpountz commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2361101846 I have a fix and tests that would have found the bug at #13812. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
bugmakerr commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360998233 > Separately, we may want to consider changing the DocValuesProducer API to take a String rather than a FieldInfo, like e.g. points, so that it is not tempted to trust the cal

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360936290 Let's postpone the 9_12 branch cut until tomorrow, pending on the outcome of this. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
jpountz commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360928625 Give me some time to see how the fix and tests look, and let's think about whether/what to revert later on? I expect to have something by end of day. @ChrisHegarty Feel free to cut

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360875134 Ok, reverts are prepared. @jpountz you wanna fix (and not revert), or revert for now? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
jpountz commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360854476 I found the actual root cause, it's here: https://github.com/apache/lucene/blob/e4ac57746eb86846b3a53944c14e09873f793ff1/lucene/core/src/java/org/apache/lucene/codecs/DocValuesConsum

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
jpountz commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360784108 I found the bug, it's the slow composite reader wrapper which is at fault here. I'll look into improving tests to detect such issues. Separately, we may want to consider chang

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
rmuir commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360778828 yeah it would be best to improve the tests: it is not good that it took this test, run many many times, to find it. -- This is an automated message from the Apache Git Service. To r

[PR] [9.x] Revert "Replace Map with IntObjectHashMap for DV prodcer (#13686)" [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty opened a new pull request, #13811: URL: https://github.com/apache/lucene/pull/13811 9.x Revert "Replace Map with IntObjectHashMap for DV prodcer (#13686)" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Backout changes messing around with fieldinfos on merge [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on issue #13809: URL: https://github.com/apache/lucene/issues/13809#issuecomment-2360775204 @benwtrent @rmuir @jpountz if you are aware of any other changes that need to be backed out, can you please add them to the above description. Thank you. -- This is an auto

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
jpountz commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360763893 I don't mind reverting but I would also like to fix the root cause as this change only exposed an existing bug: someone is calling a doc-values producer with the wrong FieldInfo obj

[PR] Revert "Replace Map with IntObjectHashMap for DV producer (#13686) [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty opened a new pull request, #13810: URL: https://github.com/apache/lucene/pull/13810 Reverts "Replace Map with IntObjectHashMap for DV producer (#13686) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360745810 I filed a meta issue to better track the reverts, #13809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
rmuir commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360732114 we didn't even have bulk merge at all in lucene for a couple years at all because of field-number bugs like this. got bit too many times. -- This is an automated message from the Ap

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
rmuir commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360728834 sounds like the safe bet to backout any changes messing around with fieldinfos on merge. Sorry for the short explanation, there is a long history of super-sneaky corruption bug

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360725418 The revert fixes the failures we see here and the other related test failures, seen in #13807 #13808. -- This is an automated message from the Apache Git Service. To respond

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-19 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360645812 ## Approach 2: Using a physical directory for each group ![approach2](https://github.com/user-attachments/assets/223686c4-5c0c-49c1-b54c-1aee22a2d1bf) To segregate s

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-19 Thread via GitHub
benwtrent commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2360711342 The dictionary idea is OK, but I still don't see how it removes `copy()`. Besides the caching of values, copy gives us multi-threaded safety by copying the underlying index readers. Ot

[I] TestBestCompressionLucene80DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty opened a new issue, #13807: URL: https://github.com/apache/lucene/issues/13807 This is likely a duplicate of #13805, but I'm filing it separately for now to capture the large stack trace and reproduce commands. ``` ERROR: The following test(s) have failed: - org.

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
benwtrent commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360701787 @ChrisHegarty this makes be worried about all the other field number switch with field name things as well. I am wondering if we should revert all of them, there are multip

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360692292 I'm going to try reverting https://github.com/apache/lucene/commit/6634b41f42f4e2802048d1e4750e1ce1202652c5. -- This is an automated message from the Apache Git Service. To r

Re: [PR] Add BytesRefIterator to TermInSetQuery [lucene]

2024-09-19 Thread via GitHub
javanna commented on PR #13806: URL: https://github.com/apache/lucene/pull/13806#issuecomment-2360422815 @cbuescher could you add an entry to CHANGES.txt, under 9.12 please? I am thinking that this should be backported so it provides a replacement for the deprecated method before it gets re

Re: [I] TermInSetQuery to expose its terms [lucene]

2024-09-19 Thread via GitHub
javanna closed issue #13804: TermInSetQuery to expose its terms URL: https://github.com/apache/lucene/issues/13804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [I] TestBestSpeedLucene80DocValuesFormat fails dv for field: sorted_set has ords out of order [lucene]

2024-09-19 Thread via GitHub
benwtrent commented on issue #13808: URL: https://github.com/apache/lucene/issues/13808#issuecomment-2360681381 Since this has to do with merging sorted things, the field number change is a likely cause here as well. -- This is an automated message from the Apache Git Service. To respond

Re: [I] TestBestSpeedLucene80DocValuesFormat fails dv for field: sorted_set has ords out of order [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on issue #13808: URL: https://github.com/apache/lucene/issues/13808#issuecomment-2360673889 Hmm.. that commit was reverted. Maybe that's confusing git bisect! ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Add BytesRefIterator to TermInSetQuery [lucene]

2024-09-19 Thread via GitHub
cbuescher commented on code in PR #13806: URL: https://github.com/apache/lucene/pull/13806#discussion_r1766408644 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -141,6 +135,11 @@ public long getTermsCount() { return termData.size(); } + pu

Re: [I] TestBestSpeedLucene80DocValuesFormat fails dv for field: sorted_set has ords out of order [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty commented on issue #13808: URL: https://github.com/apache/lucene/issues/13808#issuecomment-2360658830 git bisect shows this: ``` ad09777867d10cbaa2a9582bb49a2de5ad7748ba is the first bad commit commit ad09777867d10cbaa2a9582bb49a2de5ad7748ba Author: Benjamin Trent

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-19 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360651201 ## Summary In summary the problem can be broken down into three sub problems. 1. Having abstraction to write the data into different groups (Multiple Writers) 2.

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-19 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360649893 ## Approach 3: Combining group level IndexWriter with addIndexes ![approach3](https://github.com/user-attachments/assets/32ea3baa-0ae6-4a60-84e9-352a0e1e6a5e) In thi

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-19 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360641099 Thanks [mikemccand](https://github.com/mikemccand) and [vigyasharma](https://github.com/vigyasharma) for suggestions. Evaluated different approaches to use different IndexWriter

[I] TestBestSpeedLucene80DocValuesFormat fails dv for field: sorted_set has ords out of order [lucene]

2024-09-19 Thread via GitHub
ChrisHegarty opened a new issue, #13808: URL: https://github.com/apache/lucene/issues/13808 Fails on the 9x branch, but not reproducible on _main_. ``` ERROR: The following test(s) have failed: - org.apache.lucene.backward_codecs.lucene80.TestBestSpeedLucene80DocValuesFormat.

Re: [PR] Add BytesRefIterator to TermInSetQuery [lucene]

2024-09-19 Thread via GitHub
javanna merged PR #13806: URL: https://github.com/apache/lucene/pull/13806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add BytesRefIterator to TermInSetQuery [lucene]

2024-09-19 Thread via GitHub
cbuescher commented on PR #13806: URL: https://github.com/apache/lucene/pull/13806#issuecomment-2360483799 @javanna done, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Add BytesRefIterator to TermInSetQuery [lucene]

2024-09-19 Thread via GitHub
javanna commented on PR #13806: URL: https://github.com/apache/lucene/pull/13806#issuecomment-2360398761 Thanks for taking a look @rmuir ! I have been digging a bit through history, it seems like it used to be possible to get all the terms via `QueryVisitor#consumeTerms`, but that cha

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-19 Thread via GitHub
jpountz commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2360238996 Argh, I remember carefully checking whether this PR could cause issues due to mismatched field infos, but apparently I missed something. -- This is an automated message from the A