Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-20 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2364725871 I do not understand what happened here - somehow github is checking out different code than I pushed?? Will try pushing again?? -- This is an automated message from the Apache Git Ser

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-20 Thread via GitHub
vigyasharma commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2364348826 > 3\. Does require a new merge policy to merge the segments belonging to the same group. How do background index merges work with the original, separate DWPT based approa

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-20 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2364343082 OK, I found an off-by-one error plus a problem with lazy iterator creation that slipped in when we got rid of createIterator(). It makes me a little nervous these didn't show up in earl

Re: [PR] Bump the codec version to 10.0. [lucene]

2024-09-20 Thread via GitHub
benwtrent commented on PR #13815: URL: https://github.com/apache/lucene/pull/13815#issuecomment-2363755327 Codec 💯 ! 😂 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] Bump the codec version to 10.0. [lucene]

2024-09-20 Thread via GitHub
jpountz opened a new pull request, #13815: URL: https://github.com/apache/lucene/pull/13815 Lucene100Codec is the exact same file format as Lucene912Codec. This codec dance just makes things slightly easier to reason about since our backward compatibility guarantees are aligned with major v

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-20 Thread via GitHub
jpountz closed issue #13805: TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException URL: https://github.com/apache/lucene/issues/13805 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-20 Thread via GitHub
jpountz closed issue #13805: TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException URL: https://github.com/apache/lucene/issues/13805 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Improve testing of mismatched field numbers. [lucene]

2024-09-20 Thread via GitHub
jpountz merged PR #13812: URL: https://github.com/apache/lucene/pull/13812 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] update EdgeNGramTokenizer.DEFAULT_MAX_NGRAM_SIZE to be practical [lucene]

2024-09-20 Thread via GitHub
YeonghyeonKO closed pull request #13813: update EdgeNGramTokenizer.DEFAULT_MAX_NGRAM_SIZE to be practical URL: https://github.com/apache/lucene/pull/13813 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[PR] Update EdgeNGramTokenizer.DEFAULT_MAX_NGRAM_SIZE to be practical [lucene]

2024-09-20 Thread via GitHub
YeonghyeonKO opened a new pull request, #13814: URL: https://github.com/apache/lucene/pull/13814 issue : https://github.com/apache/lucene/issues/13802 - Many libraries(git code: [Elasticsearch](https://github.com/elastic/elasticsearch/blob/main/modules/analysis-common/src/main/java/or

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-20 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2363571626 hm interesting there was an EOFException in there - I'll dig -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-20 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2363503882 OK I think we've addressed the blocking concerns that have been raised here and I plan to push later today if nothing else comes up. Regarding removing copy() in favor of dictionary()

Re: [PR] [9.x] Revert "Replace Map with IntObjectHashMap for DV prodcer (#13686)" [lucene]

2024-09-20 Thread via GitHub
ChrisHegarty merged PR #13811: URL: https://github.com/apache/lucene/pull/13811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Revert "Replace Map with IntObjectHashMap for DV producer (#13686) [lucene]

2024-09-20 Thread via GitHub
ChrisHegarty merged PR #13810: URL: https://github.com/apache/lucene/pull/13810 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Copy stored fields during flush with index sort [lucene]

2024-09-20 Thread via GitHub
jpountz commented on PR #13803: URL: https://github.com/apache/lucene/pull/13803#issuecomment-2363335011 Oh, interesting. I'm curious if you are able to quantify some speedups, e.g. by modifying luceneutil's StoredFieldsBenchmark (https://github.com/mikemccand/luceneutil/blob/main/src/main/

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-20 Thread via GitHub
jpountz commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2363294876 Exactly. I tried to model it similarly to what doc values do, where `SortedDocValues#termsEnum()` returns a dictionary with a different backing IndexInput clone on every call. -- This

Re: [I] Backout changes messing around with fieldinfos on merge [lucene]

2024-09-20 Thread via GitHub
jpountz commented on issue #13809: URL: https://github.com/apache/lucene/issues/13809#issuecomment-2363285262 OK. Let's revert now, and try to add them back after branch_10_0 is cut, so that we can have a long backing period with the new tests before these optimizations get released. --

Re: [PR] Improve testing of mismatched field numbers. [lucene]

2024-09-20 Thread via GitHub
jpountz commented on PR #13812: URL: https://github.com/apache/lucene/pull/13812#issuecomment-2363255349 I found the bug, it's in the points merging logic, which assumes that it can look up the internal map based on the field infos that are returned by the reader. This is incorrect when the