Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2024-01-11 Thread via GitHub
kuramitsu commented on PR #12885: URL: https://github.com/apache/lucene/pull/12885#issuecomment-1886834358 @zhaih > could you please add an CHANGES.txt entry under Lucene 9.10? Thank you. I added it. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Clean up unused code & variables [lucene]

2024-01-11 Thread via GitHub
dungba88 commented on PR #12994: URL: https://github.com/apache/lucene/pull/12994#issuecomment-1887052636 Thanks @dweiss for approving! Can you help me to merge if it looks okay? Thank you -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Lazily write the FST padding byte [lucene]

2024-01-11 Thread via GitHub
mikemccand merged PR #12981: URL: https://github.com/apache/lucene/pull/12981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2024-01-11 Thread via GitHub
jpountz commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1887173459 FYI I pushed an annotation to nightly benchmarks, it should show up tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2024-01-11 Thread via GitHub
dungba88 commented on PR #12980: URL: https://github.com/apache/lucene/pull/12980#issuecomment-1887211104 I'm not sure why FSTPostingsFormat is different from the rest, that it write both the metadata and data to the same file. I think writing to separate files would be cleaner and more con

Re: [PR] Make Lucene90 postings format to write FST off heap [lucene]

2024-01-11 Thread via GitHub
dungba88 commented on code in PR #12985: URL: https://github.com/apache/lucene/pull/12985#discussion_r1448927527 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java: ## @@ -795,7 +783,12 @@ void writeBlocks(int prefixLength, int

Re: [PR] Avoid reset BlockDocsEnum#freqBuffer when indexHasFreq is false [lucene]

2024-01-11 Thread via GitHub
jpountz merged PR #12997: URL: https://github.com/apache/lucene/pull/12997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Output well-formed UTF-8 bytes in SimpleTextCodec's segmentinfos [lucene]

2024-01-11 Thread via GitHub
jpountz merged PR #12897: URL: https://github.com/apache/lucene/pull/12897 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2024-01-11 Thread via GitHub
jpountz commented on code in PR #12987: URL: https://github.com/apache/lucene/pull/12987#discussion_r1448975940 ## lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextDocValuesReader.java: ## @@ -329,9 +330,15 @@ public BytesRef apply(int docID) {

Re: [PR] Add support for index sorting with document blocks [lucene]

2024-01-11 Thread via GitHub
s1monw merged PR #12829: URL: https://github.com/apache/lucene/pull/12829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

2024-01-11 Thread via GitHub
jpountz commented on PR #12959: URL: https://github.com/apache/lucene/pull/12959#issuecomment-1887441748 If there are no concerns, I plan on merging this PR soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

2024-01-11 Thread via GitHub
uschindler commented on PR #12959: URL: https://github.com/apache/lucene/pull/12959#issuecomment-1887464862 Hi, sorry I missed to test this. I started my beasting and try to reproduce it. Will report back. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Clean up unused code & variables [lucene]

2024-01-11 Thread via GitHub
dweiss merged PR #12994: URL: https://github.com/apache/lucene/pull/12994 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Clean up unused code & variables [lucene]

2024-01-11 Thread via GitHub
dweiss commented on PR #12994: URL: https://github.com/apache/lucene/pull/12994#issuecomment-1887664512 Thank you and apologies for the delay! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

2024-01-11 Thread via GitHub
uschindler commented on PR #12959: URL: https://github.com/apache/lucene/pull/12959#issuecomment-1887730699 FYI, I did not try to run the beasting for hours because previously it was failing very fast on a heavy loaded Ryzen CPU with both Hotspot and OpenJ9. Now its stable, so I trust the r

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2024-01-11 Thread via GitHub
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1887733274 The latest PR seems to fix the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2024-01-11 Thread via GitHub
zhaih merged PR #12885: URL: https://github.com/apache/lucene/pull/12885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2024-01-11 Thread via GitHub
zhaih commented on PR #12885: URL: https://github.com/apache/lucene/pull/12885#issuecomment-1888066363 Merged and backported, thanks for the contribution @kuramitsu ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2024-01-11 Thread via GitHub
msfroh commented on code in PR #12987: URL: https://github.com/apache/lucene/pull/12987#discussion_r1449507398 ## lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextDocValuesReader.java: ## @@ -329,9 +330,15 @@ public BytesRef apply(int docID) {

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2024-01-11 Thread via GitHub
msfroh commented on code in PR #12987: URL: https://github.com/apache/lucene/pull/12987#discussion_r1449507398 ## lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextDocValuesReader.java: ## @@ -329,9 +330,15 @@ public BytesRef apply(int docID) {

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2024-01-11 Thread via GitHub
kevindrosendahl commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1888208867 > How is segment merging implemented by Lucene Vamana? I didn't do anything special for Vamana in these experiments, the index construction and merging are practically

[I] org.apache.lucene.search.TestByteVectorSimilarityQuery.testApproximate failing intermittently [lucene]

2024-01-11 Thread via GitHub
msfroh opened a new issue, #13009: URL: https://github.com/apache/lucene/issues/13009 ### Description ### Failure ``` org.apache.lucene.search.TestByteVectorSimilarityQuery > testApproximate FAILED java.lang.UnsupportedOperationException at __randomizedtesting

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2024-01-11 Thread via GitHub
msfroh commented on PR #12987: URL: https://github.com/apache/lucene/pull/12987#issuecomment-1888270133 Test failure is reproducible on main: https://github.com/apache/lucene/issues/13009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2024-01-11 Thread via GitHub
rmuir commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1888419082 i would be extremely careful around io_uring, it is disabled in many environments (e.g. by default in container environments) for security reasons: * https://security.googleblog.c