Re: [PR] LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary [lucene]

2024-01-12 Thread via GitHub
azagniotov commented on PR #12517: URL: https://github.com/apache/lucene/pull/12517#issuecomment-1890360185 > Please don't add the application plugin. Instead just add a plain java runner task. The result of the project is a library jar, so please don't change this as it could have effects

[PR] Suppress SimpleTextCodec for VectorSimilarityQueryTestCase [lucene]

2024-01-12 Thread via GitHub
zhaih opened a new pull request, #13010: URL: https://github.com/apache/lucene/pull/13010 ### Description See comments in #13009 Actually I suspect the test case will fail on some extreme case as well (like the HNSW graph somehow does not skip any vector, which is quite unlikely

Re: [I] org.apache.lucene.search.TestByteVectorSimilarityQuery.testApproximate failing intermittently [lucene]

2024-01-12 Thread via GitHub
zhaih commented on issue #13009: URL: https://github.com/apache/lucene/issues/13009#issuecomment-1890324110 I think that test case doesn't work well with simple text codec as that codec will always visit documents upto limit (which is the cardinality of the acceptDoc), however the test will

Re: [PR] Random access term dictionary [lucene]

2024-01-12 Thread via GitHub
github-actions[bot] commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1890169118 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-12 Thread via GitHub
mayya-sharipova commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1889934399 I have also done experiments using Cohere dataset, as as seen below for 10M docs dataset, the speedups with the proposed approach are 1.7-2.5x times. ## Cohere/wikipedia-22

Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]

2024-01-12 Thread via GitHub
mikemccand commented on PR #12831: URL: https://github.com/apache/lucene/pull/12831#issuecomment-1889722033 Thank you stale bot! @dungba88 -- what is the status of this change? I think it makes sense to have two FST compile+consume paths -- one on heap, that you can (efficientl

Re: [I] TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom fails randomly [lucene]

2024-01-12 Thread via GitHub
jpountz closed issue #12649: TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom fails randomly URL: https://github.com/apache/lucene/issues/12649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. [lucene]

2024-01-12 Thread via GitHub
jpountz merged PR #12959: URL: https://github.com/apache/lucene/pull/12959 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. [lucene]

2024-01-12 Thread via GitHub
jpountz commented on PR #12959: URL: https://github.com/apache/lucene/pull/12959#issuecomment-1889495623 Thanks a lot @uschindler ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2024-01-12 Thread via GitHub
jpountz merged PR #12987: URL: https://github.com/apache/lucene/pull/12987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa