Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]
jpountz merged PR #12987: URL: https://github.com/apache/lucene/pull/12987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. [lucene]
jpountz commented on PR #12959: URL: https://github.com/apache/lucene/pull/12959#issuecomment-1889495623 Thanks a lot @uschindler ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom fails randomly [lucene]
jpountz closed issue #12649: TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom fails randomly URL: https://github.com/apache/lucene/issues/12649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. [lucene]
jpountz merged PR #12959: URL: https://github.com/apache/lucene/pull/12959 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]
mikemccand commented on PR #12831: URL: https://github.com/apache/lucene/pull/12831#issuecomment-1889722033 Thank you stale bot! @dungba88 -- what is the status of this change? I think it makes sense to have two FST compile+consume paths -- one on heap, that you can (efficiently) consume (read) right away without writing FST to stable storage, another that writes and then reads from stable storage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]
mayya-sharipova commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1889934399 I have also done experiments using Cohere dataset, as as seen below for 10M docs dataset, the speedups with the proposed approach are 1.7-2.5x times. ## Cohere/wikipedia-22-12-en-embeddings - [Cohere/wikipedia-22-12-en-embeddings](https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings) dataset - 768 dims ### 1M vectors k=10, fanout=90 | |Avg visited nodes | QPS| Recall| | :--- |---: | ---: |---: | | Baseline Single segment | 804| 3225|0.454| | Baseline 8 segments concurrent | 1807| 1831|0.887| | Candidate2_with_queue | 1807| 1872|0.887| k=100, fanout=900 | |Avg visited nodes | QPS| Recall| | :--- |---: | ---: |---: | | Baseline Single segment | 4555| 527|0.477| | Baseline 8 segments concurrent | 9119| 261|0.923| | Candidate2_with_queue | 9119| 265|0.923| ### 10M vectors k=10, fanout=90 | |Avg visited nodes | QPS| Recall| | :--- |---: | ---: |---: | | Baseline Single segment | | | | | Baseline 19 segments concurrent | 37726| 293|0.971| | Candidate2_with_queue | 20199| 501|0.960| k=100, fanout=900 | |Avg visited nodes | QPS| Recall| | :--- |---: | ---: |---: | | Baseline Single segment | | | | | Baseline 19 segments concurrent |234047|47|0.992| | Candidate2_with_queue | 74995| 118|0.979| -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Random access term dictionary [lucene]
github-actions[bot] commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1890169118 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] org.apache.lucene.search.TestByteVectorSimilarityQuery.testApproximate failing intermittently [lucene]
zhaih commented on issue #13009: URL: https://github.com/apache/lucene/issues/13009#issuecomment-1890324110 I think that test case doesn't work well with simple text codec as that codec will always visit documents upto limit (which is the cardinality of the acceptDoc), however the test will basically fail with the exception above after it visit to the limit. I'll open a PR to just not using the SimpleTextCodec if it is randomly chosen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Suppress SimpleTextCodec for VectorSimilarityQueryTestCase [lucene]
zhaih opened a new pull request, #13010: URL: https://github.com/apache/lucene/pull/13010 ### Description See comments in #13009 Actually I suspect the test case will fail on some extreme case as well (like the HNSW graph somehow does not skip any vector, which is quite unlikely but not impossible), but that should really be super rare and normally impossible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary [lucene]
azagniotov commented on PR #12517: URL: https://github.com/apache/lucene/pull/12517#issuecomment-1890360185 > Please don't add the application plugin. Instead just add a plain java runner task. The result of the project is a library jar, so please don't change this as it could have effects on the resulting maven pom. Hi @uschindler, I ended up simply reverting the 8d52f66 commit. The current `gradle/generation/kuromoji.gradle` already contains `def recompileDictionary(...)` which is used by all the `task compile(..)`, thus, the task and the application plugin that I added were not necessary, after all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org