Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2024-01-12 Thread via GitHub


jpountz merged PR #12987:
URL: https://github.com/apache/lucene/pull/12987


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. [lucene]

2024-01-12 Thread via GitHub


jpountz commented on PR #12959:
URL: https://github.com/apache/lucene/pull/12959#issuecomment-1889495623

   Thanks a lot @uschindler !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom fails randomly [lucene]

2024-01-12 Thread via GitHub


jpountz closed issue #12649: 
TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom fails randomly
URL: https://github.com/apache/lucene/issues/12649


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. [lucene]

2024-01-12 Thread via GitHub


jpountz merged PR #12959:
URL: https://github.com/apache/lucene/pull/12959


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]

2024-01-12 Thread via GitHub


mikemccand commented on PR #12831:
URL: https://github.com/apache/lucene/pull/12831#issuecomment-1889722033

   Thank you stale bot!
   
   @dungba88 -- what is the status of this change?
   
   I think it makes sense to have two FST compile+consume paths -- one on heap, 
that you can (efficiently) consume (read) right away without writing FST to 
stable storage, another that writes and then reads from stable storage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-12 Thread via GitHub


mayya-sharipova commented on PR #12962:
URL: https://github.com/apache/lucene/pull/12962#issuecomment-1889934399

   I have also done experiments using Cohere dataset, as as seen below for 10M 
docs dataset, the speedups with the proposed approach are 1.7-2.5x times.
   
   ## Cohere/wikipedia-22-12-en-embeddings
   
   - 
[Cohere/wikipedia-22-12-en-embeddings](https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings)
 dataset
   - 768 dims
   
   ### 1M vectors 
   k=10, fanout=90
   
   | |Avg visited nodes |   QPS|   Recall| 
   |  :---   |---:  | ---: |---: |  
   | Baseline Single segment |   804|  3225|0.454|  
 
   | Baseline 8 segments concurrent  |  1807|  1831|0.887|  
 
   | Candidate2_with_queue   |  1807|  1872|0.887|
   
   k=100, fanout=900
   | |Avg visited nodes |   QPS|   Recall| 
   |  :---   |---:  | ---: |---: |  
   | Baseline Single segment |  4555|   527|0.477|  
 
   | Baseline 8 segments concurrent  |  9119|   261|0.923|  
 
   | Candidate2_with_queue   |  9119|   265|0.923|
   
   ### 10M vectors 
   k=10, fanout=90
   
   | |Avg visited nodes |   QPS|   Recall| 
   |  :---   |---:  | ---: |---: |  
   | Baseline Single segment |  |  | |  
 
   | Baseline 19 segments concurrent | 37726|   293|0.971|  
 
   | Candidate2_with_queue   | 20199|   501|0.960|
   
   
   k=100, fanout=900
   | |Avg visited nodes |   QPS|   Recall| 
   |  :---   |---:  | ---: |---: |  
   | Baseline Single segment |  |  | |  
 
   | Baseline 19 segments concurrent |234047|47|0.992|  
 
   | Candidate2_with_queue   | 74995|   118|0.979|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Random access term dictionary [lucene]

2024-01-12 Thread via GitHub


github-actions[bot] commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1890169118

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] org.apache.lucene.search.TestByteVectorSimilarityQuery.testApproximate failing intermittently [lucene]

2024-01-12 Thread via GitHub


zhaih commented on issue #13009:
URL: https://github.com/apache/lucene/issues/13009#issuecomment-1890324110

   I think that test case doesn't work well with simple text codec as that 
codec will always visit documents upto limit (which is the cardinality of the 
acceptDoc), however the test will basically fail with the exception above after 
it visit to the limit. I'll open a PR to just not using the SimpleTextCodec if 
it is randomly chosen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Suppress SimpleTextCodec for VectorSimilarityQueryTestCase [lucene]

2024-01-12 Thread via GitHub


zhaih opened a new pull request, #13010:
URL: https://github.com/apache/lucene/pull/13010

   ### Description
   See comments in #13009 
   
   Actually I suspect the test case will fail on some extreme case as well 
(like the HNSW graph somehow does not skip any vector, which is quite unlikely 
but not impossible), but that should really be super rare and normally 
impossible.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary [lucene]

2024-01-12 Thread via GitHub


azagniotov commented on PR #12517:
URL: https://github.com/apache/lucene/pull/12517#issuecomment-1890360185

   > Please don't add the application plugin. Instead just add a plain java 
runner task. The result of the project is a library jar, so please don't change 
this as it could have effects on the resulting maven pom.
   
   Hi @uschindler, I ended up simply reverting the 8d52f66 commit. The current 
`gradle/generation/kuromoji.gradle` already contains `def 
recompileDictionary(...)` which is used by all the `task compile(..)`, 
thus, the task and the application plugin that I added were not necessary, 
after all.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org