[GitHub] [lucene] msokolov commented on pull request #11946: add similarity threshold for hnsw

GitBox Fri, 18 Nov 2022 11:24:28 -0800


msokolov commented on PR #11946:
URL: https://github.com/apache/lucene/pull/11946#issuecomment-1320438152

   OK, can we start by providing post-filter? I think this will be a more
   common use case. I want to find the best docs, and ensure that none of them
   are terrible. It is less disruptive, doesn't require changes to the codec.
   Can you explain why you want the "find all docs with score > T"? That is
   going to be a scary thing. What if someone asks for T==0? Then the
   computation and memory requirements are unbounded. I don't think this is a
   search use case - it's some kind of analytics thing that you should do in
   Spark or some kind of off-line computation system.

   On Fri, Nov 18, 2022 at 2:01 PM Alexey Gorlenko ***@***.***>
   wrote:

   > But we don't know K - that's the problem. The task which I want to solve
   > sounds like this: find documents with similarity >= 0.76 (for example). We
   > don't have the number of such documents in advance.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/lucene/pull/11946#issuecomment-1320416549>, or
   > unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AAHHUQIDSRWIV4ZCGO375ITWI7HB7ANCNFSM6AAAAAASDGO4FQ>
   > .
   > You are receiving this because you commented.Message ID:
   > ***@***.***>
   >

-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #11946: add similarity threshold for hnsw

Reply via email to