[GitHub] [lucene] javanna merged pull request #12270: Don't generate stacktrace in CollectionTerminatedException

2023-05-09 Thread via GitHub


javanna merged PR #12270:
URL: https://github.com/apache/lucene/pull/12270


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12270: Don't generate stacktrace in CollectionTerminatedException

2023-05-09 Thread via GitHub


javanna commented on PR #12270:
URL: https://github.com/apache/lucene/pull/12270#issuecomment-1539665901

   Thanks @original-brownbear !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna closed pull request #769: LUCENE-10486: Avoid unnecessary overhead in TopScoreDoc and TopField collector manager

2023-05-09 Thread via GitHub


javanna closed pull request #769: LUCENE-10486: Avoid unnecessary overhead in 
TopScoreDoc and TopField collector manager
URL: https://github.com/apache/lucene/pull/769


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12274: Make query timeout members final in ExitableDirectoryReader

2023-05-09 Thread via GitHub


javanna commented on PR #12274:
URL: https://github.com/apache/lucene/pull/12274#issuecomment-1539773484

   Thanks @iverase !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna merged pull request #12274: Make query timeout members final in ExitableDirectoryReader

2023-05-09 Thread via GitHub


javanna merged PR #12274:
URL: https://github.com/apache/lucene/pull/12274


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna merged pull request #12272: Update javadocs for QueryTimeout

2023-05-09 Thread via GitHub


javanna merged PR #12272:
URL: https://github.com/apache/lucene/pull/12272


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12272: Update javadocs for QueryTimeout

2023-05-09 Thread via GitHub


javanna commented on PR #12272:
URL: https://github.com/apache/lucene/pull/12272#issuecomment-1539774583

   Thanks @mkhludnev & @iverase !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna merged pull request #12271: Make TimeExceededException members final

2023-05-09 Thread via GitHub


javanna merged PR #12271:
URL: https://github.com/apache/lucene/pull/12271


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12271: Make TimeExceededException members final

2023-05-09 Thread via GitHub


javanna commented on PR #12271:
URL: https://github.com/apache/lucene/pull/12271#issuecomment-1539775229

   Thanks @iverase !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub


rmuir commented on PR #12280:
URL: https://github.com/apache/lucene/pull/12280#issuecomment-1539972391

   Please lets not go this path. It is a mutlitermquery, if you want to change 
how it works behind the scenes, you "plugin" with RewriteMethod


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on issue #12276: Maybe rename `DaciukMihovAutomatonBuilder`?

2023-05-09 Thread via GitHub


mikemccand commented on issue #12276:
URL: https://github.com/apache/lucene/issues/12276#issuecomment-1540061997

   > StringSetAutomatonBuilder?
   
   Hmm can we remove `Set`?  (It could be a list or array of String too).  
Maybe `StringsToAutomaton`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on issue #12276: Maybe rename `DaciukMihovAutomatonBuilder`?

2023-05-09 Thread via GitHub


dweiss commented on issue #12276:
URL: https://github.com/apache/lucene/issues/12276#issuecomment-1540137383

   A list or an array - doesn't matter, conceptually it's a set in the end (in 
the automaton). But I don't mind any version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub


mikemccand commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540238456

   Hello, is it expected that this change alters the KNN hits returned?  That's 
fine (if it was expected) ... Lucene's nightly benchmarks are angry about it 
though, so I'll just regold if this is OK/expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #12197: [Backport] GITHUB-11838 Add api to allow concurrent query rewrite

2023-05-09 Thread via GitHub


benwtrent commented on PR #12197:
URL: https://github.com/apache/lucene/pull/12197#issuecomment-1540274950

   Thanks for the review @uschindler! I added a test that verifies override 
defaults for both methods.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub


jbellis commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540274801

   Not expected.  How can I run the test locally?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub


msokolov commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540435331

   Those tests are run using a separate package called `luceneutil` -- see 
https://github.com/mikemccand/luceneutil


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub


msokolov commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540438392

   I wonder if it could have been https://github.com/apache/lucene/pull/12248 
that caused the difference?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub


gsmiller commented on PR #12280:
URL: https://github.com/apache/lucene/pull/12280#issuecomment-1540466104

   Thanks @rmuir. It would be ideal if we could do this through RewriteMethod, 
but I'm not sure how we can actually accomplish that. The problem is in the 
implementation of `getTerms`, as defined in `TermInSetQuery`. We can plug in 
through `RewriteMethod#getTerms`, but we still need access to an iterator of 
the query terms. I'll draft up a sandbox query that might help illustrate the 
issue and we can discuss further. If there's a better way to go about this, 
happy to explore it. Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-05-09 Thread via GitHub


alessandrobenedetti commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-1540470731

   Hi @dsmiley  I updated the dev discussion on the mailing list:
   [Proposal] Remove max number of dimensions for KNN vectors
   
   And proceeded with a pragmatic new mail thread, where we just collect 
proposals with a motivation (no discussion there):
   Dimensions Limit for KNN vectors - Next Steps
   
   Feel free to participate!
   My intention is to act relatively fast (and then also operate Solr side).
   It's a train we don't need/want to miss!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub


jbellis commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540480071

   Downloading now.  In the meantime, can you give more details on the failure? 
 Some variance is expected just by the nature of hnsw randomness, but it 
shouldn't go from e.g. 90% recall to 80%.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-09 Thread via GitHub


zhaih commented on PR #12257:
URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540499897

   Thanks @alessandrobenedetti , I'll wait a day to give @msokolov and @jimczi 
a chance to review before merging it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub


gsmiller commented on code in PR #12280:
URL: https://github.com/apache/lucene/pull/12280#discussion_r1188857287


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/PKTermInSetQuery.java:
##
@@ -0,0 +1,117 @@
+package org.apache.lucene.sandbox.search;
+
+import java.io.IOException;
+import java.util.Collection;
+import org.apache.lucene.index.ImpactsEnum;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.TermState;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.TermInSetQuery;
+import org.apache.lucene.util.AttributeSource;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.BytesRefIterator;
+
+/**
+ * {@link TermInSetQuery} optimized for a primary key-like field.
+ *
+ * Relies on {@link TermsEnum#seekExact(BytesRef)} instead of {@link
+ * TermsEnum#seekCeil(BytesRef)} to produce a terms iterator, which is 
compatible with {@code
+ * BloomFilteringPostingsFormat}.
+ */
+public class PKTermInSetQuery extends TermInSetQuery {

Review Comment:
   This class is for demo purposes only. I'm not suggesting we merge it as part 
of this PR. I only want to demonstrate how a class might leverage 
`getQueryTerms`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub


rmuir commented on PR #12280:
URL: https://github.com/apache/lucene/pull/12280#issuecomment-1540521962

   I still think it doesn't make sense to me to expose this. As i said on the 
dev list, your problem is that you use a custom postings format and you want it 
to accelerate the intersection. 
   
   The cleanest way to do this, is to handoff the intersection to the 
postingsformat directly, rather than worry about seekCeil/seekExact and 
subclassing queries or exposing stuff. It should give a performance improvement 
using the default postings format as well (at least it did for other queries 
when mikemccand added it)
   
   So, IMO we should try to fix this query to use Terms.intersect() [see 
#12176], then override Terms.intersect for the BloomPostingsFormat to make use 
of the bloom filters to speed up intersection. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jimczi commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-09 Thread via GitHub


jimczi commented on PR #12257:
URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540524526

   Sorry but that doesn't make sense to me to work/review this PR while 
https://github.com/apache/lucene/pull/12254 is under review. I also disagree 
with the goal here, the mutable hnsw graph is used internally by the indexer, 
making it multi-threaded just for search is an external requirement. The other 
PR started with the goal of speeding up the building by using multiple threads. 
That seems reasonable to me. Using multiple threads just for the sake of this 
token filter is a non-goal imo. 
   Can you follow the progress of https://github.com/apache/lucene/pull/12254 
and plug your changes there? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita commented on issue #11547: IntersectIterators is not necessary under matchAll case in Facet [LUCENE-10511]

2023-05-09 Thread via GitHub


stefanvodita commented on issue #11547:
URL: https://github.com/apache/lucene/issues/11547#issuecomment-1540554539

   I’m curious about this suggestion and I’d like to understand it better. How 
could `ConjunctionUtils` tell if an iterator was going to be an “all” iterator? 
It couldn’t use cost in the general case because cost is not exact in the 
general case. We also couldn’t label “all” iterators on creation to have them 
handled separately because they could be advanced before getting passed to 
`intersectIterators` and then they would not act as an “all” iterator, right?
   
   If instead we consider changing the callers of `intersectIterators` to be 
smarter, I looked through places in the code where the method is called. I 
didn’t find the particular instance that @LuXugang was 
[referring](https://github.com/apache/lucene/issues/11547#issue-1348299158) to, 
but there are plenty of instances where at least one of the iterators is a 
DocValues. In that case, how does Lucene guarantee that it will have exact 
cost? Could we have a DocValues implementation that doesn’t?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-09 Thread via GitHub


zhaih commented on PR #12257:
URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540613109

   Sorry @jimczi I am not sure I get your idea.
   Did you mean:
   1. We shouldn't make improvement to this OnHeapHnswGraph right now because 
the other PR is ongoing and it will likely replace this one so what we're doing 
here will be deprecated eventually.
   Or:
   2. We shouldn't make improvement to this OnHeapHnswGraph for reason that is 
not for our main use case: which is indexing.
   
   I actually don't think this PR is that related to the other one, actually 
this one can be benefit to the other one as that one is currently using 
`ThreadLocal` internally to solve the problem this PR is solving. If this PR is 
checked in, I believe the other one might be able to abandon the `ThreadLocal`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jimczi commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-09 Thread via GitHub


jimczi commented on PR #12257:
URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540666019

   I think both 1 and 2 yes. If you think this PR will benefit the other PR 
then it should be discussed in https://github.com/apache/lucene/pull/12254. We 
already have issues with indexing so adding new requirements outside of this 
use case should be avoided imo.
   The thing I don't like with this change alone is that it makes it like if 
using the HNSW builder for a search use case is ok. 
   That's not right, nobody should use the HNSW builder as a static searcher. 
If we make it multi-threaded it should be to make the build faster. Search is 
exposed in the builder simply because it is used by the algorithm to create the 
graph. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub


mikemccand commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540672865

   Thanks @jbellis -- I'm not sure this change caused any difference.  
Something in the past couple days tweaked the KNN results and this one jumped 
out at me as a possibility.
   
   This is the only detail the nightly benchmark produced:
   
   ```
   Traceback (most recent call last):
 File "/l/util.nightly/src/python/nightlyBench.py", line 1818, in 
   run()
 File "/l/util.nightly/src/python/nightlyBench.py", line 701, in run
   raise RuntimeError('search result differences: %s' % str(errors))
   RuntimeError: search result differences: 
["query=KnnFloatVectorQuery:vector[0.028473025,...][100] filter=None sort=None 
groupField=None hitCount=100: hit 15 has wrong field/score value ([17135768], 
'0.9621086') vs ([26065483], '0.9620853')", 
"query=KnnFloatVectorQuery:vector[-0.047548626,...][100] filter=None sort=None 
groupField=None hitCount=100: hit 0 has wrong field/score value ([20712471], 
'0.8335463') vs ([15605918], '0.8440397')", 
"query=KnnFloatVectorQuery:vector[0.02625591,...][100] filter=None sort=None 
groupField=None hitCount=100: hit 7 has wrong field/score value ([23761647], 
'0.8285247') vs ([25459412], '0.8309758')"]
   ```
   
   Unfortunately it is not so simple to reproduce these nightly benchmarks.  
Note that they only check for exact hit/score differences and not any 
precision/recall tradeoff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller closed pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub


gsmiller closed pull request #12280: Expose iterator over query terms in 
TermInSetQuery
URL: https://github.com/apache/lucene/pull/12280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub


gsmiller commented on PR #12280:
URL: https://github.com/apache/lucene/pull/12280#issuecomment-1540768011

   Got it, thanks @rmuir. I hadn't seen your dev list reply yet. This all makes 
sense. I'll close this out and have a look at leveraging intersect. Seems like 
a better path forward. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub


msokolov commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540770665

   The idea is that the hnsw randomness should be predictable based on its 
fixed random seed (42 IIRC). It isn't really a problem if we changed that as 
long as we have some idea how / why, and as you say the recall remains 
unchanged or improved. Also we'd want to make sure that the new normal is stable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-09 Thread via GitHub


zhaih commented on PR #12257:
URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540937393

   I'm ok to just pause it here or eventually drop this PR as I'm definitely 
not the one who need this feature (at least for now). But I guess if @jimczi 
you want to stop people from using the `OnHeapHnswGraph` besides indexing, the 
better place is to stop (since it's merged then probably revert it) this one 
https://github.com/apache/lucene/pull/12169 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub


jbellis commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1541171234

   If the algorithm is implemented correctly, and I think that it is, then in 
theory the order of neighbor traversal should not matter.
   
   But we are seeing a difference here, so I *think* what causes that is the 
limited precision that you get in practice when computing vector similarities.  
If you have enough vectors, and enough dimensions, then the round off error can 
accumulate enough to make a difference.  That is why the test suite does note 
surface this difference.
   
   I performed 38 runs of the Texmex SIFT benchmark with known-correct KNN.  
This resulted in the new code having a very tiny bit better recall on average, 
with p-value 0.16.  My statistics is a bit rusty (it's very rusty) but I 
believe we're justified in concluding that recall is no worse than before, at 
least on this test.
   
   Google sheet is 
[here](https://docs.google.com/spreadsheets/d/1Xcx43x30AmTpm-7GH_SJwkwGQ93P4wYPrClGomebtsk/edit)
 and raw data is attached as csv.
   
   The first column is the new code, and the second is the old (git sha 
1fa2be9).
   
   [combined.csv](https://github.com/apache/lucene/files/11437240/combined.csv)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] ryantbrown commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-05-09 Thread via GitHub


ryantbrown commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-1541351180

   The rabbit hole that is trying to store Open AI embeddings in Elasticsearch 
eventually leads here.  I read the entire thread and unless I am missing 
something, the obvious move to make the limit configurable (up to a point) or 
at a minimum, increase the limit to 1536 to support the 
`text-embedding-ada-002` model. In other words, there should be a compelling 
reason _not_ to increase the limit beyond the fact that it will hard to reduce 
in the future. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org