[GitHub] [lucene] javanna merged pull request #12270: Don't generate stacktrace in CollectionTerminatedException
javanna merged PR #12270: URL: https://github.com/apache/lucene/pull/12270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #12270: Don't generate stacktrace in CollectionTerminatedException
javanna commented on PR #12270: URL: https://github.com/apache/lucene/pull/12270#issuecomment-1539665901 Thanks @original-brownbear ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna closed pull request #769: LUCENE-10486: Avoid unnecessary overhead in TopScoreDoc and TopField collector manager
javanna closed pull request #769: LUCENE-10486: Avoid unnecessary overhead in TopScoreDoc and TopField collector manager URL: https://github.com/apache/lucene/pull/769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #12274: Make query timeout members final in ExitableDirectoryReader
javanna commented on PR #12274: URL: https://github.com/apache/lucene/pull/12274#issuecomment-1539773484 Thanks @iverase ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna merged pull request #12274: Make query timeout members final in ExitableDirectoryReader
javanna merged PR #12274: URL: https://github.com/apache/lucene/pull/12274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna merged pull request #12272: Update javadocs for QueryTimeout
javanna merged PR #12272: URL: https://github.com/apache/lucene/pull/12272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #12272: Update javadocs for QueryTimeout
javanna commented on PR #12272: URL: https://github.com/apache/lucene/pull/12272#issuecomment-1539774583 Thanks @mkhludnev & @iverase ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna merged pull request #12271: Make TimeExceededException members final
javanna merged PR #12271: URL: https://github.com/apache/lucene/pull/12271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #12271: Make TimeExceededException members final
javanna commented on PR #12271: URL: https://github.com/apache/lucene/pull/12271#issuecomment-1539775229 Thanks @iverase ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12280: Expose iterator over query terms in TermInSetQuery
rmuir commented on PR #12280: URL: https://github.com/apache/lucene/pull/12280#issuecomment-1539972391 Please lets not go this path. It is a mutlitermquery, if you want to change how it works behind the scenes, you "plugin" with RewriteMethod -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on issue #12276: Maybe rename `DaciukMihovAutomatonBuilder`?
mikemccand commented on issue #12276: URL: https://github.com/apache/lucene/issues/12276#issuecomment-1540061997 > StringSetAutomatonBuilder? Hmm can we remove `Set`? (It could be a list or array of String too). Maybe `StringsToAutomaton`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on issue #12276: Maybe rename `DaciukMihovAutomatonBuilder`?
dweiss commented on issue #12276: URL: https://github.com/apache/lucene/issues/12276#issuecomment-1540137383 A list or an array - doesn't matter, conceptually it's a set in the end (in the automaton). But I don't mind any version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #12255: allocate one NeighborQueue per search for results
mikemccand commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540238456 Hello, is it expected that this change alters the KNN hits returned? That's fine (if it was expected) ... Lucene's nightly benchmarks are angry about it though, so I'll just regold if this is OK/expected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12197: [Backport] GITHUB-11838 Add api to allow concurrent query rewrite
benwtrent commented on PR #12197: URL: https://github.com/apache/lucene/pull/12197#issuecomment-1540274950 Thanks for the review @uschindler! I added a test that verifies override defaults for both methods. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results
jbellis commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540274801 Not expected. How can I run the test locally? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540435331 Those tests are run using a separate package called `luceneutil` -- see https://github.com/mikemccand/luceneutil -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540438392 I wonder if it could have been https://github.com/apache/lucene/pull/12248 that caused the difference? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12280: Expose iterator over query terms in TermInSetQuery
gsmiller commented on PR #12280: URL: https://github.com/apache/lucene/pull/12280#issuecomment-1540466104 Thanks @rmuir. It would be ideal if we could do this through RewriteMethod, but I'm not sure how we can actually accomplish that. The problem is in the implementation of `getTerms`, as defined in `TermInSetQuery`. We can plug in through `RewriteMethod#getTerms`, but we still need access to an iterator of the query terms. I'll draft up a sandbox query that might help illustrate the issue and we can discuss further. If there's a better way to go about this, happy to explore it. Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] alessandrobenedetti commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]
alessandrobenedetti commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1540470731 Hi @dsmiley I updated the dev discussion on the mailing list: [Proposal] Remove max number of dimensions for KNN vectors And proceeded with a pragmatic new mail thread, where we just collect proposals with a motivation (no discussion there): Dimensions Limit for KNN vectors - Next Steps Feel free to participate! My intention is to act relatively fast (and then also operate Solr side). It's a train we don't need/want to miss! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results
jbellis commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540480071 Downloading now. In the meantime, can you give more details on the failure? Some variance is expected just by the nature of hnsw randomness, but it shouldn't go from e.g. 90% recall to 80%. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph
zhaih commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540499897 Thanks @alessandrobenedetti , I'll wait a day to give @msokolov and @jimczi a chance to review before merging it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #12280: Expose iterator over query terms in TermInSetQuery
gsmiller commented on code in PR #12280: URL: https://github.com/apache/lucene/pull/12280#discussion_r1188857287 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/PKTermInSetQuery.java: ## @@ -0,0 +1,117 @@ +package org.apache.lucene.sandbox.search; + +import java.io.IOException; +import java.util.Collection; +import org.apache.lucene.index.ImpactsEnum; +import org.apache.lucene.index.PostingsEnum; +import org.apache.lucene.index.TermState; +import org.apache.lucene.index.Terms; +import org.apache.lucene.index.TermsEnum; +import org.apache.lucene.search.TermInSetQuery; +import org.apache.lucene.util.AttributeSource; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.BytesRefIterator; + +/** + * {@link TermInSetQuery} optimized for a primary key-like field. + * + * Relies on {@link TermsEnum#seekExact(BytesRef)} instead of {@link + * TermsEnum#seekCeil(BytesRef)} to produce a terms iterator, which is compatible with {@code + * BloomFilteringPostingsFormat}. + */ +public class PKTermInSetQuery extends TermInSetQuery { Review Comment: This class is for demo purposes only. I'm not suggesting we merge it as part of this PR. I only want to demonstrate how a class might leverage `getQueryTerms`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12280: Expose iterator over query terms in TermInSetQuery
rmuir commented on PR #12280: URL: https://github.com/apache/lucene/pull/12280#issuecomment-1540521962 I still think it doesn't make sense to me to expose this. As i said on the dev list, your problem is that you use a custom postings format and you want it to accelerate the intersection. The cleanest way to do this, is to handoff the intersection to the postingsformat directly, rather than worry about seekCeil/seekExact and subclassing queries or exposing stuff. It should give a performance improvement using the default postings format as well (at least it did for other queries when mikemccand added it) So, IMO we should try to fix this query to use Terms.intersect() [see #12176], then override Terms.intersect for the BloomPostingsFormat to make use of the bloom filters to speed up intersection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jimczi commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph
jimczi commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540524526 Sorry but that doesn't make sense to me to work/review this PR while https://github.com/apache/lucene/pull/12254 is under review. I also disagree with the goal here, the mutable hnsw graph is used internally by the indexer, making it multi-threaded just for search is an external requirement. The other PR started with the goal of speeding up the building by using multiple threads. That seems reasonable to me. Using multiple threads just for the sake of this token filter is a non-goal imo. Can you follow the progress of https://github.com/apache/lucene/pull/12254 and plug your changes there? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] stefanvodita commented on issue #11547: IntersectIterators is not necessary under matchAll case in Facet [LUCENE-10511]
stefanvodita commented on issue #11547: URL: https://github.com/apache/lucene/issues/11547#issuecomment-1540554539 I’m curious about this suggestion and I’d like to understand it better. How could `ConjunctionUtils` tell if an iterator was going to be an “all” iterator? It couldn’t use cost in the general case because cost is not exact in the general case. We also couldn’t label “all” iterators on creation to have them handled separately because they could be advanced before getting passed to `intersectIterators` and then they would not act as an “all” iterator, right? If instead we consider changing the callers of `intersectIterators` to be smarter, I looked through places in the code where the method is called. I didn’t find the particular instance that @LuXugang was [referring](https://github.com/apache/lucene/issues/11547#issue-1348299158) to, but there are plenty of instances where at least one of the iterators is a DocValues. In that case, how does Lucene guarantee that it will have exact cost? Could we have a DocValues implementation that doesn’t? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph
zhaih commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540613109 Sorry @jimczi I am not sure I get your idea. Did you mean: 1. We shouldn't make improvement to this OnHeapHnswGraph right now because the other PR is ongoing and it will likely replace this one so what we're doing here will be deprecated eventually. Or: 2. We shouldn't make improvement to this OnHeapHnswGraph for reason that is not for our main use case: which is indexing. I actually don't think this PR is that related to the other one, actually this one can be benefit to the other one as that one is currently using `ThreadLocal` internally to solve the problem this PR is solving. If this PR is checked in, I believe the other one might be able to abandon the `ThreadLocal`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jimczi commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph
jimczi commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540666019 I think both 1 and 2 yes. If you think this PR will benefit the other PR then it should be discussed in https://github.com/apache/lucene/pull/12254. We already have issues with indexing so adding new requirements outside of this use case should be avoided imo. The thing I don't like with this change alone is that it makes it like if using the HNSW builder for a search use case is ok. That's not right, nobody should use the HNSW builder as a static searcher. If we make it multi-threaded it should be to make the build faster. Search is exposed in the builder simply because it is used by the algorithm to create the graph. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #12255: allocate one NeighborQueue per search for results
mikemccand commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540672865 Thanks @jbellis -- I'm not sure this change caused any difference. Something in the past couple days tweaked the KNN results and this one jumped out at me as a possibility. This is the only detail the nightly benchmark produced: ``` Traceback (most recent call last): File "/l/util.nightly/src/python/nightlyBench.py", line 1818, in run() File "/l/util.nightly/src/python/nightlyBench.py", line 701, in run raise RuntimeError('search result differences: %s' % str(errors)) RuntimeError: search result differences: ["query=KnnFloatVectorQuery:vector[0.028473025,...][100] filter=None sort=None groupField=None hitCount=100: hit 15 has wrong field/score value ([17135768], '0.9621086') vs ([26065483], '0.9620853')", "query=KnnFloatVectorQuery:vector[-0.047548626,...][100] filter=None sort=None groupField=None hitCount=100: hit 0 has wrong field/score value ([20712471], '0.8335463') vs ([15605918], '0.8440397')", "query=KnnFloatVectorQuery:vector[0.02625591,...][100] filter=None sort=None groupField=None hitCount=100: hit 7 has wrong field/score value ([23761647], '0.8285247') vs ([25459412], '0.8309758')"] ``` Unfortunately it is not so simple to reproduce these nightly benchmarks. Note that they only check for exact hit/score differences and not any precision/recall tradeoff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller closed pull request #12280: Expose iterator over query terms in TermInSetQuery
gsmiller closed pull request #12280: Expose iterator over query terms in TermInSetQuery URL: https://github.com/apache/lucene/pull/12280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12280: Expose iterator over query terms in TermInSetQuery
gsmiller commented on PR #12280: URL: https://github.com/apache/lucene/pull/12280#issuecomment-1540768011 Got it, thanks @rmuir. I hadn't seen your dev list reply yet. This all makes sense. I'll close this out and have a look at leveraging intersect. Seems like a better path forward. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540770665 The idea is that the hnsw randomness should be predictable based on its fixed random seed (42 IIRC). It isn't really a problem if we changed that as long as we have some idea how / why, and as you say the recall remains unchanged or improved. Also we'd want to make sure that the new normal is stable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph
zhaih commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540937393 I'm ok to just pause it here or eventually drop this PR as I'm definitely not the one who need this feature (at least for now). But I guess if @jimczi you want to stop people from using the `OnHeapHnswGraph` besides indexing, the better place is to stop (since it's merged then probably revert it) this one https://github.com/apache/lucene/pull/12169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results
jbellis commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1541171234 If the algorithm is implemented correctly, and I think that it is, then in theory the order of neighbor traversal should not matter. But we are seeing a difference here, so I *think* what causes that is the limited precision that you get in practice when computing vector similarities. If you have enough vectors, and enough dimensions, then the round off error can accumulate enough to make a difference. That is why the test suite does note surface this difference. I performed 38 runs of the Texmex SIFT benchmark with known-correct KNN. This resulted in the new code having a very tiny bit better recall on average, with p-value 0.16. My statistics is a bit rusty (it's very rusty) but I believe we're justified in concluding that recall is no worse than before, at least on this test. Google sheet is [here](https://docs.google.com/spreadsheets/d/1Xcx43x30AmTpm-7GH_SJwkwGQ93P4wYPrClGomebtsk/edit) and raw data is attached as csv. The first column is the new code, and the second is the old (git sha 1fa2be9). [combined.csv](https://github.com/apache/lucene/files/11437240/combined.csv) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ryantbrown commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]
ryantbrown commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1541351180 The rabbit hole that is trying to store Open AI embeddings in Elasticsearch eventually leads here. I read the entire thread and unless I am missing something, the obvious move to make the limit configurable (up to a point) or at a minimum, increase the limit to 1536 to support the `text-embedding-ada-002` model. In other words, there should be a compelling reason _not_ to increase the limit beyond the fact that it will hard to reduce in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org