[GitHub] [lucene] javanna merged pull request #12270: Don't generate stacktrace in CollectionTerminatedException

2023-05-09 Thread via GitHub
javanna merged PR #12270: URL: https://github.com/apache/lucene/pull/12270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] javanna commented on pull request #12270: Don't generate stacktrace in CollectionTerminatedException

2023-05-09 Thread via GitHub
javanna commented on PR #12270: URL: https://github.com/apache/lucene/pull/12270#issuecomment-1539665901 Thanks @original-brownbear ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene] javanna closed pull request #769: LUCENE-10486: Avoid unnecessary overhead in TopScoreDoc and TopField collector manager

2023-05-09 Thread via GitHub
javanna closed pull request #769: LUCENE-10486: Avoid unnecessary overhead in TopScoreDoc and TopField collector manager URL: https://github.com/apache/lucene/pull/769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [lucene] javanna commented on pull request #12274: Make query timeout members final in ExitableDirectoryReader

2023-05-09 Thread via GitHub
javanna commented on PR #12274: URL: https://github.com/apache/lucene/pull/12274#issuecomment-1539773484 Thanks @iverase ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [lucene] javanna merged pull request #12274: Make query timeout members final in ExitableDirectoryReader

2023-05-09 Thread via GitHub
javanna merged PR #12274: URL: https://github.com/apache/lucene/pull/12274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] javanna merged pull request #12272: Update javadocs for QueryTimeout

2023-05-09 Thread via GitHub
javanna merged PR #12272: URL: https://github.com/apache/lucene/pull/12272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] javanna commented on pull request #12272: Update javadocs for QueryTimeout

2023-05-09 Thread via GitHub
javanna commented on PR #12272: URL: https://github.com/apache/lucene/pull/12272#issuecomment-1539774583 Thanks @mkhludnev & @iverase ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [lucene] javanna merged pull request #12271: Make TimeExceededException members final

2023-05-09 Thread via GitHub
javanna merged PR #12271: URL: https://github.com/apache/lucene/pull/12271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] javanna commented on pull request #12271: Make TimeExceededException members final

2023-05-09 Thread via GitHub
javanna commented on PR #12271: URL: https://github.com/apache/lucene/pull/12271#issuecomment-1539775229 Thanks @iverase ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [lucene] rmuir commented on pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub
rmuir commented on PR #12280: URL: https://github.com/apache/lucene/pull/12280#issuecomment-1539972391 Please lets not go this path. It is a mutlitermquery, if you want to change how it works behind the scenes, you "plugin" with RewriteMethod -- This is an automated message from the Apach

[GitHub] [lucene] mikemccand commented on issue #12276: Maybe rename `DaciukMihovAutomatonBuilder`?

2023-05-09 Thread via GitHub
mikemccand commented on issue #12276: URL: https://github.com/apache/lucene/issues/12276#issuecomment-1540061997 > StringSetAutomatonBuilder? Hmm can we remove `Set`? (It could be a list or array of String too). Maybe `StringsToAutomaton`? -- This is an automated message from the

[GitHub] [lucene] dweiss commented on issue #12276: Maybe rename `DaciukMihovAutomatonBuilder`?

2023-05-09 Thread via GitHub
dweiss commented on issue #12276: URL: https://github.com/apache/lucene/issues/12276#issuecomment-1540137383 A list or an array - doesn't matter, conceptually it's a set in the end (in the automaton). But I don't mind any version. -- This is an automated message from the Apache Git Servic

[GitHub] [lucene] mikemccand commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub
mikemccand commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540238456 Hello, is it expected that this change alters the KNN hits returned? That's fine (if it was expected) ... Lucene's nightly benchmarks are angry about it though, so I'll just regold i

[GitHub] [lucene] benwtrent commented on pull request #12197: [Backport] GITHUB-11838 Add api to allow concurrent query rewrite

2023-05-09 Thread via GitHub
benwtrent commented on PR #12197: URL: https://github.com/apache/lucene/pull/12197#issuecomment-1540274950 Thanks for the review @uschindler! I added a test that verifies override defaults for both methods. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub
jbellis commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540274801 Not expected. How can I run the test locally? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540435331 Those tests are run using a separate package called `luceneutil` -- see https://github.com/mikemccand/luceneutil -- This is an automated message from the Apache Git Service. To respon

[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540438392 I wonder if it could have been https://github.com/apache/lucene/pull/12248 that caused the difference? -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [lucene] gsmiller commented on pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub
gsmiller commented on PR #12280: URL: https://github.com/apache/lucene/pull/12280#issuecomment-1540466104 Thanks @rmuir. It would be ideal if we could do this through RewriteMethod, but I'm not sure how we can actually accomplish that. The problem is in the implementation of `getTerms`, as

[GitHub] [lucene] alessandrobenedetti commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-05-09 Thread via GitHub
alessandrobenedetti commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1540470731 Hi @dsmiley I updated the dev discussion on the mailing list: [Proposal] Remove max number of dimensions for KNN vectors And proceeded with a pragmatic new mail

[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub
jbellis commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540480071 Downloading now. In the meantime, can you give more details on the failure? Some variance is expected just by the nature of hnsw randomness, but it shouldn't go from e.g. 90% recall to

[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-09 Thread via GitHub
zhaih commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540499897 Thanks @alessandrobenedetti , I'll wait a day to give @msokolov and @jimczi a chance to review before merging it! -- This is an automated message from the Apache Git Service. To respond

[GitHub] [lucene] gsmiller commented on a diff in pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub
gsmiller commented on code in PR #12280: URL: https://github.com/apache/lucene/pull/12280#discussion_r1188857287 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/PKTermInSetQuery.java: ## @@ -0,0 +1,117 @@ +package org.apache.lucene.sandbox.search; + +import java.io.I

[GitHub] [lucene] rmuir commented on pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub
rmuir commented on PR #12280: URL: https://github.com/apache/lucene/pull/12280#issuecomment-1540521962 I still think it doesn't make sense to me to expose this. As i said on the dev list, your problem is that you use a custom postings format and you want it to accelerate the intersection.

[GitHub] [lucene] jimczi commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-09 Thread via GitHub
jimczi commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540524526 Sorry but that doesn't make sense to me to work/review this PR while https://github.com/apache/lucene/pull/12254 is under review. I also disagree with the goal here, the mutable hnsw grap

[GitHub] [lucene] stefanvodita commented on issue #11547: IntersectIterators is not necessary under matchAll case in Facet [LUCENE-10511]

2023-05-09 Thread via GitHub
stefanvodita commented on issue #11547: URL: https://github.com/apache/lucene/issues/11547#issuecomment-1540554539 I’m curious about this suggestion and I’d like to understand it better. How could `ConjunctionUtils` tell if an iterator was going to be an “all” iterator? It couldn’t use cost

[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-09 Thread via GitHub
zhaih commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540613109 Sorry @jimczi I am not sure I get your idea. Did you mean: 1. We shouldn't make improvement to this OnHeapHnswGraph right now because the other PR is ongoing and it will likely replac

[GitHub] [lucene] jimczi commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-09 Thread via GitHub
jimczi commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540666019 I think both 1 and 2 yes. If you think this PR will benefit the other PR then it should be discussed in https://github.com/apache/lucene/pull/12254. We already have issues with indexing s

[GitHub] [lucene] mikemccand commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub
mikemccand commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540672865 Thanks @jbellis -- I'm not sure this change caused any difference. Something in the past couple days tweaked the KNN results and this one jumped out at me as a possibility. Th

[GitHub] [lucene] gsmiller closed pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub
gsmiller closed pull request #12280: Expose iterator over query terms in TermInSetQuery URL: https://github.com/apache/lucene/pull/12280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene] gsmiller commented on pull request #12280: Expose iterator over query terms in TermInSetQuery

2023-05-09 Thread via GitHub
gsmiller commented on PR #12280: URL: https://github.com/apache/lucene/pull/12280#issuecomment-1540768011 Got it, thanks @rmuir. I hadn't seen your dev list reply yet. This all makes sense. I'll close this out and have a look at leveraging intersect. Seems like a better path forward. Thanks

[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1540770665 The idea is that the hnsw randomness should be predictable based on its fixed random seed (42 IIRC). It isn't really a problem if we changed that as long as we have some idea how / why,

[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-09 Thread via GitHub
zhaih commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1540937393 I'm ok to just pause it here or eventually drop this PR as I'm definitely not the one who need this feature (at least for now). But I guess if @jimczi you want to stop people from using th

[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-09 Thread via GitHub
jbellis commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1541171234 If the algorithm is implemented correctly, and I think that it is, then in theory the order of neighbor traversal should not matter. But we are seeing a difference here, so I *thin

[GitHub] [lucene] ryantbrown commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-05-09 Thread via GitHub
ryantbrown commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1541351180 The rabbit hole that is trying to store Open AI embeddings in Elasticsearch eventually leads here. I read the entire thread and unless I am missing something, the obvious move t