Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-06-18 Thread via GitHub
msfroh commented on PR #13201: URL: https://github.com/apache/lucene/pull/13201#issuecomment-2176943173 @gsmiller, @mayya-sharipova -- What do you both think of this approach? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-06-18 Thread via GitHub
msfroh commented on code in PR #13201: URL: https://github.com/apache/lucene/pull/13201#discussion_r1645031925 ## lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java: ## @@ -154,21 +154,24 @@ protected abstract WeightOrDocIdSetIterator r

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-06-18 Thread via GitHub
msfroh commented on PR #13201: URL: https://github.com/apache/lucene/pull/13201#issuecomment-2176805195 There we go -- I reworked the PR to reuse the terms collected during the call to `rewrite` in https://github.com/apache/lucene/pull/13454. Rather than modifying `estimateCost`, we c

Re: [PR] Change to versions.toml after running ./gradlew tidy [lucene]

2024-06-18 Thread via GitHub
dweiss commented on PR #13502: URL: https://github.com/apache/lucene/pull/13502#issuecomment-2176730295 Thank you for reporting, @slow-J ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-06-18 Thread via GitHub
benwtrent commented on PR #13401: URL: https://github.com/apache/lucene/pull/13401#issuecomment-2176636564 Wanted to touch base on this PR as it seems to have been stalled, mainly by me. The only format that would support pluggable similarities would be `Lucene99HnswVectorsFormat`.

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-06-18 Thread via GitHub
jpountz commented on PR #13359: URL: https://github.com/apache/lucene/pull/13359#issuecomment-2176590274 I pushed a new approach. Instead of `prepareSeekExact` returning `void`, it now returns a `Supplier` and forbids calling any other method on `TermsEnum` until the `Supplier` has been con

Re: [PR] add similarity threshold for hnsw [lucene]

2024-06-18 Thread via GitHub
benwtrent commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-2176580643 There is the new vector similarity query that handles this overall request. @agorlenko https://github.com/apache/lucene/pull/12679 do you think this covers your use case?

Re: [PR] Feature/scalar quantized off heap scoring [lucene]

2024-06-18 Thread via GitHub
benwtrent commented on PR #13497: URL: https://github.com/apache/lucene/pull/13497#issuecomment-2176549163 > are you reporting indexing times? query times? Query times, single segment, 10k docs of 1024 dims. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] gh-13340: Allow adding a parent field to an index with no fields (#13341) (#13483) [lucene]

2024-06-18 Thread via GitHub
benwtrent merged PR #13504: URL: https://github.com/apache/lucene/pull/13504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-06-18 Thread via GitHub
original-brownbear commented on PR #13472: URL: https://github.com/apache/lucene/pull/13472#issuecomment-2176505152 I reran the benchmarks of no concurrency vs 4 threads and constrained the page cache a lot by setting -Xmx to almost all of the machines memory (page cache size goes to about

Re: [PR] Remove intra-merge parallelism, relates to #13478 [lucene]

2024-06-18 Thread via GitHub
benwtrent merged PR #13501: URL: https://github.com/apache/lucene/pull/13501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

2024-06-18 Thread via GitHub
gsmiller commented on PR #13463: URL: https://github.com/apache/lucene/pull/13463#issuecomment-2176482403 Thanks @benwtrent for the continued testing (just now saw this... was away for a few days). I'll work on getting this merged here in a little bit. (and thanks @mayya-sharipova for the r

Re: [PR] Feature/scalar quantized off heap scoring [lucene]

2024-06-18 Thread via GitHub
msokolov commented on PR #13497: URL: https://github.com/apache/lucene/pull/13497#issuecomment-2176421274 are you reporting indexing times? query times? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-06-18 Thread via GitHub
original-brownbear commented on PR #13472: URL: https://github.com/apache/lucene/pull/13472#issuecomment-2176357069 > So for these results, I would expect to see ~4X QPS gain (ish) simply because wall-clock elapsed time for the query got ~4X faster That's why I added the perf numbers

[PR] gh-13340: Allow adding a parent field to an index with no fields (#13341) (#13483) [lucene]

2024-06-18 Thread via GitHub
benwtrent opened a new pull request, #13504: URL: https://github.com/apache/lucene/pull/13504 This backports @msokolov 's change to 9.11.1 and adds a CHANGES entry for the bugfix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-06-18 Thread via GitHub
stefanvodita commented on PR #13472: URL: https://github.com/apache/lucene/pull/13472#issuecomment-2176177541 > I'll open a spinoff issue here that Lucene's facet counting should also maybe tap into this executor for concurrent counting #12474 sounds related. -- This is an automate

[I] Lucene's facets should tap into `IndexSearcher`'s `TaskExecutor` too? [lucene]

2024-06-18 Thread via GitHub
mikemccand opened a new issue, #13503: URL: https://github.com/apache/lucene/issues/13503 ### Description Spinoff from the exciting discussion on https://github.com/apache/lucene/pull/13472: Lucene has made great gains recently on intra-query concurrency: using multiple thread

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-06-18 Thread via GitHub
mikemccand commented on PR #13472: URL: https://github.com/apache/lucene/pull/13472#issuecomment-2176161076 We need to be careful interpreting the QPS results from `luceneutil`: These are not actual red-line (capacity) QPS numbers (CPU is not normally saturated during these runs), but

Re: [PR] Change to versions.toml after running ./gradlew tidy [lucene]

2024-06-18 Thread via GitHub
slow-J closed pull request #13502: Change to versions.toml after running ./gradlew tidy URL: https://github.com/apache/lucene/pull/13502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Change to versions.toml after running ./gradlew tidy [lucene]

2024-06-18 Thread via GitHub
slow-J commented on PR #13502: URL: https://github.com/apache/lucene/pull/13502#issuecomment-2175983257 Thanks for the fix. Will close this PR then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Change to versions.toml after running ./gradlew tidy [lucene]

2024-06-18 Thread via GitHub
dweiss commented on PR #13502: URL: https://github.com/apache/lucene/pull/13502#issuecomment-2175977707 Please pull the changes on main - I've added line ending normalization there, I missed it in #13484. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Change to versions.toml after running ./gradlew tidy [lucene]

2024-06-18 Thread via GitHub
dweiss commented on PR #13502: URL: https://github.com/apache/lucene/pull/13502#issuecomment-2175973157 I think it's just end-of-line markers? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] Change to versions.toml after running ./gradlew tidy [lucene]

2024-06-18 Thread via GitHub
slow-J opened a new pull request, #13502: URL: https://github.com/apache/lucene/pull/13502 I ran ./gradlew tidy and I noticed that it reformatted versions.toml. There was a change yesterday https://github.com/apache/lucene/pull/13484 so it is possibly related. -- This is an automated mes

[PR] Removing usage of TopScoreDocCollector + TopFieldCollector deprecated methods (#create, #createSharedManager) [lucene]

2024-06-18 Thread via GitHub
slow-J opened a new pull request, #13500: URL: https://github.com/apache/lucene/pull/13500 Removing usage of TopScoreDocCollector + TopFieldCollector deprecated methods (#create, #createSharedManager) Closes #13499 Copying description from issue: These methods were depre

Re: [PR] This commit adds a new test CMS that always provides intra-merge parallelism [lucene]

2024-06-18 Thread via GitHub
benwtrent commented on PR #13475: URL: https://github.com/apache/lucene/pull/13475#issuecomment-2175809823 @jpountz sounds good -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[I] Remove internal uses of @Deprecated methods from TopScoreDocCollector and TopFieldCollector [lucene]

2024-06-18 Thread via GitHub
slow-J opened a new issue, #13499: URL: https://github.com/apache/lucene/issues/13499 ### Description These methods were deprecated in https://github.com/apache/lucene/pull/240 which is part of Lucene 10.0. Since they are not marked for deprecation in Lucene 9.x, they will not

Re: [PR] Avoid performance regression by constructing lazily the PointTree in NumericComparator [lucene]

2024-06-18 Thread via GitHub
iverase merged PR #13498: URL: https://github.com/apache/lucene/pull/13498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Sparse index: optional skip list on top of doc values [lucene]

2024-06-18 Thread via GitHub
jpountz commented on PR #13449: URL: https://github.com/apache/lucene/pull/13449#issuecomment-2175358948 FWIW I'm trying to use #11432 as a meta issue for sparse indexing and started listing tasks that I think we (ideally) need to complete to be in a good state for 9.0. -- This is an aut