Re: [PR] Upgrade spotless to 6.9.1, google java format to 1.23.0. [lucene]

2024-08-16 Thread via GitHub
dweiss commented on PR #13661: URL: https://github.com/apache/lucene/pull/13661#issuecomment-2292980814 Thanks, @rmuir -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Upgrade spotless to 6.9.1, google java format to 1.23.0. [lucene]

2024-08-16 Thread via GitHub
dweiss merged PR #13661: URL: https://github.com/apache/lucene/pull/13661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[PR] Simplify code and reduce the number of lines of code [lucene]

2024-08-16 Thread via GitHub
mrhbj opened a new pull request, #13662: URL: https://github.com/apache/lucene/pull/13662 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Simplify code and reduce the number of lines of code [lucene]

2024-08-16 Thread via GitHub
mrhbj closed pull request #13662: Simplify code and reduce the number of lines of code URL: https://github.com/apache/lucene/pull/13662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] Simplify code and reduce the number of lines of code [lucene]

2024-08-16 Thread via GitHub
mrhbj opened a new pull request, #13663: URL: https://github.com/apache/lucene/pull/13663 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Optimize decoding blocks of postings using the vector API. (#13636) [lucene]

2024-08-16 Thread via GitHub
jpountz merged PR #13652: URL: https://github.com/apache/lucene/pull/13652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-16 Thread via GitHub
epotyom commented on PR #13657: URL: https://github.com/apache/lucene/pull/13657#issuecomment-2293467656 I've made some temporary changes in luceneutil to be able to only run a couple of tasks that show regression and have meaningful profiler results - profiler results that we get for all t

Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]

2024-08-16 Thread via GitHub
msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2293683592 I made this tool; while testing it I ran into some unexpected wrinkles relating to our vector format. I created a new index from an existing one, with a new docid order by:

Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-16 Thread via GitHub
gsmiller commented on PR #13657: URL: https://github.com/apache/lucene/pull/13657#issuecomment-2293708857 Found one more interesting nuance this morning. If the `collectors` ArrayList is created with no initial size, it has no impact on performance no matter where it gets created (before-or

Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-16 Thread via GitHub
msokolov commented on PR #13657: URL: https://github.com/apache/lucene/pull/13657#issuecomment-2293718689 Note that setting the initial size requires calling `getSlices()` which might perhaps be put off by the compiler if its return value is not used. In turn `getSlices() does some complica

Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-16 Thread via GitHub
gsmiller commented on PR #13657: URL: https://github.com/apache/lucene/pull/13657#issuecomment-2293806502 Ah let me clarify a bit. I actually moved the call to getSlices back into the called private method to eliminate it from complicating things and used a constant value (randomly chose 8)

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-08-16 Thread via GitHub
msokolov commented on PR #13469: URL: https://github.com/apache/lucene/pull/13469#issuecomment-2293814000 > Would like to know how you are doing it if there is some reference. Because I am also using a custom codec. The one way I figured out to use this is I create my own KNNVectorsFormat a

Re: [PR] Inline skip data into postings lists [lucene]

2024-08-16 Thread via GitHub
HoustonPutman commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2293832375 It looks like grouping queries are really affected by this change. The throughput of each of them were halved: [100 groups](https://home.apache.org/~mikemccand/lucenebench/TermG

Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-16 Thread via GitHub
gsmiller commented on PR #13657: URL: https://github.com/apache/lucene/pull/13657#issuecomment-2293872374 OK, I inlined the `IndexSearcher#createWeight` code to further isolate things. The ordering that matters is whether-or-not the `collections` list gets created before the call to `Query#

[I] Speed up GroupingSelectors when using a descending sort on a high cardinality field [lucene]

2024-08-16 Thread via GitHub
HoustonPutman opened a new issue, #13664: URL: https://github.com/apache/lucene/issues/13664 ### Description I've ran a benchmark (using Solr admittedly, not Lucene), that compares the speed of various sorted queries. The fields mentioned in the benchmark are the fields that were sor

Re: [PR] Simplify code and reduce the number of lines of code [lucene]

2024-08-16 Thread via GitHub
gsmiller merged PR #13663: URL: https://github.com/apache/lucene/pull/13663 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Speed up prefix sums when decoding doc IDs. [lucene]

2024-08-16 Thread via GitHub
jpountz merged PR #13658: URL: https://github.com/apache/lucene/pull/13658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Inline skip data into postings lists [lucene]

2024-08-16 Thread via GitHub
jpountz commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2294115438 @HoustonPutman This is the same issue as reported above: the logic for lazily decoding blocks of freqs was broken and would decompress whole blocks of freqs on every doc ID. It is now fi

Re: [PR] Inline skip data into postings lists [lucene]

2024-08-16 Thread via GitHub
HoustonPutman commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2294155051 Thanks for the correction, sorry for the noise! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Add reopen method in PerThreadPKLookup [lucene]

2024-08-16 Thread via GitHub
github-actions[bot] commented on PR #13596: URL: https://github.com/apache/lucene/pull/13596#issuecomment-2294481436 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Optimize binary search call [lucene]

2024-08-16 Thread via GitHub
github-actions[bot] commented on PR #13595: URL: https://github.com/apache/lucene/pull/13595#issuecomment-2294481483 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Move synonym map off-heap for SynonymGraphFilter [lucene]

2024-08-16 Thread via GitHub
github-actions[bot] commented on PR #13054: URL: https://github.com/apache/lucene/pull/13054#issuecomment-2294482584 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-08-16 Thread via GitHub
goankur commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2294502514 > also, would be good to compare apples-to-apples here. currently from what i see, benchmark compares `dot8s(MemorySegment..)` vs `BinaryDotProduct(byte[])`. To me this mixes up concerns

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-08-16 Thread via GitHub
goankur commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1720493763 ## lucene/core/src/c/dotProduct.c: ## @@ -0,0 +1,143 @@ +// dotProduct.c + +#include +#include + +#ifdef __ARM_ACLE +#include +#endif + +#if (defined(__ARM_FEATURE_