Re: [PR] Upgrade spotless to 6.9.1, google java format to 1.23.0. [lucene]

2024-08-16 Thread via GitHub


dweiss commented on PR #13661:
URL: https://github.com/apache/lucene/pull/13661#issuecomment-2292980814

   Thanks, @rmuir 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Upgrade spotless to 6.9.1, google java format to 1.23.0. [lucene]

2024-08-16 Thread via GitHub


dweiss merged PR #13661:
URL: https://github.com/apache/lucene/pull/13661


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Simplify code and reduce the number of lines of code [lucene]

2024-08-16 Thread via GitHub


mrhbj opened a new pull request, #13662:
URL: https://github.com/apache/lucene/pull/13662

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Simplify code and reduce the number of lines of code [lucene]

2024-08-16 Thread via GitHub


mrhbj closed pull request #13662: Simplify code and reduce the number of lines 
of code
URL: https://github.com/apache/lucene/pull/13662


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Simplify code and reduce the number of lines of code [lucene]

2024-08-16 Thread via GitHub


mrhbj opened a new pull request, #13663:
URL: https://github.com/apache/lucene/pull/13663

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize decoding blocks of postings using the vector API. (#13636) [lucene]

2024-08-16 Thread via GitHub


jpountz merged PR #13652:
URL: https://github.com/apache/lucene/pull/13652


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-16 Thread via GitHub


epotyom commented on PR #13657:
URL: https://github.com/apache/lucene/pull/13657#issuecomment-2293467656

   I've made some temporary changes in luceneutil to be able to only run a 
couple of tasks that show regression and have meaningful profiler results - 
profiler results that we get for all tasks seems to have too many samples for 
other tasks e.g. faceting.
   
   Results after 20 runs:
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
MedTerm  581.44  (3.8%)  505.29  
(3.5%)  -13.1% ( -19% -   -5%) 0.000
   HighTerm  559.77  (3.5%)  501.17  
(3.5%)  -10.5% ( -16% -   -3%) 0.000
   ```
   
   The biggest difference in the profiler seems to be that we spend more time 
in 
`org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer.score(float, 
long)` now?
   
   Diff image:
   
   
![flamegraph_two_tasks_only](https://github.com/user-attachments/assets/847290e6-fb6a-440d-8131-a50759351234)
   
   
   JFR files: 
   [Archive.zip](https://github.com/user-attachments/files/16637306/Archive.zip)
   
   The code that was tested is slightly different from this PR, sharing 
branches just in case:
   
   - candidate: Branch with regression: 
https://github.com/epotyom/lucene/tree/IndexSearcher-search-regression
   - baseline: Branch with NO regression: 
https://github.com/epotyom/lucene/tree/IndexSearcher-search-NO-regression
   - Their diff: https://github.com/epotyom/lucene/pull/1/files
   
   luceneutil branch to reproduce: 
https://github.com/mikemccand/luceneutil/compare/main...epotyom:luceneutil:tasks_with_regression
   
   you'd need to generate task file manually as it seems to be to large for for 
github:
   
   ```
   rm tasks/wikimedium.10M.regressed.tasks
   cat tasks/wikimedium.10M.nostopwords.tasks | egrep '^(MedTerm|HighTerm):' > 
tasks/wikimedium.10M.regressed.tasks.1
   for n in {1..1}; do cat tasks/wikimedium.10M.regressed.tasks.1 >> 
tasks/wikimedium.10M.regressed.tasks; done
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]

2024-08-16 Thread via GitHub


msokolov commented on issue #13565:
URL: https://github.com/apache/lucene/issues/13565#issuecomment-2293683592

I made this tool; while testing it I ran into some unexpected wrinkles 
relating to our vector format.  I created a new index from an existing one, 
with a new docid order by:
   
 writer.addIndexes(SortingCodecReader.wrap((CodecReader) 
ctx.reader(), docMap, null));
   
   But this recreates the HNSW graph doing unneccessary work, when all we 
really want to do is to *renumber* it. And the new graph it creates ignores the 
parameters that were used to create the original graph, substituting defaults, 
such as M=16.  It makes me wonder if we ought to write M to the index as 
per-field metadata so it can be reused by tools such as this that may not have 
access to a schema, or in general when merging the field in the future.
   
   I guess for my preliminary tests I can simply hack the 
`DEFAULT_MAX_CONNECTIONS` to fit my test data, but I'd like to hear folks' 
opinions on how we should address this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-16 Thread via GitHub


gsmiller commented on PR #13657:
URL: https://github.com/apache/lucene/pull/13657#issuecomment-2293708857

   Found one more interesting nuance this morning. If the `collectors` 
ArrayList is created with no initial size, it has no impact on performance no 
matter where it gets created (before-or-after createWeight). But if there's an 
initial size, then the impact remains (where it regresses performance if done 
before createWeight and has no impact if done after). So I wonder if it has 
something to do with allocating memory on the heap before calling createWeight 
and less about this actual collection? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-16 Thread via GitHub


msokolov commented on PR #13657:
URL: https://github.com/apache/lucene/pull/13657#issuecomment-2293718689

   Note that setting the initial size requires calling `getSlices()` which 
might perhaps be put off by the compiler if its return value is not used. In 
turn `getSlices() does some complicated things including creating a 
`TaskExecutor` -- this seems to me it could possibly cause some mischief?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-16 Thread via GitHub


gsmiller commented on PR #13657:
URL: https://github.com/apache/lucene/pull/13657#issuecomment-2293806502

   Ah let me clarify a bit. I actually moved the call to getSlices back into 
the called private method to eliminate it from complicating things and used a 
constant value (randomly chose 8) to specify the initial capacity. So I think 
we can rule out interactions with getSlices. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Expose FlatVectorsFormat [lucene]

2024-08-16 Thread via GitHub


msokolov commented on PR #13469:
URL: https://github.com/apache/lucene/pull/13469#issuecomment-2293814000

   > Would like to know how you are doing it if there is some reference. 
Because I am also using a custom codec. The one way I figured out to use this 
is I create my own KNNVectorsFormat and then used the FlatWriters and Readers 
directly. Let me know if there is any better way. But this is best I can think 
of.
   
   I think we need to add the format to SPI. That way we will be able to evolve 
it going forward and let the index reading code pick the proper implementation 
corresponding to what was written to the index. Making the Codec figure this 
out is probably just a short-term hack that won't scale well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Inline skip data into postings lists [lucene]

2024-08-16 Thread via GitHub


HoustonPutman commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2293832375

   It looks like grouping queries are really affected by this change. The 
throughput of each of them were halved:
   [100 
groups](https://home.apache.org/~mikemccand/lucenebench/TermGroup100.html), 
[10K 
groups](https://home.apache.org/~mikemccand/lucenebench/TermGroup10K.html), [1M 
groups](https://home.apache.org/~mikemccand/lucenebench/TermGroup1M.html), [1M 
groups (two pass block 
grouping)](https://home.apache.org/~mikemccand/lucenebench/TermBGroup1M.html), 
[1M groups (single pass block 
grouping)](https://home.apache.org/~mikemccand/lucenebench/TermBGroup1M1P.html)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-16 Thread via GitHub


gsmiller commented on PR #13657:
URL: https://github.com/apache/lucene/pull/13657#issuecomment-2293872374

   OK, I inlined the `IndexSearcher#createWeight` code to further isolate 
things. The ordering that matters is whether-or-not the `collections` list gets 
created before the call to `Query#createWeight` (so it rules out anything to do 
with the query cache stuff). I updated the diff in this PR to show this along 
with comments explaining.
   
   And, as mentioned before, it matters whether-or-not an initial capacity is 
provided. If no initial capacity is provided then the ordering doesn't matter 
and no regressions are seen (so I'm standing by my guess that this has 
something to do with allocating heap before the call to `Query#createWeight` 
and not so much something specific with this `collections` list)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Speed up GroupingSelectors when using a descending sort on a high cardinality field [lucene]

2024-08-16 Thread via GitHub


HoustonPutman opened a new issue, #13664:
URL: https://github.com/apache/lucene/issues/13664

   ### Description
   
   I've ran a benchmark (using Solr admittedly, not Lucene), that compares the 
speed of various sorted queries. The fields mentioned in the benchmark are the 
fields that were sorted on.
   
   Benchmark parameters:
   - 1M documents
 - High cardinality fields are unique values per document (an incrementing 
counter)
 - Low cardinality fields had 500 unique values
 - All fields were single-valued
   - The different fields were tested: `LongPointField`, `StrField` and 
`TrieLongField` (No longer in lucene)
 - All sort fields were tested using both DocValues and Uninversion.
   - Queries
 - 10 results were requested, for both grouped and non-grouped
 - Each field type, docValues/Uninversion combination were ran in 8 
configurations doing a matrix of:
   - Grouping and Non-grouping
   - Sort `asc` and `desc`
   - High Cardinality values and Low Cardinality values
 - The grouping:
   - All queries were grouped by a the same string field (250,000 unique 
values, docValues enabled)
   - The sort field was used both for both the group sorting and the 
document sorting within the groups
   
   https://github.com/user-attachments/assets/75868683-3fe8-4745-b0a9-1501e7af8f28";>
   
   Overall there are some interesting findings:
   
   - **For un-grouped queries (sorting by a high cardinality field), ascending 
sorts were 5x faster than descending sorts. For grouped queries, they were 40x 
faster.** That is ultimately what this issue is meant to address, so I 
highlighted it in the chart above.
 - Note: Low cardinality fields had no difference in speed
   - DocValues and Uninverted fields has similar sorting performance for 
grouped queries. (This is likely related to 
https://github.com/apache/lucene/issues/10368)
   
   I know that since this benchmark is using Solr, it's only so useful here. So 
I can utilize lucenebench to try to recreate this as well, if that would help.
   
   I also have the flame graphs for these benchmarks, which aren't as useful as 
a screenshot but I will provide one anyways (LongPointField, docValues, 
highCardinality, grouped, sorted descending):
   https://github.com/user-attachments/assets/0bba8cff-1a47-436d-9a53-3e52d31376ed";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Simplify code and reduce the number of lines of code [lucene]

2024-08-16 Thread via GitHub


gsmiller merged PR #13663:
URL: https://github.com/apache/lucene/pull/13663


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speed up prefix sums when decoding doc IDs. [lucene]

2024-08-16 Thread via GitHub


jpountz merged PR #13658:
URL: https://github.com/apache/lucene/pull/13658


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Inline skip data into postings lists [lucene]

2024-08-16 Thread via GitHub


jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2294115438

   @HoustonPutman This is the same issue as reported above: the logic for 
lazily decoding blocks of freqs was broken and would decompress whole blocks of 
freqs on every doc ID. It is now fixed as the benchmarks that you linked 
confirm (requires zooming in).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Inline skip data into postings lists [lucene]

2024-08-16 Thread via GitHub


HoustonPutman commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2294155051

   Thanks for the correction, sorry for the noise!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add reopen method in PerThreadPKLookup [lucene]

2024-08-16 Thread via GitHub


github-actions[bot] commented on PR #13596:
URL: https://github.com/apache/lucene/pull/13596#issuecomment-2294481436

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize binary search call [lucene]

2024-08-16 Thread via GitHub


github-actions[bot] commented on PR #13595:
URL: https://github.com/apache/lucene/pull/13595#issuecomment-2294481483

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move synonym map off-heap for SynonymGraphFilter [lucene]

2024-08-16 Thread via GitHub


github-actions[bot] commented on PR #13054:
URL: https://github.com/apache/lucene/pull/13054#issuecomment-2294482584

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-08-16 Thread via GitHub


goankur commented on PR #13572:
URL: https://github.com/apache/lucene/pull/13572#issuecomment-2294502514

   > also, would be good to compare apples-to-apples here. currently from what 
i see, benchmark compares `dot8s(MemorySegment..)` vs 
`BinaryDotProduct(byte[])`. To me this mixes up concerns about memorysegment vs 
java heap with C vs vector API, so it would be good to resolve for 
understanding purposes.
   
   As I understand, here is the call stack 
   
   `PanamaVectorUtilSupport.dotProduct(MemorySegment, MemorySegment)`
   `PanamaVectorUtilSupport.dotProduct(byte[], byte[])`
   `VectorUtil.dotProduct(byte[], byte[])`
   `VectorUtilBenchmark.binaryDotProductVector()` 
   
   So heap allocated `byte[]` gets wrapped into MemorySegment before the 
dotProduct computation that uses Panama APIs.
   
   For strict apples-to-apples comparison, one would wrap the heap-allocated 
`byte[]` into a MemorySegment and pass it to the native code -- 
`dot8s(MemorySegment..)` -- but this does not work and throws the following 
exception
   
   ```
   Caused by: java.lang.IllegalArgumentException: Heap segment not allowed: 
MemorySegment{ heapBase: Optional[[B@53a133c5] address:0 limit: 768 }
   at 
java.base/jdk.internal.foreign.abi.SharedUtils.unboxSegment(SharedUtils.java:331)
   at org.apache.lucene.util.VectorUtil.dot8s(VectorUtil.java:217)
   ... 13 more
   
   ```
   
   So I can't think of a good way to make the comparison fairer. Do you have 
any recommendations ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-08-16 Thread via GitHub


goankur commented on code in PR #13572:
URL: https://github.com/apache/lucene/pull/13572#discussion_r1720493763


##
lucene/core/src/c/dotProduct.c:
##
@@ -0,0 +1,143 @@
+// dotProduct.c
+
+#include 
+#include 
+
+#ifdef __ARM_ACLE
+#include 
+#endif
+
+#if (defined(__ARM_FEATURE_SVE) && !defined(__APPLE__)) 
+#include 
+/*
+ * Unrolled and vectorized int8 dotProduct implementation using SVE 
instructions
+ * NOTE: Clang 15.0 compiler on Apple M3 Max compiles the code below 
sucessfully 
+ * with '-march=native+sve' option but throws "Illegal Hardware Instruction" 
error
+ * Looks like Apple M3 does not implement SVE and Apple's official 
documentation
+ * is not explicit about this or at least I could not find it. 
+ * 
+ */
+int32_t vdot8s_sve(int8_t *vec1, int8_t *vec2, int32_t limit) {
+int32_t result = 0;
+int32_t i = 0;
+// Vectors of 8-bit signed integers
+svint8_t va1, va2, va3, va4;
+svint8_t vb1, vb2, vb3, vb4;
+// Init accumulators
+svint32_t acc1 = svdup_n_s32(0);
+svint32_t acc2 = svdup_n_s32(0);
+svint32_t acc3 = svdup_n_s32(0);
+svint32_t acc4 = svdup_n_s32(0);
+
+// Number of 8-bits elements in the SVE vector
+int32_t vec_length = svcntb();
+
+// Manually unroll the loop
+for (i = 0; i + 4 * vec_length <= limit; i += 4 * vec_length) {
+   // Load vectors into the Z registers which can range from 128-bit to 
2048-bit wide
+   // The predicate register - P determines which bytes are active
+   // svptrue_b8() returns a predictae in which every element is true

Review Comment:
   @rmuir  -- I tried using `svwhilelt_b8_u32(i, limit)` intrinsic for 
generating the predicate in both the unrolled-loop and vector tail but the 
performance was actually worse :-(.  
   
   To give you an idea of what I did, here is link to the ARM documentation 
with code sample
   
   
https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org