[GitHub] [lucene] jbellis closed pull request #12303: Address HNSW Searcher performance regression

2023-05-21 Thread via GitHub
jbellis closed pull request #12303: Address HNSW Searcher performance regression URL: https://github.com/apache/lucene/pull/12303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] gsmiller commented on issue #12317: Option for disabling term dictionary compression

2023-05-21 Thread via GitHub
gsmiller commented on issue #12317: URL: https://github.com/apache/lucene/issues/12317#issuecomment-1556202535 I'm no expert in this area of our codec, but I'm curious to understand the issue a bit better. From the flame chart you provide, it looks like you're primarily looking at an indexi

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556224675 i made the benchmarks easier to run with something like this: ``` git clone https://github.com/rmuir/vectorbench cd vectorbench mvn verify java -jar target/vectorbench.jar

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556278132 I didn't get an anywhere with Luceneutil yet! :-( (I haven't been able to run it successfully, getting OOM errors ) -- This is an automated message from the Apache Git Service.

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556284488 thanks for sanity checking! i'm still working on the repo and making improvements. would be super-curious if you could 'git pull' and try -psize=1024 on your avx512 machine. hopefully it l

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556285923 With latest commits to that vectorbench I see this on my m1: ``` Benchmark (size) Mode Cnt Score Error Units DotProductBenchmark.dotProductNew

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556287294 thanks, glad it fixes the problem. i am running it across all the sizes we test and seeing how it looks on both my machines. -- This is an automated message from the Apache Git Service.

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556288076 > we were being inefficient. If I understand this correctly, the inefficiency was too many reduceLances, right? You replaced it with addition of the accumulators before reduc

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556288827 yes, i think we have to imagine it as a scalar operation that gets slower as vector size increases. i looked into it and read this answer and changed the code: https://stackoverflow.com/q

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556297297 i pushed one more commit to improve for "unaligned" vectors. the way to think about it, with unrolling, we do 64-at-a-time on avx512. So it isn't good to do worst-case 63 scalar com

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556297355 aarch64: ``` Benchmark (size) Mode CntScore Error Units DotProductBenchmark.dotProductNew 1 thrpt5 322.255 ± 0.496 ops/us

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556297561 skylake: ``` Benchmark (size) Mode CntScore Error Units DotProductBenchmark.dotProductNew 1 thrpt5 153.702 ± 2.576 ops/us

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556403183 I pushed a new benchmark to https://github.com/rmuir/vectorbench for the binary dot product. Basically this has to act like: ``` int sum = 0; for (...) { short product

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-21 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556405150 and here's the results on my aarch64 mac, which has only 128-bit vectors and gets that disappointing generic impl: ``` Benchmark(size) Mode CntS

[GitHub] [lucene] tang-hi opened a new pull request, #12322: NeighborQueue set incomplemete false when call clear

2023-05-21 Thread via GitHub
tang-hi opened a new pull request, #12322: URL: https://github.com/apache/lucene/pull/12322 ### Description solve the bug that @msokolov mentioned in [PR](https://github.com/apache/lucene/pull/12255#issuecomment-1553088549) -- This is an automated message from the Apache Git Servic

[GitHub] [lucene] zhaih merged pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-21 Thread via GitHub
zhaih merged PR #12257: URL: https://github.com/apache/lucene/pull/12257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] zhaih commented on a diff in pull request #12246: Set word2vec getSynonyms method synchronized

2023-05-21 Thread via GitHub
zhaih commented on code in PR #12246: URL: https://github.com/apache/lucene/pull/12246#discussion_r1199956128 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java: ## @@ -42,6 +42,7 @@ public class Word2VecSynonymProvider {