[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558674081 Latest benchmark results. intel rocketlake (512-bit vectors): ``` Benchmark(size) Mode Cnt Score Error Units BinaryCosine

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201701447 ## lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201706152 ## lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558725059 Apologies, I'm don't have much time early this week, but hope to be more active later in the week. I know that there are still outstanding questions around testing, luceneutil, etc,

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201852487 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvm

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201860585 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvm

[GitHub] [lucene] rmuir commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201912353 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvmArgs '-

[GitHub] [lucene] rmuir commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201914722 ## lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201916237 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvm

[GitHub] [lucene] alessandrobenedetti commented on pull request #12314: Multi-value support for KnnVectorField

2023-05-23 Thread via GitHub
alessandrobenedetti commented on PR #12314: URL: https://github.com/apache/lucene/pull/12314#issuecomment-1558921019 Thanks @benwtrent and no rush, I am not planning to commit to this work anytime soon. I'll need various validations from other committers on the various areas of the code

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201931143 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvmAr

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201931143 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvmAr

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201931143 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvmAr

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201940187 ## lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201940561 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvmAr

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201948213 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvm

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201916237 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvm

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558965655 I renamed the constants a bit (e.g. SPECIES at top was inconsistent with the rest) -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558966617 I will now write a test that compares, iff the panama provider is used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [lucene] rmuir commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201991364 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvmArgs '-

[GitHub] [lucene] javanna merged pull request #12233: [MINOR] Update javadoc in Query class

2023-05-23 Thread via GitHub
javanna merged PR #12233: URL: https://github.com/apache/lucene/pull/12233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] javanna commented on pull request #12233: [MINOR] Update javadoc in Query class

2023-05-23 Thread via GitHub
javanna commented on PR #12233: URL: https://github.com/apache/lucene/pull/12233#issuecomment-1558987882 thanks @AndreyBozhko ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559022634 > Apologies, I'm don't have much time early this week, but hope to be more active later in the week. I know that there are still outstanding questions around testing, luceneutil, etc, but

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559035945 couple of random TODOs i had to look into, we can think about: * disable panama provider unless user has at least 128-bit vectors? Nobody has benched the case where SPECIES_PREFERRED is

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559067381 I added a test comparing both providers (executes if the VectorUtil's provider is different from Lucene's. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [lucene] mikemccand commented on issue #12321: Can we make `DaciukMihovAutomatonBuilder` pkg-private?

2023-05-23 Thread via GitHub
mikemccand commented on issue #12321: URL: https://github.com/apache/lucene/issues/12321#issuecomment-1559085083 Oooh I love that idea! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [lucene] mikemccand commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
mikemccand commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559159143 I ran this on the `luceneutil` nightly benchmark box (`beast3`): This is an AMD Ryzen Threadripper 3990X 64-Core Processor, with this long list of CPU flags AND bugs: ```

[GitHub] [lucene] mikemccand commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
mikemccand commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559223951 I also ran the benchmark on a Skylake Linux box I have: ``` processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 94 model name

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559226152 > I ran this on the `luceneutil` nightly benchmark box (`beast3`): Anybody with a real Luceneutil vector-benchmark of the whole luceneutil testsuite? -- This is an automa

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559308415 > Anybody with a real Luceneutil vector-benchmark of the whole luceneutil testsuite? I think the benefits won't look very impressive because it only uses 100 dims. This seems to be

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559489000 > disable panama provider unless user has at least 128-bit vectors? Nobody has benched the case where SPECIES_PREFERRED is 64-bits: that's the case where vectorization isnt enabled

[GitHub] [lucene] msokolov commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
msokolov commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559562832 > you can see this stuff in benchmarks above where e.g. 128 dims is faster than 100, etc. should we zero-pad before computing the dot-products? It wouldn't affect the result and

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559566170 > Agreed. I don't think we care enough to try this - what's the point. Pushed [74a9782](https://github.com/apache/lucene/pull/12311/commits/74a9782240dc3b0eb99f19c324be304117371e46)

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559571108 > > you can see this stuff in benchmarks above where e.g. 128 dims is faster than 100, etc. > > should we zero-pad before computing the dot-products? It wouldn't affect the res

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559578873 @uschindler > Cool thanks. I was about to do the same, my idea was a bit different: Add a normal virtual method "isSupported()" to the interface and implement it returning t

[GitHub] [lucene] msokolov commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
msokolov commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559582073 I'm trying to run luceneutil benchmarks but there are confounding factors - we spend a *lot* of time parsing the vector dictionary to create query-vectors. I think we need to fix this b

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559669255 > @uschindler > > > Cool thanks. I was about to do the same, my idea was a bit different: Add a normal virtual method "isSupported()" to the interface and implement it returnin

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559765890 > should we zero-pad before computing the dot-products? It wouldn't affect the result and sounds like it would be faster I don't think so. the code wouldnt even know what size to pad

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559831627 Sorry for again changing. I was not happy with the bitsize code in the lookup function. I used a very simple approach: The constructor of the provider throws UnsupportedOperationExcep

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559834404 I tested this with changing if statement to 1280 :-) Message looks fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559863229 > > should we zero-pad before computing the dot-products? It wouldn't affect the result and sounds like it would be faster > > I don't think so. the code wouldnt even know what

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559867708 All bitsizes in the lengthy flame war on mailinglist was mostly powers of two (except maybe 1536). But we could maybe split into different chunk sizes? So to multiply size 768 vectors

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559872383 Sorry ignore my last comment. The power of 2 was wrong. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559929746 > Sorry for again changing. I was not happy with the bitsize code in the lookup function. I used a very simple approach: The constructor of the provider throws UnsupportedOperationExceptio

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-156194 Assert is fine. Now that the check is in ctor we know for sure it will be at least 128. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [lucene] RS146BIJAY commented on issue #12228: IndexWriter should clean up unreferenced files when segment merge fails due to disk full

2023-05-23 Thread via GitHub
RS146BIJAY commented on issue #12228: URL: https://github.com/apache/lucene/issues/12228#issuecomment-1560009080 @rmuir As of now is there is any way User can delete these unreferenced files on their end using Lucene without closing IndexWriter? -- This is an automated message from the A

[GitHub] [lucene] javanna commented on pull request #12160: Concurrent rewrite for KnnVectorQuery

2023-05-23 Thread via GitHub
javanna commented on PR #12160: URL: https://github.com/apache/lucene/pull/12160#issuecomment-1560097486 Question on this change: index searcher parallelizes search over leaf slices, while the concurrent rewrite for knn vector query will request one thread per segment. Is this on purpose or

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1560143502 my issues with maxvectorsize/preferred was a jshell luser issue. it really works, so its easy for me to simulate 64/128 bit vectors on my 256-bit machine. i am still fine with disabl

[GitHub] [lucene] zhaih commented on pull request #12160: Concurrent rewrite for KnnVectorQuery

2023-05-23 Thread via GitHub
zhaih commented on PR #12160: URL: https://github.com/apache/lucene/pull/12160#issuecomment-1560145097 I think the concurrent rewrite should act the same way(one thread per slice) ideally. Do you want to open a PR on that? Or open an issue and I can probably work on it recently O

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1560162059 if you want to see what i mean, try running `java -XX:UseAVX=0 -jar target/vectorbench.jar -p size=1024`: ``` Benchmark(size) Mode Cnt ScoreErr

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1560166430 You could use the RAMUsageEstimator code to look into VM flags. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1560167503 my concern is that if i spin up a QEMU that doesn't pass thru AVX flags in CPUID, that the same thing will happen without any java VM flags. -- This is an automated message from the Apac

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1560173828 and yeah, given the perf traps here, lets keep the 128-bit check. I am afraid of the unknown (e.g. something other than x86/arm) and i think its better to err on the side of caution and on

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1560207056 This isn't a theoretical issue, I really think its a openjdk problem? To reproduce it, i simply emulate a Nehalem cpu (the one right before AVX): ``` processor: 3 vendor_

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1560213214 As a potential workaround I tried Uwe's suggestion to look at RAMUsageEstimator code and ran it inside this VM, it works and seems to detect the situation? ``` jshell> final Object v

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-23 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1560217999 And my concern is not "oh but nehalem or others without AVX are ancient ancient hardware". It is that if you run qemu with all its defaults and don't specify `-cpu` flag, you get a VM with