Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-30 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1786511544 FYI I pushed a commit https://github.com/apache/lucene/commit/4576ae09e8885f40cc27424fa8d529aa5c172422 to fix the behavior that vector writer close the executor passed in -- This is an

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786498331 Also it was just previously confusing to see stuff like vector benchmark results with 128-bit arm vectors going 8x faster than 32-bit floats which makes no logical sense. e.g. with this ch

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786433389 I tried naively writing the logic like this with a couple N (8, 16, 32,etc) with FMA both off and on to see if I can baby this compiler to vectorize, nope, nothing. I don't think autovect

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-30 Thread via GitHub
rmuir commented on code in PR #12737: URL: https://github.com/apache/lucene/pull/12737#discussion_r1377027128 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultVectorUtilSupport.java: ## @@ -17,72 +17,46 @@ package org.apache.lucene.internal.vectorizatio

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786395055 e.g. for dotproduct case, with this patch, despite there being no data dependencies, compiler literally does 4 `VFMADD*SS` in the loop with different xmm registers. Instead of just doing 1

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786392160 with all the data dependencies removed, i also gave at least one stab trying to see if i could trick the compiler into using packed instructions instead of single floats... would be awesom

Re: [PR] Fix NullPointerException in Monitor.getQuery when query is not present [lucene]

2023-10-30 Thread via GitHub
romseygeek commented on PR #12736: URL: https://github.com/apache/lucene/pull/12736#issuecomment-1786088337 This looks great, thank you @daviscook477! Would you be able to add an entry to CHANGES.txt under the 9.9.0 release? -- This is an automated message from the Apache Git Service. T

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1786059581 @jpountz updated. Flat is no longer pluggable, two HNSW formats are exposed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] Fix NullPointerException in Monitor.getQuery when query is not present [lucene]

2023-10-30 Thread via GitHub
daviscook477 opened a new pull request, #12736: URL: https://github.com/apache/lucene/pull/12736 ### Description The [javadoc for Monitor.getQuery](https://github.com/apache/lucene/blob/a0887c7d26df6c9f32afcf8e9f0ff66275115f92/lucene/monitor/src/java/org/apache/lucene/monitor/Monitor.java

Re: [PR] fix explicit type declaration [lucene-solr]

2023-10-30 Thread via GitHub
nvnmandadhi closed pull request #399: fix explicit type declaration URL: https://github.com/apache/lucene-solr/pull/399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] DaciukMihovAutomatonBuilder#build should probably take a List instead of a Collection [lucene]

2023-10-30 Thread via GitHub
gsmiller closed issue #12319: DaciukMihovAutomatonBuilder#build should probably take a List instead of a Collection URL: https://github.com/apache/lucene/issues/12319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-10-30 Thread via GitHub
gsmiller merged PR #12427: URL: https://github.com/apache/lucene/pull/12427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-10-30 Thread via GitHub
gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1785924607 +1 looks good to me as well. I like that this small change, 1) makes the API a little more general, allowing users to provide any Iterable instead of Collection, and 2) adds an explicit

Re: [PR] Upgrade dependencies to address more CVEs [lucene-solr]

2023-10-30 Thread via GitHub
risdenk merged PR #2681: URL: https://github.com/apache/lucene-solr/pull/2681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
jpountz commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785712360 Thanks, splitting the way you describe would make me happy. I had not understood that the flat codec was a goal. Now that I think more about it, I wonder if we should better separa

Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-30 Thread via GitHub
mikemccand commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1785695227 > Perhaps instead of UnCompiledNode, we could encode it as byte-array (could take the same format as the FST-encoded binary, but the FST operation works on absolute address value

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
asfgit merged PR #12731: URL: https://github.com/apache/lucene/pull/12731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785563758 OK, @jpountz thinking about it more. To do what you are suggesting, I think the following would work: - Force Lucene99HnswVectorsReader & Lucene99HnswVectorsWriter to take a `F

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785549757 > I think the Panama API should allow the user to figure out how many parallel units are available to somehow dynamically split work correctly. I'm not even sure openjdk/hotspot know

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
uschindler commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785453474 > Last time i tried to figure out WTF was happening here, I think i determined that floating point reproducibility was still preventing this from happening? That there isn't like a "b

Re: [PR] Add a specialized bulk scorer for regular conjunctions. [lucene]

2023-10-30 Thread via GitHub
jpountz merged PR #12719: URL: https://github.com/apache/lucene/pull/12719 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785376897 @jpountz the goal of this change is not just making code reusable. But: - Allowing folks who don't want HNSW to take advantage of the per-segment quantization and logic. Paging

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
jpountz commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1376303898 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws

Re: [I] Should we handle negative scores due to floating point arithmetic errors? [lucene]

2023-10-30 Thread via GitHub
benwtrent closed issue #12700: Should we handle negative scores due to floating point arithmetic errors? URL: https://github.com/apache/lucene/issues/12700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-30 Thread via GitHub
benwtrent merged PR #12727: URL: https://github.com/apache/lucene/pull/12727 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Return the same input vector if its a unit vector in VectorUtil#l2normalize [lucene]

2023-10-30 Thread via GitHub
benwtrent merged PR #12726: URL: https://github.com/apache/lucene/pull/12726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-30 Thread via GitHub
jpountz commented on code in PR #12727: URL: https://github.com/apache/lucene/pull/12727#discussion_r1376230730 ## lucene/core/src/test/org/apache/lucene/util/TestVectorUtil.java: ## @@ -115,6 +116,21 @@ public void testNormalizeZeroThrows() { expectThrows(IllegalArgumentEx

Re: [PR] Return the same input vector if its a unit vector in VectorUtil#l2normalize [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12726: URL: https://github.com/apache/lucene/pull/12726#issuecomment-1785207766 @shubhamvishu I will merge and backport today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Add back maxConn & beamWidth HNSW codec ctor [lucene]

2023-10-30 Thread via GitHub
benwtrent merged PR #12728: URL: https://github.com/apache/lucene/pull/12728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785178856 Last time i tried to figure out WTF was happening here, I think i determined that floating point reproducibility was still preventing this from happening? That there isn't like a "bail out

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785163931 > .. and yes (I've not forgotten), we need something like a `java.lang.Architecture/Platform`, that is queryable for such low-level support (rather than resorting to beans - which actually

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-30 Thread via GitHub
jpountz commented on PR #12727: URL: https://github.com/apache/lucene/pull/12727#issuecomment-1785162374 I had a suspicion that the double promotion is not buying us anything in that case, so I ran a quick test that seems to confirm it: ```java long equals = 0; long notEquals =

Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-30 Thread via GitHub
dungba88 commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1785150257 I ran a small test to see what RAM would be needed for some sample dictionary using a simple `LinkedHashMap`: 6MB Cache size 62457 items 977KB FST size The repor

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785145823 > ha! So just removing the overly aggressive unrolling in cosine improves things. well, only in combination with switch to FMA. seems then its able to keep cpu busy multiplying.

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12727: URL: https://github.com/apache/lucene/pull/12727#issuecomment-1785060096 @ChrisHegarty added a test for verifying VectorSimilarityFunction returns scores `>= 0`. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-30 Thread via GitHub
dweiss commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1784860286 I'd check whether there's much gain from the switch first. Fill-up-then-discard caches often perform quite well and allow for much easier/faster implementation (both in terms of GC o

[PR] Clean up inputCount [lucene]

2023-10-30 Thread via GitHub
dungba88 opened a new pull request, #12735: URL: https://github.com/apache/lucene/pull/12735 ### Description Clean-up inputCount as it no longer has an active use -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use