[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

2023-05-19 Thread via GitHub
ChrisHegarty commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1554223493 ++ to all of what @uschindler and @rmuir said relating to which JDK versions we add support for. Specifically, let's start with JDK 20 **only**. After which we can prepare for,

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198703579 ## gradle/testing/defaults-tests.gradle: ## @@ -122,7 +122,7 @@ allprojects { // Lucene needs to optional modules at runtime, which we want to enfo

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1554440332 I refactored the provider and impl's: 1. So as to separate them out from VectorUtil - this should improve readability, etc, as we move beyond dotProduct. 2. I also moved them

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198869676 ## gradle/testing/defaults-tests.gradle: ## @@ -119,10 +119,13 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvm

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198879166 ## gradle/testing/defaults-tests.gradle: ## @@ -119,10 +119,13 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvm

[GitHub] [lucene] contrebande-labs commented on issue #12302: vector API integration, plan B

2023-05-19 Thread via GitHub
contrebande-labs commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1554466920 Guys, relax. My intention is to _**HELP**_. I was trying to find something that can be done in parallel to what @ChrisHegarty is doing so I don't step on his toes. If you w

[GitHub] [lucene] rmuir commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
rmuir commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198991426 ## gradle/testing/defaults-tests.gradle: ## @@ -119,10 +119,13 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvmArgs '-

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1199003878 ## gradle/testing/defaults-tests.gradle: ## @@ -119,10 +119,13 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvmAr

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1199008182 ## gradle/testing/defaults-tests.gradle: ## @@ -122,7 +122,7 @@ allprojects { // Lucene needs to optional modules at runtime, which we want to enforc

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1199035703 ## gradle/testing/defaults-tests.gradle: ## @@ -119,10 +119,13 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvm

[GitHub] [lucene] uschindler merged pull request #12308: Wrap Query rewrite backwards layer with AccessController

2023-05-19 Thread via GitHub
uschindler merged PR #12308: URL: https://github.com/apache/lucene/pull/12308 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1199048935 ## lucene/core/src/java/org/apache/lucene/internal/vector/DefaultVectorUtilProvider.java: ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1199053205 ## lucene/core/src/java/org/apache/lucene/internal/vector/VectorUtilProvider.java: ## @@ -76,4 +77,10 @@ static boolean vectorModulePresentAndReadable() { }

[GitHub] [lucene] tang-hi commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-19 Thread via GitHub
tang-hi commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1554708934 @msokolov, thank you! I have successfully run the test and it confirms what I mentioned earlier. I believe that #12303 by @jbellis could resolve this issue. **baseline** (this pull

[GitHub] [lucene] uschindler commented on a diff in pull request #12290: Make memory fence in `ByteBufferGuard` explicit

2023-05-19 Thread via GitHub
uschindler commented on code in PR #12290: URL: https://github.com/apache/lucene/pull/12290#discussion_r1199073297 ## lucene/core/src/java/org/apache/lucene/store/ByteBufferGuard.java: ## @@ -65,14 +62,8 @@ public ByteBufferGuard(String resourceDescription, BufferCleaner cleane

[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-19 Thread via GitHub
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1554737939 I noticed that the apijar files are quite large, because the extraction code can't remove package private superclasses. Therefore all package private classes stay alive as "empty" fra

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1554738661 I noticed that the apijar files are quite large, because the extraction code can't remove package private superclasses. Therefore all package private classes stay alive as "empty" fra

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1554742703 In addition, as we do not implement java 19 vector support yet, I would add some code to don't extract it dependning on java version. So we can control separately which of the 2 api a

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1554743815 In addition at some point we should rename the files, but thats not urgent because naming is not so important. We should then also rename the extraction gradle script, as it will be u

[GitHub] [lucene] alessandrobenedetti opened a new issue, #12313: Multi-value Support for KnnVectorField

2023-05-19 Thread via GitHub
alessandrobenedetti opened a new issue, #12313: URL: https://github.com/apache/lucene/issues/12313 ### Description It would be nice to support multiple values in a Knn vector field. This must be compatible with both the Exact and Approximate Nearest Neighbor search. There ar

[GitHub] [lucene] alessandrobenedetti opened a new pull request, #12314: Multi-value support for KnnVectorField

2023-05-19 Thread via GitHub
alessandrobenedetti opened a new pull request, #12314: URL: https://github.com/apache/lucene/pull/12314 ### Description This pull request aims to introduce support for multiple values in a single Knn vector field. The adopted solution relies on: **Index time** Sparse vector value

[GitHub] [lucene] tang-hi commented on pull request #12314: Multi-value support for KnnVectorField

2023-05-19 Thread via GitHub
tang-hi commented on PR #12314: URL: https://github.com/apache/lucene/pull/12314#issuecomment-1554789228 This concept is interesting, but I am curious about its practical uses. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [lucene] uschindler closed issue #12304: VirtualMethod does unprivileged reflection access

2023-05-19 Thread via GitHub
uschindler closed issue #12304: VirtualMethod does unprivileged reflection access URL: https://github.com/apache/lucene/issues/12304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [lucene] rmuir commented on a diff in pull request #12312: [DRAFT] GH#12176: TermInSetQuery extends AutomatonQuery

2023-05-19 Thread via GitHub
rmuir commented on code in PR #12312: URL: https://github.com/apache/lucene/pull/12312#discussion_r1199137356 ## lucene/core/src/java/org/apache/lucene/util/automaton/DaciukMihovAutomatonBuilder.java: ## @@ -308,17 +290,83 @@ private void replaceOrRegister(State state) { }

[GitHub] [lucene] rmuir commented on a diff in pull request #12312: [DRAFT] GH#12176: TermInSetQuery extends AutomatonQuery

2023-05-19 Thread via GitHub
rmuir commented on code in PR #12312: URL: https://github.com/apache/lucene/pull/12312#discussion_r1199139773 ## lucene/core/src/java/org/apache/lucene/util/automaton/DaciukMihovAutomatonBuilder.java: ## @@ -308,17 +290,83 @@ private void replaceOrRegister(State state) { }

[GitHub] [lucene] rmuir commented on pull request #12312: [DRAFT] GH#12176: TermInSetQuery extends AutomatonQuery

2023-05-19 Thread via GitHub
rmuir commented on PR #12312: URL: https://github.com/apache/lucene/pull/12312#issuecomment-1554807614 thanks for getting this started! Will be interested to see how the use of `Terms.intersect` impacts the performance. -- This is an automated message from the Apache Git Service. To respo

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1199144041 ## lucene/core/src/java/org/apache/lucene/internal/vector/DefaultVectorUtilProvider.java: ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [lucene] alessandrobenedetti commented on pull request #12314: Multi-value support for KnnVectorField

2023-05-19 Thread via GitHub
alessandrobenedetti commented on PR #12314: URL: https://github.com/apache/lucene/pull/12314#issuecomment-1554825253 I'll follow up with many clean up and tidy up on my own in the next few weeks. I should have a bit of bandwidth from now till Berlin Buzzword (mid June). Any feedback is

[GitHub] [lucene] uschindler opened a new pull request, #12315: Make sure APIJAR reproduces with different timezone

2023-05-19 Thread via GitHub
uschindler opened a new pull request, #12315: URL: https://github.com/apache/lucene/pull/12315 this PR just fixes the apijar generator to reproduce the file exactly (unfortunately java encodes the date using local timezone). -- This is an automated message from the Apache Git Service.

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1554866016 ... I had to first the reason why my computer produced a different APIJAR from beginning I HATE DEFAULT TIMEZONE, DIE; DIE; DIE -- This is an automated message from the Apache G

[GitHub] [lucene] uschindler merged pull request #12315: Make sure APIJAR reproduces with different timezone

2023-05-19 Thread via GitHub
uschindler merged PR #12315: URL: https://github.com/apache/lucene/pull/12315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1554917991 I fixed it and merged the main branch into this one. Proceeding with fixing API generator to exclude unreferenced, private classes -- This is an automated message from the Apache Gi

[GitHub] [lucene] donnerpeter opened a new pull request, #12316: hunspell (minor): reduce allocations when processing compound rules

2023-05-19 Thread via GitHub
donnerpeter opened a new pull request, #12316: URL: https://github.com/apache/lucene/pull/12316 IntelliJ's allocation profiler shows some non-zero numbers from stream allocation and traversal. While JIT might be able to eliminate this sometimes, I'd prefer to avoid the doubt completely. -

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1199198237 ## lucene/core/src/java/org/apache/lucene/internal/vector/VectorUtilProvider.java: ## @@ -76,4 +77,10 @@ static boolean vectorModulePresentAndReadable() { }

[GitHub] [lucene] benwtrent commented on pull request #12314: Multi-value support for KnnVectorField

2023-05-19 Thread via GitHub
benwtrent commented on PR #12314: URL: https://github.com/apache/lucene/pull/12314#issuecomment-1554999300 @alessandrobenedetti thank you for kick starting this! You are absolutely correct, this is a large, but pivotal and necessary change for vector search in Lucene. I have not yet r

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1199048935 ## lucene/core/src/java/org/apache/lucene/internal/vector/DefaultVectorUtilProvider.java: ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1199236929 ## lucene/core/src/java/org/apache/lucene/internal/vector/VectorUtilProvider.java: ## @@ -76,4 +77,10 @@ static boolean vectorModulePresentAndReadable() { }

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1555062236 > In addition at some point we should rename the files, but thats not urgent because naming is not so important. We should then also rename the extraction gradle script, as it will

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1199250652 ## lucene/core/src/java/org/apache/lucene/internal/vector/DefaultVectorUtilProvider.java: ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [lucene] dweiss commented on a diff in pull request #12316: hunspell (minor): reduce allocations when processing compound rules

2023-05-19 Thread via GitHub
dweiss commented on code in PR #12316: URL: https://github.com/apache/lucene/pull/12316#discussion_r1199252099 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java: ## @@ -155,7 +155,7 @@ public class Dictionary { boolean checkCompoundCase, c

[GitHub] [lucene] donnerpeter commented on a diff in pull request #12316: hunspell (minor): reduce allocations when processing compound rules

2023-05-19 Thread via GitHub
donnerpeter commented on code in PR #12316: URL: https://github.com/apache/lucene/pull/12316#discussion_r1199286526 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java: ## @@ -155,7 +155,7 @@ public class Dictionary { boolean checkCompoundCa

[GitHub] [lucene] dweiss commented on a diff in pull request #12316: hunspell (minor): reduce allocations when processing compound rules

2023-05-19 Thread via GitHub
dweiss commented on code in PR #12316: URL: https://github.com/apache/lucene/pull/12316#discussion_r1199299390 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java: ## @@ -155,7 +155,7 @@ public class Dictionary { boolean checkCompoundCase, c

[GitHub] [lucene] donnerpeter merged pull request #12316: hunspell (minor): reduce allocations when processing compound rules

2023-05-19 Thread via GitHub
donnerpeter merged PR #12316: URL: https://github.com/apache/lucene/pull/12316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

[GitHub] [lucene] RS146BIJAY commented on issue #12228: IndexWriter should clean up unreferenced files when segment merge fails due to disk full

2023-05-19 Thread via GitHub
RS146BIJAY commented on issue #12228: URL: https://github.com/apache/lucene/issues/12228#issuecomment-1555145366 As of now is there is any way User can delete these unreferenced files on their end? -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [lucene] RS146BIJAY commented on issue #12228: IndexWriter should clean up unreferenced files when segment merge fails due to disk full

2023-05-19 Thread via GitHub
RS146BIJAY commented on issue #12228: URL: https://github.com/apache/lucene/issues/12228#issuecomment-1555155442 Also Lucene provides a way to rollback to previous commit using ```rollback``` function call. But as of now, it also closes the IndexWriter as well. I think ```close``` function

[GitHub] [lucene] gsmiller commented on pull request #12312: [DRAFT] GH#12176: TermInSetQuery extends AutomatonQuery

2023-05-19 Thread via GitHub
gsmiller commented on PR #12312: URL: https://github.com/apache/lucene/pull/12312#issuecomment-1555164152 Here's what I'm seeing so far in benchmarking... I took a custom benchmarking approach for this, similar to #12151 and other related issues. I did this because, 1) we don't really

[GitHub] [lucene] rmuir commented on pull request #12312: [DRAFT] GH#12176: TermInSetQuery extends AutomatonQuery

2023-05-19 Thread via GitHub
rmuir commented on PR #12312: URL: https://github.com/apache/lucene/pull/12312#issuecomment-1555172675 hmm, disappointing. Was hoping to see gains on the terms dictionary since it optimizes `intersect`. wonder what is going on. Of course docvalues impl doesn't optimize `intersect` in

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-19 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1555188725 > > In addition at some point we should rename the files, but thats not urgent because naming is not so important. We should then also rename the extraction gradle script, as it will

[GitHub] [lucene] gsmiller commented on pull request #12312: [DRAFT] GH#12176: TermInSetQuery extends AutomatonQuery

2023-05-19 Thread via GitHub
gsmiller commented on PR #12312: URL: https://github.com/apache/lucene/pull/12312#issuecomment-1555244204 OK, here's a method profiler diff for the "High Cardinality PK" task, comparing two postings approaches—one that's using the current MultiTermQuery version, and one using AutomatonQuery

[GitHub] [lucene] gsmiller commented on pull request #12312: [DRAFT] GH#12176: TermInSetQuery extends AutomatonQuery

2023-05-19 Thread via GitHub
gsmiller commented on PR #12312: URL: https://github.com/apache/lucene/pull/12312#issuecomment-1555337627 Hmm... not sure if I've got something setup incorrectly with my JFR settings, but trying to dig into the other tasks, I can't even get the relevant methods to show up in the profiled ca

[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-19 Thread via GitHub
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1555366279 Thanks everyone for testing and fixing. I had reverted this yesterday and I believe what we have on main now has recovered the performance we had before. I also ran luceneutil a few tim

[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-19 Thread via GitHub
jbellis commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1555369584 The performance impact to building is more meaningful because that is where you are allocating large queues for multiple levels -- This is an automated message from the Apache Git Serv

[GitHub] [lucene] jainankitk opened a new issue, #12317: Option for disabling term dictionary compression

2023-05-19 Thread via GitHub
jainankitk opened a new issue, #12317: URL: https://github.com/apache/lucene/issues/12317 ### Description While working on a customer issue, I noticed that memory allocations for recently added [term dictionary compression](https://github.com/apache/lucene-solr/commit/33a7af9cbfb9f66

[GitHub] [lucene] mikemccand commented on pull request #12310: #12276: rename DaciukMihovAutomatonBuilder to StringsToAutomaton

2023-05-19 Thread via GitHub
mikemccand commented on PR #12310: URL: https://github.com/apache/lucene/pull/12310#issuecomment-1555414840 > LGTM. Apologies for the merge conflict I created for you (but thanks for the review on that PR!). No worries! I'll resolve and merge soon! Thanks for the review @gsmiller.

[GitHub] [lucene] mikemccand commented on pull request #12310: #12276: rename DaciukMihovAutomatonBuilder to StringsToAutomaton

2023-05-19 Thread via GitHub
mikemccand commented on PR #12310: URL: https://github.com/apache/lucene/pull/12310#issuecomment-1555415525 > Yeah we should explore a binary version. Even if it doesn't speedup TermInSetQuery. +1 > Pretty sure I added a comment along the lines of "we should not do this"

[GitHub] [lucene] mikemccand commented on pull request #12310: #12276: rename DaciukMihovAutomatonBuilder to StringsToAutomaton

2023-05-19 Thread via GitHub
mikemccand commented on PR #12310: URL: https://github.com/apache/lucene/pull/12310#issuecomment-1555416664 Actually, the terms must be sorted in Unicode code point order, and, we do have the builder for `BytesRef` already: `public static Automaton build(Collection input) {`. So I think we

[GitHub] [lucene] hydrogen666 commented on pull request #816: LUCENE-10519: Improvement for CloseableThreadLocal

2023-05-19 Thread via GitHub
hydrogen666 commented on PR #816: URL: https://github.com/apache/lucene/pull/816#issuecomment-1555456315 Any progress of this PR? @uschindler -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t