[GitHub] [lucene] uschindler commented on pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.

2023-05-18 Thread via GitHub
uschindler commented on PR #12299: URL: https://github.com/apache/lucene/pull/12299#issuecomment-1552603008 Isn't it a Bugfix? Because originally we had an empty Stopword in the set. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [lucene] uschindler commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
uschindler commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552692627 > The only argument you've given against a system property specifically is: > > > system property is not used appropriately (e.g. proper security checks). If there is a security

[GitHub] [lucene] uschindler commented on issue #12307: Multiple ClassNotFoundExceptions in IntelliJ Fat Jar on ARM64 Java 20

2023-05-18 Thread via GitHub
uschindler commented on issue #12307: URL: https://github.com/apache/lucene/issues/12307#issuecomment-1552710207 The problem is that you FAT JAR file does not have all MR-JAR classes. `META-INF/versions/*` is missing (this causes the ClassNotFoundExceptions). To load codecs also the Servi

[GitHub] [lucene] uschindler closed issue #12307: Multiple ClassNotFoundExceptions in IntelliJ Fat Jar on ARM64 Java 20

2023-05-18 Thread via GitHub
uschindler closed issue #12307: Multiple ClassNotFoundExceptions in IntelliJ Fat Jar on ARM64 Java 20 URL: https://github.com/apache/lucene/issues/12307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] uschindler commented on issue #12304: VirtualMethod does unprivileged reflection access

2023-05-18 Thread via GitHub
uschindler commented on issue #12304: URL: https://github.com/apache/lucene/issues/12304#issuecomment-1552722778 Hi, I did some investigation. Actually it can't be done better with `MethodHandles.Lookup` (which inherits the actual visibility privileges by the caller; my idea was to requi

[GitHub] [lucene] mikemccand commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
mikemccand commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552837338 > Just to make sure I understand the -1 correctly: The concern is that if a user can set the dimension by himself, because the Lucene version works well with this dimension at that ti

[GitHub] [lucene] uschindler opened a new pull request, #12308: Wrap Query rewrite backwards layer with AccessController

2023-05-18 Thread via GitHub
uschindler opened a new pull request, #12308: URL: https://github.com/apache/lucene/pull/12308 This fixes #12304 for query. It also adds a note to changes and puts more documentation into `VirtualMethod`. The documentation changes will be forward-ported to main branch. -- This is

[GitHub] [lucene] uschindler commented on issue #12304: VirtualMethod does unprivileged reflection access

2023-05-18 Thread via GitHub
uschindler commented on issue #12304: URL: https://github.com/apache/lucene/issues/12304#issuecomment-1552851410 See this PR: #12308 Actually to reduce impact of this more we could add an if statement in the Query constructor to not do the override/distance check for Lucene's own cla

[GitHub] [lucene] uschindler commented on pull request #12308: Wrap Query rewrite backwards layer with AccessController

2023-05-18 Thread via GitHub
uschindler commented on PR #12308: URL: https://github.com/apache/lucene/pull/12308#issuecomment-1552863855 Actually to reduce impact of this more we could add an if statement in the Query constructor to not do the override/distance check for Lucene's own classes (as we have all ported to u

[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
rmuir commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1552872873 yes, as nothing has been done yet :) I personally plan to start a branch if nobody beats me to it, but i'm stuck in a conference and without bandwidth this week. You can get an

[GitHub] [lucene] mikemccand opened a new issue, #12309: Move aKNN limits enforcement into the default Codec's KnnVectorsFormat implementation

2023-05-18 Thread via GitHub
mikemccand opened a new issue, #12309: URL: https://github.com/apache/lucene/issues/12309 ### Description [Spinoff from #12306] There have been many discussions and polls about what to do about the existing (weakly enforced) limit of aKNN vector dimensionality in Lucene.

[GitHub] [lucene] alessandrobenedetti commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
alessandrobenedetti commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552901044 As expressed in the poll, I like this idea, thanks @dsmiley for opening a draft pull request, while we collect more opinions in the poll. My main question for everyone is:

[GitHub] [lucene] mikemccand commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
mikemccand commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1552945236 > @mikemccand how can I determine which parameters the vector search task used when querying? Searching in Lucene Util for `-concurrentSearches` and `-searchThreadCount` yields few re

[GitHub] [lucene] mikemccand commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
mikemccand commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1552945912 Note that you can click & drag to zoom into the nightly chart! Very helpful when trying to isolate specific nights' builds! -- This is an automated message from the Apache Git Serv

[GitHub] [lucene] mikemccand commented on issue #12284: input automaton is too large: 1001 in Operations.topoSortStatesRecurse(Operations.java:1357)

2023-05-18 Thread via GitHub
mikemccand commented on issue #12284: URL: https://github.com/apache/lucene/issues/12284#issuecomment-1552948759 Can we close this issue now? The fix will be released in Lucene 9.7.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [lucene] mikemccand opened a new pull request, #12310: #12276: rename DaciukMihovAutomatonBuilder to StringsToAutomaton

2023-05-18 Thread via GitHub
mikemccand opened a new pull request, #12310: URL: https://github.com/apache/lucene/pull/12310 This is just a rote rename of this helpful class. I plan 10.0 only since it is technically a public API break. I added a line to MIGRATE.txt too. Closes #12276 -- This is an automat

[GitHub] [lucene] michaelwechner commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
michaelwechner commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552960080 > > Just to make sure I understand the -1 correctly: The concern is that if a user can set the dimension by himself, because the Lucene version works well with this dimension at t

[GitHub] [lucene] mikemccand commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
mikemccand commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1552973523 > > so I think the change to slice executor is a red herring and we are just missing some datapoints on the graph > > Eek -- I'll dig into why the nightly chart is misleading us

[GitHub] [lucene] mikemccand commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
mikemccand commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1552974021 > I don't understand why, but it seems to me we ought to revert this +1 -- let's revert for now and then try to understand the performance regression offline. -- This is an a

[GitHub] [lucene] ChrisHegarty opened a new pull request, #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
ChrisHegarty opened a new pull request, #12311: URL: https://github.com/apache/lucene/pull/12311 Trivially create the stubs for the Panama Vector API in the the generated 19/20 api jars. This will allow the exact JDK version of the incubating Vector API to be compiled against in JDK-version

[GitHub] [lucene] mikemccand commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-05-18 Thread via GitHub
mikemccand commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1552995243 > Nice little rewrite ChatGPT did there. ❗ ❗ ❗ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
ChrisHegarty commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1552998982 To better understand the existing "non-final JDK API stub" mechanism, I quickly put together the small set of changes that we need to get started - that generates the Vector AP

[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
ChrisHegarty commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553009202 Next I might refactor VectorUtil to be an interface, and have a different implementation for JDK 20, using a similar guard mechanism as is done for mmap. -- This is an automa

[GitHub] [lucene] mcimadamore commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-18 Thread via GitHub
mcimadamore commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1553027840 Looks good! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [lucene] dsmiley commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
dsmiley commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1553035652 There are non-mutually exclusive things that could happen to help a user wanting to work with higher dimensions; this PR focuses on **configurability** of an implementation that **alread

[GitHub] [lucene] nknize commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
nknize commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1553082121 -1 for even introducing the system property option only because I think it's too trappy. We only just released Lucene 9.6 seven days ago. Why do we need to rush to this in a PR when we ha

[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1553082966 ok, I reverted. Maybe we can scratch our heads and learn something by understanding what the difference was -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1553088549 One thing I noticed is that `NeighborQueue.clear()` does not reset `incomplete`. I don't think that is causing an issue here, but we ought to fix it. -- This is an automated message

[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1553104616 OK I think I see the problem -- when we search the upper levels of the graph we do so using topK=1. This initializes the NeighborQueue to have "initialSize=1" and therefore its LongHeap

[GitHub] [lucene] gsmiller merged pull request #12305: Minor cleanup and improvements to DaciukMihovAutomatonBuilder

2023-05-18 Thread via GitHub
gsmiller merged PR #12305: URL: https://github.com/apache/lucene/pull/12305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[GitHub] [lucene] dsmiley commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
dsmiley commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1553126457 Veto's are serious business, _balancing_ harm to the project & users against the benefit the change brings to both. Thus the change needn't be perfect in every possible way but needs to

[GitHub] [lucene] nknize commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
nknize commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1553154696 Given how simple this change is, is there harm in putting it in our back pocket as an alternative if no other option pans out closer to the 9.7 release? I don't see a justifiable reason t

[GitHub] [lucene] benwtrent commented on issue #12304: VirtualMethod does unprivileged reflection access

2023-05-18 Thread via GitHub
benwtrent commented on issue #12304: URL: https://github.com/apache/lucene/issues/12304#issuecomment-1553171092 @reta thank you for the early testing and finding this bug! Definitely my fault 🤦 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [lucene] benwtrent closed issue #12224: Find forever home for `KnnGraphTester`

2023-05-18 Thread via GitHub
benwtrent closed issue #12224: Find forever home for `KnnGraphTester` URL: https://github.com/apache/lucene/issues/12224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [lucene] benwtrent commented on issue #12224: Find forever home for `KnnGraphTester`

2023-05-18 Thread via GitHub
benwtrent commented on issue #12224: URL: https://github.com/apache/lucene/issues/12224#issuecomment-1553172357 This has been moved and completed by @msokolov's work. Closing this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [lucene] reta commented on issue #12304: VirtualMethod does unprivileged reflection access

2023-05-18 Thread via GitHub
reta commented on issue #12304: URL: https://github.com/apache/lucene/issues/12304#issuecomment-1553175397 > @reta thank you for the early testing and finding this bug! Definitely my fault facepalm Not a problem at all, thanks for quick fix folks! -- This is an automated message fr

[GitHub] [lucene] reta commented on pull request #12308: Wrap Query rewrite backwards layer with AccessController

2023-05-18 Thread via GitHub
reta commented on PR #12308: URL: https://github.com/apache/lucene/pull/12308#issuecomment-1553186693 Tested locally, the issue is gone, tests pass normally, thanks @uschindler ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [lucene] uschindler merged pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.

2023-05-18 Thread via GitHub
uschindler merged PR #12299: URL: https://github.com/apache/lucene/pull/12299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] uschindler closed issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-18 Thread via GitHub
uschindler closed issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer URL: https://github.com/apache/lucene/issues/12291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] uschindler commented on pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.

2023-05-18 Thread via GitHub
uschindler commented on PR #12299: URL: https://github.com/apache/lucene/pull/12299#issuecomment-1553192538 I will merge this to 9.x when back at home. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [lucene] alessandrobenedetti commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
alessandrobenedetti commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1553233035 > > Thus the change needn't be perfect in every possible way but needs to be a net positive. > > 💯 So lets "progress not perfection" on #12309 ? I still haven't heard a

[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
ChrisHegarty commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553232796 I updates https://github.com/apache/lucene/pull/12311, with a basic `dotProduct(float[], float[])`. Maybe this is a reasonable place to start. -- This is an automated message

[GitHub] [lucene] tang-hi commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
tang-hi commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1553237625 I've noticed that we create a NeighborQueue with an initialSize set to topK. For instance, if topK is 100, the maximum size of LongHeap is also 100. However, when we execute the searchLa

[GitHub] [lucene] tang-hi commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
tang-hi commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1553239095 > I've noticed that we create a NeighborQueue with an initialSize set to topK. For instance, if topK is 100, the maximum size of LongHeap is also 100. However, when we execute the search

[GitHub] [lucene] nknize commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
nknize commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1553249034 > With no configurability we currently leave only option 3 to our users anyway because we decided that 1024 is a golden limit for performance to be acceptable. But this isn't a per

[GitHub] [lucene] tang-hi commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
tang-hi commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1553253776 > I tried running luceneutil before/after this change using this command: > > ``` > comp = competition.Competition() > > index = comp.newIndex('baseline', sourceData,

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1553298071 Thanks @rmuir - I just merged in your implementation. I think that it's a much much better starting (if not the final) place. This might be a reasonable minimal point to start from.

[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
ChrisHegarty commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553314511 BTW, and not even remotely suggested - I'm not trying to take over this work, just sketch out a few concrete things to help get it started. Whatever collaboration mechanism is

[GitHub] [lucene] dsmiley commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
dsmiley commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1553317548 > A user configures this system property, indexes a 10Billion dimension vector. We can block that with a maximum of 2048, the tested and thus supported maximum limit. I'm presen

[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
jbellis commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1553325809 @tang-hi you're right, that explains the discrepancy. The change at #12303 should fix that -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
rmuir commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553342721 thank you for getting it started: it is great. Lets just iterate forwards with your branch? personally ive never had an issue collaborating on a fork like this (such as [ChrisH

[GitHub] [lucene] tang-hi commented on a diff in pull request #12303: Address HNSW Searcher performance regression

2023-05-18 Thread via GitHub
tang-hi commented on code in PR #12303: URL: https://github.com/apache/lucene/pull/12303#discussion_r1198060903 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -179,7 +182,7 @@ public static NeighborQueue search( } eps[0] = results

[GitHub] [lucene] tang-hi commented on pull request #12303: Address HNSW Searcher performance regression

2023-05-18 Thread via GitHub
tang-hi commented on PR #12303: URL: https://github.com/apache/lucene/pull/12303#issuecomment-1553347419 may be we could solved the bug that @msokolov found in this PR? > One thing I noticed is that NeighborQueue.clear() does not reset incomplete. I don't think that is causing an issu

[GitHub] [lucene] uschindler commented on pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1553366480 I'd prefer to have separate apijars, because the current code compiles with patching base module. I'd like to separate this. But as a start it is ok. -- This is an automated

[GitHub] [lucene] uschindler commented on pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1553373978 On the other hand: it just works! 😉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] uschindler commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
uschindler commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553375573 By default all Lucene committers can commit to your branch. This is enabled by default and you had to agree when creating the PR. -- This is an automated message from the Apach

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198085290 ## gradle/generation/panama-foreign.gradle: ## @@ -45,13 +45,14 @@ configure(project(":lucene:core")) { javaLauncher.get() return true

[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
ChrisHegarty commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553402537 Ok, cool. Committers please commit directly. Consider the branch in my personal fork as our shared place for collaboration. Let me know if you encounter any issues. -- This

[GitHub] [lucene] uschindler commented on pull request #12308: Wrap Query rewrite backwards layer with AccessController

2023-05-18 Thread via GitHub
uschindler commented on PR #12308: URL: https://github.com/apache/lucene/pull/12308#issuecomment-1553429260 Thanks for the feedback. In the meantime I have tried out some code to "optimize" the case of Lucene's own classes. If the classname starts with "org.apache.lucene.", it will as

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198137116 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -270,4 +216,134 @@ public static float dotProductScore(byte[] a, byte[] b) { float denom

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198138249 ## lucene/core/src/java20/org/apache/lucene/util/JDKVectorUtilProvider.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198140102 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -270,4 +216,134 @@ public static float dotProductScore(byte[] a, byte[] b) { float denom

[GitHub] [lucene] msokolov commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-18 Thread via GitHub
msokolov commented on PR #12255: URL: https://github.com/apache/lucene/pull/12255#issuecomment-1553458070 @tang-hi you need to switch the sourceData to use `wikivector1m` - not sure why the script got left that way. The sourceData defines not only the data (documents) but also the tasks tha

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198170915 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -270,4 +216,134 @@ public static float dotProductScore(byte[] a, byte[] b) { float denom

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198188544 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -270,4 +216,134 @@ public static float dotProductScore(byte[] a, byte[] b) { float deno

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198189573 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -270,4 +216,134 @@ public static float dotProductScore(byte[] a, byte[] b) { float deno

[GitHub] [lucene] uschindler commented on pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1553497986 Is vector's FMA also always slow (does it use BigDecimal, too?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [lucene] reta commented on a diff in pull request #12308: Wrap Query rewrite backwards layer with AccessController

2023-05-18 Thread via GitHub
reta commented on code in PR #12308: URL: https://github.com/apache/lucene/pull/12308#discussion_r1198196299 ## lucene/core/src/java/org/apache/lucene/search/Query.java: ## @@ -50,8 +52,30 @@ public abstract class Query { new VirtualMethod<>(Query.class, "rewrite", IndexR

[GitHub] [lucene] uschindler commented on a diff in pull request #12308: Wrap Query rewrite backwards layer with AccessController

2023-05-18 Thread via GitHub
uschindler commented on code in PR #12308: URL: https://github.com/apache/lucene/pull/12308#discussion_r1198197916 ## lucene/core/src/java/org/apache/lucene/search/Query.java: ## @@ -50,8 +52,30 @@ public abstract class Query { new VirtualMethod<>(Query.class, "rewrite",

[GitHub] [lucene] contrebande-labs commented on issue #12307: Multiple ClassNotFoundExceptions in IntelliJ Fat Jar on ARM64 Java 20

2023-05-18 Thread via GitHub
contrebande-labs commented on issue #12307: URL: https://github.com/apache/lucene/issues/12307#issuecomment-1553509065 I just hoped I could make a very common use case work with Lucene... -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [lucene] contrebande-labs commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
contrebande-labs commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553571328 > @ChrisHegarty [said](https://github.com/apache/lucene/issues/12302#issuecomment-1553402537) Ok, cool. Committers please commit directly. Consider the branch in my persona

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198252072 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -270,4 +216,134 @@ public static float dotProductScore(byte[] a, byte[] b) { float deno

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198253248 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -270,4 +216,134 @@ public static float dotProductScore(byte[] a, byte[] b) { float deno

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198253504 ## gradle/generation/panama-foreign.gradle: ## @@ -45,13 +45,14 @@ configure(project(":lucene:core")) { javaLauncher.get() return true

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
ChrisHegarty commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1198252297 ## lucene/core/src/java20/org/apache/lucene/util/JDKVectorUtilProvider.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1553583461 > I'd prefer to have separate apijars, because the current code compiles with patching base module. > On the other hand: it just works! 😉 Yeah, this is a bit of a hack!. It

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Include the Panama Vector API stubs in the generated 19/20 api jars

2023-05-18 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1553586455 > Is vector's FMA also always slow (does it use BigDecimal, too?). I dunno what it does - I haven't looked - but I doubt it falls back to BD. I'll take a look and do some expe

[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
rmuir commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553605796 I don't think there is any official plan or anything here. Sorry I don't really have any answers. I dont think there is any rule on which jdk versions are supported. For mmap,

[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
rmuir commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553609151 > I'm not a commiter. Should I fork your fork and do PRs on it? And is [this the branch](https://github.com/ChrisHegarty/lucene/tree/panama_vector) we should base my work on? I

[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
rmuir commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553609725 Sorry for the chaotic typos/answers, on a phone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] alessandrobenedetti commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
alessandrobenedetti commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1553613823 Mmm where is 2048 coming from? I am generally not a big fan of magic numbers, but happy to change my mind if there's any rationale behind it -- This is an automated mess

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-18 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1553618970 I really wish Math.fma fell back to sane behavior such as */+ and only StrictMath.fma did the slow big decimal stuff! Not good decisionmaking here on these apis. -- This is an automate

[GitHub] [lucene] dsmiley commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
dsmiley commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1553633051 All limits are magic numbers :-). Maybe you mean, lets add a bit of docs in the code to reflect why it is what it is? From some of the conversation threads, I recall 2048 is the highes

[GitHub] [lucene] rmuir commented on pull request #12310: #12276: rename DaciukMihovAutomatonBuilder to StringsToAutomaton

2023-05-18 Thread via GitHub
rmuir commented on PR #12310: URL: https://github.com/apache/lucene/pull/12310#issuecomment-1553633433 Yeah we should explore a binary version. Even if it doesn't speedup TermInSetQuery. TermInSetQuery has/had a super trappy visitor method that builds an automaton from it's sorted te

[GitHub] [lucene] alessandrobenedetti commented on pull request #12306: Make MAX_DIMENSIONS configurable via a system property.

2023-05-18 Thread via GitHub
alessandrobenedetti commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1553641166 I mean, yeah, I hate limits that are not related to typing constraint, and also in that case, I agree they are magic numbers behind the scenes :) Still, I would questi

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-18 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1553665376 > > I'd prefer to have separate apijars, because the current code compiles with patching base module. > > On the other hand: it just works! 😉 > > Yeah, this is a bit of a hac

[GitHub] [lucene] uschindler commented on pull request #12308: Wrap Query rewrite backwards layer with AccessController

2023-05-18 Thread via GitHub
uschindler commented on PR #12308: URL: https://github.com/apache/lucene/pull/12308#issuecomment-1553715929 Reverted it. I left the fixes of a test calling the deprecated rewrite. I will merge this tomorrow and forward-port the documentation changes to main. -- This is an automated

[GitHub] [lucene] uschindler commented on issue #12304: VirtualMethod does unprivileged reflection access

2023-05-18 Thread via GitHub
uschindler commented on issue #12304: URL: https://github.com/apache/lucene/issues/12304#issuecomment-1553717297 @benwtrent I don't think it is your fault. The documentation did not even suggest to use AccessController. -- This is an automated message from the Apache Git Service. To respo

[GitHub] [lucene] gsmiller opened a new pull request, #12312: [DRAFT] GH#12176: TermInSetQuery extends AutomatonQuery

2023-05-18 Thread via GitHub
gsmiller opened a new pull request, #12312: URL: https://github.com/apache/lucene/pull/12312 ### Description I started experimenting with #12176 to see if we can get any benefits out of having `TermInSetQuery` extend `AutomatonQuery` instead of `MultiTermQuery`. I'm opening this only

[GitHub] [lucene] contrebande-labs commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
contrebande-labs commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-155382 From what I can see, the classes/files modified so far are: * [VectorUtil](https://github.com/ChrisHegarty/lucene/blob/panama_vector/lucene/core/src/java/org/apache/l

[GitHub] [lucene] uschindler commented on issue #12302: vector API integration, plan B

2023-05-18 Thread via GitHub
uschindler commented on issue #12302: URL: https://github.com/apache/lucene/issues/12302#issuecomment-1554052266 I don't understand what's your intention is. The general setup is there. No infrastructure work needed anymore. The decision which variants of the code life in parallel is unrela