Re: [PR] build: replace six simple error prone checks [lucene]

2025-06-21 Thread via GitHub


github-actions[bot] commented on PR #14831:
URL: https://github.com/apache/lucene/pull/14831#issuecomment-2993943716

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] build: replace six simple error prone checks [lucene]

2025-06-21 Thread via GitHub


rmuir opened a new pull request, #14831:
URL: https://github.com/apache/lucene/pull/14831

   Error-prone has many checks, but the tool doesn't scale well with respect to 
many checks: replace six (arbitrary) easy ones.
   
   ast-grep:
   * https://errorprone.info/bugpattern/SubstringOfZero
   * https://errorprone.info/bugpattern/JUnit4ClassAnnotationNonStatic
   * https://errorprone.info/bugpattern/JUnit4EmptyMethods
   * https://errorprone.info/bugpattern/PackageInfo
   * https://errorprone.info/bugpattern/IncorrectMainMethod
   
   forbidden-apis:
   * https://errorprone.info/bugpattern/ICCProfileGetInstance
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: replace six simple error prone checks [lucene]

2025-06-21 Thread via GitHub


rmuir commented on code in PR #14831:
URL: https://github.com/apache/lucene/pull/14831#discussion_r2160226517


##
lucene/test-framework/src/test/org/apache/lucene/tests/util/TestBeforeAfterOverrides.java:
##
@@ -30,7 +30,9 @@ public TestBeforeAfterOverrides() {
 
   public static class Before1 extends WithNestedTests.AbstractNestedTest {
 @Before
-public void before() {}
+public void before() {
+  /* intentionally left blank */
+}

Review Comment:
   their checker is kinda wimpy on these and wasn't catching these. I added a 
comment node inside each one so they wouldn't be empty anymore: hope this is 
OK. 
   
   alternatively we could add ignore entire file or try to use an 
`ast-grep-ignore:` comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-21 Thread via GitHub


dweiss commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2993509963

   > whoa, `precommit` time is ~6 minutes, and test time is ~1.2 minutes
   
   I just realized that this is also an unfair comparison because that initial 
```check -x test``` is compiling all the sources, which is a costly thing. So 
the tests are piggybacked on top of some of the work already done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Specialize `filterCompetitiveHits` when have exact 2 clauses [lucene]

2025-06-21 Thread via GitHub


HUSTERGS commented on PR #14827:
URL: https://github.com/apache/lucene/pull/14827#issuecomment-2993494222

   For what it's worth, the reason of this PR is that I find 
`filterCompetitiveHits` ocuppied about 13% of flamegraph on OrHighHigh query, 
   
![image](https://github.com/user-attachments/assets/b45154f9-a837-40e9-b8a3-49aae5d31e82)
   
   Also, `filterCompetitiveHits` calls `MathUtil.sumUpperBound` in a loop, 
seems repeatly calculate `MathUtil.sumRelativeErrorBound(numValues)`, 
(`numValues` is constant within the loop), I tried to optimize this, but it 
shows no performance difference, maybe `filterCompetitiveHits` is no longer the 
bottleneck when `numValues` > 2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


dweiss commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993523186

   Fun. https://github.com/google/google-java-format/issues/1260


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


breskeby commented on code in PR #14824:
URL: https://github.com/apache/lucene/pull/14824#discussion_r2159972822


##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/spotless/ParentGoogleJavaFormatTask.java:
##
@@ -0,0 +1,69 @@
+package org.apache.lucene.gradle.plugins.spotless;
+
+import com.google.googlejavaformat.java.Formatter;
+import com.google.googlejavaformat.java.JavaFormatterOptions;
+import org.gradle.api.DefaultTask;
+import org.gradle.api.file.ConfigurableFileCollection;
+import org.gradle.api.file.FileType;
+import org.gradle.api.file.ProjectLayout;
+import org.gradle.api.file.RegularFileProperty;
+import org.gradle.api.tasks.InputFiles;
+import org.gradle.api.tasks.Internal;
+import org.gradle.api.tasks.OutputFile;
+import org.gradle.api.tasks.PathSensitive;
+import org.gradle.api.tasks.PathSensitivity;
+import org.gradle.work.ChangeType;
+import org.gradle.work.FileChange;
+import org.gradle.work.Incremental;
+import org.gradle.work.InputChanges;
+import org.gradle.workers.WorkQueue;
+import org.gradle.workers.WorkerExecutor;
+
+import javax.inject.Inject;
+import java.io.File;
+import java.util.List;
+import java.util.stream.StreamSupport;
+
+abstract class ParentGoogleJavaFormatTask extends DefaultTask {
+@Incremental
+@InputFiles
+@PathSensitive(PathSensitivity.RELATIVE)
+public abstract ConfigurableFileCollection getSourceFiles();
+
+@OutputFile
+public abstract RegularFileProperty getOutputChangeListFile();
+
+@Inject
+protected abstract WorkerExecutor getWorkerExecutor();
+
+public ParentGoogleJavaFormatTask(ProjectLayout layout, String gjfTask) {
+
getOutputChangeListFile().convention(layout.getBuildDirectory().file("gjf-" + 
gjfTask + ".txt"));
+}
+
+protected static Formatter getFormatter() {
+JavaFormatterOptions options =
+JavaFormatterOptions.builder()
+.style(JavaFormatterOptions.Style.GOOGLE)
+.formatJavadoc(true)
+.reorderModifiers(true)
+.build();
+return new Formatter(options);
+}
+
+protected List getIncrementalBatch(InputChanges inputChanges) {
+return 
StreamSupport.stream(inputChanges.getFileChanges(getSourceFiles()).spliterator(),
 false)
+.filter(fileChange -> {
+return fileChange.getFileType() == FileType.FILE &&
+   (fileChange.getChangeType() == ChangeType.ADDED ||
+fileChange.getChangeType() == ChangeType.MODIFIED);
+})
+.map(FileChange::getFile)
+.toList();
+}
+
+@Internal
+protected WorkQueue getWorkQueue() {
+// TODO: maybe fork a separate jvm so that we can pass open-module 
settings there and fine-tune the jvm for the task?
+return getWorkerExecutor().noIsolation();
+}

Review Comment:
   Thinking about this more I wonder if you can restrict the parallization by 
using a BuildService. Again this is not exactly the simple api for restricting 
parallization but it should work. 
   Here's an example on how to use it with WorkerAPI: 
https://docs.gradle.org/current/userguide/build_services.html#using_a_build_service_with_the_worker_api
 
   
   Then you would register the service on project level and set a maxParallel 
usage. Here's an example on how we throttle things this way in the 
elasticsearch build: 
https://github.com/elastic/elasticsearch/blob/7b6bdfa323fcc7460dc252b3dbc84c6b1314fb66/build-tools/src/main/java/org/elasticsearch/gradle/testclusters/TestClustersPlugin.java#L126-L133



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-06-21 Thread via GitHub


sam-herman commented on issue #14681:
URL: https://github.com/apache/lucene/issues/14681#issuecomment-2993767101

   > Things like Vamana or product quantization can definitely be useful, but 
from what I can tell, they offer similar properties to what we already have 
with HNSW + BBQ.
   
   jVector is actually a combination of both Vamana style and HNSW hierarchy 
combined in the same graph with the option of using PQ or BQ 
construction/search.
   In terms of properties there are some unique functionalities, to name a few:
   1. **NVQ (Non-Linear vector Quantization)** - on-disk vector format which we 
will add to the plugin pretty soon as well. It exists in jVector but we haven't 
made the integration to the plugin just yet, but coming pretty soon.
   2. **Inline vectors** - Both separate (NVQ or FP) and inline storage formats 
for vectors that improve IO patterns.
   3. **Concurrency** - jVector allows for concurrent build of graph index
   
   _Note: there are upcoming additional differentiators that will be released 
soon as well, and can update once those are released._
   
   > Also, calling this a “disk-based” solution seems a bit misleading if the 
graph still has to be built fully in memory. That’s often the core problem 
people are trying to get around.
   
   if the FP vectors are not stored in memory we have noticed that the graph 
structure is overall pretty lean and can fit on the JVM heap pretty easily, 
even on low heaps.
   Moreover, another important factor is that of efficient IO access for 
scoring FP vectors even for the non quantized use cases.
   A few months back I noticed significantly more IO access in the Lucene HNSW 
format than the jVector version at the time.
   
   > Instead, I think it might be more valuable to dig into the specific 
improvements JVector brings and see if any of those could make our current HNSW 
implementation better. I thought that was already explored in 
https://github.com/apache/lucene/issues/12615, and as I recall, there wasn’t a 
strong enough differentiator to justify pulling in the full JVector approach. 
Revisiting that might be a better path forward.
   
   I think this is not much different than FAISS integration in the sense that 
it is possible to try and copy all the code over to Lucene HNSW implementation. 
That however is not always possible and even when it is, it can create 
maintenance difficulties.
   Consider the scenario of DataStax that uses jVector in a number of projects 
such as C* and OpenSearch.
   Ideally we would not want to maintain copies of the same code in various 
projects and rather have an easy path to leverage the innovation across 
projects.
   Integration of the jVector codec to Lucene helps with the problem of 
portability, as it doesn't require the overhead of maintaining OpenSearch 
plugin and allows for a more seamless integration. 
   At the same time it allows to shift more resources to innovation without 
blocking or hindering other approaches in other codecs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Specialize `filterCompetitiveHits` when have exact 2 clauses [lucene]

2025-06-21 Thread via GitHub


github-actions[bot] commented on PR #14827:
URL: https://github.com/apache/lucene/pull/14827#issuecomment-2993476199

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


breskeby commented on code in PR #14824:
URL: https://github.com/apache/lucene/pull/14824#discussion_r2159937021


##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/spotless/ParentGoogleJavaFormatTask.java:
##
@@ -0,0 +1,69 @@
+package org.apache.lucene.gradle.plugins.spotless;
+
+import com.google.googlejavaformat.java.Formatter;
+import com.google.googlejavaformat.java.JavaFormatterOptions;
+import org.gradle.api.DefaultTask;
+import org.gradle.api.file.ConfigurableFileCollection;
+import org.gradle.api.file.FileType;
+import org.gradle.api.file.ProjectLayout;
+import org.gradle.api.file.RegularFileProperty;
+import org.gradle.api.tasks.InputFiles;
+import org.gradle.api.tasks.Internal;
+import org.gradle.api.tasks.OutputFile;
+import org.gradle.api.tasks.PathSensitive;
+import org.gradle.api.tasks.PathSensitivity;
+import org.gradle.work.ChangeType;
+import org.gradle.work.FileChange;
+import org.gradle.work.Incremental;
+import org.gradle.work.InputChanges;
+import org.gradle.workers.WorkQueue;
+import org.gradle.workers.WorkerExecutor;
+
+import javax.inject.Inject;
+import java.io.File;
+import java.util.List;
+import java.util.stream.StreamSupport;
+
+abstract class ParentGoogleJavaFormatTask extends DefaultTask {
+@Incremental
+@InputFiles
+@PathSensitive(PathSensitivity.RELATIVE)
+public abstract ConfigurableFileCollection getSourceFiles();
+
+@OutputFile
+public abstract RegularFileProperty getOutputChangeListFile();
+
+@Inject
+protected abstract WorkerExecutor getWorkerExecutor();
+
+public ParentGoogleJavaFormatTask(ProjectLayout layout, String gjfTask) {
+
getOutputChangeListFile().convention(layout.getBuildDirectory().file("gjf-" + 
gjfTask + ".txt"));
+}
+
+protected static Formatter getFormatter() {
+JavaFormatterOptions options =
+JavaFormatterOptions.builder()
+.style(JavaFormatterOptions.Style.GOOGLE)
+.formatJavadoc(true)
+.reorderModifiers(true)
+.build();
+return new Formatter(options);
+}
+
+protected List getIncrementalBatch(InputChanges inputChanges) {
+return 
StreamSupport.stream(inputChanges.getFileChanges(getSourceFiles()).spliterator(),
 false)
+.filter(fileChange -> {
+return fileChange.getFileType() == FileType.FILE &&
+   (fileChange.getChangeType() == ChangeType.ADDED ||
+fileChange.getChangeType() == ChangeType.MODIFIED);
+})
+.map(FileChange::getFile)
+.toList();
+}
+
+@Internal
+protected WorkQueue getWorkQueue() {
+// TODO: maybe fork a separate jvm so that we can pass open-module 
settings there and fine-tune the jvm for the task?
+return getWorkerExecutor().noIsolation();
+}

Review Comment:
   forking the jvm is still expensive to some degree. I usually try to get away 
with isolated first if I need separate classpaths. 
   
   you should be able to do configure the jvm options like this:
   
   ```
   return getWorkerExecutor().processIsolation(spec -> {
 spec.forkOptions(javaForkOptions -> {
   javaForkOptions.jvmArgs("-Xmx2g", "-Xms2g");
 });
   });
   ```
   
   There is no build-in mechanism to restrict the parallization for a specific 
task implementation. gradle just has a global property of 
`org.gradle.workers.max` that defaults to #cpus which is used when forking jvms 
for workers, test tasks, etc etc.
   Is the point here to parallise the handling of sources in a single project 
and speed up format checks by this?
   
   If forking a process is needed I would actually rework the implementation of 
the worker api to not submit one file after another but operate on the list of 
sources within the worker. This would basically mean having one worker per 
task. It'll still really improve the paralleiism as you can run on different 
sourcesets per project. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


breskeby commented on code in PR #14824:
URL: https://github.com/apache/lucene/pull/14824#discussion_r2159937021


##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/spotless/ParentGoogleJavaFormatTask.java:
##
@@ -0,0 +1,69 @@
+package org.apache.lucene.gradle.plugins.spotless;
+
+import com.google.googlejavaformat.java.Formatter;
+import com.google.googlejavaformat.java.JavaFormatterOptions;
+import org.gradle.api.DefaultTask;
+import org.gradle.api.file.ConfigurableFileCollection;
+import org.gradle.api.file.FileType;
+import org.gradle.api.file.ProjectLayout;
+import org.gradle.api.file.RegularFileProperty;
+import org.gradle.api.tasks.InputFiles;
+import org.gradle.api.tasks.Internal;
+import org.gradle.api.tasks.OutputFile;
+import org.gradle.api.tasks.PathSensitive;
+import org.gradle.api.tasks.PathSensitivity;
+import org.gradle.work.ChangeType;
+import org.gradle.work.FileChange;
+import org.gradle.work.Incremental;
+import org.gradle.work.InputChanges;
+import org.gradle.workers.WorkQueue;
+import org.gradle.workers.WorkerExecutor;
+
+import javax.inject.Inject;
+import java.io.File;
+import java.util.List;
+import java.util.stream.StreamSupport;
+
+abstract class ParentGoogleJavaFormatTask extends DefaultTask {
+@Incremental
+@InputFiles
+@PathSensitive(PathSensitivity.RELATIVE)
+public abstract ConfigurableFileCollection getSourceFiles();
+
+@OutputFile
+public abstract RegularFileProperty getOutputChangeListFile();
+
+@Inject
+protected abstract WorkerExecutor getWorkerExecutor();
+
+public ParentGoogleJavaFormatTask(ProjectLayout layout, String gjfTask) {
+
getOutputChangeListFile().convention(layout.getBuildDirectory().file("gjf-" + 
gjfTask + ".txt"));
+}
+
+protected static Formatter getFormatter() {
+JavaFormatterOptions options =
+JavaFormatterOptions.builder()
+.style(JavaFormatterOptions.Style.GOOGLE)
+.formatJavadoc(true)
+.reorderModifiers(true)
+.build();
+return new Formatter(options);
+}
+
+protected List getIncrementalBatch(InputChanges inputChanges) {
+return 
StreamSupport.stream(inputChanges.getFileChanges(getSourceFiles()).spliterator(),
 false)
+.filter(fileChange -> {
+return fileChange.getFileType() == FileType.FILE &&
+   (fileChange.getChangeType() == ChangeType.ADDED ||
+fileChange.getChangeType() == ChangeType.MODIFIED);
+})
+.map(FileChange::getFile)
+.toList();
+}
+
+@Internal
+protected WorkQueue getWorkQueue() {
+// TODO: maybe fork a separate jvm so that we can pass open-module 
settings there and fine-tune the jvm for the task?
+return getWorkerExecutor().noIsolation();
+}

Review Comment:
   you should be able to do configure the jvm options like this:
   
   ```
   return getWorkerExecutor().processIsolation(spec -> {
 spec.forkOptions(javaForkOptions -> {
   javaForkOptions.jvmArgs("-Xmx2g", "-Xms2g");
 });
   });
   ```
   
   There is no build-in mechanism to restrict the parallization for a specific 
task implementation. gradle just has a global property of 
`org.gradle.workers.max` that defaults to #cpus which is used when forking jvms 
for workers, test tasks, etc etc.
   Is the point here to parallise the handling of sources in a single project 
and speed up format checks by this?
   
   I would actually rework the implementation of the worker api to not submit 
one file after another but operate on the list of sources within the worker. 
This would basically mean having one worker per task. It'll still really 
improve the paralleiism as you can run on different sourcesets per project. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


breskeby commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993428499

   This is pretty cool. In elasticsearch we're also struggling with spotless 
and this looks like a good way forward, also reducing third party plugin 
dependencies which have proven to be brittle in the past.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


dweiss commented on code in PR #14824:
URL: https://github.com/apache/lucene/pull/14824#discussion_r2159970826


##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/spotless/ParentGoogleJavaFormatTask.java:
##
@@ -0,0 +1,69 @@
+package org.apache.lucene.gradle.plugins.spotless;
+
+import com.google.googlejavaformat.java.Formatter;
+import com.google.googlejavaformat.java.JavaFormatterOptions;
+import org.gradle.api.DefaultTask;
+import org.gradle.api.file.ConfigurableFileCollection;
+import org.gradle.api.file.FileType;
+import org.gradle.api.file.ProjectLayout;
+import org.gradle.api.file.RegularFileProperty;
+import org.gradle.api.tasks.InputFiles;
+import org.gradle.api.tasks.Internal;
+import org.gradle.api.tasks.OutputFile;
+import org.gradle.api.tasks.PathSensitive;
+import org.gradle.api.tasks.PathSensitivity;
+import org.gradle.work.ChangeType;
+import org.gradle.work.FileChange;
+import org.gradle.work.Incremental;
+import org.gradle.work.InputChanges;
+import org.gradle.workers.WorkQueue;
+import org.gradle.workers.WorkerExecutor;
+
+import javax.inject.Inject;
+import java.io.File;
+import java.util.List;
+import java.util.stream.StreamSupport;
+
+abstract class ParentGoogleJavaFormatTask extends DefaultTask {
+@Incremental
+@InputFiles
+@PathSensitive(PathSensitivity.RELATIVE)
+public abstract ConfigurableFileCollection getSourceFiles();
+
+@OutputFile
+public abstract RegularFileProperty getOutputChangeListFile();
+
+@Inject
+protected abstract WorkerExecutor getWorkerExecutor();
+
+public ParentGoogleJavaFormatTask(ProjectLayout layout, String gjfTask) {
+
getOutputChangeListFile().convention(layout.getBuildDirectory().file("gjf-" + 
gjfTask + ".txt"));
+}
+
+protected static Formatter getFormatter() {
+JavaFormatterOptions options =
+JavaFormatterOptions.builder()
+.style(JavaFormatterOptions.Style.GOOGLE)
+.formatJavadoc(true)
+.reorderModifiers(true)
+.build();
+return new Formatter(options);
+}
+
+protected List getIncrementalBatch(InputChanges inputChanges) {
+return 
StreamSupport.stream(inputChanges.getFileChanges(getSourceFiles()).spliterator(),
 false)
+.filter(fileChange -> {
+return fileChange.getFileType() == FileType.FILE &&
+   (fileChange.getChangeType() == ChangeType.ADDED ||
+fileChange.getChangeType() == ChangeType.MODIFIED);
+})
+.map(FileChange::getFile)
+.toList();
+}
+
+@Internal
+protected WorkQueue getWorkQueue() {
+// TODO: maybe fork a separate jvm so that we can pass open-module 
settings there and fine-tune the jvm for the task?
+return getWorkerExecutor().noIsolation();
+}

Review Comment:
   I have already experimented with this (multiple files per worker). The 
problem is that they're not balanced - lucene/core has a lot more files than 
anything else, it's the long tail.
   
   I also used process isolation but this won't fly - I have 24 workers level 
on my machine and this results in dozens of forked jvms - it costs memory and 
overhead of booting up those additional JVMs. I'll experiment some more though. 
Thanks for your feedback.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Remove -XX:ActiveProcessorCount=1 from template.gradle.properties [lucene]

2025-06-21 Thread via GitHub


dweiss opened a new issue, #14829:
URL: https://github.com/apache/lucene/issues/14829

   ### Description
   
   I think this one does more harm than good, based on my observations here:
   https://github.com/apache/lucene/pull/14824#issuecomment-2993560243
   
   I suggest we update the template and remove it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


github-actions[bot] commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993562529

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] docs: fix invalid html [lucene]

2025-06-21 Thread via GitHub


rmuir merged PR #14818:
URL: https://github.com/apache/lucene/pull/14818


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [Build] Fix deprecation warning about automatic loading of test framework [lucene]

2025-06-21 Thread via GitHub


github-actions[bot] commented on PR #14828:
URL: https://github.com/apache/lucene/pull/14828#issuecomment-2993519544

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


dweiss commented on code in PR #14824:
URL: https://github.com/apache/lucene/pull/14824#discussion_r216564


##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/spotless/ParentGoogleJavaFormatTask.java:
##
@@ -0,0 +1,69 @@
+package org.apache.lucene.gradle.plugins.spotless;
+
+import com.google.googlejavaformat.java.Formatter;
+import com.google.googlejavaformat.java.JavaFormatterOptions;
+import org.gradle.api.DefaultTask;
+import org.gradle.api.file.ConfigurableFileCollection;
+import org.gradle.api.file.FileType;
+import org.gradle.api.file.ProjectLayout;
+import org.gradle.api.file.RegularFileProperty;
+import org.gradle.api.tasks.InputFiles;
+import org.gradle.api.tasks.Internal;
+import org.gradle.api.tasks.OutputFile;
+import org.gradle.api.tasks.PathSensitive;
+import org.gradle.api.tasks.PathSensitivity;
+import org.gradle.work.ChangeType;
+import org.gradle.work.FileChange;
+import org.gradle.work.Incremental;
+import org.gradle.work.InputChanges;
+import org.gradle.workers.WorkQueue;
+import org.gradle.workers.WorkerExecutor;
+
+import javax.inject.Inject;
+import java.io.File;
+import java.util.List;
+import java.util.stream.StreamSupport;
+
+abstract class ParentGoogleJavaFormatTask extends DefaultTask {
+@Incremental
+@InputFiles
+@PathSensitive(PathSensitivity.RELATIVE)
+public abstract ConfigurableFileCollection getSourceFiles();
+
+@OutputFile
+public abstract RegularFileProperty getOutputChangeListFile();
+
+@Inject
+protected abstract WorkerExecutor getWorkerExecutor();
+
+public ParentGoogleJavaFormatTask(ProjectLayout layout, String gjfTask) {
+
getOutputChangeListFile().convention(layout.getBuildDirectory().file("gjf-" + 
gjfTask + ".txt"));
+}
+
+protected static Formatter getFormatter() {
+JavaFormatterOptions options =
+JavaFormatterOptions.builder()
+.style(JavaFormatterOptions.Style.GOOGLE)
+.formatJavadoc(true)
+.reorderModifiers(true)
+.build();
+return new Formatter(options);
+}
+
+protected List getIncrementalBatch(InputChanges inputChanges) {
+return 
StreamSupport.stream(inputChanges.getFileChanges(getSourceFiles()).spliterator(),
 false)
+.filter(fileChange -> {
+return fileChange.getFileType() == FileType.FILE &&
+   (fileChange.getChangeType() == ChangeType.ADDED ||
+fileChange.getChangeType() == ChangeType.MODIFIED);
+})
+.map(FileChange::getFile)
+.toList();
+}
+
+@Internal
+protected WorkQueue getWorkQueue() {
+// TODO: maybe fork a separate jvm so that we can pass open-module 
settings there and fine-tune the jvm for the task?
+return getWorkerExecutor().noIsolation();
+}

Review Comment:
   Ah, that's an idea! This is pretty complex. Let's leave it for a follow-up, 
if it's needed. I'm quite happy with the current state - no isolation it is but 
it seems a lot faster than going through spotless. I should finish this today.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-06-21 Thread via GitHub


sam-herman commented on issue #14681:
URL: https://github.com/apache/lucene/issues/14681#issuecomment-2993725548

   > I have a working JVector implementation for Lucene ported over from the 
opensearch implementation and I have benchmarks for that version. There are 
issues like pre-cached exact vector mismatch between runs, etc that have been 
addressed in newer update. I'm working on incorporating the new changes, but 
the codebase is moving very quickly. What features are ready to be incorporated 
into Lucene JVector on opensearch-jvector's side, and aside from the feature 
requests on opensearch, is it safe to consider this latest version as a 
checkpoint for implementing the JVector codec in Lucene?
   > 
   > UPDATE:
   > 
   > Lucene JVector Codec now up to date with June 13 build of 
OpenSearch-JVector codec. Currently benchmarking
   
   @RKSPD thank you very much for taking the initiative on this one!
   I believe it should get a lot more stable right now. Especially since the 
latest plugin release as it fixed some of the core integration issues mentioned 
earlier. (Sequential vs Random writes)
   With that being said, it is a fair question to ask regarding roadmap and 
timeline of changes. There is a collection of issues right now 
[here](https://github.com/opensearch-project/opensearch-jvector/issues) but I 
agree that there is room for improvement around the predictability of the 
roadmap.
   I'll try to aggregate those under some meta issue as a first step to help 
with that. Hope that answers your question and help unblock you on this 
initiative as much as possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


rmuir commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993725482

   The runners in use might not be beefy enough for larger sizing (gc threads, 
thread pools, etc) to help?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [Build] Fix deprecation warning about automatic loading of test framework [lucene]

2025-06-21 Thread via GitHub


dweiss merged PR #14828:
URL: https://github.com/apache/lucene/pull/14828


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Specialize `filterCompetitiveHits` when have exact 2 clauses [lucene]

2025-06-21 Thread via GitHub


HUSTERGS opened a new pull request, #14827:
URL: https://github.com/apache/lucene/pull/14827

   ### Description
   This PR propose to specialize function `filterCompetitiveHits` when we have 
exact 2 scorers, in order to reduce float calculation and potential function 
calls
   
   Luceneutil result on `wikimediumall`  with `searchConcurrency=0`, 
`taskCountPerCat=5`, `taskRepeatCount=50` after 20 iterations
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
 OrHighRare  116.59  (3.1%)  116.96  
(2.7%)0.3% (  -5% -6%) 0.734
  OrHighMed   87.75  (2.4%)   88.83  
(2.6%)1.2% (  -3% -6%) 0.116
 AndHighMed   67.91  (2.3%)   69.17  
(2.2%)1.9% (  -2% -6%) 0.009
AndHighHigh   27.96  (1.4%)   28.63  
(2.0%)2.4% (  -1% -5%) 0.000
 OrHighHigh   26.16  (1.6%)   26.97  
(1.6%)3.1% (   0% -6%) 0.000
   ```
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Optimize FieldExistsQuery to leverage index statistic in DocValuesSkipper [lucene]

2025-06-21 Thread via GitHub


bugmakerr opened a new pull request, #14830:
URL: https://github.com/apache/lucene/pull/14830

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize FieldExistsQuery to leverage index statistic in DocValuesSkipper [lucene]

2025-06-21 Thread via GitHub


github-actions[bot] commented on PR #14830:
URL: https://github.com/apache/lucene/pull/14830#issuecomment-2993596164

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [Build] Fix deprecation warning about automatic loading of test framework [lucene]

2025-06-21 Thread via GitHub


dweiss commented on PR #14828:
URL: https://github.com/apache/lucene/pull/14828#issuecomment-2993693465

   > But I'm afraid they keep the check and just fail. 
   
   I think they shouldn't mess with extra arguments - maybe you can convince 
them so. It's similar to this issue -
   https://github.com/gradle/gradle/issues/11898 Messing with arguments behind 
the dev's back is crazy and not intuitive at all. Maybe the intentions are good 
but it's an evil feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


dweiss commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993698382

   With:
   ```
   -Xmx1g -XX:ReservedCodeCacheSize=256m  -XX:TieredStopAtLevel=1
   ```
   I get:
   ```
   max-workers: 8
   real 0m9.768s
   real 0m9.486s
   real 0m9.501s
   max-workers: 16
   real 0m6.540s
   real 0m6.374s
   real 0m6.367s
   max-workers: 32
   real 0m6.166s
   real 0m6.379s
   real 0m6.477s
   ```
   
   With:
   ```
   -Xmx1g -XX:ReservedCodeCacheSize=256m
   ```
   I get:
   ```
   max-workers: 8
   real 0m6173s
   real 0m6.335s
   real 0m6.074s
   max-workers: 16
   real 0m4.750s
   real 0m4.610s
   real 0m4.498s
   max-workers: 32
   real 0m4.917s
   real 0m5.099s
   real 0m5.702s
   ```
   
   With:
   ```
   -Xmx1g -XX:ReservedCodeCacheSize=256m -XX:+UseParallelGC
   ```
   I get:
   ```
   max-workers: 8
   real 0m6.042s
   real 0m6.052s
   real 0m5.873s
   max-workers: 16
   real 0m4.899s
   real 0m4.879s
   real 0m4.735s
   max-workers: 32
   real 0m5.137s
   real 0m5.070s
   real 0m5.250s
   ```
   
   So not that much of a difference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


github-actions[bot] commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993699244

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


dweiss commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993704851

   On gh runners changing these options seems to have no effect. 
   * before removing ```-XX:TieredStopAtLevel=1 -XX:ActiveProcessorCount=1```:
   
![image](https://github.com/user-attachments/assets/48dd4c1e-d114-460d-8d1b-3951d1ee3afd)
   * after removing ```-XX:TieredStopAtLevel=1 -XX:ActiveProcessorCount=1```:
   
![image](https://github.com/user-attachments/assets/46900b5f-4ebf-4744-ada0-57ade312be9c)
   
   * It is faster than on main though, so at least something. 
   
![image](https://github.com/user-attachments/assets/4937c541-8181-4c28-ae72-956a31e54bb8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


github-actions[bot] commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993705268

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


rmuir commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993683230

   @dweiss maybe remove the parallelgc along with if you are testing. It's only 
needed because with 1 CPU ergonomics will default you to SerialGC which is too 
slow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-21 Thread via GitHub


dweiss commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993560243

   Here are some benchmarks from my Linux system. 
   
   This runs the patch with varying number of workers and the default
   ```-XX:ActiveProcessorCount=1``` in gradle.properties. This is also the 
worst-case scenario of reformatting
   all files from scratch, no incremental information. My system is an Ubuntu 
AMD
   Ryzen Threadripper 3970X, 32 core.
   
   ```
   echo "With bg daemon: './gradlew clean checkGoogleJavaFormat'."
   ./gradlew -q clean 
   ./gradlew -q --stop
   for workers in 1 2 4 8 16 32; do
 echo "max-workers: $workers"
 (for i in `seq 1 3`; do time ./gradlew clean checkGoogleJavaFormat 
--max-workers $workers ; done ) 2>&1 | grep "real"
   done
   ```
   results:
   ```
   max-workers: 1
   real 0m51.099s
   real 0m42.128s
   real 0m41.845s
   max-workers: 2
   real 0m23.382s
   real 0m23.619s
   real 0m23.563s
   max-workers: 4
   real 0m15.160s
   real 0m15.877s
   real 0m15.169s
   max-workers: 8
   real 0m13.651s
   real 0m13.534s
   real 0m13.468s
   max-workers: 16
   real 0m18.783s
   real 0m18.273s
   real 0m18.813s
   max-workers: 32
   real 0m26.930s
   real 0m26.884s
   real 0m27.259s
   ```
   
   The CPU is mostly idle during most of these runs. Weird. If I remove 
```-XX:ActiveProcessorCount=1```
   from gradle.properties, I get these results:
   ```
   max-workers: 1
   real 0m43.419s
   real 0m41.613s
   real 0m41.829s
   max-workers: 2
   real 0m22.878s
   real 0m22.745s
   real 0m22.604s
   max-workers: 4
   real 0m13.625s
   real 0m13.452s
   real 0m13.608s
   max-workers: 8
   real 0m8.392s
   real 0m8.490s
   real 0m8.319s
   max-workers: 16
   real 0m6.225s
   real 0m6.929s
   real 0m6.356s
   max-workers: 32
   real 0m6.052s
   real 0m6.863s
   real 0m6.204s
   ```
   so it clearly is a benefit if you have higher core counts. It's also close 
to the lower limit
   of manually running google-java-format on all source files (they do have 
concurrent processing 
   inside).
   
   For the incremental case... it's fast enough (well, it doesn't do anything), 
even from a
   "cold" start, without any daemon in the background (the first call will show 
configuration
   time):
   ```
   echo "With bg daemon, from cold-start, incremental: './gradlew 
checkGoogleJavaFormat'."
   ./gradlew checkGoogleJavaFormat
   for workers in 2 4 8; do
 echo "max-workers: $workers"
 ./gradlew --stop -q
 (for i in `seq 1 3`; do time ./gradlew checkGoogleJavaFormat --max-workers 
$workers ; done ) 2>&1 | grep "real"
   done
   ```
   results:
   ```
   max-workers: 2
   real 0m8.953s
   real 0m2.105s
   real 0m2.002s
   max-workers: 4
   real 0m8.746s
   real 0m2.061s
   real 0m2.037s
   max-workers: 8
   real 0m8.856s
   real 0m2.054s
   real 0m1.978s
   ```
   
   Rather internal detail but shows different batch sizes of input files 
   for a constant number of workers (the default is 5):
   ```
   for batchSize in 1 2 4 8 16 32 64; do
 echo "batch size: $batchSize"
 (for i in `seq 1 3`; do time ./gradlew clean checkGoogleJavaFormat 
-Plucene.gjf.batchSize=$batchSize --max-workers 8 ; done ) 2>&1 | grep "real"
   done
   ```
   results:
   ```
   batch size: 1
   real 0m8.958s
   real 0m8.760s
   real 0m8.585s
   batch size: 2
   real 0m8.542s
   real 0m8.401s
   real 0m8.481s
   batch size: 4
   real 0m8.341s
   real 0m8.378s
   real 0m8.469s
   batch size: 8
   real 0m8.671s
   real 0m8.494s
   real 0m8.458s
   batch size: 16
   real 0m8.363s
   real 0m8.408s
   real 0m8.395s
   batch size: 32
   real 0m8.384s
   real 0m8.320s
   real 0m8.423s
   batch size: 64
   real 0m8.578s
   real 0m8.622s
   real 0m8.699s
   ```
   
   Finally, the same check for the previous, spotless-based implementation 
(main branch).
   ```
   ./gradlew --stop
   ./gradlew clean
   for workers in 1 2 4 8 16 32; do
 echo "max-workers: $workers"
 (for i in `seq 1 3`; do time ./gradlew clean spotlessJavaCheck 
--max-workers $workers ; done ) 2>&1 | grep "real"
   done
   ```
   results:
   ```
   max-workers: 1
   real 0m49.843s
   real 0m47.934s
   real 0m48.256s
   max-workers: 2
   real 0m28.170s
   real 0m27.980s
   real 0m27.851s
   max-workers: 4
   real 0m21.620s
   real 0m21.475s
   real 0m21.503s
   max-workers: 8
   real 0m21.192s
   real 0m20.962s
   real 0m20.895s
   max-workers: 16
   real 0m20.984s
   real 0m20.783s
   real 0m20.773s
   max-workers: 32
   real 0m21.290s
   real 0m21.077s
   real 0m21.037s
   ```
   
   Faster. Note I didn't do anything here - all the heavy lifting is done by 
the same implementation
   in google-java-format. The difference is in the long-tail of the longest 
operation (formatting 
   lucene/core), which is now parallel.
   
   I also toyed with removing "-XX:TieredStopAtLevel=1" from gradle.properties, 
then re-ran the benchmark:
   ```
   ./gradlew -q clean 
   ./gradlew -q --stop
   for workers in 8 16 32; do
 echo 

Re: [PR] [Build] Fix deprecation warning about automatic loading of test framework [lucene]

2025-06-21 Thread via GitHub


breskeby commented on PR #14828:
URL: https://github.com/apache/lucene/pull/14828#issuecomment-2993580446

   it will be actually interesting how this will behave in gradle 9.0. Ideally 
it would just work as junit would be found on the module path in this case 
here. But I'm afraid they keep the check and just fail. 🤞 . I'll will do some 
testing against the 9.0-rc-1 at one point. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] [Build] Fix deprecation warning about automatic loading of test framework [lucene]

2025-06-21 Thread via GitHub


breskeby opened a new pull request, #14828:
URL: https://github.com/apache/lucene/pull/14828

   ### Description
   
   This fixes deprecation warning "The automatic loading of test framework 
implementation dependencies has been deprecated." Gradle test tasks relies on 
having the test framework on its classpath as the TestWorker has dependencies 
on this.
   
   In the existing setup we had only module classpaths declared in a jvm arg 
provider which is not taken into account by Gradle when setting up the test 
worker jvm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org