Re: [PR] style: fix sources to conform to .editorconfig (basic fixes only) [lucene]

2025-06-20 Thread via GitHub


dweiss commented on PR #14820:
URL: https://github.com/apache/lucene/pull/14820#issuecomment-2990157752

   If we want to run it on a CI, I think the gh action seems to be the easiest. 
Alternatively, running docker directly but it requires setting up git 
safe.directory because eclint uses git to list the files to validate.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


mikemccand commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2990893252

   Thanks @dweiss -- I think `precommit` used to do all checks but not run 
tests?  The [nightly benchy currently separately measures test time, from the 
rest of `check` time](https://benchmarks.mikemccandless.com/antcleantest.html) 
(whoa, `precommit` time is ~6 minutes, and test time is ~1.2 minutes).
   
   Oh it looks like I can maybe do `./gradlew check -x test`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


uschindler commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2990935968

   I also liked "precommit" as a task name. So maybe we can keep that for 
convenience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] style: fix sources to conform to .editorconfig (basic fixes only) [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14820:
URL: https://github.com/apache/lucene/pull/14820#issuecomment-2990972979

   I'm guessing its not on the approved list at apache though. The action just 
uses their docker image.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] style: fix sources to conform to .editorconfig (basic fixes only) [lucene]

2025-06-20 Thread via GitHub


rmuir commented on code in PR #14820:
URL: https://github.com/apache/lucene/pull/14820#discussion_r2158683763


##
build-tools/build-infra/src/main/groovy/lucene.regenerate.gradle:
##
@@ -219,7 +219,8 @@ configure([
   // Recompute checksums after the task has completed and write them.
   def updatedChecksums = computeChecksummedEntries(sourceTask)
   checksumsFile.setText(
-  JsonOutput.prettyPrint(JsonOutput.toJson(new TreeMap(updatedChecksums))), "UTF-8")
+  JsonOutput.prettyPrint(JsonOutput.toJson(new TreeMap(updatedChecksums))) + '\n', "UTF-8")
 

Review Comment:
   This is why I think it is bad to have inconsistent max_line_length set in 
the editorconfig. Any time I edit gradle you are going to see a change like 
this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] spotlessGradleScripts doesn't work with whitespace-paths on Windows [lucene]

2025-06-20 Thread via GitHub


rmuir commented on issue #14787:
URL: https://github.com/apache/lucene/issues/14787#issuecomment-2990990467

   Otherwise we make rust/treesitter-based one with 
https://github.com/tweag/topiary
   The groovy parser isn't the greatest, but it is doable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] spotlessGradleScripts doesn't work with whitespace-paths on Windows [lucene]

2025-06-20 Thread via GitHub


dweiss commented on issue #14787:
URL: https://github.com/apache/lucene/issues/14787#issuecomment-2990439462

   Nice. Not too fast but works quite well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] style: fix sources to conform to .editorconfig (basic fixes only) [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14820:
URL: https://github.com/apache/lucene/pull/14820#issuecomment-2991004830

   I'm gonna land this one (it was extremely annoying to get to this stage with 
the large DFA regeneration etc), and followup with tooling on a separate PR. If 
some problems sneak in between those times, I'll fix them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] style: fix sources to conform to .editorconfig (basic fixes only) [lucene]

2025-06-20 Thread via GitHub


rmuir merged PR #14820:
URL: https://github.com/apache/lucene/pull/14820


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] style: fix sources to conform to .editorconfig (basic fixes only) [lucene]

2025-06-20 Thread via GitHub


dweiss commented on code in PR #14820:
URL: https://github.com/apache/lucene/pull/14820#discussion_r2158228546


##
build-tools/build-infra/src/main/groovy/lucene.regenerate.gradle:
##
@@ -219,7 +219,8 @@ configure([
   // Recompute checksums after the task has completed and write them.
   def updatedChecksums = computeChecksummedEntries(sourceTask)
   checksumsFile.setText(
-  JsonOutput.prettyPrint(JsonOutput.toJson(new TreeMap(updatedChecksums))), "UTF-8")
+  JsonOutput.prettyPrint(JsonOutput.toJson(new TreeMap(updatedChecksums))) + '\n', "UTF-8")
 

Review Comment:
   This is mildly surprising. Passes with greclipse and both versions are "ok" 
for it but I wonder if conflicts between the rules for different systems 
(editorconfig, spotless) won't cause headaches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2991257003

   I switched to the `check -x test` long ago because I found `precommit` 
confusing: these days this term indicates fast checks that run (typically only 
on changed files) as part of a `.git/hooks/pre-commit` hook. They aren't 
supposed to take minutes. I don't use them personally but manage them for 
others.
   
   https://pre-commit.com/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Decrease TieredMergePolicy's default number of segments per tier to 8. [lucene]

2025-06-20 Thread via GitHub


jpountz opened a new pull request, #14823:
URL: https://github.com/apache/lucene/pull/14823

   `TieredMergePolicy` currently allows 10 segments per tier. With Lucene being 
increasingly deployed with separate indexing and search tiers that get updated 
via segment-based replication, I believe that it would make sense for Lucene to 
have more aggressive merging defaults, a price that is only paid once on the 
indexing tier, but that benefits all search nodes that serve queries for this 
index.
   
   Note that this is still a somewhat conservative default, applications with 
low latency requirements and low update rates will likely want to go even 
further, with 4 segments per tier, or even 2.
   
   `BaseMergePolicyTestCase#testSimulateAppendOnly` reports a write 
amplification increase from 3.4 to 3.8, while 
`BaseMergePolicyTestCase#testSimulateUpdates` reports a write amplification 
increase from 4.5 to 4.9. In exchange, the number of segments between the floor 
and max segment sizes decreases by about 20%.
   
   This should especially help queries that have a high per-segment overhead: 
PK lookups, point queries, multi-term queries and vector searches.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Decrease TieredMergePolicy's default number of segments per tier to 8. [lucene]

2025-06-20 Thread via GitHub


github-actions[bot] commented on PR #14823:
URL: https://github.com/apache/lucene/pull/14823#issuecomment-2991280830

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


uschindler commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2991110032

   Is it possible in Gradle to define a task that does "check -test"?
   
   I agree that hard coding precommit as clone of as ll check tasks without 
running tests is not good.
   
   But the precommit was always an easy fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] style: disable max-line-length in editorconfig [lucene]

2025-06-20 Thread via GitHub


rmuir opened a new pull request, #14822:
URL: https://github.com/apache/lucene/pull/14822

   This is set for all files and causes issues where editors will wrap 
inconsistently with the formatter. It will happen even with java code: because 
spotless ignores this restriction for comments, as an example.
   
   Best to leave such settings to tools like spotless which are AST-aware.
   
   Relates: #14819
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] style: disable max-line-length in editorconfig [lucene]

2025-06-20 Thread via GitHub


github-actions[bot] commented on PR #14822:
URL: https://github.com/apache/lucene/pull/14822#issuecomment-2991155854

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-20 Thread via GitHub


dweiss commented on code in PR #14824:
URL: https://github.com/apache/lucene/pull/14824#discussion_r2159163732


##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/spotless/ParentGoogleJavaFormatTask.java:
##
@@ -0,0 +1,69 @@
+package org.apache.lucene.gradle.plugins.spotless;
+
+import com.google.googlejavaformat.java.Formatter;
+import com.google.googlejavaformat.java.JavaFormatterOptions;
+import org.gradle.api.DefaultTask;
+import org.gradle.api.file.ConfigurableFileCollection;
+import org.gradle.api.file.FileType;
+import org.gradle.api.file.ProjectLayout;
+import org.gradle.api.file.RegularFileProperty;
+import org.gradle.api.tasks.InputFiles;
+import org.gradle.api.tasks.Internal;
+import org.gradle.api.tasks.OutputFile;
+import org.gradle.api.tasks.PathSensitive;
+import org.gradle.api.tasks.PathSensitivity;
+import org.gradle.work.ChangeType;
+import org.gradle.work.FileChange;
+import org.gradle.work.Incremental;
+import org.gradle.work.InputChanges;
+import org.gradle.workers.WorkQueue;
+import org.gradle.workers.WorkerExecutor;
+
+import javax.inject.Inject;
+import java.io.File;
+import java.util.List;
+import java.util.stream.StreamSupport;
+
+abstract class ParentGoogleJavaFormatTask extends DefaultTask {
+@Incremental
+@InputFiles
+@PathSensitive(PathSensitivity.RELATIVE)
+public abstract ConfigurableFileCollection getSourceFiles();
+
+@OutputFile
+public abstract RegularFileProperty getOutputChangeListFile();
+
+@Inject
+protected abstract WorkerExecutor getWorkerExecutor();
+
+public ParentGoogleJavaFormatTask(ProjectLayout layout, String gjfTask) {
+
getOutputChangeListFile().convention(layout.getBuildDirectory().file("gjf-" + 
gjfTask + ".txt"));
+}
+
+protected static Formatter getFormatter() {
+JavaFormatterOptions options =
+JavaFormatterOptions.builder()
+.style(JavaFormatterOptions.Style.GOOGLE)
+.formatJavadoc(true)
+.reorderModifiers(true)
+.build();
+return new Formatter(options);
+}
+
+protected List getIncrementalBatch(InputChanges inputChanges) {
+return 
StreamSupport.stream(inputChanges.getFileChanges(getSourceFiles()).spliterator(),
 false)
+.filter(fileChange -> {
+return fileChange.getFileType() == FileType.FILE &&
+   (fileChange.getChangeType() == ChangeType.ADDED ||
+fileChange.getChangeType() == ChangeType.MODIFIED);
+})
+.map(FileChange::getFile)
+.toList();
+}
+
+@Internal
+protected WorkQueue getWorkQueue() {
+// TODO: maybe fork a separate jvm so that we can pass open-module 
settings there and fine-tune the jvm for the task?
+return getWorkerExecutor().noIsolation();
+}

Review Comment:
   I'd love to fork a separate jvm here - where we can set some options that 
open up modules, etc. The problem is - there is no way to control how many 
isolated processes this forks. On high-end multi-core machines I see dozens of 
forked java processes - this doesn't make sense and is super heavy. Is there a 
way to control the concurrency of the isolated worker executor, @breskeby ?



##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/spotless/CheckGoogleJavaFormatTask.java:
##
@@ -0,0 +1,195 @@
+package org.apache.lucene.gradle.plugins.spotless;
+
+import com.github.difflib.text.DiffRow;
+import com.github.difflib.text.DiffRowGenerator;
+import com.google.googlejavaformat.java.FormatterException;
+import com.google.googlejavaformat.java.ImportOrderer;
+import com.google.googlejavaformat.java.JavaFormatterOptions;
+import com.google.googlejavaformat.java.RemoveUnusedImports;
+import java.io.File;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.io.Writer;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import javax.inject.Inject;
+import org.gradle.api.GradleException;
+import org.gradle.api.file.ConfigurableFileCollection;
+import org.gradle.api.file.FileSystemOperations;
+import org.gradle.api.file.ProjectLayout;
+import org.gradle.api.file.RegularFileProperty;
+import org.gradle.api.tasks.TaskAction;
+import org.gradle.work.DisableCachingByDefault;
+import org.gradle.work.InputChanges;
+import org.gradle.workers.WorkAction;
+import org.gradle.workers.WorkParameters;
+import org.gradle.workers.WorkQueue;
+
+@DisableCachingByDefault
+public abstract class CheckGoogleJavaFormatTask extends 
ParentGoogleJavaFormatTask {
+
+  @Inject
+  protected abstract FileSystemOperations getFilesystemOperations();
+
+  @Inject
+  public CheckGoogleJavaF

Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-20 Thread via GitHub


dweiss commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2991939497

   ```
   > time ./gradlew clean checkGoogleJavaFormat  --no-daemon
   real0m13.884s
   user1m14.370s
   sys 0m8.233s
   
   > time ./gradlew clean spotlessJavaCheck --no-daemon
   real0m29.668s
   user1m27.121s
   sys 0m8.733s
   ```
   
   I had to remove ```-XX:ActiveProcessorCount=1``` from gradle.properties for 
the above. I think this setting for some reason makes gradle sluggish and 
unpredictable (the timings vary a lot).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


mikemccand commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2991458163

   Thank you @dweiss for working so hard to improve our build tooling!
   
   It is indeed "cold" (tests and check -x tests is nearly the first thing 
nightly benchy does), but I dunno maybe a gradle daemon or 7 is/are still 
running from last time so it's warm ish?
   
   I think I got nightly benchy started again (one off) just now after 
switching to `./gradlew check -x test` -- let's see how the timings look in ~14 
hours or so.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


dweiss commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2991738574

   I don't know what the cryptic step was needed for. Remove it. If it doesn't 
work, it needs to be fixed.
   
   git clean -xfd is brutal - it'll recompile everything from scratch. I think 
it has benefits for the benchmark though, so I'd leave it in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-20 Thread via GitHub


msokolov commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2991834140

   > I would appreciate it if you enable emacs's support for editorconfig and 
let us know if it was terrible/bad; you seem worried about it for some reason.
   
   I would rather do my manual formatting while I am working and let tidy take 
care of the rest. Or is this covering something tidy does not? If it is, could 
tidy be made to handle?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


dweiss commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2991372910

   > Is it possible in Gradle to define a task that does "check -test"?
   
   This is exactly the problem - no, not really. You could hack around it by 
programmatically turning off test if 'precommit' is on start args... but it is 
really an ugly workaround for something that isn't a problem to me. You should 
run your tests before you commit so 'gradlew check' is shorter, really. If you 
insist on running without tests, exclude the ones you don't want. Also, this is 
per-project so 
   ```
   ./gradlew -p lucene/core check -x test
   ```
   
   will run all checks only for the lucene core - precommit was global 
(involved every project).
   
   > But the precommit was always an easy shortcut.
   
   len("precommit") > len("check") ? :)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


mikemccand commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2991624769

   > > but I dunno maybe a gradle daemon or 7 is/are still running from last 
time so it's warm ish
   > 
   > Good one. I wouldn't run with the daemon at all. By "cold" startup I mean 
setting things up - downloading deps, compiling the buid-infra scripts, etc. 
These are currently slow like hell.
   
   Oh hmm ... no, actually.  It reuses the same checkout in the local (NVMe 
SSD) filesystem, does a `git pull`, a `git clean -xfd`, hmm then this cryptic 
step (maybe not needed anymore!!):
   
   ```
   # There is some weird gradle bootstrapping bug: if we do not run this 
"help" first, then the test run fails w/ cryptic error:
   os.system("./gradlew help >> %s.tmp 2>&1" % logFile)
   ```
   
   Then runs tests with `./gradlew --no-daemon -Ptask.times=true -p lucene test 
>> %s.tmp 2>&1` -- OK, no daemon!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make spotless gradle formatting an opt-in rather than default. [lucene]

2025-06-20 Thread via GitHub


dweiss merged PR #14821:
URL: https://github.com/apache/lucene/pull/14821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make spotless gradle formatting an opt-in rather than default. [lucene]

2025-06-20 Thread via GitHub


github-actions[bot] commented on PR #14821:
URL: https://github.com/apache/lucene/pull/14821#issuecomment-2990536605

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-20 Thread via GitHub


dweiss opened a new pull request, #14824:
URL: https://github.com/apache/lucene/pull/14824

   Spotless's support for google java format is kind of slow - I don't think it 
implements gradle's workers api so concurrency is limited to different 
projects. 
   
   This issue aims to provide two custom tasks for applying and checking google 
java format, then use them as a substitute for those found in spotless.
   
   This is in draft state and not fully working but it seems faster for me 
already (although there are open questions I've been struggling with).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2991645292

   > OK, no daemon!
   
   I think it still makes a daemon anyway, just complains, and then kills it at 
the end of the run :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2992549424

   Sorry for the noise, just my lack of gradle knowledge there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Decrease TieredMergePolicy's default number of segments per tier to 8. [lucene]

2025-06-20 Thread via GitHub


mikemccand commented on PR #14823:
URL: https://github.com/apache/lucene/pull/14823#issuecomment-2991356685

   +1, this is a great idea -- more aggessive merging by default makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] style: disable max-line-length in editorconfig [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14822:
URL: https://github.com/apache/lucene/pull/14822#issuecomment-2991194959

   Full list of 
[violations.txt](https://github.com/user-attachments/files/20835991/violations.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2992145681

   You could also consider cheating for the local developer. For a formatting 
task, it is safe to only format changed files (eg based on git status or gradle 
cache or whatever). For formatting there aren't dependencies between files. As 
long as user can do a `CI=true ./gradlew tidy` to force it happening the slow 
way, corner cases such as formatter upgrades can all be handled. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


dweiss commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2991575632

   > but I dunno maybe a gradle daemon or 7 is/are still running from last time 
so it's warm ish
   
   Good one. I wouldn't run with the daemon at all. By "cold" startup I mean 
setting things up - downloading deps, compiling the buid-infra scripts, etc. 
These are currently slow like hell. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] fix sources to conform to .editorconfig [lucene]

2025-06-20 Thread via GitHub


rmuir commented on issue #14819:
URL: https://github.com/apache/lucene/issues/14819#issuecomment-2991947545

   next step up, is the CI linter task. From the gradle side I will look to do 
it consistent with the ast-grep. 
   
   because elements in .editorconfig, configure the editor, I think its ok to 
just invoke it in CI (or if the user opts-in via a property). Newly introduced 
problems going forwards should be rare.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-20 Thread via GitHub


dweiss commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2992515815

   This is already part of this patch, Robert - it is incremental so it should 
only apply to files that have changed since the last run. I don't think it's 
smart enough to do ratchet-style updates based on git history or anything like 
that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2991984654

   Personally I think there's just too much going on here. 
   The file was rushed in too quickly and is far too aggressive (e.g. applying 
too many aggressive settings to ALL files).
   
   It is better to proceed with caution around this file. An example would be 
the python file deleted here: 
https://github.com/apache/lucene/blob/f83587416ecaa7b5fb201dd3938fd06e2af95aba/dev-tools/scripts/.editorconfig
   
   Because there was inconsistent formatting of the python files in that 
folder, it was introduced JUST for very specific filetype in JUST that specific 
folder. 
   
   Trying to apply such settings to ALL files of ALL types in repo is too much, 
you need to deal with wacky file types (e.g. make with tabs), windows file 
types that should have CRLF, generated files, test files, files that aren't 
valid utf-8, and the list goes on and on.
   
   I'm trying to cope regardless, but it isn't easy and I still think there's 
too much going on here. See #14819


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] fix sources to conform to .editorconfig [lucene]

2025-06-20 Thread via GitHub


dweiss commented on issue #14819:
URL: https://github.com/apache/lucene/issues/14819#issuecomment-2990073330

   > It causes the issue that when editing any impacted file, random unrelated 
formatting changes will appear.
   
   This is what I was worried about. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] docs: fix invalid html [lucene]

2025-06-20 Thread via GitHub


github-actions[bot] commented on PR #14818:
URL: https://github.com/apache/lucene/pull/14818#issuecomment-2991037224

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


dweiss commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2991087808

   Precommit was manually configured to include some checks. I don't know, 
haven't touched it for years... For consistency, I'd move on from ant days and 
just use check or check -x test if you want to run without tests. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-06-20 Thread via GitHub


mikemccand merged PR #14178:
URL: https://github.com/apache/lucene/pull/14178


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use PriorityQueue instead of TreeMap in FirstPassGroupingCollector. [lucene]

2025-06-20 Thread via GitHub


vsop-479 commented on PR #14813:
URL: https://github.com/apache/lucene/pull/14813#issuecomment-2990790159

   I added a new benchmark (`FirstPassGroupingBenchmark`) to simulate 
`FirstPassGroupingCollector` 's implementation.
   It seems use heap can gain a better performance:
   ```
   Benchmark   Mode  Cnt  Score   Error   
Units
   FirstPassGroupingBenchmark.getTopNWithHeap thrpt   15  0.010 ± 0.002  
ops/us
   FirstPassGroupingBenchmark.getTopNWithTreeMap  thrpt   15  0.005 ± 0.001  
ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Make spotless gradle formatting an opt-in rather than default. [lucene]

2025-06-20 Thread via GitHub


dweiss opened a new pull request, #14821:
URL: https://github.com/apache/lucene/pull/14821

   This is a tweak of groovy/gradle formatting check for spotless so that it 
does _not_ run by default. This avoids the heavy greclipse download and makes 
tidy run much faster. Most devs won't even touch build files - I don't think we 
all need to suffer here.
   
   I'll just add a gh workflow to check the formatting periodically and I'll 
apply the formatting if it fails. Anybody willing to suffer (heh) can turn it 
on locally by setting the corresponding build option 
```lucene.spotlessGradleScripts``` to true in
   gradle.properties or their local build option file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-06-20 Thread via GitHub


mikemccand commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2990864157

   Thank you @kaivalnp!  I just merged this ... I'll try to watch builds.
   
   Let's let this bake for a week or so on `main` branch and then backport for 
10.x?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-20 Thread via GitHub


dweiss commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2991385071

   > Thanks @dweiss -- I think `precommit` used to do all checks but not run 
tests? The [nightly benchy currently separately measures test time, from the 
rest of `check` time](https://benchmarks.mikemccandless.com/antcleantest.html) 
(whoa, `precommit` time is ~6 minutes, and test time is ~1.2 minutes).
   
   Yes, this is an abomination. This benchmark is from a cold start, I think? 
I've been working to speed things up. We'll see how it goes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-20 Thread via GitHub


dweiss commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2991954656

   > I would rather do my manual formatting while I am working and let tidy 
take care of the rest.
   
   I'm from the same camp. This is one of the reasons I like gjf - I can move 
things around and they half-magically revert back to the previous state. No 
need to think about formatting that much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-20 Thread via GitHub


dweiss commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2992036802

   Right - it's gradle.properties. I'll be experimenting and will try to push 
this forward - it's a promising direction to cut some time from one of the most 
expensive tasks at the moment.
   
   
![image](https://github.com/user-attachments/assets/29c1498a-6ba6-4d69-8c0f-aebc59ae2372)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-06-20 Thread via GitHub


kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2992093576

   Thank you @mikemccand!
   
   > I'll try to watch builds
   
   Here's a link for the GH action that tests this codec on a new commit / PR: 
https://github.com/apache/lucene/actions/workflows/run-special-checks-sandbox.yml
   
   > backport for 10.x?
   
   Exciting! I can try to create a backport PR..
   
   [10.x](https://github.com/apache/lucene/tree/branch_10x) seems to be on 
JDK21 where the FFM API was in preview (https://openjdk.org/jeps/442) and was 
subsequently finalized in JDK22 (https://openjdk.org/jeps/454), I hope we don't 
need too many changes :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] style: disable max-line-length in editorconfig [lucene]

2025-06-20 Thread via GitHub


rmuir merged PR #14822:
URL: https://github.com/apache/lucene/pull/14822


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14824:
URL: https://github.com/apache/lucene/pull/14824#issuecomment-2992023661

   > I had to remove `-XX:ActiveProcessorCount=1` from gradle.properties for 
the above. I think this setting for some reason makes gradle sluggish and 
unpredictable (the timings vary a lot).
   
   For just the `org.gradle.jvmargs`, correct? I think it makes sense to nuke 
it here. It makes better sense for forked processes (e.g. `tests.jvmargs`), 
where, on an 8 processor machine, you want to avoid fork()'ing 8 JVMS, each of 
which is sized for 8 CPU machine (8 * 8 = 64).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial prototype of custom google java format tasks to replace spotless [lucene]

2025-06-20 Thread via GitHub


dweiss commented on code in PR #14824:
URL: https://github.com/apache/lucene/pull/14824#discussion_r2159174610


##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/spotless/ApplyGoogleJavaFormatTask.java:
##
@@ -0,0 +1,77 @@
+package org.apache.lucene.gradle.plugins.spotless;
+
+import com.google.googlejavaformat.java.FormatterException;
+import org.gradle.api.file.ConfigurableFileCollection;
+import org.gradle.api.file.ProjectLayout;
+import org.gradle.api.tasks.TaskAction;
+import org.gradle.work.DisableCachingByDefault;
+import org.gradle.work.InputChanges;
+import org.gradle.workers.WorkAction;
+import org.gradle.workers.WorkParameters;
+import org.gradle.workers.WorkQueue;
+
+import javax.inject.Inject;
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.List;
+import java.util.stream.Collectors;
+
+@DisableCachingByDefault
+public abstract class ApplyGoogleJavaFormatTask extends 
ParentGoogleJavaFormatTask {
+@Inject
+public ApplyGoogleJavaFormatTask(ProjectLayout layout) {
+super(layout, "apply");
+}
+
+@TaskAction
+public void formatSources(InputChanges inputChanges) throws IOException {
+WorkQueue workQueue = getWorkQueue();
+List sourceFiles = getIncrementalBatch(inputChanges);
+if (sourceFiles.isEmpty()) {
+return;
+}
+
+getLogger().info("Will apply formatting to {} source {} in this run.",
+sourceFiles.size(), sourceFiles.size() == 1 ? "file" : 
"files");
+
+for (var sourceFile : sourceFiles) {
+workQueue.submit(ApplyGoogleJavaFormatAction.class, params -> {
+params.getTargetFiles().setFrom(List.of(sourceFile));
+});
+}
+
+Path taskOutput = getOutputChangeListFile().get().getAsFile().toPath();
+Files.writeString(
+taskOutput,
+
sourceFiles.stream().map(File::getPath).collect(Collectors.joining("\n")));
+
+workQueue.await();
+}
+
+public static abstract class ApplyGoogleJavaFormatAction implements 
WorkAction {
+public interface Parameters extends WorkParameters {
+ConfigurableFileCollection getTargetFiles();
+}
+
+@Override
+public void execute() {
+var formatter = getFormatter();
+
+for (File inputFile : getParameters().getTargetFiles().getFiles()) 
{
+var inputPath = inputFile.toPath();
+try {
+String input = Files.readString(inputPath);
+String output = formatter.formatSourceAndFixImports(input);
+
+if (!input.equals(output)) {
+Files.writeString(inputPath, output);
+}

Review Comment:
   @breskeby this is one example where I think gradle build model doesn't shine 
- for this task, the outputs and inputs overlap. It takes input files, formats 
them and writes them back to the same location. I know Ned Twigg (spotless) had 
similar problems. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2993007936

   If you don't mind, i'd like to get the CI piece working before making any 
big changes to this file. Then we can tone it down, fix it, etc, but changes to 
it will be validated rather than eyeballed. Just gimme a bit to get a PR up


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-20 Thread via GitHub


vigyasharma commented on PR #14009:
URL: https://github.com/apache/lucene/pull/14009#issuecomment-2992621644

   Thanks for the explanations, @dungba88. I suppose the scenario you're trying 
to solve for, is when users want to change the matchset of a KnnVectorQuery 
using full-precision or other reranking. I'm open to this change if it's a 
valid requirement. 
   
   My perception is that it should be solvable by plumbing vector query results 
into a rescorer, then combining top-K hits with other (lexical) hits. For e.g. 
is this also a problem for hybrid search in OpenSearch/Elasticsearch ? I 
suspect they might have independent queries for both with some way to combine 
results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add the ability to inverse a Sort [lucene]

2025-06-20 Thread via GitHub


msokolov commented on PR #14775:
URL: https://github.com/apache/lucene/pull/14775#issuecomment-2992625771

   Should we refuse to allow inverse of Sort.SCORE? or RELEVANCE?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: add eclint to verify editorconfig in CI [lucene]

2025-06-20 Thread via GitHub


rmuir commented on code in PR #14825:
URL: https://github.com/apache/lucene/pull/14825#discussion_r2159848746


##
.github/actions/eclint/action.yml:
##
@@ -0,0 +1,33 @@
+name: Install eclint
+description: Installs eclint from cache, or builds it
+
+inputs:
+  eclint-version:
+required: false
+description: "eclint version to use"
+default: "v0.5.1"
+
+runs:
+ using: "composite"
+ steps:
+  - name: Cache eclint binary
+id: cache-eclint
+uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
+with:
+  path: ~/.local/bin/eclint
+  key: eclint-${{ inputs.eclint-version }}-${{ runner.os }}-${{ 
runner.arch }}

Review Comment:
   May seem overkill to only save single-digit seconds, but the purpose is to 
avoid the network requests (similar to gradle-wrapper).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: fix incorrect gradle dependency [lucene]

2025-06-20 Thread via GitHub


dweiss merged PR #14826:
URL: https://github.com/apache/lucene/pull/14826


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] build: add eclint to verify editorconfig in CI [lucene]

2025-06-20 Thread via GitHub


rmuir opened a new pull request, #14825:
URL: https://github.com/apache/lucene/pull/14825

   You generally shouldn't need the check, as your editor should follow the 
editorconfig and not introduce the errors checked-for here.
   
   But it helps maintain the .editorconfig itself and ensure that files in the 
repository actually match whatever is configured there.
   
   If you want to enable the check locally for debugging:
   ```
   GOBIN=${HOME}/.local/bin go install gitlab.com/greut/eclint/cmd/eclint@latest
   echo "lucene.tool.eclint=eclint" >> ~/.gradle/gradle.properties
   ```
   
   The eclint action is not used, as it is not on the apache approved list. 
Getting the 3MB binary in CI takes 2s (cached) and 9s (uncached, using 
setup-go).
   
   The check itself runs fast:
   ```
   real 0m3.647s
   user 0m6.702s
   sys  0m0.474s
   ```
   
   @dweiss, I imagine you want to keep things a certain way, please feel free 
to push commits here. I just hack when it comes to the gradle side of things, 
but the actions part is decent I think.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-20 Thread via GitHub


dsmiley commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2992922097

   I definitely disagree on this PR being rushed.  You aren't a fan of it but 
that doesn't make it rushed.
   
   > Trying to apply such settings to ALL files of ALL types in repo is too 
much, you need to deal with wacky file types (e.g. make with tabs), windows 
file types that should have CRLF, generated files, test files, files that 
aren't valid utf-8, and the list goes on and on.
   
   I mostly sympathize.
   
   As you & Dawid suggested, I can remove the initial everything glob, and 
instead copy the rules there now to the files types it lists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-20 Thread via GitHub


rmuir commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2992994166

   I'm actually a huge fan of having editorconfig, 
   
   I just know it can be tricky if you want to get it right (e.g. not have 
random style changes in PRs). Similar process to when we moved to spotless, 
fixing all sources to conform, marking the git commits special so they don't 
mess up `git blame`, dealing with the long tail issues described above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Decrease TieredMergePolicy's default number of segments per tier to 8. [lucene]

2025-06-20 Thread via GitHub


jpountz commented on PR #14823:
URL: https://github.com/apache/lucene/pull/14823#issuecomment-2993010598

   Thanks @mikemccand ! I'll wait a few days before merging to give others a 
chance to take a look. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: add eclint to verify editorconfig in CI [lucene]

2025-06-20 Thread via GitHub


github-actions[bot] commented on PR #14825:
URL: https://github.com/apache/lucene/pull/14825#issuecomment-2993289814

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: add eclint to verify editorconfig in CI [lucene]

2025-06-20 Thread via GitHub


rmuir commented on code in PR #14825:
URL: https://github.com/apache/lucene/pull/14825#discussion_r2159850350


##
.github/actions/eclint/action.yml:
##
@@ -0,0 +1,33 @@
+name: Install eclint
+description: Installs eclint from cache, or builds it
+
+inputs:
+  eclint-version:
+required: false
+description: "eclint version to use"
+default: "v0.5.1"

Review Comment:
   we could alternatively do go.mod/go.sum to let dependabot take care, but its 
a bit more awkward and I'm afraid of tons of noise from the indirect deps. Plus 
those files use tab indentation :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: fix incorrect gradle dependency [lucene]

2025-06-20 Thread via GitHub


github-actions[bot] commented on PR #14826:
URL: https://github.com/apache/lucene/pull/14826#issuecomment-2993299954

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] build: fix incorrect gradle dependency [lucene]

2025-06-20 Thread via GitHub


rmuir opened a new pull request, #14826:
URL: https://github.com/apache/lucene/pull/14826

   applyAstGrep doesn't run in CI, because it is set to depend on "tidy" 
instead of "check"
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-20 Thread via GitHub


dungba88 commented on PR #14009:
URL: https://github.com/apache/lucene/pull/14009#issuecomment-2993030559

   > is when users want to change the matchset of a KnnVectorQuery using 
full-precision or other reranking
   
   Yes that's correct, @vigyasharma. We are using a hybrid search where 
KnnFloatVectorQuery and TermQuery (amongst others) are combined into a single 
BooleanQuery. Thus it is important to change the matchset of 
KnnFloatVectorQuery individually.
   
   For hybrid search in OpenSearch/Elastic Search, I'm wondering if @jmazanec15 
and @benwtrent have any input. I'm having a feeling that it's quite common to 
combine lexical + KNN matching into a single BooleanQuery.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org