Re: [I] spotlessGradleScripts doesn't work with whitespace-paths on Windows [lucene]

2025-06-19 Thread via GitHub


rmuir commented on issue #14787:
URL: https://github.com/apache/lucene/issues/14787#issuecomment-2989181352

   @dweiss another one to investigate as it advertises linting capabilities 
too: https://github.com/nvuillam/npm-groovy-lint


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-19 Thread via GitHub


rmuir commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2989679301

   @easyice Something like DELTA+FOR shouldn't require any cache, right? To me 
that is a different problem with other challenges: index would need to be e.g. 
sorted on timestamp field for the delta-compression to work effectively (i 
think?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-19 Thread via GitHub


easyice commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2989753149

   @rmuir You are right, it needs to be sorted on the timestamp field. In 
addition to enabling delta-compression on the timestamp field, index sorting 
brings another benefit: when sorting by timestamp and the query includes a 
condition on this field, the doc IDs after query filtering tend to be more 
contiguous (i.e., not completely randomly distributed). As a result, this 
mitigates the performance impact of random access when bulk decoding doc values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] fix sources to conform to .editorconfig [lucene]

2025-06-19 Thread via GitHub


rmuir commented on issue #14819:
URL: https://github.com/apache/lucene/issues/14819#issuecomment-2989770075

   I'm having decent luck with the fast https://gitlab.com/greut/eclint, which 
allows disabling some of the problems in the .editorconfig with `eclint_` 
prefix. There are many categories of problems, e.g. invalid encoding: as we now 
say all non-binary files are UTF-8. I'm iterating with it, I've had to 
regenerate that huge DFA several times.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] feat: add automated backport workflow [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14738:
URL: https://github.com/apache/lucene/pull/14738#issuecomment-2989473038

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-19 Thread via GitHub


easyice commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2989482091

   Yeah, I’ve been thinking about this. Elasticsearch now supports a 
time_series index mode with DELTA + FOR encoding on doc values. In time series 
or logging scenarios, storage cost usually matters more than query performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add isEmpty in PriorityQueue. [lucene]

2025-06-19 Thread via GitHub


vsop-479 commented on code in PR #14814:
URL: https://github.com/apache/lucene/pull/14814#discussion_r2157867142


##
lucene/core/src/java/org/apache/lucene/util/PriorityQueue.java:
##
@@ -342,4 +342,13 @@ public T next() {
   }
 };
   }
+
+  /**
+   * Returns {@code true} if this list contains no elements.

Review Comment:
   Yes, I removed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add isEmpty in PriorityQueue. [lucene]

2025-06-19 Thread via GitHub


vsop-479 commented on code in PR #14814:
URL: https://github.com/apache/lucene/pull/14814#discussion_r2157890563


##
lucene/core/src/java/org/apache/lucene/util/PriorityQueue.java:
##
@@ -342,4 +342,13 @@ public T next() {
   }
 };
   }
+
+  /**
+   * Returns {@code true} if this list contains no elements.

Review Comment:
   > e.g. use this somewhere in Lucene or explain how you would like to use it 
in your application?
   
   I think it is useful when calling `PriorityQueue#pop` to fetch topN, just 
like a `Collection`, e.g. `Stack`, `ArrayList`, etc. 
   Although `PriorityQueue` is not a `Collection`, and we can do this by 
judging `PriorityQueue#size`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] fix sources to conform to .editorconfig [lucene]

2025-06-19 Thread via GitHub


rmuir opened a new issue, #14819:
URL: https://github.com/apache/lucene/issues/14819

   ### Description
   
   editorconfig with aggressive settings was applied to all files, but without 
fixing any existing problems or preventing new ones.
   
   It causes the issue that when editing any impacted file, random unrelated 
formatting changes will appear. For example trailing whitespace is disabled, so 
if you edit any files that have it, your editor will correct that, leading to 
unrelated changes.
   
   validation/autofix is tough: especially since indentation and line-length 
settings are applied to ALL files: these need ast-awareness to validating and 
fix correctly (e.g. spotless, prettier, etc). On the other hand, editors tend 
to not aggressively autofix these for the same reason.
   
   We can start by fixing the most basic problems: e.g. trailing whitespace, 
missing newlines, etc in one fell swoop with low-tech tools, and add the commit 
to `.git-blame-ignore-revs`. These are the corrections that editors tend to do 
automatically across the entire individual file when saving it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] style: fix sources to conform to .editorconfig (basic fixes only) [lucene]

2025-06-19 Thread via GitHub


rmuir opened a new pull request, #14820:
URL: https://github.com/apache/lucene/pull/14820

   Rather than have editors introduce these changes incrementally in unrelated 
PRs over time, fix them in one commit which may be added to 
.git-blame-ignore-revs as a followup commit.
   
   Steps performed:
   1. comment out indentation and line-length settings from .editorconfig 
(fixes are not "basic")
   2. ran eclint -fix
   3. ran gradlew regenerate
   
   Relates: #14819
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] docs: fix invalid html [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14818:
URL: https://github.com/apache/lucene/pull/14818#issuecomment-2989632943

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] fix sources to conform to .editorconfig [lucene]

2025-06-19 Thread via GitHub


rmuir commented on issue #14819:
URL: https://github.com/apache/lucene/issues/14819#issuecomment-2989648867

   It looks to me like https://github.com/ec4j/editorconfig-gradle-plugin might 
be exactly what is wanted here for the build to keep this tidy.
   
   its TextLinter will address the common ones that cause this kind of noise in 
PRs:
   end_of_line, trim_trailing_whitespace, insert_final_newline. We can disable 
its XMLLinter which will do indentation as well for .xml and .xsl files, or 
just keep it, doesn't matter to me. I'm mostly concerned about the issues 
detected/fixed by TextLinter because editors will automatically introduce this 
noise into files otherwise.
   
   cc: @dweiss


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] docs: fix invalid html [lucene]

2025-06-19 Thread via GitHub


rmuir commented on code in PR #14818:
URL: https://github.com/apache/lucene/pull/14818#discussion_r2157934763


##
lucene/queryparser/docs/xml/LuceneContribQuery.dtd.html:
##
@@ -27,7 +27,7 @@
 
 BooleanQuerys implement Boolean logic which controls how multiple Clauses 
should be interpreted.
 Some clauses may represent optional Query criteria while others represent 
mandatory criteria.Example: Find 
articles about banks, preferably talking about mergers but nothing to do with 
"sumitomo"
-

Review Comment:
   I opened #14819 to address all these unrelated changes. if we can merge that 
one first, the diff here will become readable for review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-19 Thread via GitHub


rmuir commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2988878571

   > The existing editorconfig was redundant with information in this 
editorconfig, so I simply removed it. I'll merge this tonight.
   
   That's not true, now a maximum line length is set, inconsistent with the 
linter. I will fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-19 Thread via GitHub


rmuir commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2988890817

   > I think you're missing a point of the value of this. It's _complementary_ 
with Spotless (or similar). It helps us write code formatted according to the 
project's standards during the writing/editing process, especially when 
utilizing IDE features that manipulate code (e.g. refactorings, perhaps code 
suggestion completions too). Consequently, forgetting to run tidy/Spotless can 
be less of an annoyance... pushing more formatting earlier.
   
   @dsmiley .editorconfig has nothing to do with spotless or IDEs or anything 
like that. It configures editors... all editors.
   
   If affects us all, for all files: even those people not using an IDE. When 
this PR was merged my editor (vim) was throwing errors left and right!
   
   So I'd ask if we want to tweak intellij formatting, that if possible, we 
configure that intellij-specific stuff elsewhere in the project.
   
   This file is very important, will be interpreted by all kinds of editors 
(not just IDE) so its best to be on the minimal side and be extremely careful 
around it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] spotlessGradleScripts doesn't work with whitespace-paths on Windows [lucene]

2025-06-19 Thread via GitHub


dweiss commented on issue #14787:
URL: https://github.com/apache/lucene/issues/14787#issuecomment-2988833608

   I know. I wish there was a simpler alternative. I am surprised groovy 
doesn't include a formatter (since it has an ast parser api).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-19 Thread via GitHub


dweiss commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2988835226

   The standard: ```./gradlew check``` will do the job. You can use it on 
branch_10x as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-19 Thread via GitHub


dweiss commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2988835937

   https://github.com/apache/lucene/blob/main/help/workflow.txt#L15-L19


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] spotlessGradleScripts doesn't work with whitespace-paths on Windows [lucene]

2025-06-19 Thread via GitHub


rmuir commented on issue #14787:
URL: https://github.com/apache/lucene/issues/14787#issuecomment-2988998672

   prettier supports many languages (including java) and there are plugins for 
groovy. Because of its popularity esp. on the front-end side, its much more 
widely used, and has good editor and IDE support, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] spotlessGradleScripts doesn't work with whitespace-paths on Windows [lucene]

2025-06-19 Thread via GitHub


rmuir commented on issue #14787:
URL: https://github.com/apache/lucene/issues/14787#issuecomment-2989031043

   That's not an argument for it either, I haven't tried it out on our 
codebase. Just a guess as to why you aren't finding lots of alternatives.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] spotlessGradleScripts doesn't work with whitespace-paths on Windows [lucene]

2025-06-19 Thread via GitHub


dweiss commented on issue #14787:
URL: https://github.com/apache/lucene/issues/14787#issuecomment-2989037837

   I just went with the simplest route which allowed me to reformat existing 
code automatically and keep it sane. If it feels too heavy - we can turn it off 
by default (and periodically reformat) or even remove it entirely. No other 
formatter comes close to the one spotless uses (eclipse's).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] docs: fix invalid html [lucene]

2025-06-19 Thread via GitHub


rmuir opened a new pull request, #14818:
URL: https://github.com/apache/lucene/pull/14818

   These html files (whether manually or automatically generated) have 
inconsistent usage of 'gt' and 'amp' html entities: sometimes they do it, 
sometimes they neglect to use it.
   
   I added rules to try to keep these kinds of problems at bay: not really 
ideal but it works in lieu of some better linter for the long tail of file 
types, looks like this:
   
   ```
   error[json-syntax]: Parse Error
 ┌─ 
lucene/test-framework/src/generated/checksums/generateEmojiTokenizationTest.json:4:38
 │
   4 │ "property:unicodeVersion": "12.1",
 │  ^
 │
 = Treesitter parse error: editors and tools may treat it incorrectly.
   Check for trailing commas and comments, which JSON forbids.
   
   error[css-syntax]: Parse Error
   ┌─ 
lucene/analysis/common/src/test/org/apache/lucene/analysis/charfilter/MS-Word 
14 generated.htm:333:1
   │
   333 │ 

Re: [PR] docs: fix invalid html [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14818:
URL: https://github.com/apache/lucene/pull/14818#issuecomment-2989516049

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]

2025-06-19 Thread via GitHub


rmuir merged PR #14812:
URL: https://github.com/apache/lucene/pull/14812


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-19 Thread via GitHub


dsmiley commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2989124763

   I'm confused how the presence of an `.editorconfig` resulted in errors; can 
you elaborate how that could happen?  Judging from your commit here just 
recently, was this a minor glitch due to where a comment was in this file?  
Thanks for fixing that, and hopefully it's a non-issue now.  @msokolov I would 
appreciate it if you enable emacs's support for editorconfig and let us know if 
it was terrible/bad; you seem worried about it for some reason.
   
   And how could the presence of IntelliJ specific matters interfere with 
someone not using IntelliJ such as yourself Rob?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-19 Thread via GitHub


rmuir commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2989130686

   .editorconfig is not an intellij specific file. It is parsed and respected 
by many editors, even ones you may not consider, such as github repository 
editor in the browser UI. So the file should really be simple and minimal: it 
is not the place to put a bunch of intellij-specific stuff. This is not the 
file to have complexity.
   
   my editor complained loudly (rightfully) because the file wasn't 
syntactically valid. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-19 Thread via GitHub


msokolov commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2988904498

   
   
   This got me worried so I checked emacs editorconfig support 
https://www.gnu.org/software/emacs/manual/html_node/emacs/EditorConfig-support.html
 and it turns out it is off by default, phew.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-19 Thread via GitHub


msokolov commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2988905484

   This got me worried so I checked emacs editorconfig support 
https://www.gnu.org/software/emacs/manual/html_node/emacs/EditorConfig-support.html
 and it turns out it is off by default, phew.
   On Thursday, June 19th, 2025 at 2:47 PM, Robert Muir ***@***.***> wrote:
   
   > rmuir left a comment 
[(apache/lucene#14740)](https://github.com/apache/lucene/pull/14740#issuecomment-2988890817)
   >
   >> I think you're missing a point of the value of this. It's complementary 
with Spotless (or similar). It helps us write code formatted according to the 
project's standards during the writing/editing process, especially when 
utilizing IDE features that manipulate code (e.g. refactorings, perhaps code 
suggestion completions too). Consequently, forgetting to run tidy/Spotless can 
be less of an annoyance... pushing more formatting earlier.
   >
   > ***@***.***(https://github.com/dsmiley) .editorconfig has nothing to do 
with spotless or IDEs or anything like that. It configures editors... all 
editors.
   >
   > If affects us all, for all files: even those people not using an IDE. When 
this PR was merged my editor (vim) was throwing errors left and right!
   >
   > So I'd ask if we want to tweak intellij formatting, that if possible, we 
configure that intellij-specific stuff elsewhere in the project.
   >
   > This file is very important, will be interpreted by all kinds of editors 
(not just IDE) so its best to be on the minimal side and be extremely careful 
around it.
   >
   > —
   > Reply to this email directly, [view it on 
GitHub](https://github.com/apache/lucene/pull/14740#issuecomment-2988890817), 
or 
[unsubscribe](https://github.com/notifications/unsubscribe-auth/AAHHUQNZFARPDXJTCLOPTKT3EMAULAVCNFSM6AB6KDRV72VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOBYHA4TAOBRG4).
   > You are receiving this because you are subscribed to this thread.Message 
ID: ***@***.***>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] style: fix sources to conform to .editorconfig (basic fixes only) [lucene]

2025-06-19 Thread via GitHub


rmuir commented on PR #14820:
URL: https://github.com/apache/lucene/pull/14820#issuecomment-2989794455

   all done, took a few iterations due to that gnarly encoding error and 
dealing with a long-tail of weird stuff.
   
   Checker passes now:
   ```
   $ time eclint
   
   real 0m3.708s
   user 0m6.757s
   sys  0m0.464s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] style: fix sources to conform to .editorconfig (basic fixes only) [lucene]

2025-06-19 Thread via GitHub


rmuir commented on PR #14820:
URL: https://github.com/apache/lucene/pull/14820#issuecomment-2989816496

   Several options if we want to use this linter in CI (i'm concerned the 
java-gradle based one may be slow):
   * get binaries from releases: https://gitlab.com/greut/eclint/-/releases
   * build+cache with `go install gitlab.com/greut/eclint/cmd/eclint`
   * use their container/action
   
   Avoid the javascript based tool with the same name which can be confusing...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert some complex PriorityQueue implementations to comparators [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14817:
URL: https://github.com/apache/lucene/pull/14817#issuecomment-2988077064

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert some complex PriorityQueue implementations to comparators [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14817:
URL: https://github.com/apache/lucene/pull/14817#issuecomment-2988081467

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Expose search strategy in KNN query [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14816:
URL: https://github.com/apache/lucene/pull/14816#issuecomment-2988023256

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert some complex PriorityQueue implementations to comparators [lucene]

2025-06-19 Thread via GitHub


thecoop commented on PR #14817:
URL: https://github.com/apache/lucene/pull/14817#issuecomment-298814

   The subclasses that are remaining after this PR are:
   
   * `TermsMergeQueue`. This has some additional state in the `stack` variable
   * `FieldValueHitQueue`. This is further subclassed, and is public (although 
experimental)
   * `HitQueue`. Public class (although internal). Used in several difference 
places, so this class represents some common functionality.
   * `TopDocs.ScoreMergeSortQueue`, `TopDocs.MergeSortQueue`. Has some 
significant state fields in them
   * `TopOrdAndNumberQueue` and subclasses. Part of the public API, and has 
some abstract behavious set by subclasses
   * `SuggestWordQueue`. Public API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-19 Thread via GitHub


dsmiley merged PR #14740:
URL: https://github.com/apache/lucene/pull/14740


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-19 Thread via GitHub


mikemccand commented on PR #14764:
URL: https://github.com/apache/lucene/pull/14764#issuecomment-2988527500

   Thank you @dweiss for all the awesome attention to Lucene's build infra!
   
   I used to run `./gradlew precommit` -- is there an equivalent in the new 
shiny gradle plugin based build?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]

2025-06-19 Thread via GitHub


rmuir commented on code in PR #14812:
URL: https://github.com/apache/lucene/pull/14812#discussion_r2156593895


##
gradle/validation/ast-grep/rules/java-patterns.yml:
##
@@ -0,0 +1,24 @@
+# Banned Lucene source patterns
+# Historically implemented as regexes which are more difficult
+---
+# yaml-language-server: 
$schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json
+id: java-lang-import
+language: java
+rule:
+  pattern: import java.lang.$REF
+  kind: import_declaration
+fix: ""
+severity: error
+message: unnecessary import of `$REF` from java.lang
+note: classes in java.lang are implicitly imported
+---
+# yaml-language-server: 
$schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json
+id: impossible-type-inference
+language: java
+# does the javac compiler allow this? did it ever?
+rule:
+  pattern: var $$$ = new $$$<>($$$)
+  kind: local_variable_declaration

Review Comment:
   @dweiss there's not a problem here: the first line is valid. the second line 
is invalid. same as with the source-pattern today.
   
   i'm not arguing for what the rule is trying to achieve, just trying to 
replace the regex.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14812:
URL: https://github.com/apache/lucene/pull/14812#issuecomment-2987450127

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14812:
URL: https://github.com/apache/lucene/pull/14812#issuecomment-2987487274

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]

2025-06-19 Thread via GitHub


rmuir commented on code in PR #14812:
URL: https://github.com/apache/lucene/pull/14812#discussion_r2156621252


##
gradle/validation/ast-grep/rules/java-patterns.yml:
##
@@ -0,0 +1,24 @@
+# Banned Lucene source patterns
+# Historically implemented as regexes which are more difficult
+---
+# yaml-language-server: 
$schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json
+id: java-lang-import
+language: java
+rule:
+  pattern: import java.lang.$REF
+  kind: import_declaration
+fix: ""
+severity: error
+message: unnecessary import of `$REF` from java.lang
+note: classes in java.lang are implicitly imported
+---
+# yaml-language-server: 
$schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json
+id: impossible-type-inference
+language: java
+# does the javac compiler allow this? did it ever?
+rule:
+  pattern: var $$$ = new $$$<>($$$)
+  kind: local_variable_declaration

Review Comment:
   I understand your explanation: I removed the TODO and renamed rule from 
"impossible" to "confusing" :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add isEmpty in PriorityQueue. [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14814:
URL: https://github.com/apache/lucene/pull/14814#issuecomment-2987489716

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] detect and ban wildcard imports in Java [lucene]

2025-06-19 Thread via GitHub


dweiss commented on PR #14804:
URL: https://github.com/apache/lucene/pull/14804#issuecomment-2987375548

   FYI - spotless runs google java format in a single-thread, that's part of 
the reason it's sluggish. But it's still an order of magnitude slower than the 
c parser. I ran this, for the fun of it (this uses multiple threads for 
parsing):
   ```
   > find . -name "*.java" > files.in
   > wc -l files.in
   5973 files.in
   > time java -jar ~/tmp/google-java-format-1.27.0-all-deps.jar  -n 
--skip-reflowing-long-strings --skip-javadoc-formatting 
--skip-removing-unused-imports @files.in
   real 0m5.338s
   user 2m7.927s
   sys  0m6.549s
   ```
   Compare to tree-sitter's 7 seconds, it's rather pale.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]

2025-06-19 Thread via GitHub


rmuir commented on code in PR #14812:
URL: https://github.com/apache/lucene/pull/14812#discussion_r2156626300


##
gradle/validation/ast-grep/rules/java-patterns.yml:
##
@@ -0,0 +1,24 @@
+# Banned Lucene source patterns
+# Historically implemented as regexes which are more difficult
+---
+# yaml-language-server: 
$schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json
+id: java-lang-import
+language: java
+rule:
+  pattern: import java.lang.$REF
+  kind: import_declaration
+fix: ""

Review Comment:
   just nvim + lang server. the ast-grep lang-server will provide diagnostics 
and code action fixes.
   
   you can get similar setup easily / without hassle with just vscode + lang 
server. the help/IDEs has the setup procedure. In this case you would also want 
to install and configure ast-grep extension.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Fix 'private' method declared 'final' in PriorityQueue. [lucene]

2025-06-19 Thread via GitHub


vsop-479 opened a new pull request, #14815:
URL: https://github.com/apache/lucene/pull/14815

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


rmuir commented on code in PR #14811:
URL: https://github.com/apache/lucene/pull/14811#discussion_r2156628908


##
gradle/validation/forbidden-apis/defaults.all.txt:
##
@@ -76,3 +76,15 @@ java.lang.Math#fma(float,float,float)
 java.lang.Math#fma(double,double,double)
 
 java.lang.Thread#sleep(**) @ Thread.sleep makes inefficient use of resources, 
introduces weird race conditions and slows down the code/tests. Not a scalable 
and good practice so we should prevent it creeping into lucene code
+
+@defaultMessage Using dangerous ClassLoader APIs may deserialize untrusted 
user input into bytecode, leading to remote code execution vulnerabilities
+java.lang.ClassLoader#defineClass(**)
+jdk.internal.misc.Unsafe#defineClass(**)
+jdk.internal.access.JavaLangAccess#defineClass(**)
+# forbidden complains it can't find this enclosed class
+# java.lang.invoke.MethodHandles.Lookup#defineClass(**)

Review Comment:
   thanks for the explanation, I figured there was a way to catch it, the `$` 
is what I was missing, too rusty with java :)
   
   I will remove some of the redundancies, too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-06-19 Thread via GitHub


jimczi commented on issue #14681:
URL: https://github.com/apache/lucene/issues/14681#issuecomment-2987729521

   Thanks for opening this and for all the work going into expanding vector 
search in Lucene.
   
   That said, I’m a bit concerned about the growing number of options being 
proposed that seem to have similar trade-offs, just implemented differently. 
We’ve added Faiss recently, and now there’s talk of bringing in another 
graph-based method that doesn’t really change the fundamental limitations we 
already have with the HNSW codec.
   
   Things like Vamana or product quantization can definitely be useful, but 
from what I can tell, they offer similar properties to what we already have 
with HNSW + BBQ. It feels like we’re piling on more alternatives without really 
addressing the pain points some users face with graph-based indexing in general.
   
   Also, calling this a “disk-based” solution seems a bit misleading if the 
graph still has to be built fully in memory. That’s often the core problem 
people are trying to get around.
   
   Instead, I think it might be more valuable to dig into the specific 
improvements JVector brings and see if any of those could make our current HNSW 
implementation better. I thought that was already explored in 
[#12615](https://github.com/apache/lucene/issues/12615), and as I recall, there 
wasn’t a strong enough differentiator to justify pulling in the full JVector 
approach. Revisiting that might be a better path forward.
   
   I totally get that OpenSearch wants to contribute their work upstream, and 
that’s great to see. It just feels important that we avoid overwhelming users 
with too many similar graph-based options, especially when they don’t clearly 
solve new problems.
   
   Appreciate the ongoing discussion and all the effort here!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


rmuir merged PR #14811:
URL: https://github.com/apache/lucene/pull/14811


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Convert some complex PriorityQueue implementations to comparators [lucene]

2025-06-19 Thread via GitHub


thecoop opened a new pull request, #14817:
URL: https://github.com/apache/lucene/pull/14817

   A 3rd set of `PriorityQueue` conversions. These ones are more controversial, 
and so need some specific review and discussion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert some complex PriorityQueue implementations to comparators [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14817:
URL: https://github.com/apache/lucene/pull/14817#issuecomment-2988043480

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-06-19 Thread via GitHub


mikemccand commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2988466651

   Thanks @kaivalnp -- I think this is ready -- I will try to merge later today 
or early tomorrow!  Sandboxy fun experimental Codec...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Expose search strategy in KNN query [lucene]

2025-06-19 Thread via GitHub


tteofili merged PR #14816:
URL: https://github.com/apache/lucene/pull/14816


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Expose search strategy in KNN query [lucene]

2025-06-19 Thread via GitHub


tteofili opened a new pull request, #14816:
URL: https://github.com/apache/lucene/pull/14816

   This simply exposes `AbstractKnnQuery#searchStrategy`. 
   This might be useful to debug/inspect and perform optimizations based on 
different such strategies for consumers of such queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] spotlessGradleScripts doesn't work with whitespace-paths on Windows [lucene]

2025-06-19 Thread via GitHub


uschindler commented on issue #14787:
URL: https://github.com/apache/lucene/issues/14787#issuecomment-2988224422

   💩


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove unused delGen() method from FieldTermIterator [lucene]

2025-06-19 Thread via GitHub


romseygeek commented on PR #14807:
URL: https://github.com/apache/lucene/pull/14807#issuecomment-2988226115

   It's a package-private class so not part of the public API - I don't think I 
would normally add a CHANGES entry here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove unused delGen() method from FieldTermIterator [lucene]

2025-06-19 Thread via GitHub


stefanvodita commented on PR #14807:
URL: https://github.com/apache/lucene/pull/14807#issuecomment-2988281014

   Sorry, my bad, you're right!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove unused delGen() method from FieldTermIterator [lucene]

2025-06-19 Thread via GitHub


romseygeek merged PR #14807:
URL: https://github.com/apache/lucene/pull/14807


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


uschindler commented on PR #14811:
URL: https://github.com/apache/lucene/pull/14811#issuecomment-2987071310

   In addition the list on errorprone is incomplete. Since Java 16 there's 
`Lookup#defineHiddenClass(**) & others (see expressions module).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


uschindler commented on PR #14811:
URL: https://github.com/apache/lucene/pull/14811#issuecomment-2987067126

   > Source of the check: 
https://github.com/google/error-prone/blob/master/core/src/main/java/com/google/errorprone/bugpatterns/BanClassLoader.java
   > 
   > It also checks for explicit `extends URLClassLoader`, which isn't handled 
here. I'm not sure what special powers subclassing gives, maybe more methods 
need to be banned.
   > 
   > Also maybe some of these don't need to be banned because they are 
impossible due to java module system? I think google is still on java 8 :)
   
   Exactly, see above. Most of the signatures here can be removed. jdk.internal 
is unreachable unless you pass open-modules parameters to jvaca or jdk. 
Nevertheless to SUpport Java 8, the forbiddenapis has a dynamic 
`jdk-non-portable` bundled signature which disallows all non-documented classes 
like sun.misc or those in `jdk.*` java modules.
   
   About subclassing: A constructor is like a static methos. To forbid a 
constructur you need to give exact class names and ctors, there's no 
inheritance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


uschindler commented on code in PR #14811:
URL: https://github.com/apache/lucene/pull/14811#discussion_r2156332399


##
gradle/validation/forbidden-apis/defaults.all.txt:
##
@@ -76,3 +76,15 @@ java.lang.Math#fma(float,float,float)
 java.lang.Math#fma(double,double,double)
 
 java.lang.Thread#sleep(**) @ Thread.sleep makes inefficient use of resources, 
introduces weird race conditions and slows down the code/tests. Not a scalable 
and good practice so we should prevent it creeping into lucene code
+
+@defaultMessage Using dangerous ClassLoader APIs may deserialize untrusted 
user input into bytecode, leading to remote code execution vulnerabilities
+java.lang.ClassLoader#defineClass(**)
+jdk.internal.misc.Unsafe#defineClass(**)
+jdk.internal.access.JavaLangAccess#defineClass(**)
+# forbidden complains it can't find this enclosed class
+# java.lang.invoke.MethodHandles.Lookup#defineClass(**)

Review Comment:
   In reality most of the signatures here are automatically enabled by default 
(in a more dynamic way).
   
   - `jdk.internal.*` is forbidden by default by disallowing all calls to non 
"public access" modules. See the "jdk-non-portable" bundled signatures: 
https://github.com/policeman-tools/forbidden-apis/wiki/BundledSignatures
   - `sun.misc.Unsafe` is also forbidden by default (same as before: 
https://github.com/policeman-tools/forbidden-apis/wiki/BundledSignatures). 
Adding it here brings more problems than it solves. Because `sun.misc.Unsafe` 
get removed around Java 25 (and especailly that method mentioned here), this 
will cause issues in later Java versions.
   
   Because of that all "internal" and "sun.misc" stuff should be removed here. 
The other ones are fine,



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


uschindler commented on PR #14811:
URL: https://github.com/apache/lucene/pull/14811#issuecomment-2987082206

   Final note abozut subclasses: You need to add all ctors of sublasses of 
Classloader which reside in JDK. Now you might argue that one could define a 
subclass in user code (extending URLClassLoader) to circumvent the check. But 
this is differeent because when you declare that "user defined" classloader you 
have to add a ctor which calls super -> then it detect calling the call to 
superclass' ctor which was forbidden.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] detect and ban wildcard imports in Java [lucene]

2025-06-19 Thread via GitHub


rmuir commented on PR #14804:
URL: https://github.com/apache/lucene/pull/14804#issuecomment-2987658166

   @dweiss yeah, the parsers are generally fast. and the java one in particular 
is very nice. Of course they are doing a lot less, too. 
   
   parser test framework tracks the performance in terms of MB/s and warns if 
the speed gets too low:
   ```
 custom_tag:
 5. ✓ custom tag
 6. ✓ custom tag: inline
 7. ✓ custom tag: missing description
 8. ✓ custom tag: inline, missing description -- Warning: Slow parse 
rate (1671.346 bytes/ms)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Add isEmpty in PriorityQueue. [lucene]

2025-06-19 Thread via GitHub


vsop-479 opened a new pull request, #14814:
URL: https://github.com/apache/lucene/pull/14814

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use PriorityQueue instead of TreeMap in FirstPassGroupingCollector. [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14813:
URL: https://github.com/apache/lucene/pull/14813#issuecomment-2987467232

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Use PriorityQueue instead of TreeMap in FirstPassGroupingCollector. [lucene]

2025-06-19 Thread via GitHub


vsop-479 opened a new pull request, #14813:
URL: https://github.com/apache/lucene/pull/14813

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: replace ComparingThisWithNull with ast-grep rule [lucene]

2025-06-19 Thread via GitHub


rmuir merged PR #14810:
URL: https://github.com/apache/lucene/pull/14810


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] spotlessGradleScripts doesn't work with whitespace-paths on Windows [lucene]

2025-06-19 Thread via GitHub


dweiss closed issue #14787: spotlessGradleScripts doesn't work with 
whitespace-paths on Windows
URL: https://github.com/apache/lucene/issues/14787


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] spotlessGradleScripts doesn't work with whitespace-paths on Windows [lucene]

2025-06-19 Thread via GitHub


dweiss commented on issue #14787:
URL: https://github.com/apache/lucene/issues/14787#issuecomment-2987473386

   It's in a plugin that coordinates eclipse-gradle interaction. The bug is 
simple but I don't think I want to go down and try to fix that particular code, 
sorry - too hairy.
   
   
https://github.com/equodev/equo-ide/blob/main/solstice/src/main/java/dev/equo/solstice/ShimBundle.java#L238-L239
   
   I'll close the issue if it works (even with a nasty stack trace). Maybe we 
can come up with some other alternatives for formatting groovy/gradle code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add isEmpty in PriorityQueue. [lucene]

2025-06-19 Thread via GitHub


stefanvodita commented on code in PR #14814:
URL: https://github.com/apache/lucene/pull/14814#discussion_r2156665168


##
lucene/core/src/java/org/apache/lucene/util/PriorityQueue.java:
##
@@ -342,4 +342,13 @@ public T next() {
   }
 };
   }
+
+  /**
+   * Returns {@code true} if this list contains no elements.

Review Comment:
   Should we just keep the other line in the comment?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix 'private' method declared 'final' in PriorityQueue. [lucene]

2025-06-19 Thread via GitHub


stefanvodita merged PR #14815:
URL: https://github.com/apache/lucene/pull/14815


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-19 Thread via GitHub


gf2121 commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2987035325

   OLAP engines splits format to `codec` and `compression`, both configurable. 
For example, you can:
   
   * Use `ForUtil` codec and `LZ4` compression in normal filesystem, cache 
managed by engine.
   * Use `ForUtil` codec and `None` compression in compression filesystem, 
cache managed by operating system.
   
   I guess the problem here is Lucene is wrapping every thing in its `Codec`, 
so some structure is compressed twice in a compression filesystem while others 
expand much in a normal filesystem. And it seems difficult for users to realize 
how much a compressed filesystem affects Lucene numeric docvalue. This is, 
perhaps, leaving too much responsibility on the user side. 
   
   My feeling is still we have something to do here, but i don't really know 
how to do it correctly. Let's keep it :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


uschindler commented on code in PR #14811:
URL: https://github.com/apache/lucene/pull/14811#discussion_r2156332399


##
gradle/validation/forbidden-apis/defaults.all.txt:
##
@@ -76,3 +76,15 @@ java.lang.Math#fma(float,float,float)
 java.lang.Math#fma(double,double,double)
 
 java.lang.Thread#sleep(**) @ Thread.sleep makes inefficient use of resources, 
introduces weird race conditions and slows down the code/tests. Not a scalable 
and good practice so we should prevent it creeping into lucene code
+
+@defaultMessage Using dangerous ClassLoader APIs may deserialize untrusted 
user input into bytecode, leading to remote code execution vulnerabilities
+java.lang.ClassLoader#defineClass(**)
+jdk.internal.misc.Unsafe#defineClass(**)
+jdk.internal.access.JavaLangAccess#defineClass(**)
+# forbidden complains it can't find this enclosed class
+# java.lang.invoke.MethodHandles.Lookup#defineClass(**)

Review Comment:
   In reality most of the signatures here are automatically enabled by default 
(in a more dynamic way).
   
   - `jdk.internal.*` is forbidden by default by disallowing all calls to non 
"public access" modules. See the "jdk-non-portable" bundled signatures: 
https://github.com/policeman-tools/forbidden-apis/wiki/BundledSignatures
   - `sun.misc.Unsafe` is also forbidden by default (same as before: 
https://github.com/policeman-tools/forbidden-apis/wiki/BundledSignatures). 
Adding it here brings more problems than it solves. Because `sun.misc.Unsafe` 
get removed around Java 25 (and especailly that method mentioned here), this 
will cause issues in later Java versions.
   
   Because of that all "internal" and "sun.misc" stuff should be removed here. 
The other ones are fine, 
`java.lang.invoke.MethodHandles.Lookup#defineClass(**)` needs to be 
`java.lang.invoke.MethodHandles$Lookup#defineClass(**)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix 'private' method declared 'final' in PriorityQueue. [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14815:
URL: https://github.com/apache/lucene/pull/14815#issuecomment-2987503989

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14811:
URL: https://github.com/apache/lucene/pull/14811#issuecomment-2987524629

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


uschindler commented on code in PR #14811:
URL: https://github.com/apache/lucene/pull/14811#discussion_r2156654935


##
gradle/validation/forbidden-apis/defaults.all.txt:
##
@@ -76,3 +76,15 @@ java.lang.Math#fma(float,float,float)
 java.lang.Math#fma(double,double,double)
 
 java.lang.Thread#sleep(**) @ Thread.sleep makes inefficient use of resources, 
introduces weird race conditions and slows down the code/tests. Not a scalable 
and good practice so we should prevent it creeping into lucene code
+
+@defaultMessage Using dangerous ClassLoader APIs may deserialize untrusted 
user input into bytecode, leading to remote code execution vulnerabilities
+java.lang.ClassLoader#defineClass(**)
+jdk.internal.misc.Unsafe#defineClass(**)
+jdk.internal.access.JavaLangAccess#defineClass(**)
+# forbidden complains it can't find this enclosed class
+# java.lang.invoke.MethodHandles.Lookup#defineClass(**)

Review Comment:
   All fine. I think thats documented somewhere. But basically the signatures 
files need the "binary name" of all classes.
   I will add the 
[wiki](https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax) 
page to explicitely give an example (in addition to `java.lang.String`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use PriorityQueue instead of TreeMap in FirstPassGroupingCollector. [lucene]

2025-06-19 Thread via GitHub


vsop-479 commented on PR #14813:
URL: https://github.com/apache/lucene/pull/14813#issuecomment-2987535916

   Benchmark: 
   ```
   Benchmark   Mode  Cnt   ScoreError   
Units
   PriorityQueueBenchmark.getTopNWithHeap thrpt   15  ≈ 10⁻³   
ops/us
   PriorityQueueBenchmark.getTopNWithTreeMap  thrpt   15  ≈ 10⁻⁴   
ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #14811:
URL: https://github.com/apache/lucene/pull/14811#issuecomment-2987598490

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog label to 
it and you will stop receiving this reminder on future updates to the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


uschindler commented on code in PR #14811:
URL: https://github.com/apache/lucene/pull/14811#discussion_r2156684715


##
gradle/validation/forbidden-apis/defaults.all.txt:
##
@@ -76,3 +76,9 @@ java.lang.Math#fma(float,float,float)
 java.lang.Math#fma(double,double,double)
 
 java.lang.Thread#sleep(**) @ Thread.sleep makes inefficient use of resources, 
introduces weird race conditions and slows down the code/tests. Not a scalable 
and good practice so we should prevent it creeping into lucene code
+
+@defaultMessage Using dangerous ClassLoader APIs may deserialize untrusted 
user input into bytecode, leading to remote code execution vulnerabilities
+java.lang.ClassLoader#defineClass(**)
+java.lang.invoke.MethodHandles$Lookup#defineClass(**)
+java.rmi.server.RMIClassLoader
+java.net.URLClassLoader#(**)

Review Comment:
   I think this can be removed, because when subclassing you need to call 
`defineClass()` anyways.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


rmuir commented on PR #14811:
URL: https://github.com/apache/lucene/pull/14811#issuecomment-2987609996

   With help of @uschindler, this check is a bit different than how the 
error-prone rules implement it, but more thorough. 
   
   The stated goal of the error-prone check is to prevent turning input into 
code, but it did not really work for us: it only introduced a false positive in 
Luke but did not find any issue with expressions!
   
   So these rules instead target methods that define new code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


uschindler commented on PR #14811:
URL: https://github.com/apache/lucene/pull/14811#issuecomment-2987855642

   Thanks Robert! As always: nice discussions. I will improve the signatures 
file documentation to give a hint how the inner class binary names look like 
(give an example).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]

2025-06-19 Thread via GitHub


uschindler commented on code in PR #14811:
URL: https://github.com/apache/lucene/pull/14811#discussion_r2156288518


##
gradle/validation/forbidden-apis/defaults.all.txt:
##
@@ -76,3 +76,15 @@ java.lang.Math#fma(float,float,float)
 java.lang.Math#fma(double,double,double)
 
 java.lang.Thread#sleep(**) @ Thread.sleep makes inefficient use of resources, 
introduces weird race conditions and slows down the code/tests. Not a scalable 
and good practice so we should prevent it creeping into lucene code
+
+@defaultMessage Using dangerous ClassLoader APIs may deserialize untrusted 
user input into bytecode, leading to remote code execution vulnerabilities
+java.lang.ClassLoader#defineClass(**)
+jdk.internal.misc.Unsafe#defineClass(**)
+jdk.internal.access.JavaLangAccess#defineClass(**)
+# forbidden complains it can't find this enclosed class
+# java.lang.invoke.MethodHandles.Lookup#defineClass(**)

Review Comment:
   Needs to be MethodHandles$Lookup, this is how the real class name looks like.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]

2025-06-19 Thread via GitHub


dweiss commented on code in PR #14812:
URL: https://github.com/apache/lucene/pull/14812#discussion_r2156296401


##
gradle/validation/ast-grep/rules/java-patterns.yml:
##
@@ -0,0 +1,24 @@
+# Banned Lucene source patterns
+# Historically implemented as regexes which are more difficult
+---
+# yaml-language-server: 
$schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json
+id: java-lang-import
+language: java
+rule:
+  pattern: import java.lang.$REF
+  kind: import_declaration
+fix: ""
+severity: error
+message: unnecessary import of `$REF` from java.lang
+note: classes in java.lang are implicitly imported
+---
+# yaml-language-server: 
$schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json
+id: impossible-type-inference
+language: java
+# does the javac compiler allow this? did it ever?
+rule:
+  pattern: var $$$ = new $$$<>($$$)
+  kind: local_variable_declaration

Review Comment:
   I think this can actually be valid code, as in:
   ```
   var list = new ArrayList();
   var foo = new ArrayList<>(list);
   ```
   which compiles just fine but breaks the rule. This pattern doesn't exist 
anywhere in Lucene - I'm actually surprised it doesn't - I use this quite a lot 
elsewhere.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]

2025-06-19 Thread via GitHub


dweiss commented on code in PR #14812:
URL: https://github.com/apache/lucene/pull/14812#discussion_r2156302413


##
gradle/validation/ast-grep/rules/java-patterns.yml:
##
@@ -0,0 +1,24 @@
+# Banned Lucene source patterns
+# Historically implemented as regexes which are more difficult
+---
+# yaml-language-server: 
$schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json
+id: java-lang-import
+language: java
+rule:
+  pattern: import java.lang.$REF
+  kind: import_declaration
+fix: ""

Review Comment:
   > in editor it will show up as light-bulb, user can choose fix as a code 
action.
   
   Can you elaborate on what's your toolstack? Is this the lang server in the 
background? What editor is it? Just curious - it looks very interesting.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org