[PR] Utilize `docIdRunEnd` on `ReqExclBulkScorer` [lucene]
HUSTERGS opened a new pull request, #14806: URL: https://github.com/apache/lucene/pull/14806 ### Description This PR propose to utilize `docIdRunEnd` on `ReqExclBulkScorer`, so we can jump faster on `MUST_NOT` clause -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Utilize `docIdRunEnd` on `ReqExclBulkScorer` [lucene]
github-actions[bot] commented on PR #14806: URL: https://github.com/apache/lucene/pull/14806#issuecomment-2982992815 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2983643568 The reason it's never up to date with a custom formatting step: https://github.com/diffplug/spotless/issues/2516 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2983964771 It is wicked fast indeed (uses all cores). ``` dweiss@dweiss-beast:~/work/apache/lucene$ time ast-grep scan . -r /home/dweiss/tmp/no-wildcard-imports.sg.yaml ... real 0m0.738s user 0m6.995s sys 0m0.269s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Utilize `docIdRunEnd` on `ReqExclBulkScorer` [lucene]
gf2121 commented on code in PR #14806: URL: https://github.com/apache/lucene/pull/14806#discussion_r2154439718 ## lucene/core/src/java/org/apache/lucene/search/ReqExclBulkScorer.java: ## @@ -57,7 +57,10 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, int max) thr exclDoc = exclApproximation.advance(upTo); } if (exclDoc == upTo) { -if (exclTwoPhase == null || exclTwoPhase.matches()) { +if (exclTwoPhase == null) { + // from upTo to docIdRunEnd() are excluded, so we scored up to docIdRunEnd() + upTo = exclApproximation.docIDRunEnd(); +} else if (exclTwoPhase.matches()) { Review Comment: Is `upTo = Math.max(upTo + 1, exclTwoPhase.docIdRunEnd())` correct here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Utilize `docIdRunEnd` on `ReqExclBulkScorer` [lucene]
gf2121 commented on code in PR #14806: URL: https://github.com/apache/lucene/pull/14806#discussion_r2154454156 ## lucene/core/src/java/org/apache/lucene/search/ReqExclBulkScorer.java: ## @@ -57,7 +57,10 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, int max) thr exclDoc = exclApproximation.advance(upTo); } if (exclDoc == upTo) { -if (exclTwoPhase == null || exclTwoPhase.matches()) { +if (exclTwoPhase == null) { + // from upTo to docIdRunEnd() are excluded, so we scored up to docIdRunEnd() + upTo = exclApproximation.docIDRunEnd(); Review Comment: Do we need to check `exclApproximation.docID() != DocIdSetIterator#NO_MORE_DOCS`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make `pack` methods public for `BigIntegerPoint` and `HalfFloatPoint` [lucene]
jpountz merged PR #14784: URL: https://github.com/apache/lucene/pull/14784 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add prefetching for terms dict in doc values [lucene]
jpountz commented on PR #14773: URL: https://github.com/apache/lucene/pull/14773#issuecomment-2983993101 Doing this is a bit less obvious in my mind since terms dictionaris are allowed to have a random access pattern, when doc-value iterators are required to be consumed in order? Somewhat separately, I've been wondering if we need to APIs, one for saying "I'm going to use this field" and pass the full range of data, and another one to say "I'm going to use this specific range of bytes", which would be what `IndexInput#prefetch` does today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add prefetching for terms dict in doc values [lucene]
easyice closed pull request #14773: Add prefetching for terms dict in doc values URL: https://github.com/apache/lucene/pull/14773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add prefetching for terms dict in doc values [lucene]
easyice commented on PR #14773: URL: https://github.com/apache/lucene/pull/14773#issuecomment-2984815277 Thanks for the explanation, You're right, I'll close this PR. I feel an API to prefetch full range of data could be useful, e.g. for files like `.tip`, which are relatively small but have a significant impact on read performance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Utilize `docIdRunEnd` on `ReqExclBulkScorer` [lucene]
github-actions[bot] commented on PR #14806: URL: https://github.com/apache/lucene/pull/14806#issuecomment-2983425250 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Utilize `docIdRunEnd` on `ReqExclBulkScorer` [lucene]
github-actions[bot] commented on PR #14806: URL: https://github.com/apache/lucene/pull/14806#issuecomment-2983430347 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Utilize `docIdRunEnd` on `ReqExclBulkScorer` [lucene]
HUSTERGS commented on code in PR #14806: URL: https://github.com/apache/lucene/pull/14806#discussion_r2154727636 ## lucene/core/src/java/org/apache/lucene/search/ReqExclBulkScorer.java: ## @@ -57,7 +57,10 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, int max) thr exclDoc = exclApproximation.advance(upTo); } if (exclDoc == upTo) { -if (exclTwoPhase == null || exclTwoPhase.matches()) { +if (exclTwoPhase == null) { + // from upTo to docIdRunEnd() are excluded, so we scored up to docIdRunEnd() + upTo = exclApproximation.docIDRunEnd(); +} else if (exclTwoPhase.matches()) { Review Comment: I think it's correct here, and I run lucene test locally, everything looks fine. What I'm concerned here is that only `DocValuesRangeIterator` implement it's own `docIdRunEnd` method for now, I'm not sure whether adding this will hurt the performance or not -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2983757840 Error prone has a rule for this, actually - ``` diff --git a/build-tools/build-infra/src/main/groovy/lucene.validation.error-prone.gradle b/build-tools/build-infra/src/main/groovy/lucene.validation.error-prone.gradle index 49a4a067af6..1070346a19f 100644 --- a/build-tools/build-infra/src/main/groovy/lucene.validation.error-prone.gradle +++ b/build-tools/build-infra/src/main/groovy/lucene.validation.error-prone.gradle @@ -331,6 +331,7 @@ allprojects { prj -> '-Xep:UnsafeWildcard:ERROR', '-Xep:UnusedAnonymousClass:ERROR', '-Xep:UnusedCollectionModifiedInPlace:ERROR', + "-Xep:WildcardImport:ERROR", '-Xep:VarTypeName:ERROR', // '-Xep:WrongOneof:OFF', // we don't use protobuf '-Xep:XorPower:ERROR', ``` I know we only run it on a CI but ast-grep requires an external tool, so maybe it's a simpler/ better alternative? This is what the output looks like: ``` > Task :lucene:core:compileJava /home/dweiss/mounts/ocean/work/apache/lucene/lucene/core/src/java/org/apache/lucene/internal/hppc/IntIntHashMap.java:20: error: [WildcardImport] Wildcard imports, static or otherwise, should not be used import static org.apache.lucene.internal.hppc.HashContainers.*; ^ (see https://google.github.io/styleguide/javaguide.html?cl=head#s3.3.1-wildcard-imports) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduce getQuantizedVectorValues method in LeafReader to access QuantizedByteVectorValues [lucene]
benwtrent commented on PR #14792: URL: https://github.com/apache/lucene/pull/14792#issuecomment-2983852051 @Pulkitg64 I think the format has access to the quantized vectors. I am saying we shouldn't add a new `getQuantizedVEctors` to the leaf or kNN APIs. When the `vec` file isn't present, the format should still return a `FloatVectorValues`, and when folks call `float[] getVector(int ord)`, they get the de-quantized representation of the vector. This should require no new public facing APIs anywhere. Here is the idea. ``` public FloatVectorValues getFloatVectorValues(String field) throws IOException { return OffHeapQuantizedFloatVectorValues.load(...); } ... static class OffHeapQuantizedFloatVectorValues extends FloatVectorValues { final float[] scratch; int curOrd = -1; ... public float[] vectorValue(int ord) throws IOException { if (ord == curOrd) return scratch; dequantize(ord); curOrd = ord; return scratch; } private void dequantize(int ord) throws IOException { //read quantized values and parameters // dequantize into scratch } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
rmuir commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2983891154 yeah, that's fine. we can discuss it separately maybe. Elsewhere I use the tool a lot, replace entire slow linters with it! It is wicked fast... there's a strange cultural thing at play here, with all the other programming languages, "external tools" written in rust are heavily adopted, makes everything so fast on the dev side. It isn't just python here but e.g. microsoft rewriting typescript in golang, bun, etc. with java, and only java, there's this strange NIH cultural thing where all tooling must itself also be written in java. I don't understand it, but it makes everything really slow :) The recent "git" is a great example, insanely fast tool designed by literally, the best, yet there's pressure to use a crappy java substitute. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2983916752 I have noticed this as well - in the js world most (fast) things use a rust backend. Anyway, I'll fiddle with this a bit, it's not a high priority thing. Maybe make it optionally available, if the tool is present. I'll see. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2982832608 I'll take a look, sure. I'll also try to figure out why the heck spotless isn't behaving the way it should. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduce getQuantizedVectorValues method in LeafReader to access QuantizedByteVectorValues [lucene]
Pulkitg64 commented on PR #14792: URL: https://github.com/apache/lucene/pull/14792#issuecomment-2984235394 Oh I understand now what you meant in your comments. This approach is much more cleaner and doesn't require any new API addition. Will raise a new revision in some time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] .editorconfig [lucene]
dsmiley commented on PR #14740: URL: https://github.com/apache/lucene/pull/14740#issuecomment-2984277753 The existing editorconfig was redundant with information in this editorconfig, so I simply removed it. I'll merge this tonight. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Remove unused delGen() method from FieldTermIterator [lucene]
romseygeek opened a new pull request, #14807: URL: https://github.com/apache/lucene/pull/14807 This was used in the past to correctly order terms in MergedPrefixCodedTermsIterator, but that class was removed long ago so the method is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove unused delGen() method from FieldTermIterator [lucene]
github-actions[bot] commented on PR #14807: URL: https://github.com/apache/lucene/pull/14807#issuecomment-2984306433 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2984368443 I've added a task that runs ast-grep on the CI and locally, if you define ```lucene.tool.ast-grep``` property (for example, ```lucene.tool.ast-grep=ast-grep``` or permanently, in local build options or gradle properties).  Opens up the possibility to add new rules too (gradle/validation/ast-grep). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2984370238 I'll try to finalize this patch later today, have to leave now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Utilize `docIdRunEnd` on `ReqExclBulkScorer` [lucene]
HUSTERGS commented on code in PR #14806: URL: https://github.com/apache/lucene/pull/14806#discussion_r2154711602 ## lucene/core/src/java/org/apache/lucene/search/ReqExclBulkScorer.java: ## @@ -57,7 +57,10 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, int max) thr exclDoc = exclApproximation.advance(upTo); } if (exclDoc == upTo) { -if (exclTwoPhase == null || exclTwoPhase.matches()) { +if (exclTwoPhase == null) { + // from upTo to docIdRunEnd() are excluded, so we scored up to docIdRunEnd() + upTo = exclApproximation.docIDRunEnd(); Review Comment: If I understand correctly, `exclApproximation.docID()` should equals `exclDoc`, which is equal to `upTo` (under the `if` clause ), and `upTo` is less than `max`, so `exclApproximation.docID()` should never be `DocIdSetIterator#NO_MORE_DOCS`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduce getQuantizedVectorValues method in LeafReader to access QuantizedByteVectorValues [lucene]
Pulkitg64 commented on PR #14792: URL: https://github.com/apache/lucene/pull/14792#issuecomment-2983706927 Thanks @benwtrent @msokolov > Again, I think we should do the nice thing, de-quantize the vectors as the user asks for them. Sorry, I am surely missing something. But if we can't access the quantized vector then how can we do this? We still need byte vector to rehydrate float vectors, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
rmuir commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2984212372 In my builds, I install such tools via `pip` or `npm` package managers. So it's no different than downloading jar and using that, just faster. Just food for thought. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
rmuir commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2984219215 The python linting, formatting, type checking in dev tools is an example. None of those are written in the python programming language. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Compression cache of numeric docvalues [lucene]
rmuir commented on issue #14803: URL: https://github.com/apache/lucene/issues/14803#issuecomment-2985347971 The advantage of letting a filesystem such as zfs (which was designed to do exactly this), is that it is integrated in the correct place and operating system caches work as expected. It is best to let the OS handle the caching, it will do a better job. Lucene caching the docvalues to me is just a step backwards to "fieldcache" which did not work well and caused a lot of operational pain. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Add an optional bandwidth cap to `TieredMergePolicy`? [lucene]
nipunbatra8 commented on issue #14148: URL: https://github.com/apache/lucene/issues/14148#issuecomment-2985361444 Hello, I will be diving into this issue for my summer internship. Looking forward to discussing more in the near future and contributing to this project! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
rmuir commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2985443802 Another related idea here is to prevent any IDE from automatically adding the *'s we don't want. Eclipse's default limit is 99 imports before it starts doing this... of course user can change, and some rare files might exceed this. there's a way to bump it to huge value (e.g. Integer.MAX_VALUE) in our eclipse settings, i think, but i don't have a concrete solution: https://github.com/eclipse-jdtls/eclipse.jdt.ls/blob/0b3651f030e2e68e43019268f35880a07aa7a8c5/org.eclipse.jdt.ls.core/src/org/eclipse/jdt/ls/core/internal/preferences/Preferences.java#L345-L367 I don't know how intellij behaves here or if we can prevent similar behavior. All this stuff can be followups, really, I'm just brainstorming. Thanks for the ast-grep though!!! Gives me a way to help speed up the checks :) I will leave a few comments, not review ones intended to block the change. I can help out with followup PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
rmuir commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155360626 ## gradle/validation/ast-grep/sgconfig.yml: ## @@ -0,0 +1,2 @@ +ruleDirs: + - ./rules Review Comment: I've found the built-in self-test to be essential for having and keeping correct rules. It takes milliseconds and they are easy to write. ```suggestion - ./rules testConfigs: - testDir: ./tests ``` I run these with `ast-grep test --skip-snapshot-tests` before the `ast-grep scan`. For a rule like this one, I would add tests (untested!) like this: `tests/errorprone.yml` (which would test rules in `rules/errorprone.yml`. I organize them this way with `---` yaml doc separator between the rules in both files. For multiline tests, I use yaml literal style (`|`): https://yaml.org/spec/1.2-old/spec.html#id2795688 ```yaml --- # test wildcard import detector id: wildcard-import-not-allowed valid: - import foo.bar.Baz; - import static foo.bar.Baz; invalid: - import foo.bar.*; - import static foo.bar.*; --- # test for another error prone rule id: some-other-errorprone-rule invalid: - | if (x.equals(null) { } - | foo = x.equals(null) ? 1 : 2; # ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Add support for running ast-grep rules [lucene]
dweiss opened a new pull request, #14808: URL: https://github.com/apache/lucene/pull/14808 This is a spinoff from #14804. We should allow running ```ast-grep``` (https://ast-grep.github.io/) rules as validation checks. This is a powerful tool (and fast). For now, running ast-grep is optional. You need to install ast-grep manually and then set this build option to the location (or name) of of the ast-grep tool: ``` lucene.tool.ast-grep=ast-grep ``` I recommend doing this in ```build-options.local.properties```. If this build option is _not set_, the entire task (ruleset) is ignored and a warning is printed. I've set up CI jobs on github to install and run ast-grep as part of pull requests and on-commit checks for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for running ast-grep rules [lucene]
github-actions[bot] commented on PR #14808: URL: https://github.com/apache/lucene/pull/14808#issuecomment-2985485459 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2985485780 I've forked the ast-grep thing into a separate PR - https://github.com/apache/lucene/pull/14808. I think we can merge that one in, then I can apply a smaller patch for wildcards, using your rule. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155370086 ## gradle/validation/ast-grep/sgconfig.yml: ## @@ -0,0 +1,2 @@ +ruleDirs: + - ./rules Review Comment: > I run these with ast-grep test --skip-snapshot-tests before the ast-grep scan. We can integrate this check into gradle as well. These are nice indeed - I've seen the docs on the tool. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]
rmuir commented on PR #14812: URL: https://github.com/apache/lucene/pull/14812#issuecomment-2986074641 This is just starting small. I really want to next clean up some of the rat interaction in this file, especially given the changing APIs and thread-unsafeness that makes the gradle code complex. A lot of the logic around license/headers can be done here instead, especially since it isn't a "legal" check (rat-sources does that). I already use queries to match these headers in my editor and hide, e.g.: ```scheme ;; fold and initially close license headers (program (block_comment) @fold @foldclose (#lua-match? @foldclose "^/[*%s]*Licensed to the Apache.*")) ``` We can do similar stuff with yaml rules and ensure licenses aren't javadoc, come before package declaration, whatever all is going on here, and simplify the build. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155429690 ## gradle/validation/ast-grep/sgconfig.yml: ## @@ -0,0 +1,2 @@ +ruleDirs: + - ./rules Review Comment: I've added it, nice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for running ast-grep rules [lucene]
rmuir commented on PR #14808: URL: https://github.com/apache/lucene/pull/14808#issuecomment-2985574748 Thank you @dweiss, I was just slow, was testing it out. I set the property locally, and added rule and test: ```yaml --- # yaml-language-server: $schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/rule.json id: java-syntax language: java rule: kind: ERROR message: Syntax Error severity: error ``` ```yaml --- # test the parser is working id: java-syntax valid: - x = 1; ``` I introduced bogus syntax to a file and the build then failed quickly on the problem (it is just an example and we probably should not do this): ``` > Task :testAstGrepRules Running 1 tests --- Case Details --- PASS java-syntax . test result: ok. 1 passed; 0 failed; > Task :applyAstGrepRules error[java-syntax]: Syntax Error ┌─ lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:145:50 │ 145 │ public List getLeafContexts([) { │ ^ Error: 1 error(s) found in code. Help: Scan succeeded and found error level diagnostics in the codebase. > Task :applyAstGrepRules FAILED ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
rmuir commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155451618 ## gradle/validation/ast-grep/rules/no-wildcard-imports.sg.yaml: ## @@ -0,0 +1,16 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json +id: wildcard-import-not-allowed +language: java +rule: + kind: asterisk + inside: +kind: import_declaration +severity: error +message: don't use wildcard imports +note: please use full imports instead +url: https://google.github.io/styleguide/javaguide.html?cl=head#s3.3.1-wildcard-imports Review Comment: I can see pointing at the style guide, but as far as being practical, i wonder if we can name the file errorprone.yml and name the checks accordingly (e.g. `WildcardImport` and link to their https://errorprone.info/bugpattern/WildcardImport url. It is just how i've been organizing them, after seeing others use similar approach. I also think the errorprone URL is more helpful to end-user in fixing such issues (specs are more difficult to read), but that's just me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2985617584 Ok. I think this one is ready. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] error-prone: replace ComparingThisWithNull with ast-grep rule [lucene]
github-actions[bot] commented on PR #14810: URL: https://github.com/apache/lucene/pull/14810#issuecomment-2985745189 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] error-prone: replace ComparingThisWithNull with ast-grep rule [lucene]
rmuir opened a new pull request, #14810: URL: https://github.com/apache/lucene/pull/14810 This is just an example of a dead-simple check to cross off the list. organize the error-prone rules into an errorprone.yml file with matching linter IDs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Correct rat source patterns to include build-infra sources [lucene]
dweiss merged PR #14809: URL: https://github.com/apache/lucene/pull/14809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
rmuir commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2985386981 Thanks for playing with it. I would be happy to try to write ast-grep rules to try to replace some of the slower error-prone checks, or whatever else (nocommit checks etc). We could install the package via pip or npm in the GH actions to support the checker. At just a glance, many of the rules are basically trivial e.g. https://errorprone.info/bugpattern/EqualsNull. From what I remember, each check has a high overhead for errorprone which I don't believe is the case for ast-grep. I know I'm oversimplifying it, but I assume each file is parsed only once and then all rules run similar to https://tree-sitter.github.io/tree-sitter/using-parsers/queries/1-syntax.html, but without the s-expressions :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]
rmuir opened a new pull request, #14811: URL: https://github.com/apache/lucene/pull/14811 This rule is attempting to ban dangerous usages of ClassLoader which could be security hazard (IMO a good idea). forbidden APIs is a good tool for banning, just like we use forbidden-apis to ban JNDI and other things. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]
github-actions[bot] commented on PR #14811: URL: https://github.com/apache/lucene/pull/14811#issuecomment-2985810339 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]
rmuir commented on code in PR #14811: URL: https://github.com/apache/lucene/pull/14811#discussion_r2155576445 ## gradle/validation/forbidden-apis/defaults.all.txt: ## @@ -76,3 +76,15 @@ java.lang.Math#fma(float,float,float) java.lang.Math#fma(double,double,double) java.lang.Thread#sleep(**) @ Thread.sleep makes inefficient use of resources, introduces weird race conditions and slows down the code/tests. Not a scalable and good practice so we should prevent it creeping into lucene code + +@defaultMessage Using dangerous ClassLoader APIs may deserialize untrusted user input into bytecode, leading to remote code execution vulnerabilities +java.lang.ClassLoader#defineClass(**) +jdk.internal.misc.Unsafe#defineClass(**) +jdk.internal.access.JavaLangAccess#defineClass(**) +# forbidden complains it can't find this enclosed class +# java.lang.invoke.MethodHandles.Lookup#defineClass(**) Review Comment: @uschindler if you have any ideas, i'd love to fix this. This is just using the same banned list that error-prone uses, except as forbidden signatures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]
github-actions[bot] commented on PR #14811: URL: https://github.com/apache/lucene/pull/14811#issuecomment-2985834553 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] error-prone: implement BanClassLoader with forbidden-apis instead [lucene]
rmuir commented on PR #14811: URL: https://github.com/apache/lucene/pull/14811#issuecomment-2985845882 Source of the check: https://github.com/google/error-prone/blob/master/core/src/main/java/com/google/errorprone/bugpatterns/BanClassLoader.java It also checks for explicit `extends URLClassLoader`, which isn't handled here. I'm not sure what special powers subclassing gives, maybe more methods need to be banned. Also maybe some of these don't need to be banned because they are impossible due to java module system? I think google is still on java 8 :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for running ast-grep rules [lucene]
dweiss merged PR #14808: URL: https://github.com/apache/lucene/pull/14808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for running ast-grep rules [lucene]
dweiss commented on PR #14808: URL: https://github.com/apache/lucene/pull/14808#issuecomment-2985638437 Sorry for not waiting, I wanted to proceed. Please open a new issue with those rules. This one looks very neat indeed - we can even force it to run prior to javac. It's a C based parser for java... remember jikes? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Fail spotless check for wildcard imports [lucene]
dweiss closed issue #14553: Fail spotless check for wildcard imports URL: https://github.com/apache/lucene/issues/14553 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss merged PR #14804: URL: https://github.com/apache/lucene/pull/14804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155467101 ## gradle/validation/ast-grep/rules/no-wildcard-imports.sg.yaml: ## @@ -0,0 +1,16 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json +id: wildcard-import-not-allowed +language: java +rule: + kind: asterisk + inside: +kind: import_declaration +severity: error +message: don't use wildcard imports +note: please use full imports instead +url: https://google.github.io/styleguide/javaguide.html?cl=head#s3.3.1-wildcard-imports Review Comment: Thanks for coming up with this great idea - it seems like it opens up a lot of new possibilities. Definitely better than the custom formatter step I initially added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
rmuir commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155464257 ## gradle/validation/ast-grep/rules/no-wildcard-imports.sg.yaml: ## @@ -0,0 +1,16 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json +id: wildcard-import-not-allowed +language: java +rule: + kind: asterisk + inside: +kind: import_declaration +severity: error +message: don't use wildcard imports +note: please use full imports instead +url: https://google.github.io/styleguide/javaguide.html?cl=head#s3.3.1-wildcard-imports Review Comment: let's just commit yours, i'll try to see if i can find some easy-wins in the errorprone rule list and organize while I'm there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Correct rat source patterns to include build-infra sources [lucene]
dweiss opened a new pull request, #14809: URL: https://github.com/apache/lucene/pull/14809 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2985545202 > We should be able to make it easier for IntelliJ to do the right thing without too much trouble. One option is to use `gradle-idea-ext`, I've never used this plugin - have it on my "to-check" list (!). It would be awesome to provide more reasonable IntelliJ defaults. My experience with using google-java-format plugin is that once you enable it, the import section formatting settings are completely ignored... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for running ast-grep rules [lucene]
dweiss commented on PR #14808: URL: https://github.com/apache/lucene/pull/14808#issuecomment-2985547456 I'll allow myself to merge this in without a review. I don't think anybody will object (but if anybody does, shout out). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
rmuir commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155375027 ## gradle/validation/ast-grep/rules/no-wildcard-imports.sg.yaml: ## @@ -0,0 +1,15 @@ +id: wildcard-import-not-allowed Review Comment: ```suggestion # yaml-language-server: $schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json id: wildcard-import-not-allowed ``` I add the comment above each rule, users with https://github.com/redhat-developer/yaml-language-server get autocomplete / validation / hover documentation in editors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
msfroh commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2985537756 > I don't know how intellij behaves here or if we can prevent similar behavior. All this stuff can be followups, really, I'm just brainstorming. We should be able to make it easier for IntelliJ to do the right thing without too much trouble. One option is to use `gradle-idea-ext`, which is what we do on OpenSearch ([ref](https://github.com/opensearch-project/OpenSearch/blob/6ff44d9b5e1f1f6f22b9935d21e408957383e940/gradle/ide.gradle#L67)). Another option is to use a `.editorconfig` file, where there are some IntelliJ-specific extensions (like `ij_java_names_count_to_use_import_on_demand`) that can crank up the threshold for wildcard imports. I can take a shot at a separate PR to help out there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on PR #14804: URL: https://github.com/apache/lucene/pull/14804#issuecomment-2985418128 Well, if this patch goes in, all it takes to add a rule is to create a new file in gradle/validation/ast-grep - it should be picked up automatically. We can also make ast-grep a requirement (much like python, perl and git) in the long term if it proves to be beneficial. I've taken a look at the tree sitter too, very interesting stuff. Gotta love how with all the cool languages out there, it's still C that prevails. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]
rmuir opened a new pull request, #14812: URL: https://github.com/apache/lucene/pull/14812 The validation.source-patterns.gradle is intimidating, there's a lot going on here. Many of the rules are doing regex matches, which is tricky to maintain and can't give good error messages, autofixes in IDE, relevant explanations/URLs, etc. Start with the 'invalidJavaOnlyPatterns' because it is one of many special cases, and better to just implement as ast-grep java rules. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]
github-actions[bot] commented on PR #14812: URL: https://github.com/apache/lucene/pull/14812#issuecomment-2986028296 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] build: replace invalidJavaOnlyPatterns source-patterns with ast-grep [lucene]
rmuir commented on code in PR #14812: URL: https://github.com/apache/lucene/pull/14812#discussion_r2155783331 ## gradle/validation/ast-grep/rules/java-patterns.yml: ## @@ -0,0 +1,24 @@ +# Banned Lucene source patterns +# Historically implemented as regexes which are more difficult +--- +# yaml-language-server: $schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json +id: java-lang-import +language: java +rule: + pattern: import java.lang.$REF + kind: import_declaration +fix: "" Review Comment: example of rule with a "fix". If it is safe and easy, it can make things nice, especially if rule is on the nitpicky side. * in editor it will show up as light-bulb, user can choose fix as a code action. * `scan -U` could be integrated if we wanted for some reason. though i'd recommend "tidy" after any such thing. * error messages improve, the problem+fix displays as diff, which helps clarify the issue: ``` > Task :applyAstGrepRules lucene/core/src/java/org/apache/lucene/index/IndexWriter.java error[java-lang-import]: unnecessary import of `String` from java.lang @@ -20,7 +20,7 @@ 21 21│ 22 22│ import java.io.Closeable; 23 23│ import java.io.IOException; 24 │-import java.lang.String; 24│+ 25 25│ import java.time.Instant; 26 26│ import java.util.ArrayDeque; 27 27│ import java.util.ArrayList; Note: classes in java.lang are implicitly imported Error: 1 error(s) found in code. Help: Scan succeeded and found error level diagnostics in the codebase. > Task :applyAstGrepRules FAILED ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155455149 ## gradle/validation/ast-grep/rules/no-wildcard-imports.sg.yaml: ## @@ -0,0 +1,16 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json +id: wildcard-import-not-allowed +language: java +rule: + kind: asterisk + inside: +kind: import_declaration +severity: error +message: don't use wildcard imports +note: please use full imports instead +url: https://google.github.io/styleguide/javaguide.html?cl=head#s3.3.1-wildcard-imports Review Comment: Up to you, entirely. I just pasted... something. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155456818 ## gradle/validation/ast-grep/rules/no-wildcard-imports.sg.yaml: ## @@ -0,0 +1,16 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json +id: wildcard-import-not-allowed +language: java +rule: + kind: asterisk + inside: +kind: import_declaration +severity: error +message: don't use wildcard imports +note: please use full imports instead +url: https://google.github.io/styleguide/javaguide.html?cl=head#s3.3.1-wildcard-imports Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
dweiss commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155458113 ## gradle/validation/ast-grep/rules/no-wildcard-imports.sg.yaml: ## @@ -0,0 +1,16 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json +id: wildcard-import-not-allowed +language: java +rule: + kind: asterisk + inside: +kind: import_declaration +severity: error +message: don't use wildcard imports +note: please use full imports instead +url: https://google.github.io/styleguide/javaguide.html?cl=head#s3.3.1-wildcard-imports Review Comment: Oh, if you'd like to rename the rules - feel free to just commit whatever it is your gut tells you. I think I'm done here - once you think it's ready, just squash to main, please. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] detect and ban wildcard imports in Java [lucene]
rmuir commented on code in PR #14804: URL: https://github.com/apache/lucene/pull/14804#discussion_r2155458055 ## gradle/validation/ast-grep/rules/no-wildcard-imports.sg.yaml: ## @@ -0,0 +1,16 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/ast-grep/ast-grep/refs/heads/main/schemas/java_rule.json +id: wildcard-import-not-allowed +language: java +rule: + kind: asterisk + inside: +kind: import_declaration +severity: error +message: don't use wildcard imports +note: please use full imports instead +url: https://google.github.io/styleguide/javaguide.html?cl=head#s3.3.1-wildcard-imports Review Comment: we can always change it. that particular description is really cool too, because of how meta this PR is. By banning these wildcards, we can do more with these simple rules :) Just as an example, you could write unused import rule which would work correctly (if you wanted to), whereas with wildcards its not reasonable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Utilize `docIdRunEnd` on `ReqExclBulkScorer` [lucene]
gf2121 commented on code in PR #14806: URL: https://github.com/apache/lucene/pull/14806#discussion_r215686 ## lucene/core/src/java/org/apache/lucene/search/ReqExclBulkScorer.java: ## @@ -57,7 +57,10 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, int max) thr exclDoc = exclApproximation.advance(upTo); } if (exclDoc == upTo) { -if (exclTwoPhase == null || exclTwoPhase.matches()) { +if (exclTwoPhase == null) { + // from upTo to docIdRunEnd() are excluded, so we scored up to docIdRunEnd() + upTo = exclApproximation.docIDRunEnd(); Review Comment: Oh yes, You are right. Thanks for explanation! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org