[PR] skip keyword in German Normalization Filter [lucene]

2025-03-26 Thread via GitHub
xzhang9292 opened a new pull request, #14416: URL: https://github.com/apache/lucene/pull/14416 Current GermanNormalizationFilter tries to normalize special German characters like ä to a, ü to u. For some words it makes sense to do so, äpfel - > apfel is like apples -> apple. But for some wo

Re: [PR] skip keyword for GermanNormalizationFilter [lucene]

2025-03-26 Thread via GitHub
xzhang9292 closed pull request #14414: skip keyword for GermanNormalizationFilter URL: https://github.com/apache/lucene/pull/14414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] skip keyword for GermanNormalizationFilter [lucene]

2025-03-26 Thread via GitHub
xzhang9292 opened a new pull request, #14415: URL: https://github.com/apache/lucene/pull/14415 Current GermanNormalizationFilter tries to normalize special German characters like ä to a, ü to u. For some words it makes sense to do so, äpfel - > apfel is like apples -> apple. But for some wo

Re: [PR] skip keyword for GermanNormalizationFilter [lucene]

2025-03-26 Thread via GitHub
xzhang9292 closed pull request #14415: skip keyword for GermanNormalizationFilter URL: https://github.com/apache/lucene/pull/14415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-26 Thread via GitHub
benwtrent merged PR #14304: URL: https://github.com/apache/lucene/pull/14304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[I] Leverage sparse doc value indexes for range and value facet collection [lucene]

2025-03-26 Thread via GitHub
gsmiller opened a new issue, #14406: URL: https://github.com/apache/lucene/issues/14406 ### Description Spinning off an issue from the discussion in #14273. There are a few ways we can probably leverage sparse doc value indexes for numeric range/value faceting. 1. Use a simi

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
gsmiller commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2754588145 > I like the idea! Looks like we can do similar trick for range facets and long values facets? I _think_ we could optimize these use-cases even further by potentially skipping ov

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2754343679 > I just think autoformat the code in a consistent way, call it a day. I agree, it does not matter which one you pick if it's an automated process. > I don't understand

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2014552652 ## lucene/core/src/java/org/apache/lucene/search/DocIdStream.java: ## @@ -34,12 +33,35 @@ protected DocIdStream() {} * Iterate over doc IDs contained in this strea

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2754350593 We'd probably have to apply reformatting to 10x and main to keep cherry picking easier. Other than that - it's a simple thing to do. -- This is an automated message from the Apach

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2754358922 I will play with the "don't reformat javadoc option". Maybe it's an easier solution to these problems? If we can coerce Google formatter to treat `///` as javadoc then problem solved.

[PR] MultiRange query for SortedNumericc DocValues [lucene]

2025-03-26 Thread via GitHub
mkhludnev opened a new pull request, #14404: URL: https://github.com/apache/lucene/pull/14404 ### Description Extending #13974 idea to SortedNumerics DVs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2754226935 I played with this a bit and reduced noise in two ways: Original file: 113 files changed, 3656 insertions(+), 5216 deletions(-) 1. Disable reformatting of Apache

[I] New merging hnsw failures with BP policy [lucene]

2025-03-26 Thread via GitHub
benwtrent opened a new issue, #14407: URL: https://github.com/apache/lucene/issues/14407 ### Description With the new HNSW merger logic, it seems we have some test failures with how it interacts with BP reordering, etc. ``` java.lang.IllegalStateException: The heap i

Re: [I] New merging hnsw failures with BP policy [lucene]

2025-03-26 Thread via GitHub
benwtrent commented on issue #14407: URL: https://github.com/apache/lucene/issues/14407#issuecomment-2754946165 @mayya-sharipova you might find this interesting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]

2025-03-26 Thread via GitHub
ChrisHegarty opened a new issue, #14408: URL: https://github.com/apache/lucene/issues/14408 With the relatively recent capability to call `madvise` in Lucene, we've started to use `MADV_RANDOM` in several places where it makes conceptual sense, e.g. for accessing vector data when navigating

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2754999433 > If we have a skipper, I think we ought to also be able to use competitive iterators to jump over blocks of docs we know we won't collect based on their values? This is correct.

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755105202 @dweiss I'm wondering if we could send them a PR such that any `///` line comment respects the `--skip-javadoc-formatting` flag (or some other flag to say "dont mess around"). it woul

Re: [I] build support: java 24 [lucene]

2025-03-26 Thread via GitHub
ChrisHegarty commented on issue #14379: URL: https://github.com/apache/lucene/issues/14379#issuecomment-2755090319 Argh! sorry, I caused this issue by upgrading to JDK 23. Maybe that was a mistake, for this reason (a non-LTS can disappear before the tools catch up with the newly released ma

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755090672 https://github.com/google/google-java-format/blob/master/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java#L46-L60 All it takes would be to preserve a

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755132797 @dweiss I think that is because google-java-format uses internal JDK compiler apis to parse it. just like error prone. it is why you have to add all the opens? -- This is an automa

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755128619 Yeah. I'll take a look at that, interesting. Part of the problem is that different Java versions seem to be returning a different tokenization of those comment strings. Seems like so

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755137112 Yes, that's correct - https://github.com/google/google-java-format/issues/1153#issuecomment-2344790653 -- This is an automated message from the Apache Git Service. To respond to th

Re: [I] New testMinMaxScalarQuantize tests failing repeatably [lucene]

2025-03-26 Thread via GitHub
benwtrent closed issue #14402: New testMinMaxScalarQuantize tests failing repeatably URL: https://github.com/apache/lucene/issues/14402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2755985826 It should be ready for review now. Now that `DocIdStream` has become more sophisticated, I extracted impls to proper classes that could be better tested. This causes some diffs in our bo

[PR] Revert "gh-12627: HnswGraphBuilder connects disconnected HNSW graph components (#13566)" [lucene]

2025-03-26 Thread via GitHub
txwei opened a new pull request, #14411: URL: https://github.com/apache/lucene/pull/14411 This reverts commit 217828736c41bfc68065ceb3d5b37c47116ea947. ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2755991200 I'll try to run some simple benchmarks next. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Leverage sparse doc value indexes for range and value facet collection [lucene]

2025-03-26 Thread via GitHub
jpountz commented on issue #14406: URL: https://github.com/apache/lucene/issues/14406#issuecomment-2755995103 > Leverage competitive iteration to skip over blocks of docs that are known not to fall into any of the ranges we are faceting on. Out of curiosity, is it common for the union

[PR] Preparing existing profiler for adding concurrent profiling [lucene]

2025-03-26 Thread via GitHub
jainankitk opened a new pull request, #14413: URL: https://github.com/apache/lucene/pull/14413 ### Description This code change introduces `AbstractQueryProfilerBreakdown` that can be extended by `ConcurrentQueryProfilerBreakdown` to show query profiling information for concurrent se

[PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-26 Thread via GitHub
sgup432 opened a new pull request, #14412: URL: https://github.com/apache/lucene/pull/14412 ### Description Related issue - https://github.com/apache/lucene/issues/14183 This change allows skip cache factor to be updated dynamically within LRU query cache. This can be done by passi

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-26 Thread via GitHub
sgup432 commented on PR #14412: URL: https://github.com/apache/lucene/pull/14412#issuecomment-2756202209 @jpountz Might need your review as discussed in https://github.com/apache/lucene/issues/14183 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Use read advice consistently in the knn vector formats [lucene]

2025-03-26 Thread via GitHub
github-actions[bot] commented on PR #14076: URL: https://github.com/apache/lucene/pull/14076#issuecomment-2756055344 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-03-26 Thread via GitHub
github-actions[bot] commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2756055188 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Opening of vector files with ReadAdvice.RANDOM_PRELOAD [lucene]

2025-03-26 Thread via GitHub
viliam-durina commented on issue #14348: URL: https://github.com/apache/lucene/issues/14348#issuecomment-2755668181 I've ran into issue with this setting now. If the file doesn't actually fit into memory, this read advice hurts the performance significantly. With it, `madvise` is called wit

Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]

2025-03-26 Thread via GitHub
jimczi commented on issue #14408: URL: https://github.com/apache/lucene/issues/14408#issuecomment-2755462640 > Let the defaults be as smart as they need. Maybe check /sys/kernel/mm/lru_gen/enabled as part of the decision-making! But IMO let the user have the final say, in an easy way.

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755707979 @dweiss very nice. the `///` can have leading whitespace in front of it which is preserved too. I dont know how their parser works but you can simulate the leading-case by adding a me

Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]

2025-03-26 Thread via GitHub
rmuir commented on issue #14408: URL: https://github.com/apache/lucene/issues/14408#issuecomment-2755662166 > The Linux change targets both MGLRU and normal LRU. The impact is more pronounced in MGLRU, as page reclamation is more aggressive there. However, the semantic change for this advic

[PR] Fix test delta in minMaxScalarQuantize [lucene]

2025-03-26 Thread via GitHub
thecoop opened a new pull request, #14403: URL: https://github.com/apache/lucene/pull/14403 Delta was a bit too small. Resolves #14402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2754235469 For the record those diffstats were based on `./gradlew -p lucene/suggest spotlessApply` and include the changes of the patch/formatter XML itself -- This is an automated message fr

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
gsmiller commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2014273684 ## lucene/core/src/java/org/apache/lucene/search/DocIdStream.java: ## @@ -34,12 +33,35 @@ protected DocIdStream() {} * Iterate over doc IDs contained in this stre

Re: [PR] Preparing existing profiler for adding concurrent profiling [lucene]

2025-03-26 Thread via GitHub
jpountz commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2756886264 Can you explain why we need two impls? I would have assumed that the `ConcurrentQueryProfilerBreakdown` could also be used for searches that are not concurrent? -- This is an automate

Re: [PR] skip keyword in German Normalization Filter [lucene]

2025-03-26 Thread via GitHub
rmuir commented on PR #14416: URL: https://github.com/apache/lucene/pull/14416#issuecomment-2756917145 This keyword is legacy, for stemmers not normalizers. Just use ProtectedTermFilter which works with any tokenfilter without requiring modification to its code? -- This is an automated m

[PR] skip keyword for GermanNormalizationFilter [lucene]

2025-03-26 Thread via GitHub
xzhang9292 opened a new pull request, #14414: URL: https://github.com/apache/lucene/pull/14414 Current GermanNormalizationFilter tries to normalize special German characters like ä to a, ü to u. For some words it makes sense to do so, äpfel - > apfel is like apples -> apple. But for some

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-26 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755678098 Here is what I did. * added a brute-force non-formatting to any /// line comments in my fork of google-java-format [1] * added a local, precompiled binary of the above to my for