xzhang9292 opened a new pull request, #14416:
URL: https://github.com/apache/lucene/pull/14416
Current GermanNormalizationFilter tries to normalize special German
characters like ä to a, ü to u. For some words it makes sense to do so, äpfel -
> apfel is like apples -> apple. But for some wo
xzhang9292 closed pull request #14414: skip keyword for
GermanNormalizationFilter
URL: https://github.com/apache/lucene/pull/14414
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
xzhang9292 opened a new pull request, #14415:
URL: https://github.com/apache/lucene/pull/14415
Current GermanNormalizationFilter tries to normalize special German
characters like ä to a, ü to u. For some words it makes sense to do so, äpfel -
> apfel is like apples -> apple. But for some wo
xzhang9292 closed pull request #14415: skip keyword for
GermanNormalizationFilter
URL: https://github.com/apache/lucene/pull/14415
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
benwtrent merged PR #14304:
URL: https://github.com/apache/lucene/pull/14304
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
gsmiller opened a new issue, #14406:
URL: https://github.com/apache/lucene/issues/14406
### Description
Spinning off an issue from the discussion in #14273.
There are a few ways we can probably leverage sparse doc value indexes for
numeric range/value faceting.
1. Use a simi
gsmiller commented on PR #14273:
URL: https://github.com/apache/lucene/pull/14273#issuecomment-2754588145
> I like the idea! Looks like we can do similar trick for range facets and
long values facets?
I _think_ we could optimize these use-cases even further by potentially
skipping ov
dweiss commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2754343679
> I just think autoformat the code in a consistent way, call it a day.
I agree, it does not matter which one you pick if it's an automated process.
> I don't understand
jpountz commented on code in PR #14273:
URL: https://github.com/apache/lucene/pull/14273#discussion_r2014552652
##
lucene/core/src/java/org/apache/lucene/search/DocIdStream.java:
##
@@ -34,12 +33,35 @@ protected DocIdStream() {}
* Iterate over doc IDs contained in this strea
dweiss commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2754350593
We'd probably have to apply reformatting to 10x and main to keep cherry
picking easier. Other than that - it's a simple thing to do.
--
This is an automated message from the Apach
rmuir commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2754358922
I will play with the "don't reformat javadoc option". Maybe it's an easier
solution to these problems? If we can coerce Google formatter to treat `///` as
javadoc then problem solved.
mkhludnev opened a new pull request, #14404:
URL: https://github.com/apache/lucene/pull/14404
### Description
Extending #13974 idea to SortedNumerics DVs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
rmuir commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2754226935
I played with this a bit and reduced noise in two ways:
Original file:
113 files changed, 3656 insertions(+), 5216 deletions(-)
1. Disable reformatting of Apache
benwtrent opened a new issue, #14407:
URL: https://github.com/apache/lucene/issues/14407
### Description
With the new HNSW merger logic, it seems we have some test failures with how
it interacts with BP reordering, etc.
```
java.lang.IllegalStateException: The heap i
benwtrent commented on issue #14407:
URL: https://github.com/apache/lucene/issues/14407#issuecomment-2754946165
@mayya-sharipova you might find this interesting.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
ChrisHegarty opened a new issue, #14408:
URL: https://github.com/apache/lucene/issues/14408
With the relatively recent capability to call `madvise` in Lucene, we've
started to use `MADV_RANDOM` in several places where it makes conceptual sense,
e.g. for accessing vector data when navigating
jpountz commented on PR #14273:
URL: https://github.com/apache/lucene/pull/14273#issuecomment-2754999433
> If we have a skipper, I think we ought to also be able to use competitive
iterators to jump over blocks of docs we know we won't collect based on their
values?
This is correct.
rmuir commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755105202
@dweiss I'm wondering if we could send them a PR such that any `///` line
comment respects the `--skip-javadoc-formatting` flag (or some other flag to
say "dont mess around"). it woul
ChrisHegarty commented on issue #14379:
URL: https://github.com/apache/lucene/issues/14379#issuecomment-2755090319
Argh! sorry, I caused this issue by upgrading to JDK 23. Maybe that was a
mistake, for this reason (a non-LTS can disappear before the tools catch up
with the newly released ma
dweiss commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755090672
https://github.com/google/google-java-format/blob/master/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java#L46-L60
All it takes would be to preserve a
rmuir commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755132797
@dweiss I think that is because google-java-format uses internal JDK
compiler apis to parse it. just like error prone. it is why you have to add all
the opens?
--
This is an automa
dweiss commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755128619
Yeah. I'll take a look at that, interesting. Part of the problem is that
different Java versions seem to be returning a different tokenization of those
comment strings. Seems like so
dweiss commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755137112
Yes, that's correct -
https://github.com/google/google-java-format/issues/1153#issuecomment-2344790653
--
This is an automated message from the Apache Git Service.
To respond to th
benwtrent closed issue #14402: New testMinMaxScalarQuantize tests failing
repeatably
URL: https://github.com/apache/lucene/issues/14402
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific c
jpountz commented on PR #14273:
URL: https://github.com/apache/lucene/pull/14273#issuecomment-2755985826
It should be ready for review now. Now that `DocIdStream` has become more
sophisticated, I extracted impls to proper classes that could be better tested.
This causes some diffs in our bo
txwei opened a new pull request, #14411:
URL: https://github.com/apache/lucene/pull/14411
This reverts commit 217828736c41bfc68065ceb3d5b37c47116ea947.
### Description
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on t
jpountz commented on PR #14273:
URL: https://github.com/apache/lucene/pull/14273#issuecomment-2755991200
I'll try to run some simple benchmarks next.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
jpountz commented on issue #14406:
URL: https://github.com/apache/lucene/issues/14406#issuecomment-2755995103
> Leverage competitive iteration to skip over blocks of docs that are known
not to fall into any of the ranges we are faceting on.
Out of curiosity, is it common for the union
jainankitk opened a new pull request, #14413:
URL: https://github.com/apache/lucene/pull/14413
### Description
This code change introduces `AbstractQueryProfilerBreakdown` that can be
extended by `ConcurrentQueryProfilerBreakdown` to show query profiling
information for concurrent se
sgup432 opened a new pull request, #14412:
URL: https://github.com/apache/lucene/pull/14412
### Description
Related issue - https://github.com/apache/lucene/issues/14183
This change allows skip cache factor to be updated dynamically within LRU
query cache. This can be done by passi
sgup432 commented on PR #14412:
URL: https://github.com/apache/lucene/pull/14412#issuecomment-2756202209
@jpountz Might need your review as discussed in
https://github.com/apache/lucene/issues/14183
--
This is an automated message from the Apache Git Service.
To respond to the message, pl
github-actions[bot] commented on PR #14076:
URL: https://github.com/apache/lucene/pull/14076#issuecomment-2756055344
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
github-actions[bot] commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2756055188
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
viliam-durina commented on issue #14348:
URL: https://github.com/apache/lucene/issues/14348#issuecomment-2755668181
I've ran into issue with this setting now. If the file doesn't actually fit
into memory, this read advice hurts the performance significantly. With it,
`madvise` is called wit
jimczi commented on issue #14408:
URL: https://github.com/apache/lucene/issues/14408#issuecomment-2755462640
> Let the defaults be as smart as they need. Maybe check
/sys/kernel/mm/lru_gen/enabled as part of the decision-making! But IMO let the
user have the final say, in an easy way.
rmuir commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755707979
@dweiss very nice. the `///` can have leading whitespace in front of it
which is preserved too. I dont know how their parser works but you can simulate
the leading-case by adding a me
rmuir commented on issue #14408:
URL: https://github.com/apache/lucene/issues/14408#issuecomment-2755662166
> The Linux change targets both MGLRU and normal LRU. The impact is more
pronounced in MGLRU, as page reclamation is more aggressive there. However, the
semantic change for this advic
thecoop opened a new pull request, #14403:
URL: https://github.com/apache/lucene/pull/14403
Delta was a bit too small. Resolves #14402
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
rmuir commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2754235469
For the record those diffstats were based on `./gradlew -p lucene/suggest
spotlessApply` and include the changes of the patch/formatter XML itself
--
This is an automated message fr
gsmiller commented on code in PR #14273:
URL: https://github.com/apache/lucene/pull/14273#discussion_r2014273684
##
lucene/core/src/java/org/apache/lucene/search/DocIdStream.java:
##
@@ -34,12 +33,35 @@ protected DocIdStream() {}
* Iterate over doc IDs contained in this stre
jpountz commented on PR #14413:
URL: https://github.com/apache/lucene/pull/14413#issuecomment-2756886264
Can you explain why we need two impls? I would have assumed that the
`ConcurrentQueryProfilerBreakdown` could also be used for searches that are not
concurrent?
--
This is an automate
rmuir commented on PR #14416:
URL: https://github.com/apache/lucene/pull/14416#issuecomment-2756917145
This keyword is legacy, for stemmers not normalizers. Just use
ProtectedTermFilter which works with any tokenfilter without requiring
modification to its code?
--
This is an automated m
xzhang9292 opened a new pull request, #14414:
URL: https://github.com/apache/lucene/pull/14414
Current GermanNormalizationFilter tries to normalize special German
characters like ä to a, ü to u. For some words it makes sense to do so, äpfel
- > apfel is like apples -> apple. But for some
dweiss commented on issue #14257:
URL: https://github.com/apache/lucene/issues/14257#issuecomment-2755678098
Here is what I did.
* added a brute-force non-formatting to any /// line comments in my fork of
google-java-format [1]
* added a local, precompiled binary of the above to my for
44 matches
Mail list logo