Re: [PR] removing constructor with deprecated attribute 'onlyLongestMatch [lucene]

2025-03-15 Thread via GitHub
renatoh commented on code in PR #14356: URL: https://github.com/apache/lucene/pull/14356#discussion_r1995903621 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.java: ## @@ -62,7 +65,12 @@ public DictionaryCompoundWo

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-15 Thread via GitHub
rmuir commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2726984173 +1 to start simple with Character.toLowerCase, thats the best you can get in java. The problem is java not having a Character.foldCase. A proper function would look like ICU's `UCha

Re: [I] Create a bot to check if there is a CHANGES entry for new PRs [lucene]

2025-03-15 Thread via GitHub
pseudo-nymous commented on issue #13898: URL: https://github.com/apache/lucene/issues/13898#issuecomment-2726787800 Thanks @stefanvodita! I will take a look at the fix and propose changes accordingly. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-15 Thread via GitHub
gf2121 commented on PR #14333: URL: https://github.com/apache/lucene/pull/14333#issuecomment-2724132677 Thanks for looking :) > I started looking at the code but you would know better: does this new encoding make it easier to know the length of leaf blocks while traversing the terms

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-15 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2726739699 On the AVX-512 machine: * Specialized read does not vectorize the remainder loop, it seems the complier failed to inline it. * Specialized decode vectorizes the remainder loop.

[PR] BooleanScorer doesn't optimize for TwoPhaseIterator [lucene]

2025-03-15 Thread via GitHub
dsmiley opened a new pull request, #14357: URL: https://github.com/apache/lucene/pull/14357 Showing a performance problem here in BooleanScorer (used for disjunctions -- "OR"). BS will score all its clauses indepenently, overlapping the same documents, some of which might be expensive wit

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-15 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2723963174 @jpountz Hi, do you have any idea how should we move forward on this optimization? several thoughts: * We can add another step32 for the hybrid-step decoding, which makes the code

Re: [PR] Extract leaf-slice calculation path from IndexSearch#slices [lucene]

2025-03-15 Thread via GitHub
original-brownbear commented on PR #14336: URL: https://github.com/apache/lucene/pull/14336#issuecomment-2711046458 Thanks Luca! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-15 Thread via GitHub
DivyanshIITB commented on PR #14335: URL: https://github.com/apache/lucene/pull/14335#issuecomment-2726271504 Thanks for the detailed clarification, @jpountz! I have made the necessary changes to the implementation: - Implemented a shared global thread pool with a fixed size of

Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]

2025-03-15 Thread via GitHub
jpountz commented on PR #14275: URL: https://github.com/apache/lucene/pull/14275#issuecomment-2710617442 > @jpountz given the connection of this PR with completion FST, do you have opinions here? Sorry for the late reply. If we want to allow configuring how a codec gets loaded

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-15 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2727214320 > Sorry for making it hard for you to move this PR forward, I was a bit annoyed that we needed something complicated to speed things up, I like the simplicity of specializedDecodeMaskInRe

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-15 Thread via GitHub
DivyanshIITB commented on PR #14335: URL: https://github.com/apache/lucene/pull/14335#issuecomment-2727212266 Thanks for your feedback, @jpountz! I have created a new MultiTenantMergeScheduler as suggested, instead of modifying ConcurrentMergeScheduler. I have restored ConcurrentMe

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-15 Thread via GitHub
rmuir commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2726990337 Separately, it would be nice to add boolean flag (for turkish/azeri) to that CaseFolding class, and fix it to do the right thing, so it doesn't match unrelated characters in turkish. ultim

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-15 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2726996663 Again, thanks a lot for running benchmarks. > I can refactor the code to the specialized decoding if it makes sense to you That would be great, thank you. Sorry for making i

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-15 Thread via GitHub
jpountz commented on PR #14335: URL: https://github.com/apache/lucene/pull/14335#issuecomment-2727001008 I think it'll be simpler to create a new merge scheduler rather than modify ConcurrentMergeScheduler. Also we'll need tests. -- This is an automated message from the Apache Git Service

Re: [PR] Avoid unnecessary evaluations and skipping documents [lucene]

2025-03-15 Thread via GitHub
jpountz merged PR #14301: URL: https://github.com/apache/lucene/pull/14301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Create a bot to check if there is a CHANGES entry for new PRs [lucene]

2025-03-15 Thread via GitHub
dweiss commented on issue #13898: URL: https://github.com/apache/lucene/issues/13898#issuecomment-2726850239 > dangoslen/changelog-enforcer@v3 is not allowed to be used in apache/lucene Apache projects are not allowed to use arbitrary actions for security reasons, please take a look a