benwtrent commented on issue #14327:
URL: https://github.com/apache/lucene/issues/14327#issuecomment-2730201462
Git bisect blames: a6a96cde1c65fddb65363f0090a0202fd6db329c
Which, if the scores are the same between docs, makes sense to me.
--
This is an automated message from the Apa
jpountz commented on code in PR #14365:
URL: https://github.com/apache/lucene/pull/14365#discussion_r1999318407
##
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##
@@ -251,6 +252,30 @@ public void visit(int docID, byte[] packedValue) {
msfroh commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2730390727
Instead of a boolean flag, what if we define an interface that specifies the
folding rules?
It could have two methods: one that folds input characters to a canonical
representatio
rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2730397885
I think my ask is misunderstood, it is just to follow the Unicode standard.
There are two mappings for simple case folding:
* Default
* Alternate (Turkish/azeri)
--
This is an auto
rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2730410784
If you want to do fancy romanian accent removal, use an analyzer and
normalize your data. That's what a search engine is all about.
But if we want to provide some limited runtime ex
rmuir commented on PR #14360:
URL: https://github.com/apache/lucene/pull/14360#issuecomment-2730119438
i'll keep the PR up here. Actually as a first step, I'd rather improve
existing toDot() and regex toString(). It would help the logic here, too.
There's no need to escape codepoints
gf2121 commented on PR #14203:
URL: https://github.com/apache/lucene/pull/14203#issuecomment-2729902081
I raised an PR for annotation.
https://github.com/mikemccand/luceneutil/pull/354.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on t
gf2121 commented on PR #14361:
URL: https://github.com/apache/lucene/pull/14361#issuecomment-2729280384
OK i get expected results that multiple of 16 faster than multiple of 8 when
i force `-XX:UseAVX=3`, it can be seen AVX3 is slower on this chip, that may be
why java disabled it by defaul
gf2121 commented on PR #14203:
URL: https://github.com/apache/lucene/pull/14203#issuecomment-2729806648
Nightly benchmark confirmed the speed up
https://benchmarks.mikemccandless.com/2025.03.16.18.04.58.html.
Thanks again for profile guide and helping figure out simpler and faster
co
tteofili commented on issue #14180:
URL: https://github.com/apache/lucene/issues/14180#issuecomment-2730194375
> Could you please let me know which future version of Elasticsearch will
resolve the vector search consistency problem?
we are investigating on a proper solution to this iss
javanna commented on code in PR #14364:
URL: https://github.com/apache/lucene/pull/14364#discussion_r1999422163
##
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionPostingsFormat.java:
##
@@ -122,11 +122,6 @@ public enum FSTLoadMode {
private fina
mayya-sharipova commented on PR #14331:
URL: https://github.com/apache/lucene/pull/14331#issuecomment-2730506974
@msokolov Thanks for the comment.
I've experimented setting: beamCandidates0 to `M * 3` increasing it from the
previous `M*2` when building merged graphs.
Graphs look bette
kaivalnp commented on PR #14131:
URL: https://github.com/apache/lucene/pull/14131#issuecomment-2730545360
Exciting change! Since this PR adds a new codec for vector search, I wanted
to point to #14178 along similar lines -- adding a new Faiss-based KNN format
to index and query vectors
msfroh commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2730578138
Okay, got it! That's the piece that I was misunderstanding. I didn't realize
that Turkish/Azeri is the **only** other valid folding. I kept thinking of it
as just an example where the naï
javanna commented on code in PR #14364:
URL: https://github.com/apache/lucene/pull/14364#discussion_r1999421774
##
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/Completion101PostingsFormat.java:
##
@@ -25,17 +25,9 @@
* @lucene.experimental
*/
public clas
jainankitk commented on issue #14347:
URL: https://github.com/apache/lucene/issues/14347#issuecomment-2730860749
I am not sure if it is good idea to have this as user parameter. But, I am
wondering if the default for `BEST_SPEED` should be using preset dict as that
compromises speed for co
javanna opened a new pull request, #14364:
URL: https://github.com/apache/lucene/pull/14364
All the existing completion postings format load their FSTs on-heap. It is
possible to customize that behaviour by mainintaing a custom postings format
that override the fst load mode.
TestSug
javanna commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2729969364
I opened #14364 to make the suggested change to the completion postings
format, let me know what you think.
--
This is an automated message from the Apache Git Service.
To respond to t
javanna commented on PR #14270:
URL: https://github.com/apache/lucene/pull/14270#issuecomment-2729966315
I opened #14364 to still address the testing gap, but also change the
default load mode to off heap.
--
This is an automated message from the Apache Git Service.
To respond to the mess
javanna closed pull request #14270: Address completion fields testing gap and
truly allow loading FST off heap
URL: https://github.com/apache/lucene/pull/14270
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2729965328
An aside: `org.apache.lucene.search.DisjunctionScorer.TwoPhase#matches`
looks kind of sad, in that each matches() call is going to build a priority
queue of "unverified matches" (DisiWr
jpountz commented on PR #14203:
URL: https://github.com/apache/lucene/pull/14203#issuecomment-2729975580
Fantastic speedup. Nice to see tasks like `TermDayOfYearSort` also take
advantage from this change.
--
This is an automated message from the Apache Git Service.
To respond to the messa
jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2729988278
The current approach is probably not the fastest indeed. We should add a
task to nightly benchmarks if we want to optimize this. Something like a
disjunction of phrase queries (possibly
jainankitk commented on issue #14348:
URL: https://github.com/apache/lucene/issues/14348#issuecomment-2730902349
> Lucene currently uses ReadAdvice.RANDOM when opening these files. I think
it would be better to use RANDOM_PRELOAD.
As per the documentation for RANDOM_PRELOAD:
_
dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2730909459
BTW I don't have plans to explore this further. Anyone should feel free to
take over. Or abandon if nobody cares -- I admit it's very unusual to even
have a top level disjunction, let
mccullocht commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1999677796
##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,72 @@
+package org.apache.lucene.index;
+
+import java.util.concurrent
jpountz merged PR #14363:
URL: https://github.com/apache/lucene/pull/14363
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz commented on code in PR #14364:
URL: https://github.com/apache/lucene/pull/14364#discussion_r1999714466
##
lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java:
##
@@ -951,7 +951,16 @@ static IndexWriterConfig iwcWithSuggestField(Analyz
rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2731329809
it is confusing. because unicode case folding algorithm is supposed to work
for everyone. But here's the problem:
for most of the world:
* lowercase i has a dot, uppercase I has n
DivyanshIITB opened a new pull request, #78:
URL: https://github.com/apache/lucene-site/pull/78
This PR adds a direct link to the [Lucene Issue
Tracker](https://issues.apache.org/jira/projects/LUCENE) under the "Editing
Content on the Lucene™ sites" section in site-instructions.md.
C
gf2121 opened a new pull request, #14365:
URL: https://github.com/apache/lucene/pull/14365
(no comment)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-ma
jpountz commented on issue #14348:
URL: https://github.com/apache/lucene/issues/14348#issuecomment-2730966937
For what it's worth, it's possible to override the read advice of vectors
with something like that:
```java
Path path = ...;
Directory dir = new FilterDirectory(FSDirect
DivyanshIITB commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998645110
##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurre
DivyanshIITB commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998646386
##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurre
DivyanshIITB commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998645593
##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurre
DivyanshIITB commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998646829
##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurre
DivyanshIITB commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998648834
##
lucene/core/src/test/org/apache/lucene/index/TestMultiTenantMergeScheduler.java:
##
@@ -0,0 +1,30 @@
+package org.apache.lucene.index;
+
+import org.apache.luc
DivyanshIITB commented on PR #14335:
URL: https://github.com/apache/lucene/pull/14335#issuecomment-2729358357
I have a request to you. Kindly ignore the following two deleted files in
the "Files Changed" section :
"KeepOnlyLastCommitDeletionPolicy.java"
"ConcurrentMergeScheduler.java"
gf2121 commented on PR #14361:
URL: https://github.com/apache/lucene/pull/14361#issuecomment-2729094239
Results on `wikimediumall`:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p-value
jpountz commented on PR #14361:
URL: https://github.com/apache/lucene/pull/14361#issuecomment-2729147376
Should we floor to a multiple of 16 instead of 8 so that we have a perfect
second loop with AVX-512 as well? (By the way, which of your machine produced
the above benchmark results?) Oth
jpountz commented on code in PR #14359:
URL: https://github.com/apache/lucene/pull/14359#discussion_r1998546217
##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
windowMatches.clear
jpountz commented on code in PR #14359:
URL: https://github.com/apache/lucene/pull/14359#discussion_r1998546819
##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
windowMatches.clear
gf2121 commented on code in PR #14359:
URL: https://github.com/apache/lucene/pull/14359#discussion_r1998596143
##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
windowMatches.clear(
gf2121 commented on PR #14361:
URL: https://github.com/apache/lucene/pull/14361#issuecomment-2729250628
Thanks for feedback,
> Should we floor to a multiple of 16 instead of 8 so that we have a perfect
second loop with AVX-512 as well?
That is what i thought initially. But my A
gf2121 opened a new pull request, #14361:
URL: https://github.com/apache/lucene/pull/14361
This PR tries another way to implement the idea of
https://github.com/apache/lucene/pull/13521, taking advantage of
auto-vectorized loop to decode ints like we did in for bpv24 in
https://github.com/
gf2121 commented on code in PR #14359:
URL: https://github.com/apache/lucene/pull/14359#discussion_r1998258104
##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
windowMatches.clear(
gf2121 commented on issue #14327:
URL: https://github.com/apache/lucene/issues/14327#issuecomment-2729377397
Seeing similar failure as well:
```
> java.lang.AssertionError: [doc=0 score=0.990099 shardIndex=-1, doc=3
score=0.49751243 shardIndex=-1, doc=5 score=0.21691975 shardIn
dweiss commented on PR #14360:
URL: https://github.com/apache/lucene/pull/14360#issuecomment-2728443809
I've looked at the docs of mermaid and toyed around a bit. I agree that
infinite loops are so ugly that one's eyes start to bleed. Maybe we should
stick to graphviz.
--
This is an auto
vigyasharma commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998041333
##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurren
gf2121 commented on code in PR #14359:
URL: https://github.com/apache/lucene/pull/14359#discussion_r1998219212
##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
windowMatches.clear(
guojialiang92 opened a new issue, #14362:
URL: https://github.com/apache/lucene/issues/14362
### Description
Can we support modifying `segmentInfos.counter` in `IndexWriter`?
This can be used to skip some segment names when writing.
In the scenario of enabling `segment replicatio
iverase closed pull request #14358: Specialise DirectMonotonicReader when it
only contains one block
URL: https://github.com/apache/lucene/pull/14358
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
52 matches
Mail list logo