atris commented on issue #14341:
URL: https://github.com/apache/lucene/issues/14341#issuecomment-2716520829
Cool. Assigning this for myself
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
jpountz closed issue #11915: Make Lucene smarter about long runs of matches
URL: https://github.com/apache/lucene/issues/11915
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
jpountz commented on PR #14325:
URL: https://github.com/apache/lucene/pull/14325#issuecomment-2705668964
@DivyanshIITB Deletion policies are configurable via
`IndexWriterConfig#setIndexDeletionPolicy`, see e.g. `SnapshotDeletionPolicy`
which allows for finer-grained maintenance of snapshots
lpld commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2710872661
@benwtrent @mikemccand I really appreciate your help and quick responses.
May I also ask about the selection of datasets being used for the
benchmarks? How do you choose them? Why I'm
javanna commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2711741530
> But I'm still questioning if there's actually a use-case for allowing
something to be loaded either on-heap or off-heap in our codecs. For all
examples that come to mind, I would rathe
rmuir merged PR #14311:
URL: https://github.com/apache/lucene/pull/14311
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
original-brownbear merged PR #14343:
URL: https://github.com/apache/lucene/pull/14343
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...
original-brownbear commented on PR #14343:
URL: https://github.com/apache/lucene/pull/14343#issuecomment-2716102016
Thanks Michael!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
rmuir commented on PR #14311:
URL: https://github.com/apache/lucene/pull/14311#issuecomment-2716086552
I'm just doing final tests. Thanks again @renatoh. I will backport it to
10.2. We can followup to remove the deprecated "sorta-kinda-longest-match" from
lucene's `main` branch, and see if
iverase opened a new pull request, #14338:
URL: https://github.com/apache/lucene/pull/14338
In the case you have many BKD readers on the same heap, it feels wasteful to
have individual instances of BKDConfig records as most of the time those
instances correspond to standard lucene fields. T
rmuir opened a new pull request, #14346:
URL: https://github.com/apache/lucene/pull/14346
For the same reason the aws-jmh venv is ignored. Current rat version will go
crazy on this.
We should look into the rat version, I think they may have improved
.gitignore support in recent relea
rmuir merged PR #14326:
URL: https://github.com/apache/lucene/pull/14326
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
jpountz commented on PR #14345:
URL: https://github.com/apache/lucene/pull/14345#issuecomment-2715856731
It gives a few small speedups (low p-value):
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p-value
jpountz opened a new pull request, #14345:
URL: https://github.com/apache/lucene/pull/14345
We currently pushe FILTER clauses as constant-scoring MUST clauses with a 0
score to `BlockMaxConjunctionBulkScorer`. This change improves efficiency a bit
by reducing polymorphism a bit (TermScorer
DivyanshIITB commented on PR #14335:
URL: https://github.com/apache/lucene/pull/14335#issuecomment-2710583795
Thank you for the review, @jpountz!
I see your concern regarding equal resource distribution across IndexWriter
instances potentially leading to inefficiencies when some write
jpountz commented on issue #14341:
URL: https://github.com/apache/lucene/issues/14341#issuecomment-2715706359
This idea sounds worth exploring to me too. Intuitively, it may help
pre-filtering too. E.g. if I think of an e-commerce use-case with a filter on
the category field, it is likely t
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2715605453
Thanks for the review @navneet1v!
> lucene util branch
You can find some (very hacky) changes
[here](https://github.com/kaivalnp/luceneutil/tree/faiss). Broad steps to run
dweiss opened a new issue, #14344:
URL: https://github.com/apache/lucene/issues/14344
### Description
This fails for me every time on main:
```
./gradlew -p lucene/backward-codecs -Ptests.seed=CF895D81F5B12730 test
--tests TestIndexSortBackwardsCompatibility
...
> j
atris commented on issue #14341:
URL: https://github.com/apache/lucene/issues/14341#issuecomment-2715539509
@navneet1v Skimming through the issue, I think they refer to different
problem statements.
What you primarily want in the referenced GH issue is the ability to filter
on more m
navneet1v commented on issue #14341:
URL: https://github.com/apache/lucene/issues/14341#issuecomment-2715434628
@benwtrent I was also thinking on similar lines and I created this GH issue
which eventually wants to create more than 1 graph at the segment level:
https://github.com/apache/luce
mikemccand commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2711223110
> > Net/net I think we ought to be adding some multithreaded test capability
to KnnGraphTester.
>
> Agreed, I think a "num_search_threads" parameter would be beneficial. Then
t
mikemccand commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2710460602
> `fanout` makes the search queue when searching the HNSW graph larger.
However, the searcher will still only return `k` results. So, searching for top
`k=10` with `fanout=20` indicat
benwtrent commented on issue #14342:
URL: https://github.com/apache/lucene/issues/14342#issuecomment-2715217336
> FashionMnist784 (60_000 x 784)
That one looks weird to me. The others sort of make sense.
--
This is an automated message from the Apache Git Service.
To respond to the
benwtrent commented on issue #14341:
URL: https://github.com/apache/lucene/issues/14341#issuecomment-2715213230
> Are you actively working on this? Or would you like me to explore more?
I am not actively exploring it. A POC is definitely needed to explore if
this is worth it at search
original-brownbear merged PR #14337:
URL: https://github.com/apache/lucene/pull/14337
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1982961727
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation
jpountz merged PR #14312:
URL: https://github.com/apache/lucene/pull/14312
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
msokolov commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2710427176
right that's what it looked like to me - I was only responding to the
earlier message where you said:
> The "simplified" version now only has a slight more latency compared to
"
original-brownbear merged PR #14336:
URL: https://github.com/apache/lucene/pull/14336
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...
dungba88 commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2710473316
Ah right, sorry it was a typo :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
mikemccand commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2710473036
> > Could you please also share other parameters of your benchmark (ndoc,
maxConn, beamWidthIndex, fanout, etc.)
>
> I have lost my test environment and I regrettably didn't wri
DivyanshIITB commented on PR #14335:
URL: https://github.com/apache/lucene/pull/14335#issuecomment-2715055808
Thank you for the clarification, @jpountz!
I'll drop the merge throttling aspect from the changes since it's disabled
by default.
Regarding the fixed thread pool approach (
jpountz commented on PR #14335:
URL: https://github.com/apache/lucene/pull/14335#issuecomment-2714988316
Merge throttling is now disabled by default, IMO it's fine to ignore merge
throttling for now. Regarding thread creation, I'm thinking of a shared fixed
(e.g. number of processors / 2) t
navneet1v commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2714965745
> @navneet1v I wonder if either of you were able to replicate benchmarks?
@kaivalnp can you share your lucene util branch so that I can replicate your
results.
--
This is an
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1989679548
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,488 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
jpountz commented on code in PR #14338:
URL: https://github.com/apache/lucene/pull/14338#discussion_r1989674584
##
lucene/core/src/java/org/apache/lucene/util/bkd/BKDConfig.java:
##
@@ -38,6 +39,18 @@ public record BKDConfig(int numDims, int numIndexDims, int
bytesPerDim, int m
iverase commented on code in PR #14338:
URL: https://github.com/apache/lucene/pull/14338#discussion_r1989664702
##
lucene/core/src/java/org/apache/lucene/util/bkd/BKDConfig.java:
##
@@ -38,6 +39,19 @@ public record BKDConfig(int numDims, int numIndexDims, int
bytesPerDim, int m
iverase commented on code in PR #14338:
URL: https://github.com/apache/lucene/pull/14338#discussion_r1989663618
##
lucene/core/src/java/org/apache/lucene/util/bkd/BKDConfig.java:
##
@@ -38,6 +39,19 @@ public record BKDConfig(int numDims, int numIndexDims, int
bytesPerDim, int m
jpountz commented on code in PR #14338:
URL: https://github.com/apache/lucene/pull/14338#discussion_r1989627096
##
lucene/core/src/java/org/apache/lucene/util/bkd/BKDConfig.java:
##
@@ -68,6 +82,16 @@ public record BKDConfig(int numDims, int numIndexDims, int
bytesPerDim, int m
tteofili commented on issue #14341:
URL: https://github.com/apache/lucene/issues/14341#issuecomment-2714880037
> it is conceivable that clusters are of common distributions, consequently
we can quit searching clusters early and only search a couple of the clusters
at a time.
I think
original-brownbear opened a new pull request, #14343:
URL: https://github.com/apache/lucene/pull/14343
It's in the title, some obvious speedups. This is fairly expensive logic for
Elasticsearch when run over a larger number of shards. No need for streams,
creating comparator instances and s
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2714811428
I've added a GH workflow (see [sample
output](https://github.com/apache/lucene/actions/runs/13791742930/job/38573182600?pr=14178))
that builds and adds the C_API of Faiss before running
jpountz commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2714778925
> I do think that this is more generally useful, than just the particular
use case of on or off -heap FST in completion postings.
I'm curious of what other use-cases you have in mi
atris commented on issue #14341:
URL: https://github.com/apache/lucene/issues/14341#issuecomment-2714751735
It's actually crazy - I was thinking of starting a discussion on this today.
One thing that I have been playing with is creating clusters with centroids
that are at a certain ra
benwtrent opened a new issue, #14341:
URL: https://github.com/apache/lucene/issues/14341
### Description
What do we think about clustering or grouping documents by centroids, or
potentially in chunks of filters and allow multiple graphs per segment. If
segments are random sub-samples
lpld opened a new issue, #14342:
URL: https://github.com/apache/lucene/issues/14342
Hi lucene team. Last week I've been playing with the [quantization
format](https://github.com/apache/lucene/pull/14078) that's been recently added
to lucene. Main idea was to take the datasets from
[ann-ben
benwtrent commented on PR #14304:
URL: https://github.com/apache/lucene/pull/14304#issuecomment-2713553719
On GCP, there isn't much difference. I wouldn't expect there to be a huge
amount of difference as the dominate cost is the vector comparisons not the
quantization.
I haven't tes
iverase opened a new pull request, #14340:
URL: https://github.com/apache/lucene/pull/14340
Lucene90DocValuesProducer holds all the metadata found on the meta file on
heap . At runtime, it processes that metadata to produce the right doc value
flavour, e.g dense vs sparse. This is a bit w
benwtrent commented on code in PR #14304:
URL: https://github.com/apache/lucene/pull/14304#discussion_r1988779142
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -907,4 +907,87 @@ public static long int4BitDotProduct128(byte
benwtrent commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2713454213
Hey @lpld
> May I also ask about the selection of datasets being used for the
benchmarks? How do you choose them?
I haven't tested with SIFT, though be sure to use euclid
ChrisHegarty commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2713450274
I do think that this is more generally useful, than just the particular use
case of on or off -heap FST in completion postings.
> If we want to allow configuring how a codec g
51 matches
Mail list logo