dweiss commented on issue #13647:
URL: https://github.com/apache/lucene/issues/13647#issuecomment-2696384821
I can generate this file and make it available as a benchmark dataset. Or
would you rather give me one of your own, for consistency with your previous
results?
--
This is an a
msokolov commented on issue #14295:
URL: https://github.com/apache/lucene/issues/14295#issuecomment-2695731447
thanks for pointing that out, somehow I overlooked it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
U
msokolov commented on issue #13647:
URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695713269
Yes, I was referring to files that can be generated with
`infer_token_vectors_cohere.py`. Maybe we take the position that users should
regenerate, but it is kind of slow and demand
uschindler commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2695637219
Hi,
> I will throw in a real usecase that gives us a bit of headache: completion
fields. All the existing codecs load them on heap, and we want to make a switch
to load them of
dweiss commented on issue #13647:
URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695553260
> [...] but can we attach 3G files here?
I think we can, if it makes sense to do so.
We're not supposed to abuse this service - for example by downloading 3gb
data file
msokolov commented on issue #13647:
URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695517144
There are other vector data files - I think the key one that has become a
reference point is Cohere 768d trained on wikipedia-derived docs, but I'm not
sure where nightly benchmark
benwtrent commented on issue #13647:
URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695529583
@msokolov the python script in Lucene util downloads from hugging face. If
that is the data you are talking about?
`infer_token_vectors_cohere.py`
--
This is an a
benwtrent closed issue #14266: Flaky
`TestKnnByteVectorQueryMMap.testRandomWithFilter` test failures
URL: https://github.com/apache/lucene/issues/14266
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
benwtrent merged PR #14329:
URL: https://github.com/apache/lucene/pull/14329
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
dweiss commented on issue #13647:
URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695433674
We now have an s3 bucket to place those benchmark/ reference files on. If
you have any of these files - please let me know and perhaps make it available
to me, somehow -
```
benwtrent opened a new pull request, #14329:
URL: https://github.com/apache/lucene/pull/14329
I have noticed some rare failures of this test, but every time it failed, it
was due to a valid set of kNN docs being found before the exploration limit was
actually hit. This is due to extremely l
rmuir commented on issue #13647:
URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695438823
@dweiss
https://issues.apache.org/jira/secure/attachment/12429835/top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2
--
This is an automated message from the Apache Git Service.
dweiss merged PR #14328:
URL: https://github.com/apache/lucene/pull/14328
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
javanna commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2695424434
I will throw in a real usecase that gives us a bit of headache: completion
fields. All the existing codecs load them on heap, and we want to make a switch
to load them off heap in certai
dweiss closed issue #14144: :lucene:benchmark:getGeoNames github job fails
URL: https://github.com/apache/lucene/issues/14144
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
dweiss opened a new pull request, #14328:
URL: https://github.com/apache/lucene/pull/14328
This causes tests that expect exact outputs (like TestReproduceMessage) to
occasionally fail under JDK25+. I added some filtering to randomTimeZone so
that those warning-emitting time zone codes are n
renatoh commented on PR #14311:
URL: https://github.com/apache/lucene/pull/14311#issuecomment-2695297490
@rmuir
the field is on the super class, hence, it hence we cannot deprecated it. we
could deprecated the current constructor and introduce another constructor
without the onlyLongest
shatejas commented on PR #14076:
URL: https://github.com/apache/lucene/pull/14076#issuecomment-2695220393
> For the exact case that was tested in
https://github.com/apache/lucene/pull/13985 that might be a regression. I am
challenging the result a bit here since I don't see how the copy tim
benwtrent commented on PR #14256:
URL: https://github.com/apache/lucene/pull/14256#issuecomment-2694997840
This proved not particularly useful. Maybe it can be a future optimization,
but for now, it seems the added complexity isn't worth its cost.
--
This is an automated message from the
benwtrent closed pull request #14256: Reuse entry point scores and provide
mechanisms to provide scores for directly entry points
URL: https://github.com/apache/lucene/pull/14256
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub an
ChrisHegarty commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2694804577
Yeah, I can see your point. What unsettles me a little about the proposed
change is the "weight" that it imposes on this simple SPI interface for a
somewhat niche issue. That said,
jimczi commented on PR #14076:
URL: https://github.com/apache/lucene/pull/14076#issuecomment-2694779286
> Is that truly the case or did I miss something?
That's probably the opposite. For the exact case that was tested in
https://github.com/apache/lucene/pull/13985 that might be a re
msokolov commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2694771977
see https://github.com/mikemccand/luceneutil/pull/345 for benchmarking
support
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on t
msokolov commented on PR #14076:
URL: https://github.com/apache/lucene/pull/14076#issuecomment-2694758015
I briefly skimmed the prior PR, which this effectively undoes, and I did not
see much benefit there in terms of improving merge times. Is that truly the
case or did I miss something? If
uschindler commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2694748030
But basically the whole idea here is to allow to replace codecs, which is in
reality not wanted at all. If you want a different codec, name it differently.
So you am not fully h
uschindler commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2694739987
Now I understand how this is expected to work. Elasticsearch will return
true for the SPI impl.
Maybe we should also try to allow different orders for the active discovery.
May
uschindler commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2694709538
Did you think about this one, too:
https://github.com/apache/lucene/blob/8e68ed22614dc7841ebea94d3e66561ceb74d25e/lucene/core/src/java/org/apache/lucene/analysis/AnalysisSPILoader.java
ChrisHegarty commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2694681847
Anyone else ? @uschindler ? I think that this is quite solid, and while the
issue we're facing in Elasticsearch is because we deploy as modules, it may not
be that widely encounter
stefanvodita merged PR #14237:
URL: https://github.com/apache/lucene/pull/14237
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucen
rmuir commented on PR #14311:
URL: https://github.com/apache/lucene/pull/14311#issuecomment-2694568500
@renatoh oh, sorry for the slow feedback, did not realize you had
deconflicted it.
changes look good to me. I wish there was a way to really use a
`@deprecated/@Deprecated` here, bu
msokolov commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2694557420
> ^ I can also help playing around with the above ideas and benchmark to see
if it helps (some cases above seems to have high reentries, ~500)
Please feel free to experiment! Not
renatoh commented on PR #14311:
URL: https://github.com/apache/lucene/pull/14311#issuecomment-2694511860
@rmuir any thoughts on my changes?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
rmuir commented on PR #14325:
URL: https://github.com/apache/lucene/pull/14325#issuecomment-2694353271
To me, KeepOnlyLastCommit means only the last commit, not the last 5. I
don't think this policy should be modified like this.
--
This is an automated message from the Apache Git Service
ChrisHegarty merged PR #14320:
URL: https://github.com/apache/lucene/pull/14320
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucen
gf2121 commented on code in PR #14312:
URL: https://github.com/apache/lucene/pull/14312#discussion_r1976952412
##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -128,6 +128,16 @@ private void scoreWindowUsingBitSet(
assert windowMatches
dungba88 commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2694032117
>
- Could we readjust the pro-rata rate, not based on the whole index, but
based on the effective segments?
- What if we just set the per-leaf k to the same as global k in
pseudo-nymous commented on issue #13898:
URL: https://github.com/apache/lucene/issues/13898#issuecomment-2693761988
Thanks for updating the script with PR number. I'll root cause and fix the
issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, p
dungba88 commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2693679959
> I added an additional cap on this, but then realized we are already
implicitly imposing such a limit here:
@msokolov that checks if the *previous* iteration (kInLoop / 2) has ex
38 matches
Mail list logo