github-actions[bot] commented on PR #13872:
URL: https://github.com/apache/lucene/pull/13872#issuecomment-2495139672
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
vigyasharma merged PR #14015:
URL: https://github.com/apache/lucene/pull/14015
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene
saschaszott opened a new pull request, #14015:
URL: https://github.com/apache/lucene/pull/14015
### Description
Class `Hits` was removed from the Lucene API with Lucene version 3.0. Use
class `TopDocs` instead.
--
This is an automated message from the Apache Git Service.
To respond
jpountz opened a new pull request, #14014:
URL: https://github.com/apache/lucene/pull/14014
Running filtered disjunctions with a specialized bulk scorer seems to yield
a good speedup. For what it's worth, I also tried to implement a MAXSCORE-based
scorer to see if it had to do with the `Bul
jpountz commented on code in PR #13989:
URL: https://github.com/apache/lucene/pull/13989#discussion_r1854638074
##
lucene/core/src/test/org/apache/lucene/store/TestBufferedChecksum.java:
##
@@ -63,4 +67,127 @@ public void testRandom() {
}
assertEquals(c1.getValue(), c2
msokolov commented on issue #14002:
URL: https://github.com/apache/lucene/issues/14002#issuecomment-2494609732
I do think it's worth improving. Another way could be to measure empirically
the stack depth - maybe it scales in a predictable way with total number of
vectors? And then we can us
benwtrent commented on issue #14007:
URL: https://github.com/apache/lucene/issues/14007#issuecomment-2494527716
> In my opinion, we should not have lossy codecs. This creates weird
situations where the errors could compound in weird ways over time, e.g. when
you switch file formats.
mikemccand commented on issue #14007:
URL: https://github.com/apache/lucene/issues/14007#issuecomment-2494487655
Ahh sorry @dungba88 also referenced the issue above!
https://github.com/apache/lucene/issues/13158
--
This is an automated message from the Apache Git Service.
To respond to t
msfroh commented on PR #13987:
URL: https://github.com/apache/lucene/pull/13987#issuecomment-2494486990
> @msfroh FWIW I'm happy to merge this PR when we remove the double call to
LeafCollector#collect on the same doc ID in tests.
In that case, the unit test that I added can be remove
mikemccand commented on issue #14007:
URL: https://github.com/apache/lucene/issues/14007#issuecomment-2494484887
Also, note that, at least for the current scalar quantization (`int7`,
`int4`), those full precision `float[]` vectors remain on disk during
searching. They are only used during
msfroh opened a new pull request, #14012:
URL: https://github.com/apache/lucene/pull/14012
### Description
This is functionally equivalent to the logic that was present, but makes the
behavior clearer.
--
This is an automated message from the Apache Git Service.
To respond to the m
mikemccand commented on issue #14007:
URL: https://github.com/apache/lucene/issues/14007#issuecomment-2494478945
> I'd rather like it to be done on top of the codec API. E.g. computing a
good scalar quantization for a given model offline, and then using it in the
way in to index vectors dir
benwtrent commented on PR #14011:
URL: https://github.com/apache/lucene/pull/14011#issuecomment-2494478036
@msokolov we should have `CHANGES` for this & it could be back ported to
10.1. Its a nice optimization that we should track.
--
This is an automated message from the Apache Git Servi
mikemccand commented on issue #14004:
URL: https://github.com/apache/lucene/issues/14004#issuecomment-2494470229
> Interestingly, an index that is less than 1GB can still have 10 segments
with the above merge policy because of the constraint to not run merges where
the resulting segment is
jpountz commented on issue #14004:
URL: https://github.com/apache/lucene/issues/14004#issuecomment-2494383950
I ran the IndexGeoNames benchmark with 1 indexing thread,
SerialMergeScheduler, 10k buffered docs, 100MB floor segment size, 2 segments
per tier. This made the total indexing time g
jpountz commented on issue #14007:
URL: https://github.com/apache/lucene/issues/14007#issuecomment-2494379468
In my opinion, we should not have lossy codecs. This creates weird
situations where the errors could compound in weird ways over time, e.g. when
you switch file formats.
I'd
jpountz merged PR #13999:
URL: https://github.com/apache/lucene/pull/13999
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz merged PR #14003:
URL: https://github.com/apache/lucene/pull/14003
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz commented on PR #13987:
URL: https://github.com/apache/lucene/pull/13987#issuecomment-2494272651
@msfroh FWIW I'm happy to merge this PR when we remove the double call to
LeafCollector#collect on the same doc ID in tests.
--
This is an automated message from the Apache Git Service
jpountz commented on PR #13998:
URL: https://github.com/apache/lucene/pull/13998#issuecomment-2494266353
Sorry for derailing the PR, let's not implement it on ByteBuffersIndexInput
then. We can look into it in a separate PR if we want.
--
This is an automated message from the Apache Git S
msokolov commented on PR #14011:
URL: https://github.com/apache/lucene/pull/14011#issuecomment-2494249748
Thanks @villam-durina! This looks good. I guess we were trying to enforce
access to the rows, requiring that callers acquire a lock to obtain them, but
it was really just a fig leaf any
ChrisHegarty commented on PR #13998:
URL: https://github.com/apache/lucene/pull/13998#issuecomment-2494090457
> > yeah, I think that this prob makes sense. Lemme satisfy myself that it
will always be true.
>
> it won't be in core if currently swapped out, no? I don't think a
hardcode
shatejas commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2494021476
Thanks a lot @ChrisHegarty for adding tight tests and merging this!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
rmuir commented on PR #13998:
URL: https://github.com/apache/lucene/pull/13998#issuecomment-2493975727
> yeah, I think that this prob makes sense. Lemme satisfy myself that it
will always be true.
it won't be in core if currently swapped out, no? I don't think a hardcoded
`true` work
benwtrent commented on issue #14007:
URL: https://github.com/apache/lucene/issues/14007#issuecomment-2493691474
> I think we still need for indexing and merging as vigyasharma@ comment.
I don't know if its strictly necessary to keep the raw vectors for merging.
Once a certain limit i
ChrisHegarty merged PR #13985:
URL: https://github.com/apache/lucene/pull/13985
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucen
ChrisHegarty commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2493554763
> ..
> Took a look, The mismatch between `mergeInstanceCount` and `mergeInstance`
is because mergeInstanceCount is being updated in parent and mergeInstance is
updated to true du
stefanvodita commented on PR #14010:
URL: https://github.com/apache/lucene/pull/14010#issuecomment-2493291352
At the same time, I don't think we need to rush bug fix releases, since this
functionality was broken from the time it was released.
--
This is an automated message from the Apach
28 matches
Mail list logo