dungba88 commented on issue #12355:
URL: https://github.com/apache/lucene/issues/12355#issuecomment-1806668298
I just stumbled this, I agreed that reading backward is not cache-friendly.
Is there a reason why we write it in backward in the first place? We are
specially reversing the byte or
kevindrosendahl commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1806615864
I've got my framework set up for testing larger than memory indexes and have
some somewhat interesting first results.
TL;DR:
- the main thing driving jvector's larg
kevindrosendahl commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1806614314
@**[benwtrent](https://github.com/benwtrent)**:
> if I am reading the code correctly, it does the following:
> - Write int8 quantized vectors along side the vector ordin
stefanvodita commented on issue #12734:
URL: https://github.com/apache/lucene/issues/12734#issuecomment-1806557718
I spent some more time with the code and I can attempt answering the
questions in the description.
1. Yes. We rely on zeros in slice buffers to tell us where a slice ends
slow-J commented on PR #12741:
URL: https://github.com/apache/lucene/pull/12741#issuecomment-1806515360
I think that it's a little hard to tell with 1 datapoint due to noise, it
seems to be trending upwards in the `BooleanQuery` graphs, but I agree that
it's not obvious that there is a noti
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1389857835
##
lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java:
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under on
benwtrent commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1806359735
@mayya-sharipova with those experiments, I am guessing these are over
multiple segments, could you include that information in the table?
It would also be awesome to see what the
zhaih merged PR #73:
URL: https://github.com/apache/lucene-site/pull/73
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache
zhaih opened a new pull request, #73:
URL: https://github.com/apache/lucene-site/pull/73
(no comment)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail
zhaih merged PR #12793:
URL: https://github.com/apache/lucene/pull/12793
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
benwtrent merged PR #12729:
URL: https://github.com/apache/lucene/pull/12729
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
mayya-sharipova commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1806267939
### Experiments
- [luceneutil](https://github.com/mikemccand/luceneutil) tool
- Apple M1 Max (Apple M1 Max, 10 CPU cores)
- **baseline**: Lucene main branch
- **c
mayya-sharipova opened a new pull request, #12794:
URL: https://github.com/apache/lucene/pull/12794
Speedup concurrent multi-segment HNWS graph search by exchanging
the global minimum similarity collected so far across segments. As the global
similarity is used as a minimum threshold t
zhaih opened a new pull request, #12793:
URL: https://github.com/apache/lucene/pull/12793
### Description
I didn't realize our random searcher will use threadpool randomly, fixed it
to use a rewrite method that will not do concurrent rewrite
--
This is an automated message
benwtrent commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1389741654
##
lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java:
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under o
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1806196196
Summary of new changes:
1. Refactor into a more appropriate query
- Move away from `AbstractKnnVectorQuery` to take advantage of inherent
independence of segment-level results
yugushihuang opened a new issue, #12792:
URL: https://github.com/apache/lucene/issues/12792
### Description
[TieredMergePolicy](https://github.com/apache/lucene/blob/branch_9_8/lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java#L382)
use `MaxDoc` to calculate the `s
shubhamvishu opened a new pull request, #12791:
URL: https://github.com/apache/lucene/pull/12791
### Description
Adds doc value query to the `IndexOrDocValuesQuery#toString`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
rmuir merged PR #12787:
URL: https://github.com/apache/lucene/pull/12787
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
benwtrent commented on PR #12789:
URL: https://github.com/apache/lucene/pull/12789#issuecomment-1805943735
@jpountz searching scales logarithmically, but we do have to explore more if
there are any pre-filtered nodes.
We can run some experiments to determine the appropriate threshold.
gashutos commented on issue #12720:
URL: https://github.com/apache/lucene/issues/12720#issuecomment-1805760492
Sure !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To uns
jpountz closed issue #12720: [Sort] Numeric field sort query performance
degrades dramatically with more deleted entries in segment
URL: https://github.com/apache/lucene/issues/12720
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitH
jpountz commented on issue #12720:
URL: https://github.com/apache/lucene/issues/12720#issuecomment-1805758015
I am closing because I don't think there is anything that can be done here?
Feel free to reopen if you think otherwise.
--
This is an automated message from the Apache Git Service
jpountz commented on PR #12789:
URL: https://github.com/apache/lucene/pull/12789#issuecomment-1805727513
Thanks, the numbers make more sense to me now.
Intuitively, `FixedBitSet` performs better when a large percentage of nodes
needs to be visited and `SparseFixedBitSet` performs bett
benwtrent commented on issue #12505:
URL: https://github.com/apache/lucene/issues/12505#issuecomment-1805659612
One thing to consider is that we should test some various graphs to see how
many vectors we actually visit.
I suspect its around `Math.log(graphSize) * vectorsCollected`. W
slow-J commented on code in PR #12640:
URL: https://github.com/apache/lucene/pull/12640#discussion_r1389238790
##
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java:
##
@@ -145,22 +144,30 @@ public int score(LeafCollector collector, Bits
acceptDocs, int min,
slow-J commented on PR #12640:
URL: https://github.com/apache/lucene/pull/12640#issuecomment-1805497199
LGTM, I think that this requires a rebase after
https://github.com/apache/lucene/pull/12642/files
--
This is an automated message from the Apache Git Service.
To respond to the message,
cpoerschke commented on PR #448:
URL: https://github.com/apache/lucene/pull/448#issuecomment-1805426718
If there are no objections or concerns I'll aim to merge this sometime next
week.
(And the upgrade to 2.x can happen as a follow-up pull request.)
--
This is an automated message
martijnvg commented on PR #12711:
URL: https://github.com/apache/lucene/pull/12711#issuecomment-1805422002
> We could get away with not having the check at all and make blocks a first
class citizen by recording the parent document in a docvalues field. Really, if
we'd be implementing the fe
robertvanwinkle1138 commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1805417696
Another notable difference in the Lucene implementation is delta variable
byte encoding of node ids. The increase in disk space requires the user to
purchase more RAM pe
s1monw commented on PR #12711:
URL: https://github.com/apache/lucene/pull/12711#issuecomment-1805414877
> In fact, in order to make use of your doc blocks at search time
(ToParent/ChildBlockJoinQuery), users must already provide a bitset marking
which docs are parents (I think this is typic
gf2121 merged PR #12784:
URL: https://github.com/apache/lucene/pull/12784
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
gf2121 commented on PR #12775:
URL: https://github.com/apache/lucene/pull/12775#issuecomment-1805262547
Close this in favor of https://github.com/apache/lucene/pull/12784
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
gf2121 closed pull request #12775: Speed up BytesRefHash#sort
URL: https://github.com/apache/lucene/pull/12775
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e
gf2121 commented on PR #12784:
URL: https://github.com/apache/lucene/pull/12784#issuecomment-1805261948
Thanks for review @jpountz !
I'll merge this and close https://github.com/apache/lucene/pull/12775.
--
This is an automated message from the Apache Git Service.
To respond to the
35 matches
Mail list logo