jpountz commented on PR #14359:
URL: https://github.com/apache/lucene/pull/14359#issuecomment-2727649234
This doesn't slow down existing tasks significantly, including
`CountFilteredPhrase` which now runs with `DenseConjunctionBulkScorer` vs. a
`DefaultBulkScorer` on top of a `ConjunctionSc
jpountz opened a new pull request, #14359:
URL: https://github.com/apache/lucene/pull/14359
The main motivation is to efficiently evaluate range queries on fields that
have a doc-value index enabled. These range queries produce two-phase iterators
that should match large contiguous range of
iverase commented on code in PR #14358:
URL: https://github.com/apache/lucene/pull/14358#discussion_r1997607493
##
lucene/core/src/java/org/apache/lucene/util/packed/DirectMonotonicReader.java:
##
@@ -90,102 +140,142 @@ public static DirectMonotonicReader getInstance(Meta
meta,
iverase opened a new pull request, #14358:
URL: https://github.com/apache/lucene/pull/14358
While looking into some heap dumps, I notice in the
DirectMonotonicReader.Meta objects hold by segments that the case of single
value block is actually common. I wondered if we could specialize that
rmuir commented on PR #14360:
URL: https://github.com/apache/lucene/pull/14360#issuecomment-2727698038
Here's the same Automaton, but via `toDot()` tossed into
https://dreampuf.github.io/GraphvizOnline with all defaults. I guess I'm still
a fan of that output style, I feel it is more readab
rmuir commented on PR #14360:
URL: https://github.com/apache/lucene/pull/14360#issuecomment-2727699099
Mermaid definitely doesn't handle infinite automata very well at all:
```mermaid
stateDiagram
direction LR
classDef accept border-width:5px;stroke-width:5px,s
gf2121 merged PR #14203:
URL: https://github.com/apache/lucene/pull/14203
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
jpountz commented on PR #14358:
URL: https://github.com/apache/lucene/pull/14358#issuecomment-2727467298
I'm wary about adding all these micro-optimizations to reduce the
per-segment per-field overhead. They hurt readability and may easily get lost
over time when codecs get replaced with ne
jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727461724
+1 let's use `DisjunctionSumScorerwhich` (which already supports two-phase
iteration) when one of the clauses exposes a non-null two-phase iterator?
--
This is an automated message fro
dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727528269
If one or more DISI has a high cost (irrespective of TPIs), thus matching
many docs, I could see avoiding BS1 as well.
An aside, if we are going to refer to these as BS1 vs BS2, th
jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727629320
In case you missed it, `BooleanScorer` had optimizations recently that make
it hard to beat by `DisjunctionScorer` when clauses are `PostingsEnum`s:
- `DocIdSetIterator#intoBitSet` he
jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727625240
> If one or more DISI has a high cost (irrespective of TPIs), thus matching
many docs, I could see avoiding BS1 as well.
I imagine that your idea is that if most of the cost comes
navneet1v commented on issue #14341:
URL: https://github.com/apache/lucene/issues/14341#issuecomment-2727640413
> What you primarily want in the referenced GH issue is the ability to
filter on more metadata during traversal vs doing a pre filter on the candidate
documents themselves. As Adr
navneet1v commented on issue #14348:
URL: https://github.com/apache/lucene/issues/14348#issuecomment-2727641512
@viliam-durina if you have benchmarks that shows the performance is better
it will be good to raise the PR. Once PR is there maintainers can also do more
tests to see if it is rea
dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727499162
Thanks for your confirmation of the problem. The collect-per-clause is
surprising to me; like what would benefit from that algorithm? Wouldn't that
_only_ be in fact _needed_ if scores
rmuir opened a new pull request, #14360:
URL: https://github.com/apache/lucene/pull/14360
Mermaid is state chart supported within fenced codeblocks by github. For
some reason it doesn't support dotty but instead the latest js tool. I'm sure
in 2 months it will be a different tool.
Be
jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727502419
BS2 uses a heap to merge multiple `DocIdSetIterator`s. Unfortunately,
reordering this heap on every call to `nextDoc()` or `advance(int)` is not
completely free and BS1's approach of loa
iverase commented on PR #14358:
URL: https://github.com/apache/lucene/pull/14358#issuecomment-2728327145
I understand what you say, I will close this then.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
iverase commented on PR #14340:
URL: https://github.com/apache/lucene/pull/14340#issuecomment-2728329058
See here
https://github.com/apache/lucene/pull/14358#issuecomment-2727467298, I will
close this.
--
This is an automated message from the Apache Git Service.
To respond to the message
dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2728182572
I could imagine improving BooleanScorer so that the TPI clauses are
separated and converted to a filter around the collector to try to match docs
*not* collected (i.e. test for docs inbe
Zona-hu commented on issue #14180:
URL: https://github.com/apache/lucene/issues/14180#issuecomment-2728315199
> ### 描述
> 相关:[#14167](https://github.com/apache/lucene/pull/14167)
>
> 但是,除了多叶收集(例如信息共享)之外,对多个段进行多线程搜索也可以在低值下获得一致的结果`k`。
>
> 有可能获得更一致的结果,并且可能通过简单地收集更多邻居(`k`在查询中、`fan
iverase closed pull request #14340: Reduce Lucene90DocValuesProducer memory
footprint
URL: https://github.com/apache/lucene/pull/14340
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
vigyasharma commented on PR #14325:
URL: https://github.com/apache/lucene/pull/14325#issuecomment-2728289419
+1 to Adrien's comment, IndexDeletionPolicy can quite easily be implemented
and configured by users in IndexWriterConfig. It if often configured outside of
Lucene too, like the
[Com
jpountz commented on code in PR #14203:
URL: https://github.com/apache/lucene/pull/14203#discussion_r1997535617
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PointsWriter.java:
##
@@ -105,15 +107,22 @@ public Lucene90PointsWriter(
}
}
+ public Luce
gf2121 commented on code in PR #14333:
URL: https://github.com/apache/lucene/pull/14333#discussion_r1997503104
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java:
##
@@ -0,0 +1,486 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one o
gf2121 commented on code in PR #14203:
URL: https://github.com/apache/lucene/pull/14203#discussion_r1997571999
##
lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java:
##
@@ -248,21 +281,68 @@ private void readBitSet(IndexInput in, int count, int[]
docIDs) throws I
26 matches
Mail list logo