HakanBayazitHabes opened a new pull request, #14549:
URL: https://github.com/apache/lucene/pull/14549
### Description
This pull request proposes the addition of several frequently used Turkish
stopwords to the stopwords.txt file. These words are commonly considered
non-informative in
jpountz commented on PR #14532:
URL: https://github.com/apache/lucene/pull/14532#issuecomment-2827499874
Benchmark results with #14550 applied:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p-value
jpountz commented on issue #14536:
URL: https://github.com/apache/lucene/issues/14536#issuecomment-2827511771
In my opinion, it's fine to require doc values to be indexed for faceting to
work. I don't think we should try to support faceting (or sorting) when the
field has a points index but
jpountz opened a new pull request, #14550:
URL: https://github.com/apache/lucene/pull/14550
I had initially introduced `DISIDocIdStream` to avoid introducing
regressions when `DenseConjunctionBulkScorer` started accepting single clauses.
However, benchmarks on #14532 suggested that going th
gf2121 merged PR #14529:
URL: https://github.com/apache/lucene/pull/14529
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
kkewwei opened a new issue, #14551:
URL: https://github.com/apache/lucene/issues/14551
### Description
In #13449, SkipIndex was introduced. It is primarily utilized in range query
like `SortedSetDocValuesRangeQuery` and `SortedNumericDocValuesRangeQuery`. I'm
wondering if we could in
mikemccand commented on issue #14431:
URL: https://github.com/apache/lucene/issues/14431#issuecomment-2828061328
> > I don't know if we are already doing this -- is this TieredMergePolicy's
default behavior (1 -> 1) for forceMergeDeletes? I don't think so?
>
> It's not the default ind
jpountz commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2058189258
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -693,6 +723,34 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int
target) thro
jpountz commented on issue #14545:
URL: https://github.com/apache/lucene/issues/14545#issuecomment-2827604411
If you're interested in storage efficiency, you'd probably want to use a
doc-values format like this one: https://github.com/apache/lucene/issues/11072.
It stores data into blocks o
expani commented on code in PR #14511:
URL: https://github.com/apache/lucene/pull/14511#discussion_r2058428825
##
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##
@@ -282,6 +290,10 @@ public PostingsEnum postings(
@Override
public Im
benwtrent commented on PR #14527:
URL: https://github.com/apache/lucene/pull/14527#issuecomment-2827970657
@weizijun I am running some benchmarking as well. But the key thing is to
update the parameters in `knnPerfTest.py` making sure `numMergeWorker` and
`numMergeThread` are greater than `
weizijun commented on PR #14527:
URL: https://github.com/apache/lucene/pull/14527#issuecomment-2827978234
@benwtrent I see the default parameters:
```
"numMergeWorker": (12,),
"numMergeThread": (4,),
```
Is the current merger effective with these parameters?
--
This
weizijun commented on PR #14527:
URL: https://github.com/apache/lucene/pull/14527#issuecomment-2827261485
> We need to make sure that there are no significant performance or
concurrency bugs introduced with this. Could you test with
https://github.com/mikemccand/luceneutil to verify recall,
jpountz commented on PR #14524:
URL: https://github.com/apache/lucene/pull/14524#issuecomment-2827538594
IMO Lucene should own how queries execute concurrently instead of making it
pluggable. So I'd rather not allow users to pass a custom `TaskExecutor`.
--
This is an automated message fr
jpountz commented on code in PR #14511:
URL: https://github.com/apache/lucene/pull/14511#discussion_r2058399082
##
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##
@@ -1286,14 +1298,11 @@ public long cost() {
@Override
gf2121 opened a new pull request, #14552:
URL: https://github.com/apache/lucene/pull/14552
Speed up the flush of softdelete by intoBitset.
relates: https://github.com/apache/lucene/issues/14521
--
This is an automated message from the Apache Git Service.
To respond to the message, p
jpountz commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2059102197
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abstract
ChrisHegarty commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057896977
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abst
thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abstract
thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abstract
thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abstract
thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abstract
gf2121 commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2057923869
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -693,6 +727,44 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int
target) throw
thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abstract
thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abstract
thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abstract
ChrisHegarty commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057828737
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abst
jpountz commented on PR #14532:
URL: https://github.com/apache/lucene/pull/14532#issuecomment-2826606964
I could confirm that it's due to `#docIdEnd()` now being implemented on the
competitive iterators. This triggers usage of `DISIDocIdStream`, which calls
`nextDoc()` in a loop, and callin
gf2121 commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2057706654
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -491,6 +492,14 @@ public int advance(int target) throws IOException {
return doc;
gf2121 commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2057680596
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -625,6 +634,29 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int
target) throw
gf2121 commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2057720050
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -625,6 +634,29 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int
target) throw
jpountz commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2057729121
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -693,6 +727,44 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int
target) thro
ChrisHegarty commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057896977
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abst
thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abstract
ChrisHegarty commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2057993129
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abst
gf2121 commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2058000145
##
lucene/core/src/test/org/apache/lucene/codecs/lucene90/TestIndexedDISI.java:
##
@@ -555,6 +565,47 @@ private void assertAdvanceExactRandomized(
}
}
+ priv
jpountz commented on PR #14532:
URL: https://github.com/apache/lucene/pull/14532#issuecomment-2826597720
I'm seeing a reproducible slowdown with this change:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p-va
rmuir commented on issue #14408:
URL: https://github.com/apache/lucene/issues/14408#issuecomment-2828267119
IMO it would be better to make a 10.2.2 if you want to do that? Especially
since I don't think it would be a trivial change: I'm concerned that changing
just the "default" would have
jainankitk commented on code in PR #14439:
URL: https://github.com/apache/lucene/pull/14439#discussion_r2059224354
##
lucene/CHANGES.txt:
##
@@ -78,6 +78,8 @@ Optimizations
-
* GITHUB#14418: Quick exit on filter query matching no docs when rewriting knn
qu
jainankitk commented on PR #14516:
URL: https://github.com/apache/lucene/pull/14516#issuecomment-2828864383
> thanks for adding tests and benchies
Thank you for the pointers on jmh benchmark. Helped me reason about and
demonstrate the performance improvements for Histogram Collector P
atris commented on PR #14525:
URL: https://github.com/apache/lucene/pull/14525#issuecomment-2828868856
@jpountz thanks for looking!
Just for my understanding, the first PR should contain the index time
binning logic that is currently in this PR, just with a simpler model of rank
on b
weizijun opened a new issue, #14554:
URL: https://github.com/apache/lucene/issues/14554
### Description
When there are many shards to merge, vector data merging can easily lead to
memory overflow and high CPU cost.
The index.merge.scheduler.max_thread_count parameter can't control
gf2121 commented on PR #14552:
URL: https://github.com/apache/lucene/pull/14552#issuecomment-2829443943
I managed to get some numbers on this change:
```
Baseline: Soft delete flush total took: 30311ms, IndexedDISI#writeBitset
total took: 8324ms.
Candidate: Soft delete flush tot
gf2121 opened a new pull request, #14555:
URL: https://github.com/apache/lucene/pull/14555
`exists` will be set false if `docID() == NO_MORE_DOCS`, while it is allowed
by contract for `intoBitset`.
--
This is an automated message from the Apache Git Service.
To respond to the message, ple
HUSTERGS closed issue #14545: Encode numeric docvalue with per block gcd
URL: https://github.com/apache/lucene/issues/14545
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To u
HUSTERGS commented on issue #14545:
URL: https://github.com/apache/lucene/issues/14545#issuecomment-2829240138
Got it! Thanks for your reply, close the issue for now :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use t
jpountz commented on PR #14543:
URL: https://github.com/apache/lucene/pull/14543#issuecomment-2827497431
Thanks for catching this! For reference, this was introduced in 10.0 (by
me). The change looks good, let me know if you need help writing a test.
--
This is an automated message from t
renatoh commented on PR #14356:
URL: https://github.com/apache/lucene/pull/14356#issuecomment-2828336563
@rmuir Could you please have a look at the PR, it has been open for more
than a month. Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, pl
stefanvodita commented on code in PR #14439:
URL: https://github.com/apache/lucene/pull/14439#discussion_r2059125314
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/PointTreeBulkCollector.java:
##
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Softw
stefanvodita commented on code in PR #14439:
URL: https://github.com/apache/lucene/pull/14439#discussion_r2059130450
##
lucene/CHANGES.txt:
##
@@ -78,6 +78,8 @@ Optimizations
-
* GITHUB#14418: Quick exit on filter query matching no docs when rewriting knn
github-actions[bot] commented on PR #14024:
URL: https://github.com/apache/lucene/pull/14024#issuecomment-2829119993
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
github-actions[bot] commented on PR #14262:
URL: https://github.com/apache/lucene/pull/14262#issuecomment-2829119918
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
weizijun commented on PR #14527:
URL: https://github.com/apache/lucene/pull/14527#issuecomment-2828149196
> I left a comment on the underlying structures used.
>
> Please update CHANGES.txt under 10.3 optimizations for this nice
optimization! It will be very nice to have better heap u
benwtrent commented on code in PR #14527:
URL: https://github.com/apache/lucene/pull/14527#discussion_r2058763024
##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,13 +33,15 @@
public class NeighborArray {
private final boolean scoresDescOrder
Shibi-bala commented on PR #14524:
URL: https://github.com/apache/lucene/pull/14524#issuecomment-2828849510
@jpountz `IMO Lucene should own how queries execute concurrently instead of
making it pluggable` then why allow an executor service to be passed in?
Previously, lucene didn't have con
msokolov commented on PR #14527:
URL: https://github.com/apache/lucene/pull/14527#issuecomment-2828649522
re: concurrency; in theory it should be safe since we lock a row before
inserting anything into it. Consider that even with fixed-size arrays we need
to track the occupancy (how full t
jpountz commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2059104981
##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
public abstract
kashkambath commented on issue #13753:
URL: https://github.com/apache/lucene/issues/13753#issuecomment-2828705042
Hi @javanna,
Your explanation makes sense that intra-segment concurrency can't be
leveraged by the existing `DrillSidewaysScorer` since it goes through all the
docs in a
rmuir commented on issue #14553:
URL: https://github.com/apache/lucene/issues/14553#issuecomment-2828727305
I can help with eclipse formatter. It has a limit that you set in the
preference files and I think the solution is to set it to an obnoxious value
(999 or something like that)
--
T
jpountz commented on PR #14525:
URL: https://github.com/apache/lucene/pull/14525#issuecomment-2828722257
> the implementation is more ambitious
I like ambition, but it also makes this change harder to review/integrate,
especially with the high LOC count. I would suggest splitting this
dweiss commented on issue #14553:
URL: https://github.com/apache/lucene/issues/14553#issuecomment-2828723311
I've adopted and used what opensearch came up with. It would be ideal to
also fix wildcard imports - not just detect them. It should be possible with
intellij or eclipse formatter so
vigyasharma commented on issue #14408:
URL: https://github.com/apache/lucene/issues/14408#issuecomment-2828222009
Should we target a revert of the default ReadAdvice.RANDOM setting, for the
10.2.1 RC that's currently being evaluated?
--
This is an automated message from the Apache Git Ser
benwtrent commented on PR #14527:
URL: https://github.com/apache/lucene/pull/14527#issuecomment-2828219136
> Ok, and about the OnHeapHnswGraph.ramBytesUsed, I didn't change to code.
But it maybe need to change the logic.
I think utilizing the underlying array estimations is what we ne
dweiss commented on issue #14553:
URL: https://github.com/apache/lucene/issues/14553#issuecomment-2828789291
> I can help with eclipse formatter. It has a limit that you set in the
preference files and I think the solution is to set it to an obnoxious value
(999 or something like that)
64 matches
Mail list logo