iverase commented on PR #13599:
URL: https://github.com/apache/lucene/pull/13599#issuecomment-2247049079
If we look at the current implementation of matches and relates, they both
iterate over the dimensions and they both check if the dimension is disjoint.
If that is true, then they bail o
iverase commented on PR #13592:
URL: https://github.com/apache/lucene/pull/13592#issuecomment-2247035303
I introduced the method DocValuesSkipper#advance(long,long) to advance the
iterator to a matching range. wdyt?
--
This is an automated message from the Apache Git Service.
To respond t
epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1688962446
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/RangeOrdLabelBiMap.java:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
github-actions[bot] commented on PR #13192:
URL: https://github.com/apache/lucene/pull/13192#issuecomment-2246620817
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1688952591
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/IntervalTracker.java:
##
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) u
epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1688796852
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/LongRangeFacetCutter.java:
##
@@ -0,0 +1,431 @@
+/*
+ * Licensed to the Apache Software Foundation (A
epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1688796043
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/LongRangeFacetCutter.java:
##
@@ -0,0 +1,431 @@
+/*
+ * Licensed to the Apache Software Foundation (A
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688761798
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java:
##
@@ -0,0 +1,597 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688752555
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java:
##
@@ -0,0 +1,1148 @@
+// This file has been automatically generated, DO NOT EDIT
+
+/*
+ *
mikemccand commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246352717
Do you have any measure of how many bytes in a big posting is spent on skip
data vs doc/freq blocks?
The gains on the last benchy look awesome! It's surprising
`CountOrHighHig
mikemccand commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246348961
> Also I noticed we would sometimes decode the same block of positions
multiple times when it's shared by two doc blocks (because when moving to the
next doc block we reset the positi
mayya-sharipova commented on code in PR #13604:
URL: https://github.com/apache/lucene/pull/13604#discussion_r1688624775
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/quantization/KMeans.java:
##
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software Foundation (
jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246112137
Skip data at level 0 now stores pointers into pos/pay files instead of
incrementing posPendingCount by the total term freq of the block. This seems to
slow down term queries marginally a
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688593960
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java:
##
@@ -0,0 +1,1148 @@
+// This file has been automatically generated, DO NOT EDIT
+
+/*
+ * Li
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688514148
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java:
##
@@ -0,0 +1,1148 @@
+// This file has been automatically generated, DO NOT EDIT
+
+/*
+ *
benwtrent commented on issue #13281:
URL: https://github.com/apache/lucene/issues/13281#issuecomment-2245953147
@msokolov @jmazanec15
I don't know of many `int8` models/datasets out there that require cosine.
But, I did a benchmark with Cohere's int8 embeddings here:
https://hugging
harshavamsi commented on PR #13599:
URL: https://github.com/apache/lucene/pull/13599#issuecomment-2245815063
> I would expect this change to result in a slow down for this type of
queries.
>
> You are proposing to replace the current implementation with a slower one
that computes the
javanna commented on PR #13542:
URL: https://github.com/apache/lucene/pull/13542#issuecomment-2245595764
Pinging @gsmiller as well around the challenges adjusting the facets code I
mentioned
[above](https://github.com/apache/lucene/pull/13542#issuecomment-2243620253).
I've seen collector m
benwtrent commented on PR #13604:
URL: https://github.com/apache/lucene/pull/13604#issuecomment-2245534266
@mikemccand
The runtime depends on configured number of iterations, restarts, sample
size, and cluster count. But, it can be very fast. I will leave mMayya to talk
about some s
benwtrent commented on code in PR #13604:
URL: https://github.com/apache/lucene/pull/13604#discussion_r1688239085
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/quantization/KMeans.java:
##
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) u
javanna commented on issue #12892:
URL: https://github.com/apache/lucene/issues/12892#issuecomment-2245377653
I have been looking into this, there are unfortunately ~85 leftover usages
of this method. Would be great to clean this up for Lucene 10. I opened a few
small PRs around this.
uschindler commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1688093358
##
lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java:
##
@@ -125,4 +135,77 @@ private final MemorySegment[] map(
}
retur
mikemccand commented on PR #13604:
URL: https://github.com/apache/lucene/pull/13604#issuecomment-2245202448
Whoa, cool! What is (roughly) the run-time of KMeans as a function of
number of vectors? Do you tell it how many clusters to create, or, do you ask
it to keep splitting into more cl
mayya-sharipova commented on PR #13604:
URL: https://github.com/apache/lucene/pull/13604#issuecomment-2245155916
I did [benchmarking](https://github.com/mayya-sharipova/kmeans-test) on
MNIST dataset and compared the accuracy with [KMeans
algorithm](https://github.com/mayya-sharipova/kmeans
mayya-sharipova opened a new pull request, #13604:
URL: https://github.com/apache/lucene/pull/13604
Implement Kmeans clustering algorithm for vectors.
Knn algorithms that further reduce memory usage of vectors (such as
Product Quantization, RaBitQ etc) require clustering of vectors
uschindler commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1687970721
##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -83,6 +93,26 @@ public class MMapDirectory extends FSDirectory {
*/
public static f
ChrisHegarty commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1687920309
##
lucene/core/src/java21/org/apache/lucene/store/RefCountedSharedArena.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
benwtrent commented on PR #13525:
URL: https://github.com/apache/lucene/pull/13525#issuecomment-2244972179
@vigyasharma
> do we have any existing benchmarks for ParentJoin queries in knn?
No, we do not. I ended up writing a bunch of throw away code to benchmark
latency and rec
mikemccand commented on code in PR #13054:
URL: https://github.com/apache/lucene/pull/13054#discussion_r1687872625
##
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMapDirectory.java:
##
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundat
javanna commented on code in PR #13603:
URL: https://github.com/apache/lucene/pull/13603#discussion_r1687837747
##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -694,40 +695,56 @@ protected void search(List leaves,
Weight weight, Collector c
// th
javanna opened a new pull request, #13603:
URL: https://github.com/apache/lucene/pull/13603
There's a couple of places in the codebase where we extend IndexSearcher to
customize per leaf behaviour, and in order to do that, we need to override the
entire search method that loops through the
javanna commented on code in PR #13602:
URL: https://github.com/apache/lucene/pull/13602#discussion_r1687812812
##
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/SearchWithCollectorTask.java:
##
@@ -46,17 +46,17 @@ public boolean withCollector() {
}
@
javanna commented on code in PR #13602:
URL: https://github.com/apache/lucene/pull/13602#discussion_r1687812812
##
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/SearchWithCollectorTask.java:
##
@@ -46,17 +46,17 @@ public boolean withCollector() {
}
@
javanna opened a new pull request, #13602:
URL: https://github.com/apache/lucene/pull/13602
This commit modifies ReadTask to no longer call the deprecated search(Query,
Collector). Instead, it creates a collector manager and calls search(Query,
CollectorManager).
The existing protect
mikemccand commented on issue #13565:
URL: https://github.com/apache/lucene/issues/13565#issuecomment-2244785292
> This is what got me to thinking of BP for HNSW search: intuitively, it
could help a lot when the dataset size exceeds the size of the page cache?
I think that gains might
javanna opened a new pull request, #13601:
URL: https://github.com/apache/lucene/pull/13601
IndexSearcher#search(Query, Collector) is deprecated and leftover usages
should be removed. This addresses one usage in TestTopDocsCollector.
--
This is an automated message from the Apache Git
javanna opened a new pull request, #13600:
URL: https://github.com/apache/lucene/pull/13600
IndexSearcher#search(Query, Collector) is deprecated and leftover usages
should be removed. This addresses one usage in TestTopDocsCollector.
--
This is an automated message from the Apache Git Ser
epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1687770185
##
lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java:
##
@@ -0,0 +1,714 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
iverase commented on PR #13592:
URL: https://github.com/apache/lucene/pull/13592#issuecomment-2244734279
I tried to find a generic method that could help here but I think the logic
relies too much on the fact that the index is sorted.
For example, your mental model somewhat breaks if
tteofili commented on issue #13565:
URL: https://github.com/apache/lucene/issues/13565#issuecomment-2244713847
this [paper](https://dl.acm.org/doi/abs/10.1145/3626772.3657906) from
SIGIR'24 seems to do exactly this as a first step in their Block Max Pruning
technique.
--
This is an autom
ChrisHegarty commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1687726079
##
lucene/core/src/java21/org/apache/lucene/store/RefCountedSharedArena.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
ChrisHegarty commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1687665071
##
lucene/core/src/java21/org/apache/lucene/store/RefCountedSharedArena.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
ChrisHegarty commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1687664305
##
lucene/core/src/java21/org/apache/lucene/store/RefCountedSharedArena.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
43 matches
Mail list logo