ChrisHegarty commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1687664305
##
lucene/core/src/java21/org/apache/lucene/store/RefCountedSharedArena.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
ChrisHegarty commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1687665071
##
lucene/core/src/java21/org/apache/lucene/store/RefCountedSharedArena.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
ChrisHegarty commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1687726079
##
lucene/core/src/java21/org/apache/lucene/store/RefCountedSharedArena.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
tteofili commented on issue #13565:
URL: https://github.com/apache/lucene/issues/13565#issuecomment-2244713847
this [paper](https://dl.acm.org/doi/abs/10.1145/3626772.3657906) from
SIGIR'24 seems to do exactly this as a first step in their Block Max Pruning
technique.
--
This is an autom
iverase commented on PR #13592:
URL: https://github.com/apache/lucene/pull/13592#issuecomment-2244734279
I tried to find a generic method that could help here but I think the logic
relies too much on the fact that the index is sorted.
For example, your mental model somewhat breaks if
epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1687770185
##
lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java:
##
@@ -0,0 +1,714 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
javanna opened a new pull request, #13600:
URL: https://github.com/apache/lucene/pull/13600
IndexSearcher#search(Query, Collector) is deprecated and leftover usages
should be removed. This addresses one usage in TestTopDocsCollector.
--
This is an automated message from the Apache Git Ser
javanna opened a new pull request, #13601:
URL: https://github.com/apache/lucene/pull/13601
IndexSearcher#search(Query, Collector) is deprecated and leftover usages
should be removed. This addresses one usage in TestTopDocsCollector.
--
This is an automated message from the Apache Git
mikemccand commented on issue #13565:
URL: https://github.com/apache/lucene/issues/13565#issuecomment-2244785292
> This is what got me to thinking of BP for HNSW search: intuitively, it
could help a lot when the dataset size exceeds the size of the page cache?
I think that gains might
javanna opened a new pull request, #13602:
URL: https://github.com/apache/lucene/pull/13602
This commit modifies ReadTask to no longer call the deprecated search(Query,
Collector). Instead, it creates a collector manager and calls search(Query,
CollectorManager).
The existing protect
javanna commented on code in PR #13602:
URL: https://github.com/apache/lucene/pull/13602#discussion_r1687812812
##
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/SearchWithCollectorTask.java:
##
@@ -46,17 +46,17 @@ public boolean withCollector() {
}
@
javanna commented on code in PR #13602:
URL: https://github.com/apache/lucene/pull/13602#discussion_r1687812812
##
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/SearchWithCollectorTask.java:
##
@@ -46,17 +46,17 @@ public boolean withCollector() {
}
@
javanna opened a new pull request, #13603:
URL: https://github.com/apache/lucene/pull/13603
There's a couple of places in the codebase where we extend IndexSearcher to
customize per leaf behaviour, and in order to do that, we need to override the
entire search method that loops through the
javanna commented on code in PR #13603:
URL: https://github.com/apache/lucene/pull/13603#discussion_r1687837747
##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -694,40 +695,56 @@ protected void search(List leaves,
Weight weight, Collector c
// th
mikemccand commented on code in PR #13054:
URL: https://github.com/apache/lucene/pull/13054#discussion_r1687872625
##
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMapDirectory.java:
##
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundat
benwtrent commented on PR #13525:
URL: https://github.com/apache/lucene/pull/13525#issuecomment-2244972179
@vigyasharma
> do we have any existing benchmarks for ParentJoin queries in knn?
No, we do not. I ended up writing a bunch of throw away code to benchmark
latency and rec
ChrisHegarty commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1687920309
##
lucene/core/src/java21/org/apache/lucene/store/RefCountedSharedArena.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
uschindler commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1687970721
##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -83,6 +93,26 @@ public class MMapDirectory extends FSDirectory {
*/
public static f
mayya-sharipova opened a new pull request, #13604:
URL: https://github.com/apache/lucene/pull/13604
Implement Kmeans clustering algorithm for vectors.
Knn algorithms that further reduce memory usage of vectors (such as
Product Quantization, RaBitQ etc) require clustering of vectors
mayya-sharipova commented on PR #13604:
URL: https://github.com/apache/lucene/pull/13604#issuecomment-2245155916
I did [benchmarking](https://github.com/mayya-sharipova/kmeans-test) on
MNIST dataset and compared the accuracy with [KMeans
algorithm](https://github.com/mayya-sharipova/kmeans
mikemccand commented on PR #13604:
URL: https://github.com/apache/lucene/pull/13604#issuecomment-2245202448
Whoa, cool! What is (roughly) the run-time of KMeans as a function of
number of vectors? Do you tell it how many clusters to create, or, do you ask
it to keep splitting into more cl
uschindler commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1688093358
##
lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java:
##
@@ -125,4 +135,77 @@ private final MemorySegment[] map(
}
retur
javanna commented on issue #12892:
URL: https://github.com/apache/lucene/issues/12892#issuecomment-2245377653
I have been looking into this, there are unfortunately ~85 leftover usages
of this method. Would be great to clean this up for Lucene 10. I opened a few
small PRs around this.
benwtrent commented on code in PR #13604:
URL: https://github.com/apache/lucene/pull/13604#discussion_r1688239085
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/quantization/KMeans.java:
##
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) u
benwtrent commented on PR #13604:
URL: https://github.com/apache/lucene/pull/13604#issuecomment-2245534266
@mikemccand
The runtime depends on configured number of iterations, restarts, sample
size, and cluster count. But, it can be very fast. I will leave mMayya to talk
about some s
javanna commented on PR #13542:
URL: https://github.com/apache/lucene/pull/13542#issuecomment-2245595764
Pinging @gsmiller as well around the challenges adjusting the facets code I
mentioned
[above](https://github.com/apache/lucene/pull/13542#issuecomment-2243620253).
I've seen collector m
harshavamsi commented on PR #13599:
URL: https://github.com/apache/lucene/pull/13599#issuecomment-2245815063
> I would expect this change to result in a slow down for this type of
queries.
>
> You are proposing to replace the current implementation with a slower one
that computes the
benwtrent commented on issue #13281:
URL: https://github.com/apache/lucene/issues/13281#issuecomment-2245953147
@msokolov @jmazanec15
I don't know of many `int8` models/datasets out there that require cosine.
But, I did a benchmark with Cohere's int8 embeddings here:
https://hugging
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688514148
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java:
##
@@ -0,0 +1,1148 @@
+// This file has been automatically generated, DO NOT EDIT
+
+/*
+ *
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688593960
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java:
##
@@ -0,0 +1,1148 @@
+// This file has been automatically generated, DO NOT EDIT
+
+/*
+ * Li
jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246112137
Skip data at level 0 now stores pointers into pos/pay files instead of
incrementing posPendingCount by the total term freq of the block. This seems to
slow down term queries marginally a
mayya-sharipova commented on code in PR #13604:
URL: https://github.com/apache/lucene/pull/13604#discussion_r1688624775
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/quantization/KMeans.java:
##
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software Foundation (
mikemccand commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246348961
> Also I noticed we would sometimes decode the same block of positions
multiple times when it's shared by two doc blocks (because when moving to the
next doc block we reset the positi
mikemccand commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246352717
Do you have any measure of how many bytes in a big posting is spent on skip
data vs doc/freq blocks?
The gains on the last benchy look awesome! It's surprising
`CountOrHighHig
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688752555
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java:
##
@@ -0,0 +1,1148 @@
+// This file has been automatically generated, DO NOT EDIT
+
+/*
+ *
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688761798
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java:
##
@@ -0,0 +1,597 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1688796043
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/LongRangeFacetCutter.java:
##
@@ -0,0 +1,431 @@
+/*
+ * Licensed to the Apache Software Foundation (A
epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1688796852
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/LongRangeFacetCutter.java:
##
@@ -0,0 +1,431 @@
+/*
+ * Licensed to the Apache Software Foundation (A
epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1688952591
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/IntervalTracker.java:
##
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) u
github-actions[bot] commented on PR #13192:
URL: https://github.com/apache/lucene/pull/13192#issuecomment-2246620817
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1688962446
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/RangeOrdLabelBiMap.java:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
iverase commented on PR #13592:
URL: https://github.com/apache/lucene/pull/13592#issuecomment-2247035303
I introduced the method DocValuesSkipper#advance(long,long) to advance the
iterator to a matching range. wdyt?
--
This is an automated message from the Apache Git Service.
To respond t
iverase commented on PR #13599:
URL: https://github.com/apache/lucene/pull/13599#issuecomment-2247049079
If we look at the current implementation of matches and relates, they both
iterate over the dimensions and they both check if the dimension is disjoint.
If that is true, then they bail o
43 matches
Mail list logo