kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2621481909
### Description
1. Separate Faiss indexes are maintained per-segment per-field, in line with
Lucene's architecture (and the current vector format)
2. Vectors are buffered in memory
ChrisHegarty opened a new pull request, #14179:
URL: https://github.com/apache/lucene/pull/14179
This commit avoids setting the security manager on JDK 24+ - since it is not
longer possible to enable it in JDK 24+
This is the minimum required to start testing with JDK 24 EA.
--
Thi
stefanvodita closed issue #1852: SpanWithinQuery - A SpanNotQuery that allows a
specified number of intersections [LUCENE-777]
URL: https://github.com/apache/lucene/issues/1852
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
stefanvodita commented on issue #1852:
URL: https://github.com/apache/lucene/issues/1852#issuecomment-2621404654
`SpanWithinQuery` got added as part of #7145. Resolving.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
cpoerschke commented on code in PR #14170:
URL: https://github.com/apache/lucene/pull/14170#discussion_r1933824084
##
lucene/core/src/java/org/apache/lucene/util/hnsw/SeededHnswGraphSearcher.java:
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
cpoerschke commented on code in PR #14170:
URL: https://github.com/apache/lucene/pull/14170#discussion_r1933828372
##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingNearestChildrenKnnCollector.java:
##
@@ -42,7 +43,20 @@ class DiversifyingNearestChildrenKnnCollec
kaivalnp opened a new pull request, #14178:
URL: https://github.com/apache/lucene/pull/14178
### Description
Faiss (https://github.com/facebookresearch/faiss) is _"a library for
efficient similarity search and clustering of dense vectors"_
It supports various features like vect
stefanvodita closed issue #1847: Index corruption can cause infinite spin loop
when Lucene attempts to incorrectly uncompress fields [LUCENE-772]
URL: https://github.com/apache/lucene/issues/1847
--
This is an automated message from the Apache Git Service.
To respond to the message, please lo
stefanvodita commented on issue #1847:
URL: https://github.com/apache/lucene/issues/1847#issuecomment-2621354936
Closing based on Uwe's assessment. Doesn't seem like this is getting a fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
benwtrent opened a new issue, #14180:
URL: https://github.com/apache/lucene/issues/14180
### Description
Related to: https://github.com/apache/lucene/pull/14167
But multi-threaded search over multiple segments in addition to multi-leaf
collection (e.g. information sharing) can
benwtrent commented on PR #14167:
URL: https://github.com/apache/lucene/pull/14167#issuecomment-2621748238
To aid in the conversation, I opened an issue:
https://github.com/apache/lucene/issues/14180
I plan on merging this new test, but with the multi-threaded case muted
until we can
benwtrent commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2621652581
> For parentJoin benchmark run on main, there is a visible drop in recall
when I disable merges (as compared to a main branch run with merges enabled).
Is this expected?
I wonde
benwtrent merged PR #14172:
URL: https://github.com/apache/lucene/pull/14172
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
benwtrent closed issue #13640: testMergeStability failing for Knn formats
URL: https://github.com/apache/lucene/issues/13640
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
mikemccand commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2623225740
> > should report total CPU cycles consumed during indexing and searching
(summed across all threads)...
>
> @mikemccand that would help these higher level multithreaded perform
mikemccand commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2623221564
> If I have learned one thing over the years, it's that benchmarking
accurately is very difficult!
Amen to that!!
--
This is an automated message from the Apache Git Service.
mikemccand commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622774476
Really, `luceneutil` should report total CPU cycles consumed during indexing
and searching (summed across all threads)... I'll open an issue for this.
--
This is an automated messag
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622861110
@benwtrent Thanks for the input! I tried what you mentioned above:
> I would reduce the number of indexing threads to 1, faiss threads to 1,
and merge workers to 1
Lucene:
`
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622749459
@kaivalnp the force-merge time indicates that during merge to a single
segment, the index is being rebuilt from various segments. I would think that
the `force-merge` time itself is mo
mikemccand opened a new issue, #14182:
URL: https://github.com/apache/lucene/issues/14182
### Description
When trying to understand why a shard seems to not do a good job merging,
it's surprisingly difficult to gain visibility / understanding. E.g. cases
like https://github.com/apac
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1934791780
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundati
pseudo-nymous commented on code in PR #14101:
URL: https://github.com/apache/lucene/pull/14101#discussion_r1935129051
##
.github/labeler.yml:
##
@@ -0,0 +1,134 @@
+# This file defines module label mappings for the Lucene project.
+# Each module is associated with a set of file g
pseudo-nymous commented on code in PR #14101:
URL: https://github.com/apache/lucene/pull/14101#discussion_r1935130586
##
.github/labeler.yml:
##
@@ -0,0 +1,134 @@
+# This file defines module label mappings for the Lucene project.
+# Each module is associated with a set of file g
pseudo-nymous commented on code in PR #14101:
URL: https://github.com/apache/lucene/pull/14101#discussion_r1935132629
##
.github/workflows/label-pull-request.yml:
##
@@ -0,0 +1,21 @@
+# This file defines the workflow for labeling pull requests with module tags
based on the chan
ChrisHegarty merged PR #14179:
URL: https://github.com/apache/lucene/pull/14179
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucen
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622880879
@kaivalnp 😌
I was worried that we had some serious outstanding performance bug that has
been missed in Lucene!
Conceptually, it makes sense that the performance of buildi
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622885089
> number of vector operations that FAISS does during search.
By this, I mean the number of vectors it must visit when searching the graph.
--
This is an automated message from
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1934885034
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1934885034
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1934885034
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1935086492
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsFormat.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1935099798
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundatio
benwtrent opened a new pull request, #14181:
URL: https://github.com/apache/lucene/pull/14181
As stated by @ChrisHegarty and @msokolov the amount of garbage we create
during vector index creation is pretty astounding.
This adjusts the interface to allow an "Updateable" random vector
benwtrent commented on code in PR #14181:
URL: https://github.com/apache/lucene/pull/14181#discussion_r1934697182
##
lucene/codecs/src/java/org/apache/lucene/codecs/bitvectors/FlatBitVectorsScorer.java:
##
@@ -58,7 +59,7 @@ public RandomVectorScorer getRandomVectorScorer(
t
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622946390
> FAISS with this vector dimension does seem about 20% faster at search
I should add here that Lucene was using vectorized instructions via Panama,
but the C_API of Faiss was not.
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1934818584
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
sgup432 opened a new issue, #14183:
URL: https://github.com/apache/lucene/issues/14183
### Description
I see there have been many discussions around finding the right value for
skip_factor ([here](https://issues.apache.org/jira/browse/LUCENE-9002) and
https://github.com/apache/lucene
stefanvodita commented on code in PR #14101:
URL: https://github.com/apache/lucene/pull/14101#discussion_r1934819611
##
.github/workflows/label-pull-request.yml:
##
@@ -0,0 +1,21 @@
+# This file defines the workflow for labeling pull requests with module tags
based on the chang
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622798641
> should report total CPU cycles consumed during indexing and searching
(summed across all threads)...
@mikemccand that would help these higher level multithreaded performance
stefanvodita commented on issue #2460:
URL: https://github.com/apache/lucene/issues/2460#issuecomment-2621442718
A lot of the documentation (and code!) has changed since 2008. The
assessment here is great, but no longer holds, e.g. Package.html, FieldSelect,
DateTools.Resolution no longer e
stefanvodita closed issue #2460: Document package javadocs needs improving
[LUCENE-1386]
URL: https://github.com/apache/lucene/issues/2460
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2621638248
Some very interesting numbers @kaivalnp
Almost 10x indexing throughput improvement tells me we are doing something
silly in Lucene. Especially since the search time is only about
benwtrent merged PR #14167:
URL: https://github.com/apache/lucene/pull/14167
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2621943260
> Maybe it can be just as fast by not reading the floating point vectors on
to heap and doing memory segment stuff
Interesting, do we have a Lucene PR that explores it?
> D
jimczi commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622455427
> Almost 10x indexing throughput improvement tells me we are doing something
silly in Lucene.
I did not test this specific integration but Faiss is multithreaded on bulk
training,
cpoerschke commented on code in PR #14170:
URL: https://github.com/apache/lucene/pull/14170#discussion_r1933798309
##
lucene/core/src/java/org/apache/lucene/search/SeededKnnVectorQuery.java:
##
@@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2621529365
### Usage
The new format can be used by:
- "Describing" the index you want, see
https://github.com/facebookresearch/faiss/wiki/The-index-factory
- Setting index parameters,
kaivalnp commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-2621532621
> I'm also tinkering with a Faiss
(https://github.com/facebookresearch/faiss) wrapper
Opened #14178, would appreciate feedback :)
--
This is an automated message from the
cpoerschke commented on code in PR #14170:
URL: https://github.com/apache/lucene/pull/14170#discussion_r1933811338
##
lucene/core/src/java/org/apache/lucene/search/knn/KnnSearchStrategy.java:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
cpoerschke commented on code in PR #14170:
URL: https://github.com/apache/lucene/pull/14170#discussion_r1933812258
##
lucene/core/src/java/org/apache/lucene/search/knn/KnnSearchStrategy.java:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
jimczi commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622578687
> Not as high as 10x anymore, but it is still ~3x faster
Not so easy ;) See the force merge time for Faiss (41.44 s). The force merge
is the time it took to merge the created segmen
stefanvodita commented on PR #14101:
URL: https://github.com/apache/lucene/pull/14101#issuecomment-2622114022
@pseudo-nymous, I'm only seeing this now, sorry! At first glance, it matches
what I had in mind - thank you for addressing that issue! I'll do an in-depth
review soon, but I'd appre
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622613947
Ah I see :)
> The force merge is the time it took to merge the created segments into 1
Does it mean that the Faiss benchmark created a larger number of segments
initially,
navneet1v commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-2622419830
> > I'm also tinkering with a Faiss
(https://github.com/facebookresearch/faiss) wrapper
>
> Opened [#14178](https://github.com/apache/lucene/pull/14178), would
appreciate f
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622538569
> Since Faiss uses multithreading by default, we cannot compare with Lucene
Ah nice catch, the number of threads used by both may be different..
I'm not sure how many thread
55 matches
Mail list logo