benwtrent commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2757791987
> Can we expose a graph construction parameter in Lucene99HnswVectorsFormat
to gate the connectComponents() call? This would allow us to mitigate this
issue while a more comprehen
txwei commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2749658069
Can we expose a graph construction parameter in `Lucene99HnswVectorsFormat`
to gate the `connectComponents()` call? This would allow us to mitigate this
issue while a more comprehensi
msokolov commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2698456606
Maybe as a short-term mitigation we should revert or disable the
`connectComponents` impl since its supposed improvements are kind of
theoretical and it comes with a deadly vulnera
benwtrent commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2698475184
The goal of connectComponents is to help graphs that have gaps in their
connectivity. However, when its needed most (e.g. tons of gaps and poor
connectivity), it does more harm th
msokolov commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2698452319
I tried indexing some [NOAA climate
data](https://www.ncei.noaa.gov/products/land-based-station/noaa-global-temp)
that is four-dimensional (temperature over last 150 years for ever
benwtrent commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2675053049
> Sorry I didn't understand this.
@msokolov I mean that during search, we would by default have more neighbors
to look at. Ensuring diversity eagerly means that its possible
msokolov commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2675147250
> I mean that during search, we would by default have more neighbors to look
at.
I see, yes, that's true.
--
This is an automated message from the Apache Git Service.
To
Vikasht34 commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2675065270
@benwtrent As far as I understand from your idea is to use Delaunay
Triangulation and skip connectComponents() ?
--
This is an automated message from the Apache Git Service.
To
msokolov commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2674568528
> I am not sure if its "better" I would assume it makes search slower on
well distributed and distinguished vectors :/
Sorry I didn't understand this. In the normal case, we'
benwtrent commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2674533124
> This means we have to sort at that point IIRC, but it might be a better,
more robust choice?
I am not sure if its "better" I would assume it makes search slower on well
d
msokolov commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2674487026
In the original diversity impl we had allowed the neighbors array to fill
without regard to any diversity criterion, and only started imposing it once
the array was full. This mea
benwtrent commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2671796667
So, these adverse scenarios where connect components has to do a ton of work
all stem from us keeping the graph very sparse (e.g. only connecting diverse
nodes).
I wonder
Vikasht34 commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2657428175
Interesting , let me quickly run those tests my self also to see what would
be impact!! Thanks for logs ..
--
This is an automated message from the Apache Git Service.
To respon
benwtrent commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2654755040
So, verifying the "fewDistinct" slowness, here is how connect components
works in this adverse case:
```
1> HNSW 1 [2025-02-12T20:14:45.641640Z;
TEST-TestKnnFloatVector
Vikasht34 commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2646005402
```
private void connectComponents() {
BitSet visited = new BitSet(graph.size());
List entryPoints = new ArrayList<>();
for (int node = 0; node < gra
tteofili commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2642779168
this can be reproduced with either of the following tests
```java
public void testSameVectorIndexedMultipleTimes() throws IOException {
try (Directory d = newDirecto
16 matches
Mail list logo