[GitHub] [lucene] Pulkitg64 opened a new issue, #12414: Wrap MatchedDocs and LiveDocs in a single Bits instance in createBitSet function of AbstractKnnVectorQuery

2023-07-04 Thread via GitHub
Pulkitg64 opened a new issue, #12414: URL: https://github.com/apache/lucene/issues/12414 ### Description **Context** In KNN Prefiltering we require a ```BitSet``` of docs which matched with prefilter query. This BitSet is used during HNSW Graph Search to check whether a node i

[GitHub] [lucene] jpountz commented on issue #12394: Add the ability to compute vector similarity scores with the new ValuesSource API

2023-07-04 Thread via GitHub
jpountz commented on issue #12394: URL: https://github.com/apache/lucene/issues/12394#issuecomment-1620395945 The two values source APIs are very different, so here's a proposal for new method signatures with the new API: `DoubleValuesSource distanceFromQueryVector(float[] queryVector, Stri

[GitHub] [lucene] jpountz commented on pull request #12413: Fix HNSW graph visitation limit bug

2023-07-04 Thread via GitHub
jpountz commented on PR #12413: URL: https://github.com/apache/lucene/pull/12413#issuecomment-1620500383 Intuitively, it sounds like a good approach to me to not take live docs into account to find good entry points, as there could be nodes that are good entry points even though they might

[GitHub] [lucene] benwtrent commented on pull request #12413: Fix HNSW graph visitation limit bug

2023-07-04 Thread via GitHub
benwtrent commented on PR #12413: URL: https://github.com/apache/lucene/pull/12413#issuecomment-1620506450 > Should we consider never exiting before hitting the zero-th level instead? 🤔 The idea is that if we cannot even get to the zeroth level before hitting the visitation lim

[GitHub] [lucene] jpountz commented on pull request #12413: Fix HNSW graph visitation limit bug

2023-07-04 Thread via GitHub
jpountz commented on PR #12413: URL: https://github.com/apache/lucene/pull/12413#issuecomment-1620516800 Sorry, I commented too quickly, before understanding what your change was doing, I thought it was ignoring filtered out ords on levels > 0 at first. Your change makes sense to me now.

[GitHub] [lucene] ChrisHegarty commented on issue #12396: Make ForUtil Vectorized

2023-07-04 Thread via GitHub
ChrisHegarty commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1620696514 I would like to pause a little, double check where we're going, and reset if needed. It's become clear to me that we're not quite aligned. The main issue I see is with th

[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized

2023-07-04 Thread via GitHub
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1620744388 I agree. There are more complications: DataInput does not have a read method for int[], only one for float[] and long[]. So changing this is a bigger task. I tend to think that w

[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized

2023-07-04 Thread via GitHub
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1620965832 Switching from LongVector to IntVector is feasible, especially for 128-bit, as there are only a few modifications needed. Special handling may be required for 256-bit and 512-bit, b

[GitHub] [lucene] jpountz commented on issue #12396: Make ForUtil Vectorized

2023-07-04 Thread via GitHub
jpountz commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1621145444 > I would like to know if we can tolerate some performance loss in the scalar code if we switch the compression format. What is the minimum performance threshold we can accept?