Thanks for the update, LGTM
> On May 17, 2023, at 5:35 AM, Jasonstack Zhao Yang
> wrote:
>
> Hi,
>
> I have updated the CEP with some details about distributed queries in the
> Approach section.
>
> David:
>
> > given results have a real ranking, the current 2i logic may yield incorrect
>
Hi,
I have updated the CEP with some details about distributed queries in the
*Approach* section.
David:
> given results have a real ranking, the current 2i logic may yield
incorrect results
C* internal iterators are all in primary key order. So we need two
in-memory top-k filters, one at repli
Just wanted to add that I don't have any special knowledge of CEP-30 beyond
what Jonathan posted and just trying to help clarify and answer questions as I
can with some knowledge and experience from DSE Search and SAI. Thanks to
Caleb for helping validate some things as well. And to be clear a
I talked to David and some others in slack to hopefully clarify:
With SAI, can you have partial results? When you have a query that is non-key
based, you need to have full token range coverage of the results. If that
isn't possible, will Vector Search/SAI return partial results?
Anything can
Anyone on this ML who still remembers DSE Search (or has experience w/
Elastic or SolrCloud) probably also knows that there are some significant
pieces of an optimized scatter/gather apparatus for IR (even without
sorting, which also doesn't exist yet) that do not exist in C* or it's
range query sy
HNSW can in principle be made into a distributed index. But that would be quite a different paradigm to SAI.On 9 May 2023, at 19:30, Patrick McFadin wrote:Under the goals section, there is this line:Scatter/gather across replicas, combining topK from each to get global topK.But what I'm hearing i
Under the goals section, there is this line:
1. Scatter/gather across replicas, combining topK from each to get
global topK.
But what I'm hearing is, exactly how will that happen? Maybe this is an SAI
question too. How is that verified in SAI?
On Tue, May 9, 2023 at 11:07 AM David Capwel
Approach section doesn’t go over how this will handle cross replica search,
this would be good to flesh out… given results have a real ranking, the current
2i logic may yield incorrect results… so would think we need num_ranges / rf
queries in the best case, with some new capability to sort the
Hi all,
Following the recent discussion threads, I would like to propose CEP-30 to
add Approximate Nearest Neighbor (ANN) Vector Search via Storage-Attached
Indexes (SAI) to Apache Cassandra.
The primary goal of this proposal is to implement ANN vector search
capabilities, making Cassandra more u