Re: [POLL] Vector type for ML

Patrick McFadin Tue, 02 May 2023 12:56:23 -0700

I'll speak up on that one. If you look at my ranked voting, that is where
my head is. I get accused of scope creep (a lot) and looking at the initial
proposal Jonathan put on the ML it was mostly "Developers are adopting
vector search at a furious pace and I think I have a simple way of adding
support to keep Cassandra relevant for these use cases" Instead of just
focusing on this use case, I feel the arguments have bike shedded into
scope creep which means it will take forever to get into the project.


My preference is to see one thing validated with an MVP and get it into the
hands of developers sooner so we can continue to iterate based on actual
usage.

It doesn't say your points are wrong or your opinions are broken, I'm
voting for what I think will be awesome for users sooner.

Patrick

On Tue, May 2, 2023 at 12:29 PM Benedict <[email protected]> wrote:

> Could folk voting against a general purpose type (that could well be
> called a vector) briefly explain their reasoning?
>
> We established in the other thread that it’s technically trivial, meaning
> folk must think it is strictly superior to only support float rather than
> eg all numeric types (note: for the type, not the ANN).
>
> I am surprised, and the blurbs accompanying votes so far don’t seem to
> touch on this, mostly just endorsing the idea of a vector.
>
>
> On 2 May 2023, at 20:20, Patrick McFadin <[email protected]> wrote:
>
> 
> A > B > C on both polls.
>
> Having talked to several users in the community that are highly excited
> about this change, this gets to what developers want to do at Cassandra
> scale: store embeddings and retrieve them.
>
> On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña <[email protected]>
> wrote:
>
>> A > B > C
>>
>> I don't think that ML is such a niche application that it can't have its
>> own CQL data type. Also, vectors are mathematical elements that have more
>> applications that ML.
>>
>> On Tue, 2 May 2023 at 19:15, Mick Semb Wever <[email protected]> wrote:
>>
>>>
>>>
>>> On Tue, 2 May 2023 at 17:14, Jonathan Ellis <[email protected]> wrote:
>>>
>>>> Should we add a vector type to Cassandra designed to meet the needs of
>>>> machine learning use cases, specifically feature and embedding vectors for
>>>> training, inference, and vector search?
>>>>
>>>> ML vectors are fixed-dimension (fixed-length) sequences of numeric
>>>> types, with no nulls allowed, and with no need for random access. The ML
>>>> industry overwhelmingly uses float32 vectors, to the point that the
>>>> industry-leading special-purpose vector database ONLY supports that data
>>>> type.
>>>>
>>>> This poll is to gauge consensus subsequent to the recent discussion
>>>> thread at
>>>> https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>>>>
>>>> Please rank the discussed options from most preferred option to least,
>>>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B
>>>> = A (C is my preference, followed by B or A approximately equally.)
>>>>
>>>> (A) I am in favor of adding a vector type for floats; I do not believe
>>>> we need to tie it to any particular implementation details.
>>>>
>>>> (B) I am okay with adding a vector type but I believe we must add array
>>>> types that compose with all Cassandra types first, and make vectors a
>>>> special case of arrays-without-null-elements.
>>>>
>>>> (C) I am not in favor of adding a built-in vector type.
>>>>
>>>
>>>
>>>
>>> A  > B > C
>>>
>>> B is stated as "must add array types…".  I think this is a bit loaded.
>>> If B was the (A + the implementation needs to be a non-null frozen float32
>>> array, serialisation forward compatible with other frozen arrays later
>>> implemented) I would put this before (A).  Especially because it's been
>>> shown already this is easy to implement.
>>>
>>>
>>>
>>

Re: [POLL] Vector type for ML

Reply via email to