Re: [POLL] Vector type for ML

David Capwell Tue, 02 May 2023 09:49:45 -0700

> B) Should we introduce a type that is general purpose, and supports all 
> Cassandra types, so that this may be used to support ML (and perhaps other) 
> workloads


I vote B only as well...

> On May 2, 2023, at 9:02 AM, Benedict <bened...@apache.org> wrote:
> 
> This is not the poll I thought we would be conducting, and I don’t really 
> support its framing. There are two parallel questions: what the functionality 
> should be and how they should be exposed. This poll compresses the 
> optionality poorly.
> 
> Whether or not we support a “vector” concept (or something isomorphic with 
> it), the first question this poll wants to answer is:
> 
> A) Should we introduce a new CQL collection type that is unique to ML and 
> *only* supports float32
> B) Should we introduce a type that is general purpose, and supports all 
> Cassandra types, so that this may be used to support ML (and perhaps other) 
> workloads
> C) Should we not introduce new types to CQL at all
> 
> For this question, I vote B only.
> 
> Once this question is answered it makes sense to answer how it will be 
> exposed semantically/syntactically. 
> 
> 
>> On 2 May 2023, at 16:43, Jonathan Ellis <jbel...@gmail.com> wrote:
>> 
>> 
>> My preference: A > B > C.  Vectors are distinct enough from arrays that we 
>> should not make adding the latter a prerequisite for adding the former.
>> 
>> On Tue, May 2, 2023 at 10:13 AM Jonathan Ellis <jbel...@gmail.com 
>> <mailto:jbel...@gmail.com>> wrote:
>>> Should we add a vector type to Cassandra designed to meet the needs of 
>>> machine learning use cases, specifically feature and embedding vectors for 
>>> training, inference, and vector search?  
>>> 
>>> ML vectors are fixed-dimension (fixed-length) sequences of numeric types, 
>>> with no nulls allowed, and with no need for random access. The ML industry 
>>> overwhelmingly uses float32 vectors, to the point that the industry-leading 
>>> special-purpose vector database ONLY supports that data type.
>>> 
>>> This poll is to gauge consensus subsequent to the recent discussion thread 
>>> at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>>> 
>>> Please rank the discussed options from most preferred option to least, 
>>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B 
>>> = A (C is my preference, followed by B or A approximately equally.)
>>> 
>>> (A) I am in favor of adding a vector type for floats; I do not believe we 
>>> need to tie it to any particular implementation details.
>>> 
>>> (B) I am okay with adding a vector type but I believe we must add array 
>>> types that compose with all Cassandra types first, and make vectors a 
>>> special case of arrays-without-null-elements.
>>> 
>>> (C) I am not in favor of adding a built-in vector type.
>>> 
>>> -- 
>>> Jonathan Ellis
>>> co-founder, http://www.datastax.com <http://www.datastax.com/>
>>> @spyced
>> 
>> 
>> -- 
>> Jonathan Ellis
>> co-founder, http://www.datastax.com <http://www.datastax.com/>
>> @spyced

Re: [POLL] Vector type for ML

Reply via email to