> B) Should we introduce a type that is general purpose, and supports all > Cassandra types, so that this may be used to support ML (and perhaps other) > workloads
I vote B only as well... > On May 2, 2023, at 9:02 AM, Benedict <bened...@apache.org> wrote: > > This is not the poll I thought we would be conducting, and I don’t really > support its framing. There are two parallel questions: what the functionality > should be and how they should be exposed. This poll compresses the > optionality poorly. > > Whether or not we support a “vector” concept (or something isomorphic with > it), the first question this poll wants to answer is: > > A) Should we introduce a new CQL collection type that is unique to ML and > *only* supports float32 > B) Should we introduce a type that is general purpose, and supports all > Cassandra types, so that this may be used to support ML (and perhaps other) > workloads > C) Should we not introduce new types to CQL at all > > For this question, I vote B only. > > Once this question is answered it makes sense to answer how it will be > exposed semantically/syntactically. > > >> On 2 May 2023, at 16:43, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> >> My preference: A > B > C. Vectors are distinct enough from arrays that we >> should not make adding the latter a prerequisite for adding the former. >> >> On Tue, May 2, 2023 at 10:13 AM Jonathan Ellis <jbel...@gmail.com >> <mailto:jbel...@gmail.com>> wrote: >>> Should we add a vector type to Cassandra designed to meet the needs of >>> machine learning use cases, specifically feature and embedding vectors for >>> training, inference, and vector search? >>> >>> ML vectors are fixed-dimension (fixed-length) sequences of numeric types, >>> with no nulls allowed, and with no need for random access. The ML industry >>> overwhelmingly uses float32 vectors, to the point that the industry-leading >>> special-purpose vector database ONLY supports that data type. >>> >>> This poll is to gauge consensus subsequent to the recent discussion thread >>> at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0. >>> >>> Please rank the discussed options from most preferred option to least, >>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B >>> = A (C is my preference, followed by B or A approximately equally.) >>> >>> (A) I am in favor of adding a vector type for floats; I do not believe we >>> need to tie it to any particular implementation details. >>> >>> (B) I am okay with adding a vector type but I believe we must add array >>> types that compose with all Cassandra types first, and make vectors a >>> special case of arrays-without-null-elements. >>> >>> (C) I am not in favor of adding a built-in vector type. >>> >>> -- >>> Jonathan Ellis >>> co-founder, http://www.datastax.com <http://www.datastax.com/> >>> @spyced >> >> >> -- >> Jonathan Ellis >> co-founder, http://www.datastax.com <http://www.datastax.com/> >> @spyced