A > B > C on both polls. Having talked to several users in the community that are highly excited about this change, this gets to what developers want to do at Cassandra scale: store embeddings and retrieve them.
On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña <adelap...@apache.org> wrote: > A > B > C > > I don't think that ML is such a niche application that it can't have its > own CQL data type. Also, vectors are mathematical elements that have more > applications that ML. > > On Tue, 2 May 2023 at 19:15, Mick Semb Wever <m...@apache.org> wrote: > >> >> >> On Tue, 2 May 2023 at 17:14, Jonathan Ellis <jbel...@gmail.com> wrote: >> >>> Should we add a vector type to Cassandra designed to meet the needs of >>> machine learning use cases, specifically feature and embedding vectors for >>> training, inference, and vector search? >>> >>> ML vectors are fixed-dimension (fixed-length) sequences of numeric >>> types, with no nulls allowed, and with no need for random access. The ML >>> industry overwhelmingly uses float32 vectors, to the point that the >>> industry-leading special-purpose vector database ONLY supports that data >>> type. >>> >>> This poll is to gauge consensus subsequent to the recent discussion >>> thread at >>> https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0. >>> >>> Please rank the discussed options from most preferred option to least, >>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B >>> = A (C is my preference, followed by B or A approximately equally.) >>> >>> (A) I am in favor of adding a vector type for floats; I do not believe >>> we need to tie it to any particular implementation details. >>> >>> (B) I am okay with adding a vector type but I believe we must add array >>> types that compose with all Cassandra types first, and make vectors a >>> special case of arrays-without-null-elements. >>> >>> (C) I am not in favor of adding a built-in vector type. >>> >> >> >> >> A > B > C >> >> B is stated as "must add array types…". I think this is a bit loaded. >> If B was the (A + the implementation needs to be a non-null frozen float32 >> array, serialisation forward compatible with other frozen arrays later >> implemented) I would put this before (A). Especially because it's been >> shown already this is easy to implement. >> >> >> >