I have no problem with `VECTOR` hanging around forever as an alias for
`NON-NULL FROZEN`. Even without ANN, it makes sense and will stick with
new C* users.
A plug-in system would be great, but it shouldn't hold back this work imho.
On Mon, 1 May 2023 at 22:17, Benedict wrote:
> I have expla
To make sure I understand correctly -- are you saying that you're fine with
a vector type, but you want to see it implemented as a special case of
arrays, or that you are not fine with a vector type because you would
prefer to only add arrays and that should be "good enough" for ML?
On Mon, May 1,
If we agree we’re delivering some general purpose array type, that supports all types as elements (ie, is logicaly equivalent to a frozen list of fixed length, however it is actually implemented), I think we are in technical agreement and it’s just a matter of presentation.At which point I think we
Should we add a vector type to Cassandra designed to meet the needs of
machine learning use cases, specifically feature and embedding vectors for
training, inference, and vector search?
ML vectors are fixed-dimension (fixed-length) sequences of numeric types,
with no nulls allowed, and with no nee
Hi folks,
Great stuff thanks for sharing.
The performance numbers I've seen so far are for the sidecar streaming
sstables (seems like this is just network bound?). What kind of perf are
you seeing at the Spark executors (at the per task level)?
--Seb
On Mon, May 1, 2023 at 3:50 PM Dinesh Joshi
My preference: A > B > C. Vectors are distinct enough from arrays that we
should not make adding the latter a prerequisite for adding the former.
On Tue, May 2, 2023 at 10:13 AM Jonathan Ellis wrote:
> Should we add a vector type to Cassandra designed to meet the needs of
> machine learning use
This is not the poll I thought we would be conducting, and I don’t really support its framing. There are two parallel questions: what the functionality should be and how they should be exposed. This poll compresses the optionality poorly.Whether or not we support a “vector” concept (or something is
> B) Should we introduce a type that is general purpose, and supports all
> Cassandra types, so that this may be used to support ML (and perhaps other)
> workloads
I vote B only as well...
> On May 2, 2023, at 9:02 AM, Benedict wrote:
>
> This is not the poll I thought we would be conducting,
On Tue, 2 May 2023 at 17:14, Jonathan Ellis wrote:
> Should we add a vector type to Cassandra designed to meet the needs of
> machine learning use cases, specifically feature and embedding vectors for
> training, inference, and vector search?
>
> ML vectors are fixed-dimension (fixed-length) sequ
A > B > C
I don't think that ML is such a niche application that it can't have its
own CQL data type. Also, vectors are mathematical elements that have more
applications that ML.
On Tue, 2 May 2023 at 19:15, Mick Semb Wever wrote:
>
>
> On Tue, 2 May 2023 at 17:14, Jonathan Ellis wrote:
>
>> S
A > B > C on both polls.
Having talked to several users in the community that are highly excited
about this change, this gets to what developers want to do at Cassandra
scale: store embeddings and retrieve them.
On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña
wrote:
> A > B > C
>
> I don't th
Could folk voting against a general purpose type (that could well be called a vector) briefly explain their reasoning?We established in the other thread that it’s technically trivial, meaning folk must think it is strictly superior to only support float rather than eg all numeric types (note: for t
It is line rate / network bound. We have a patch out in vert.x that should use
the zero copy path for it. But it's not a strict prereq for it.
On 2023/05/02 15:39:02 Sebastian Estevez wrote:
> Hi folks,
>
> Great stuff thanks for sharing.
>
> The performance numbers I've seen so far are for the
Hey Dinesh,
Yeah it makes sense that the sstable streaming is network bound since it's
mostly just moving files.
Do you have any performance stats on the sstable parsing side inside spark?
--Seb
On Tue, May 2, 2023 at 3:31 PM Dinesh Joshi wrote:
> It is line rate / network bound. We have a pa
I'll speak up on that one. If you look at my ranked voting, that is where
my head is. I get accused of scope creep (a lot) and looking at the initial
proposal Jonathan put on the ML it was mostly "Developers are adopting
vector search at a furious pace and I think I have a simple way of adding
supp
But it’s so trivial it was already implemented by David in the span of ten minutes? If anything, we’re slowing progress down by refusing to do the extra types, as we’re busy arguing about it rather than delivering a feature?FWIW, my interpretation of the votes today is that we SHOULD NOT (ever) sup
Yeah, it's a bit of a mess but mailing list yo. People reading this would
have no idea we are friends. ;) (Which we are, for anyone reading this
later!)
I must have missed the point of this already being done. How about it,
David? Did you already make this?
"FWIW, my interpretation of the votes t
I'm all for bringing more functionality to the masses sooner, but the original
idea has a very very specific use case. Do we have use cases for a general
purpose Vector/Array data structure? If so, awesome. I just wondered if
generalizing provides value, beyond being straightforward to implem
> How about it, David? Did you already make this?
I checked out the patch, fixed serialize/deserialize, added the constraints,
then added a composeForFloat(ByteBuffer), with this the impact to the POC patch
was the following
1) move away from VectorType.instance.serializer().deserialize(bb) to
We're reusing existing Cassandra code so the performance characteristics for
parsing should be the same as Cassandra. I will need to check if we have
benchmarks. If we do, we'll add it to the CEP wiki page.
On 2023/05/02 19:52:28 Sebastian Estevez wrote:
> Hey Dinesh,
>
> Yeah it makes sense th
I had a call with David. We agreed that we want a "vector" data type with
these properties
- Fixed length
- No nulls
- Random access not supported
Where we disagreed was on my proposal to restrict vectors to only numeric
data. David's points were that
(1) He has a use case today for a data typ
I'm also in favor of having a general data type that is not tied to numeric
data types alone.
On 2023/05/02 22:27:24 Jonathan Ellis wrote:
> I had a call with David. We agreed that we want a "vector" data type with
> these properties
>
> - Fixed length
> - No nulls
> - Random access not support
\o/
Bring it in team. Group hug.
Now if you'll excuse me, I'm going to go build my preso on how Cassandra is
the only distributed database you can do vector search in an ACID
transaction.
Patrick
On Tue, May 2, 2023 at 3:27 PM Jonathan Ellis wrote:
> I had a call with David. We agreed that w
23 matches
Mail list logo