I think we need to briefly step back and think about what the syntax means and how it fits into existing syntax. It seems that the dimensionality verbiage assumes we’re logically introducing N vector fields, so that each row adopts a value for all of the vector fields or none. But in practice we are actually introducing a fixed-length frozen list in Cassandra terms, and our API treats this as a per-row array/vector rather than a number of column vectors. My inclination then would be to say you declare an ARRAY<FLOAT, N> (which is semantic sugar for FROZEN<LIST<FLOAT, N>>). This is very consistent with our existing style. We then simply permit such columns to define ANN indexes. Otherwise, I think we should lean into the idea that this is a set of N vectors, as “dimensions" makes limited sense when discussing an array length. In this case I would lean towards declaring e.g. 1500 FLOAT VECTORS, maybe. But then I think we should reconsider our presentation a little, and perhaps the result set should treat each vector as a separate field (or something like this).
|
- [DISCUSS] New data type for vector search Jonathan Ellis
- Re: [DISCUSS] New data type for vector search David Capwell
- Re: [DISCUSS] New data type for vector sea... David Capwell
- Re: [DISCUSS] New data type for vector search Benedict Elliott Smith
- Re: [DISCUSS] New data type for vector sea... Mick Semb Wever
- Re: [DISCUSS] New data type for vector... Andrés de la Peña
- Re: [DISCUSS] New data type for ve... Jonathan Ellis
- Re: [DISCUSS] New data type f... Jeff Jirsa
- Re: [DISCUSS] New data ty... Benedict
- Re: [DISCUSS] New dat... David Capwell
- Re: [DISCUSS] New dat... Josh McKenzie
- Re: [DISCUSS] New dat... Anthony Grasso
- Re: [DISCUSS] New dat... Caleb Rackliffe
- Re: [DISCUSS] New dat... steve landiss via dev