https://issues.apache.org/jira/browse/CASSANDRA-18504

> On May 5, 2023, at 12:27 PM, David Capwell <dcapw...@apple.com> wrote:
> 
> Yep, fair point…. SPARSE VECTOR better maps to NON NULL MAP<int32, type>
> 
>> On May 5, 2023, at 11:58 AM, David Capwell <dcapw...@apple.com> wrote:
>> 
>>> If we ever add sparse vectors, we can assume that DENSE is the default and 
>>> allow to use either DENSE, SPARSE or nothing.
>> 
>> I have been feeling that sparse is just a fixed size list with nulls… so 
>> array<type, dimension>… if you insert {0: 42, 3: 17} then you get a array of 
>> [42, null, null, 17]?  One negative doing this is any operator/function that 
>> needs to reify large vectors (lets say 10k elements) you have a ton of 
>> memory due to us making it a array… so a new type could be used to lower 
>> this cost…
>> 
>> With DENSE VECTOR we have the syntax in place that we “could” add SPARSE 
>> later… With VECTOR we will have complications adding a sparse vector after 
>> the fact due to this implying DENSE…
>> 
>> Updated ranking
>> 
>> Syntax
>> Score
>> VECTOR<type, dimension>
>> 21
>> DENSE VECTOR<type, dimension>
>> 12
>> type[dimension]
>> 10
>> NON NULL <type>[dimention]
>> 8
>> VECTOR type[n]
>> 5
>> DENSE_VECTOR<type, dimension>
>> 4
>> NON-NULL FROZEN<type[n]>
>> 3
>> ARRAY<type, n>
>> 1
>> 
>> Syntax
>> Round 1
>> Round 2
>> VECTOR<type, dimension>
>> 4
>> 4
>> DENSE VECTOR<type, dimension>
>> 2
>> 3
>> NON NULL <type>[dimention]
>> 2
>> 1
>> VECTOR type[n]
>> 1
>> 
>> type[dimension]
>> 1
>> 
>> DENSE_VECTOR<type, dimension>
>> 1
>> 
>> NON-NULL FROZEN<type[n]>
>> 1
>> 
>> ARRAY<type, n>
>> 0
>> 
>> 
>> VECTOR<type, dimension> is still in the lead…
>> 
>>> On May 5, 2023, at 11:40 AM, Andrés de la Peña <adelap...@apache.org> wrote:
>>> 
>>> My vote is:
>>> 
>>> 1. VECTOR<type, dimension>
>>> 2. DENSE VECTOR<type, dimension>
>>> 3. type[dimension]
>>> 
>>> If we ever add sparse vectors, we can assume that DENSE is the default and 
>>> allow to use either DENSE, SPARSE or nothing.
>>> 
>>> Perhaps the dimension could be separated from the type, such as in 
>>> VECTOR<type>[dimension] or VECTOR<type>(dimension).
>>> 
>>> On Fri, 5 May 2023 at 19:05, David Capwell <dcapw...@apple.com 
>>> <mailto:dcapw...@apple.com>> wrote:
>>>>>> ...where, just to be clear, VECTOR<type, dimension> means a frozen fixed 
>>>>>> size array w/ no null values?
>>>>> Assuming this is the case
>>>> 
>>>> The current agreed requirements are:
>>>> 
>>>> 1) non-null elements
>>>> 2) fixed length
>>>> 3) frozen 
>>>> 
>>>> You pointed out 3 isn’t actually required, but that would be a different 
>>>> conversation to remove =)… maybe defer this to JIRA as long as all parties 
>>>> agree in the ticket?
>>>> 
>>>> With all votes in, this is what I see
>>>> 
>>>> Syntax
>>>> Jonathan Ellis
>>>> David Capwell
>>>> Josh McKenzie
>>>> Caleb Rackliffe
>>>> Patrick McFadin
>>>> Brandon Williams
>>>> Mike Adamson
>>>> Benedict
>>>> Mick Semb Wever
>>>> Derek Chen-Becker
>>>> VECTOR<type, dimension>
>>>> 1
>>>> 2
>>>> 2
>>>> 
>>>> 2
>>>> 1
>>>> 1
>>>> 3
>>>> 2
>>>> 
>>>> DENSE VECTOR<type, dimension>
>>>> 2
>>>> 1
>>>> 
>>>> 
>>>> 1
>>>> 
>>>> 2
>>>> 
>>>> 
>>>> 
>>>> type[dimension]
>>>> 3
>>>> 3
>>>> 3
>>>> 1
>>>> 
>>>> 3
>>>> 
>>>> 2
>>>> 
>>>> 
>>>> DENSE_VECTOR<type, dimension>
>>>> 
>>>> 
>>>> 1
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 3
>>>> NON NULL <type>[dimention]
>>>> 
>>>> 1
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 1
>>>> 
>>>> 2
>>>> VECTOR type[n]
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 2
>>>> 
>>>> 
>>>> 1
>>>> 
>>>> ARRAY<type, n>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 3
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> NON-NULL FROZEN<type[n]>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 1
>>>> 
>>>> Rank
>>>> Weight
>>>> 1
>>>> 3
>>>> 2
>>>> 2
>>>> 3
>>>> 1
>>>> ?
>>>> 3
>>>> 
>>>> Syntax
>>>> Score
>>>> VECTOR<type, dimension>
>>>> 18
>>>> DENSE VECTOR<type, dimension>
>>>> 10
>>>> type[dimension]
>>>> 9
>>>> NON NULL <type>[dimention]
>>>> 8
>>>> VECTOR type[n]
>>>> 5
>>>> DENSE_VECTOR<type, dimension>
>>>> 4
>>>> NON-NULL FROZEN<type[n]>
>>>> 3
>>>> ARRAY<type, n>
>>>> 1
>>>> 
>>>> 
>>>> Syntax
>>>> Round 1
>>>> Round 2
>>>> VECTOR<type, dimension>
>>>> 3
>>>> 4
>>>> DENSE VECTOR<type, dimension>
>>>> 2
>>>> 2
>>>> NON NULL <type>[dimention]
>>>> 2
>>>> 1
>>>> VECTOR type[n]
>>>> 1
>>>> 
>>>> type[dimension]
>>>> 1
>>>> 
>>>> DENSE_VECTOR<type, dimension>
>>>> 1
>>>> 
>>>> NON-NULL FROZEN<type[n]>
>>>> 1
>>>> 
>>>> ARRAY<type, n>
>>>> 0
>>>> 
>>>> 
>>>> Under 2 different voting systems vector<type, dimension> is in the lead 
>>>> and by a good amount… I have updated the patch locally to reflect this 
>>>> change as well.
>>>> 
>>>>> On May 5, 2023, at 10:41 AM, Mike Adamson <madam...@datastax.com 
>>>>> <mailto:madam...@datastax.com>> wrote:
>>>>> 
>>>>>> ...where, just to be clear, VECTOR<type, dimension> means a frozen fixed 
>>>>>> size array w/ no null values?
>>>>> Assuming this is the case, my vote is:
>>>>> 
>>>>> 1. VECTOR<type, dimension>
>>>>> 2. DENSE VECTOR<type, dimension>
>>>>> 
>>>>> I don't really have a 3rd vote because I think that type[dimension] is 
>>>>> too ambiguous. 
>>>>> 
>>>>> 
>>>>> On Fri, 5 May 2023 at 18:32, Derek Chen-Becker <de...@chen-becker.org 
>>>>> <mailto:de...@chen-becker.org>> wrote:
>>>>>> LOL, I'm holding you to that at the summit :) In all seriousness, I'm 
>>>>>> glad to see a robust debate around it. I guess for completeness, my 
>>>>>> order of preference is 
>>>>>> 
>>>>>> 1 - NONNULL FROZEN<TYPE<N>>
>>>>>> 2 - NONNULL TYPE<N> (which part of this implies frozen? The NONNULL or 
>>>>>> the cardinality?)
>>>>>> 3 - DENSE_VECTOR<type, N>
>>>>>> 
>>>>>> I guess my main concern with just "VECTOR" is that it's such an 
>>>>>> overloaded term. Maybe in ML it means something specific, but for anyone 
>>>>>> coming from C++, Rust, Java, etc, a Vector is both mutable and can carry 
>>>>>> null (or equivalent, e.g. None, in Rust). If the argument hadn't also 
>>>>>> been made that we should be working toward something that's not 
>>>>>> ML-specific maybe I would be less concerned.
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Derek
>>>>>> 
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Derek
>>>>>> 
>>>>>> On Fri, May 5, 2023 at 11:14 AM Patrick McFadin <pmcfa...@gmail.com 
>>>>>> <mailto:pmcfa...@gmail.com>> wrote:
>>>>>>> Derek, despite your preference, I would hang out with you at a party. 
>>>>>>> 
>>>>>>> On Fri, May 5, 2023 at 9:44 AM Derek Chen-Becker <de...@chen-becker.org 
>>>>>>> <mailto:de...@chen-becker.org>> wrote:
>>>>>>>> Speaking as someone who likes Erlang, maybe that's why I also like 
>>>>>>>> NONNULL FROZEN<TYPE<[n]>>. It's unambiguous what Cassandra is going to 
>>>>>>>> do with that type. DENSE VECTOR means I need to go read docs (and then 
>>>>>>>> probably double-check in the source to be sure) to be sure what 
>>>>>>>> exactly is going on. 
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> 
>>>>>>>> Derek
>>>>>>>> 
>>>>>>>> On Fri, May 5, 2023 at 9:54 AM Patrick McFadin <pmcfa...@gmail.com 
>>>>>>>> <mailto:pmcfa...@gmail.com>> wrote:
>>>>>>>>> I hope we are willing to consider developers that use our system 
>>>>>>>>> because if I had to teach people to use "NON-NULL FROZEN<TYPE[n]>" 
>>>>>>>>> I'm pretty sure the response would be:
>>>>>>>>> 
>>>>>>>>> Did you tell me to go write a distributed map-reduce job in Erlang? I 
>>>>>>>>> beleive I did, Bob.  
>>>>>>>>> 
>>>>>>>>> On Fri, May 5, 2023 at 8:05 AM Josh McKenzie <jmcken...@apache.org 
>>>>>>>>> <mailto:jmcken...@apache.org>> wrote:
>>>>>>>>>> Idiomatically, to my mind, there's a question of "what space are we 
>>>>>>>>>> thinking about this datatype in"?
>>>>>>>>>> 
>>>>>>>>>> - In the context of mathematics, nullability in a vector would be 0
>>>>>>>>>> - In the context of Cassandra, nullability tends to mean a tombstone 
>>>>>>>>>> (or nothing)
>>>>>>>>>> - In the context of programming languages, it's all over the place
>>>>>>>>>> 
>>>>>>>>>> Given many models are exploring quantizing to int8 and other data 
>>>>>>>>>> types, there's definitely the "support other data types easily in 
>>>>>>>>>> the future" piece to me we need to keep in mind.
>>>>>>>>>> 
>>>>>>>>>> So with the above and the "meet the user where they are and don't 
>>>>>>>>>> make them understand more of Cassandra than absolutely critical to 
>>>>>>>>>> use it", I lean:
>>>>>>>>>> 
>>>>>>>>>> 1. DENSE_VECTOR<type, dimension>
>>>>>>>>>> 2. VECTOR<type, dimension>
>>>>>>>>>> 3. type[dimension]
>>>>>>>>>> 
>>>>>>>>>> This leaves the path open for us to expand on it in the future with 
>>>>>>>>>> sparse support and allows us to introduce some semantics that 
>>>>>>>>>> indicate idioms around nullability for the users coming from a 
>>>>>>>>>> different space.
>>>>>>>>>> 
>>>>>>>>>> "NON-NULL FROZEN<TYPE[n]>" is strictly correct, however it requires 
>>>>>>>>>> understanding idioms of how Cassandra thinks about data (nulls mean 
>>>>>>>>>> different things to us, we have differences between frozen and 
>>>>>>>>>> non-frozen due to constraints in our storage engine and 
>>>>>>>>>> materialization of data, etc) that get in the way of users doing 
>>>>>>>>>> things in the pattern they're familiar with without learning more 
>>>>>>>>>> about the DB than they're probably looking to learn. Historically 
>>>>>>>>>> this has been a challenge for us in adoption; the classic "Why can't 
>>>>>>>>>> I just write and delete and write as much as I want? Why are deletes 
>>>>>>>>>> filling up my disk?" problem comes to mind.
>>>>>>>>>> 
>>>>>>>>>> I'd also be happy with us supporting:
>>>>>>>>>> * NON-NULL FROZEN<TYPE[n]>
>>>>>>>>>> * DENSE_VECTOR<type, dimension> as syntactic sugar for the above
>>>>>>>>>> 
>>>>>>>>>> If getting into the "built-in syntactic sugar mapping for 
>>>>>>>>>> communities and specific use-cases" is something we're willing to 
>>>>>>>>>> consider.
>>>>>>>>>> 
>>>>>>>>>> On Fri, May 5, 2023, at 7:26 AM, Patrick McFadin wrote:
>>>>>>>>>>> I think we are still discussing implementation here when I'm 
>>>>>>>>>>> talking about developer experience. I want developers to adopt this 
>>>>>>>>>>> quickly, easily and be successful. Vector search is already a 
>>>>>>>>>>> thing. People use it every day. A successful outcome, in my view, 
>>>>>>>>>>> is developers picking up this feature without reading a manual. 
>>>>>>>>>>> (Because they don't anyway and get in trouble) I did some more 
>>>>>>>>>>> extensive research about what other DBs are using for syntax. The 
>>>>>>>>>>> consensus is some variety of 'VECTOR', 'DENSE' and 'SPARSE'
>>>>>>>>>>> 
>>>>>>>>>>> Pinecone[1] - dense_vector, sparse_vector
>>>>>>>>>>> Elastic[2]: dense_vector
>>>>>>>>>>> Milvus[3]: float_vector, binary_vector
>>>>>>>>>>> pgvector[4]: vector
>>>>>>>>>>> Weaviate[5]: Different approach. All typed arrays can be indexed
>>>>>>>>>>> 
>>>>>>>>>>> Based on that I'm advocating a similar syntax:
>>>>>>>>>>> 
>>>>>>>>>>> - DENSE VECTOR
>>>>>>>>>>> or
>>>>>>>>>>> - VECTOR
>>>>>>>>>>> 
>>>>>>>>>>> [1] https://docs.pinecone.io/docs/hybrid-search 
>>>>>>>>>>> <https://urldefense.com/v3/__https://docs.pinecone.io/docs/hybrid-search__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nGOa1KY4$>
>>>>>>>>>>> [2] 
>>>>>>>>>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
>>>>>>>>>>>  
>>>>>>>>>>> <https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7n--HiUaw$>
>>>>>>>>>>> [3] https://milvus.io/docs/create_collection.md 
>>>>>>>>>>> <https://urldefense.com/v3/__https://milvus.io/docs/create_collection.md__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nQttAKvY$>
>>>>>>>>>>> [4] https://github.com/pgvector/pgvector
>>>>>>>>>>> [5] https://weaviate.io/developers/weaviate/config-refs/datatypes 
>>>>>>>>>>> <https://urldefense.com/v3/__https://weaviate.io/developers/weaviate/config-refs/datatypes__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7n0yKoHLs$>
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, May 5, 2023 at 6:07 AM Mike Adamson <madam...@datastax.com 
>>>>>>>>>>> <mailto:madam...@datastax.com>> wrote:
>>>>>>>>>>> Then we can have the indexing apparatus only accept 
>>>>>>>>>>> frozen<float[n]> for the HSNW case.
>>>>>>>>>>> I'm inclined to agree with Benedict that the index will need to be 
>>>>>>>>>>> specifically select by option rather than inferred based on type. 
>>>>>>>>>>> As such there is no real reason for the frozen requirement on the 
>>>>>>>>>>> type. The hnsw index can be built just as easily from a non-frozen 
>>>>>>>>>>> array.
>>>>>>>>>>> 
>>>>>>>>>>> I am in favour of enforcing non-null on the elements of an array by 
>>>>>>>>>>> default. I would prefer that allowing nulls in the array would be a 
>>>>>>>>>>> later addition if and when a use case arose for it.
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, 5 May 2023 at 03:02, Caleb Rackliffe 
>>>>>>>>>>> <calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> wrote:
>>>>>>>>>>> Even in the ML case, sparse can just mean zeros rather than nulls, 
>>>>>>>>>>> and they should compress similarly anyway.
>>>>>>>>>>> 
>>>>>>>>>>> If we really want null values, I'd rather leave that in collections 
>>>>>>>>>>> space.
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, May 4, 2023 at 8:59 PM Caleb Rackliffe 
>>>>>>>>>>> <calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> wrote:
>>>>>>>>>>> I actually still prefer type[dimension], because I think I 
>>>>>>>>>>> intuitively read this as a primitive (meaning no null elements) 
>>>>>>>>>>> array. Then we can have the indexing apparatus only accept 
>>>>>>>>>>> frozen<float[n]> for the HSNW case.
>>>>>>>>>>> 
>>>>>>>>>>> If that isn't intuitive to anyone else, I don't really have a 
>>>>>>>>>>> strong opinion...but...conflating "frozen" and "dense" seems like a 
>>>>>>>>>>> bad idea. One should indicate single vs. multi-cell, and the other 
>>>>>>>>>>> the presence or absence of nulls/zeros/whatever.
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, May 4, 2023 at 12:51 PM Patrick McFadin <pmcfa...@gmail.com 
>>>>>>>>>>> <mailto:pmcfa...@gmail.com>> wrote:
>>>>>>>>>>> I agree with David's reasoning and the use of DENSE (and maybe 
>>>>>>>>>>> eventually SPARSE). This is terminology well established in the 
>>>>>>>>>>> data world, and it would lead to much easier adoption from users. 
>>>>>>>>>>> VECTOR is close, but I can see having to create a lot of content 
>>>>>>>>>>> around "How to use it and not get in trouble." (I have a lot of 
>>>>>>>>>>> that content already)
>>>>>>>>>>> 
>>>>>>>>>>>  - We don't have to explain what it is. A lot of prior art out 
>>>>>>>>>>> there already [1][2][3]
>>>>>>>>>>>  - We're matching an established term with what users would expect. 
>>>>>>>>>>> No surprises. 
>>>>>>>>>>>  - Shorter ramp-up time for users. Cassandra is being modernized.
>>>>>>>>>>> 
>>>>>>>>>>> The implementation is flexible, but the interface should empower 
>>>>>>>>>>> our users to be awesome. 
>>>>>>>>>>> 
>>>>>>>>>>> Patrick
>>>>>>>>>>> 
>>>>>>>>>>> 1 - 
>>>>>>>>>>> https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks
>>>>>>>>>>>  
>>>>>>>>>>> <https://urldefense.com/v3/__https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ud6ieKGQw$>
>>>>>>>>>>> 2 - 
>>>>>>>>>>> https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035
>>>>>>>>>>>  
>>>>>>>>>>> <https://urldefense.com/v3/__https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ue1o2CO2Q$>
>>>>>>>>>>> 3 - 
>>>>>>>>>>> https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/
>>>>>>>>>>>  
>>>>>>>>>>> <https://urldefense.com/v3/__https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ud3U6Hw5A$>
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, May 4, 2023 at 10:25 AM David Capwell <dcapw...@apple.com 
>>>>>>>>>>> <mailto:dcapw...@apple.com>> wrote:
>>>>>>>>>>> My views have changed over time on syntax and I feel 
>>>>>>>>>>> type[dimention] may not be the best, so it has gone lower in my own 
>>>>>>>>>>> personal ranking… this is my current preference
>>>>>>>>>>> 
>>>>>>>>>>> 1) DENSE <type>[dimention] | NON NULL <type>[dimention]
>>>>>>>>>>> 2) VECTOR<type, dimention>
>>>>>>>>>>> 3) type[dimention]
>>>>>>>>>>> 
>>>>>>>>>>> My reasoning for this order
>>>>>>>>>>> 
>>>>>>>>>>> * type[dimention] looks like syntax sugar for array<type, 
>>>>>>>>>>> dimention>, so users may assume list/array semantics, but we limit 
>>>>>>>>>>> to non-null elements in a frozen array
>>>>>>>>>>> * feel VECTOR as a prefix feels out of place, but VECTOR as a 
>>>>>>>>>>> direct type makes more sense… this also leads to a possible future 
>>>>>>>>>>> of VECTOR<type> which is the non-fixed length version of this type. 
>>>>>>>>>>>  What makes VECTOR different from list/array?  non-null elements 
>>>>>>>>>>> and is frozen.  I don’t feel that VECTOR really tells users to 
>>>>>>>>>>> expect non-null or frozen semantics, as there exists different 
>>>>>>>>>>> VECTOR types for those reasons (sparse vs dense)… 
>>>>>>>>>>> * DENSE may be confusing for people coming from languages where 
>>>>>>>>>>> this just means “sequential layout”, which is what our frozen 
>>>>>>>>>>> array/list already are… but since the target user is coming from a 
>>>>>>>>>>> ML background, this shouldn’t offer much confusion.  DENSE just 
>>>>>>>>>>> means FROZEN in Cassandra, with NON NULL elements (SPARSE allows 
>>>>>>>>>>> for NULL and isn’t frozen)… So DENSE just acts as syntax sugar for 
>>>>>>>>>>> frozen<non null type[dimention]>
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On May 4, 2023, at 4:13 AM, Brandon Williams <dri...@gmail.com 
>>>>>>>>>>>> <mailto:dri...@gmail.com>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. VECTOR<FLOAT,n>
>>>>>>>>>>>> 2. VECTOR FLOAT[n]
>>>>>>>>>>>> 3. FLOAT[N]   (Non null by default)
>>>>>>>>>>>> 
>>>>>>>>>>>> Redundant or not, I think having the VECTOR keyword helps signify 
>>>>>>>>>>>> what
>>>>>>>>>>>> the app is generally about and helps get buy-in from ML 
>>>>>>>>>>>> stakeholders.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, May 4, 2023 at 3:45 AM Benedict <bened...@apache.org 
>>>>>>>>>>>> <mailto:bened...@apache.org>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hurrah for initial agreement.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For syntax, I think one option was just FLOAT[N]. In VECTOR 
>>>>>>>>>>>>> FLOAT[N], VECTOR is redundant - FLOAT[N] is fully descriptive by 
>>>>>>>>>>>>> itself. I don’t think VECTOR should be used to simply imply 
>>>>>>>>>>>>> non-null, as this would be very unintuitive. More logical would 
>>>>>>>>>>>>> be NONNULL, if this is the only condition being applied. 
>>>>>>>>>>>>> Alternatively for arrays we could default to NONNULL and later 
>>>>>>>>>>>>> introduce NULLABLE if we want to permit nulls.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If the word vector is to be used it makes more sense to make it 
>>>>>>>>>>>>> look like a list, so VECTOR<FLOAT, N> as here the word VECTOR is 
>>>>>>>>>>>>> clearly not redundant.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So, I vote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1) (NON NULL) FLOAT[N]
>>>>>>>>>>>>> 2) FLOAT[N]   (Non null by default)
>>>>>>>>>>>>> 3) VECTOR<FLOAT, N>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 4 May 2023, at 08:52, Mick Semb Wever <m...@apache.org 
>>>>>>>>>>>>> <mailto:m...@apache.org>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Did we agree on a CQL syntax?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I don’t believe there has been a pool on CQL syntax… my 
>>>>>>>>>>>>>> understanding reading all the threads is that there are ~4-5 
>>>>>>>>>>>>>> options and non are -1ed, so believe we are waiting for majority 
>>>>>>>>>>>>>> rule on this?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Re-reading that thread, IIUC the valid choices remaining are…
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. VECTOR FLOAT[n]
>>>>>>>>>>>>> 2. FLOAT VECTOR[n]
>>>>>>>>>>>>> 3. VECTOR<FLOAT,n>
>>>>>>>>>>>>> 4. VECTOR[n]<FLOAT>
>>>>>>>>>>>>> 5. ARRAY<FLOAT, n>
>>>>>>>>>>>>> 6. NON-NULL FROZEN<FLOAT[n]>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes I'm putting my preference (1) first ;) because (banging on) 
>>>>>>>>>>>>> if the future of CQL will have FLOAT[n] and FROZEN<FLOAT[n]>, 
>>>>>>>>>>>>> where the VECTOR keyword is: for general cql users; just meaning 
>>>>>>>>>>>>> "non-null and frozen", these gel best together.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Options (5) and (6) are for those that feel we can and should 
>>>>>>>>>>>>> provide this type without introducing the vector keyword.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>>  <https://www.datastax.com/>
>>>>>>>>>>> Mike Adamson
>>>>>>>>>>> Engineering
>>>>>>>>>>> +1 650 389 6000 <tel:16503896000> | datastax.com 
>>>>>>>>>>> <https://www.datastax.com/>
>>>>>>>>>>> Find DataStax Online:
>>>>>>>>>>>  
>>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
>>>>>>>>>>>     
>>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
>>>>>>>>>>>     <https://twitter.com/DataStax>    
>>>>>>>>>>> <https://www.datastax.com/blog/rss.xml>    
>>>>>>>>>>> <https://github.com/datastax>
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> +---------------------------------------------------------------+
>>>>>>>> | Derek Chen-Becker                                             |
>>>>>>>> | GPG Key available at https://keybase.io/dchenbecker 
>>>>>>>> <https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nLBpa-Vg$>
>>>>>>>>  and       |
>>>>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org 
>>>>>>>> <https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nkqpt2mA$>
>>>>>>>>  |
>>>>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>>>>>> +---------------------------------------------------------------+
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> +---------------------------------------------------------------+
>>>>>> | Derek Chen-Becker                                             |
>>>>>> | GPG Key available at https://keybase.io/dchenbecker 
>>>>>> <https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nLBpa-Vg$>
>>>>>>  and       |
>>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org 
>>>>>> <https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nkqpt2mA$>
>>>>>>  |
>>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>>>> +---------------------------------------------------------------+
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>>  <https://www.datastax.com/>      Mike Adamson
>>>>> Engineering
>>>>> 
>>>>> +1 650 389 6000 <tel:16503896000> | datastax.com 
>>>>> <https://www.datastax.com/>
>>>>> Find DataStax Online:      
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
>>>>>     
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
>>>>>     <https://twitter.com/DataStax>    
>>>>> <https://www.datastax.com/blog/rss.xml>    <https://github.com/datastax>
>>>>> 
>>>> 
>> 
> 

Reply via email to