That's fair comment. In this case I would be happy with any of your
suggestions although I would prefer that the datatype did not support
nulls.

On Thu, 4 May 2023 at 11:55, Benedict <bened...@apache.org> wrote:

> I would expect that the type of index would be specified anyway?
>
> I don’t think it’s good API design to have the field define the index you
> create - only to shape what is permitted.
>
> A HNSW index is very specific and should be asked for specifically, not
> implicitly, IMO.
>
> On 4 May 2023, at 11:47, Mike Adamson <madam...@datastax.com> wrote:
>
> 
>
>> For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N],
>> VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t
>> think VECTOR should be used to simply imply non-null, as this would be very
>> unintuitive. More logical would be NONNULL, if this is the only condition
>> being applied. Alternatively for arrays we could default to NONNULL and
>> later introduce NULLABLE if we want to permit nulls.
>>
>
> I have a small issue relating to not having a specific VECTOR tag on the
> data type. The driver behind adding this datatype is the hnsw index that is
> being added to consume this data. If we have a generic array datatype, what
> is the expectation going to be for users who create an index on it? The
> hnsw index will support only floats initially so we would have to reject
> any non-float arrays if an attempt was made to create an hnsw index on it.
> While there is no problem with doing this, there would be a problem if, in
> the future, we allow indexing in arrays in the same way that we index
> collections. In this case we would then need to have the user select what
> type of index they want at creation time.
>
> Can I add another proposal that we allow a VECTOR or DENSE (this is a well
> known term in the ML space) keyword that could be used when the array is
> going to be used for ML workloads. This would be optional and would
> function similarly to FROZEN in that it would limit the functionality of
> the array to ML usage.
>
> On Thu, 4 May 2023 at 09:45, Benedict <bened...@apache.org> wrote:
>
>> Hurrah for initial agreement.
>>
>> For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N],
>> VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t
>> think VECTOR should be used to simply imply non-null, as this would be very
>> unintuitive. More logical would be NONNULL, if this is the only condition
>> being applied. Alternatively for arrays we could default to NONNULL and
>> later introduce NULLABLE if we want to permit nulls.
>>
>> If the word vector is to be used it makes more sense to make it look like
>> a list, so VECTOR<FLOAT, N> as here the word VECTOR is clearly not
>> redundant.
>>
>> So, I vote:
>>
>> 1) (NON NULL) FLOAT[N]
>> 2) FLOAT[N]   (Non null by default)
>> 3) VECTOR<FLOAT, N>
>>
>>
>>
>> On 4 May 2023, at 08:52, Mick Semb Wever <m...@apache.org> wrote:
>>
>> 
>>
>>> Did we agree on a CQL syntax?
>>>
>>> I don’t believe there has been a pool on CQL syntax… my understanding
>>> reading all the threads is that there are ~4-5 options and non are -1ed, so
>>> believe we are waiting for majority rule on this?
>>>
>>
>>
>> Re-reading that thread, IIUC the valid choices remaining are…
>>
>> 1. VECTOR FLOAT[n]
>> 2. FLOAT VECTOR[n]
>> 3. VECTOR<FLOAT,n>
>> 4. VECTOR[n]<FLOAT>
>> 5. ARRAY<FLOAT, n>
>> 6. NON-NULL FROZEN<FLOAT[n]>
>>
>>
>> Yes I'm putting my preference (1) first ;) because (banging on) if the
>> future of CQL will have FLOAT[n] and FROZEN<FLOAT[n]>, where the VECTOR
>> keyword is: for general cql users; just meaning "non-null and frozen",
>> these gel best together.
>>
>> Options (5) and (6) are for those that feel we can and should provide
>> this type without introducing the vector keyword.
>>
>>
>>
>>
>
> --
> [image: DataStax Logo Square] <https://www.datastax.com/> *Mike Adamson*
> Engineering
>
> +1 650 389 6000 <16503896000> | datastax.com <https://www.datastax.com/>
> Find DataStax Online: [image: LinkedIn Logo]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
>    [image: Facebook Logo]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
>    [image: Twitter Logo] <https://twitter.com/DataStax>   [image: RSS
> Feed] <https://www.datastax.com/blog/rss.xml>   [image: Github Logo]
> <https://github.com/datastax>
>
>

-- 
[image: DataStax Logo Square] <https://www.datastax.com/> *Mike Adamson*
Engineering

+1 650 389 6000 <16503896000> | datastax.com <https://www.datastax.com/>
Find DataStax Online: [image: LinkedIn Logo]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
   [image: Facebook Logo]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
   [image: Twitter Logo] <https://twitter.com/DataStax>   [image: RSS Feed]
<https://www.datastax.com/blog/rss.xml>   [image: Github Logo]
<https://github.com/datastax>

Reply via email to