Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Henrik Ingo
By my superficial reading I get the impression that the main distinction is that vectors don't need to support random access into a single element/float. I haven't looked at what Jonathan is doing, but I assume, and it seems Jonathan assumes or knows that this makes implementation both easier and a

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Patrick McFadin
> > So is the goal here to provide something specific and idiomatic for the ML > community or is the goal to make a primitive that's C*-centric that then > another layer can write to? I personally argue for the former; I don't see > this specific data type going away any time soon. +1 on this con

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Benedict
I and others have claimed that an array concept will work, since it is isomorphic with a vector. I have seen the following counterclaims:1. Vectors don’t need to support index lookups2. Vectors don’t need to support ordered indexes3. Vectors don’t need to support other types besides floatNone of th

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Henrik Ingo
Benedict, I don't quite see why that matters? The argument is merely that this kind of vector, for this use case, a) is different from arrays, and b) arrays apparently don't serve the use case well enough (or at all). Now, if from the above it follows a discussion that a vector type cannot be a fi

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Benedict
pgvector is a plug-in. If you were proposing a plug-in you could ignore these considerations.On 28 Apr 2023, at 16:58, Jonathan Ellis wrote:I'm proposing a vector data type for ML use cases.  It's not the same thing as an array or a list and it's not supposed to be.While it's true that it would b

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Jonathan Ellis
I'm proposing a vector data type for ML use cases. It's not the same thing as an array or a list and it's not supposed to be. While it's true that it would be possible to build a vector type on top of an array type, it's not necessary to do it that way, and given the lack of interest in an array

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Benedict
But you’re proposing introducing a general purpose type - this isn’t an ML plug-in, it’s modifying the core language in a manner that makes targeting your workload easier. Which is fine, but that means you have to consider its impact on the general language, not just your target use case.On 28 Apr

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Jonathan Ellis
That's exactly right. In particular it makes no sense at all from an ML perspective to have vector types of anything other than numerics. And as I mentioned in the POC thread (but I did not mention here), float is overwhelmingly the most frequently used vector type, to the point that Pinecone (by

[ANNOUNCE] Apache Cassandra 3.11.15 test artifact available

2023-04-28 Thread Miklosovic, Stefan
The test build of Cassandra 3.11.15 is available. sha1: 6cdcf5e56a77cf40c251125d68856a614eccbc53 Git: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.11.15-tentative Maven Artifacts: https://repository.apache.org/content/repositories/orgapachecassandra-1287/org/apach

Re: [DISCUSS] CEP-29 CQL NOT Operator

2023-04-28 Thread Piotr Kołaczkowski
> It's easy for an inverted index to find matches efficiently, but not so easy > for it to find non-matches. Yes, I agree, it is not easy for an *index* to do that. But I think at least in SAI we could do that by using the index to find the matches, and, because they are always returned in the ro

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Benedict
This feature may be targeting ML users but it isn’t part of some “ML plug-in” it’s a general purpose type available to all users that happens to permit the use of ANN. So it needs to make sense in a general context, not just to ML users.I also doubt users will struggle with understanding an array o