Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-22 Thread Jonathan Ellis
My guess is that I will be able to get this ready to upstream before the
rest of CEP-7 goes in, so it would make sense to me to roll it into that.

On Fri, Apr 21, 2023 at 5:34 PM Dinesh Joshi  wrote:

> Interesting proposal Jonathan. Will grok it over the weekend and play
> around with the branch.
>
> Do you intend to make this part of CEP-7 or as an incremental update to
> SAI once it is committed?
>
> On Apr 21, 2023, at 2:19 PM, Jonathan Ellis  wrote:
>
>
>
> *Happy Friday, everyone!Rich text formatting ahead, I've attached a PDF
> for those who prefer that.*
>
> *I propose adding approximate nearest neighbor (ANN) vector search
> capability to Apache Cassandra via storage-attached indexes (SAI). This is
> a medium-sized effort that will significantly enhance Cassandra’s
> functionality, particularly for AI use cases. This addition will not only
> provide a new and important feature for existing Cassandra users, but also
> attract new users to the community from the AI space, further expanding
> Cassandra’s reach and relevance.*
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *IntroductionVector search is a powerful document search technique that
> enables developers to quickly find relevant content within an extensive
> collection of documents, which is useful as a standalone technique, but it
> is particularly hot now because it significantly enhances the performance
> of LLMs.Vector search uses ML models to match the semantics of a question
> rather than just the words it contains, avoiding the classic false
> positives and false negatives associated with term-based search.
> Alessandro Benedetti gives some good examples in his excellent talk
> <https://www.youtube.com/watch?v=z-i8mOHAhlU>:You can
> search across any set of vectors, which are just ordered sets of numbers.
> In the context of natural language queries and document search, we are
> specifically concerned with a type of vector called an embedding.  An
> embedding is a high-dimensional vector that captures the underlying
> semantic relationships and contextual information of words or phrases.
> Embeddings are generated by ML models trained for this purpose; OpenAI
> provides an API to do this, but open-source and self-hostable models like
> BERT are also popular. Creating more accurate and smaller embeddings are
> active research areas in ML.Large language models (LLMs) can be described
> as a mile wide and an inch deep. They are not experts on any narrow domain
> (although they will hallucinate that they are, sometimes convincingly).
> You can remedy this by giving the LLM additional context for your query,
> but the context window is small (4k tokens for GPT-3.5, up to 32k for
> GPT-4), so you want to be very selective about giving the LLM the most
> relevant possible information.Vector search is red-hot now because it
> allows us to easily answer the question “what are the most relevant
> documents to provide as context” by performing a similarity search between
> the embeddings vector of the query, and those of your document universe.
> Doing exact search is prohibitively expensive, since you necessarily have
> to compare with each and every document; this is intractable when you have
> billions or trillions of docs.  However, there are well-understood
> algorithms for turning this into a logarithmic problem if you are willing
> to accept approximately the most similar documents.  This is the
> “approximate nearest neighbor” problem.  (You will see these referred to as
> kNN – k nearest neighbors – or ANN.)Pinecone DB has a good example of what
> this looks like in Python code
> <https://docs.pinecone.io/docs/gen-qa-openai>.Vector search is the
> foundation underlying effectively all of the AI applications that are
> launching now.  This is particularly relevant to Apache Cassandra users,
> who tend to manage the types of large datasets that benefit the most from
> fast similarity search. Adding vector search to Cassandra’s unique
> strengths of scale, reliability, and low latency, will further enhance its
> appeal and effectiveness for these users while also making it more
> attractive to newcomers looking to harness AI’s potential.  The faster we
> deliver vector search, the more valuable it will be for this expanding user
> base.Requirements 1. Perform vector search as outlined in the Pinecone
> example above1. Support Float32 embeddings in the form of a new DENSE
> FLOAT32 cql type1. This is also useful for “classic” ML applications that
> derive and serve their own feature vectors2. Add ANN (approximate nearest
> neighbor) search.2. Work with normal Cassandra data flow1. Inserting one
> row at

[DISCUSS] New data type for vector search

2023-04-26 Thread Jonathan Ellis
Hi all,

Splitting this out per the suggestion in the initial VS thread so we can
work on driver support in parallel with the server-side changes.

I propose adding a new data type for vector search indexes:

FLOAT VECTOR[N_DIMENSIONS]

In the initial commits and thread, this was DENSE FLOAT32. Nobody really
loved that, so we considered a bunch of alternatives, including

- `FLOAT[N]`: This minimal option resembles C and Java array syntax, which
would make it familiar for many users. However, this syntax raises the
question of why arrays cannot be created for other types.  Additionally,
the expectation for an array is to provide random access to its contents,
which is not supported for vectors.
- `DENSE FLOAT[N]`: This option clarifies that we are supporting dense
vectors, not sparse ones. However, since Lucene had sparse vector support
in the past but removed it for lack of compelling use cases, it is unlikely
that it will be added back, making the "DENSE" qualifier less relevant.
- `DENSE FLOAT VECTOR[N]`: This is the most verbose option and aligns with
the CQL/SQL spirit. However, the "DENSE" qualifier is unnecessary for the
reasons mentioned above.
- `VECTOR FLOAT[N]`: This option omits the "DENSE" qualifier, but has a
less natural word order.
`VECTOR`: This follows the syntax of our Collections, but again
this would imply that random access is supported, which we want to avoid
doing.
- `VECTOR[N]`: This syntax is not very clear about the vector's contents
and could make it difficult to add other vector types, such as byte vectors
(already supported by Lucene), in the future.

Finally, the original qualifier of 32 in `FLOAT32` was intended to allow
consistency if we add other float types like FLOAT16 or FLOAT64, both of
which are sometimes used in ML. However, we already have a CQL data type
for a 64-bit float (`DOUBLE`), so it would make more sense to add future
variants (which remain hypothetical at this point) along that line instead.

Thus, we believe that `FLOAT VECTOR[N_DIMENSIONS]` provides the best
balance of clarity, conciseness, and extensibility. It is more natural in
its word order than the original proposal and avoids unnecessary
qualifiers, while still being clear about the data type it represents.
Finally, this syntax is straighforwardly extensible should we choose to
support other vector types in the future.

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] New data type for vector search

2023-04-27 Thread Jonathan Ellis
It's been a while, so I may be missing something, but do we already have
fixed-size lists?  If not, I don't see why we'd try to make this fit into a
List-shaped problem.

A tuple would be a better fit from that perspective, but as you point out
it has the problem of allowing nulls.

The key thing about a vector is that unlike lists or tuples you really
don't care about individual elements, you care about doing vector and
matrix multiplications with the thing as a unit.  That's the key reason
that it makes more sense to me as a separate type.

(Maybe this is making the case for VECTOR FLOAT[N] rather than FLOAT
VECTOR[N].)


On Wed, Apr 26, 2023 at 4:31 PM Andrés de la Peña 
wrote:

> If we are going to use FLOAT[N] as sugar for another CQL data type, maybe
> tuples are more convenient than lists. So FLOAT[N] could be equivalent to
> TUPLE.
>
> Differently to collections, tuples have a fixed size, they are always
> frozen and I think they don't support random access. These properties seem
> desirable for vectors.
>
> Tuples however support null values, whereas collections doesn't. I mean,
> you can remove elements from a collection, but I think you are never going
> to see an explicit null in the collection. Tuples don't allow to remove a
> value, but the entire tuple can be written with null values. Like in INSERT
> INTO t (key, tuple) VALUES (0,  (1, null, 3)).
>
> On Wed, 26 Apr 2023 at 21:53, Mick Semb Wever  wrote:
>
>> My inclination then would be to say you declare an ARRAY (which
>>> is semantic sugar for FROZEN>). This is very consistent with
>>> our existing style. We then simply permit such columns to define ANN
>>> indexes.
>>>
>>
>>
>> So long as nulls aren't a problem as David questions, an alternative is:
>>
>>  FLOAT[N] as semantic sugar for LIST
>>
>> And ANN requiring FROZEN
>>
>> Maybe taking a poll in a few days will be positive to keep this
>> moving forward.
>>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Jonathan Ellis
 semantic link between the frozen list and whatever we
> do here, it is fundamentally a frozen list with a restriction on its size.
> What we’re defining here are “statically” sized arrays, whereas a frozen
> list is essentially a dynamically sized array.
>
> I do not think vector is a good name because vector is used in some other
> popular languages to mean a (dynamic) list, which is confusing when we also
> have a list concept.
>
> I’m fine with just using the FLOAT[N] syntax, and drawing no direct link
> with list. Though it is a bit strange that this particular type declaration
> looks so different to other collection types.
>
> On 27 Apr 2023, at 16:48, Jeff Jirsa  wrote:
>
> 
>
>
> On Thu, Apr 27, 2023 at 7:39 AM Jonathan Ellis  wrote:
>
> It's been a while, so I may be missing something, but do we already have
> fixed-size lists?  If not, I don't see why we'd try to make this fit into a
> List-shaped problem.
>
>
> We do not. The proposal got closed as wont-fix
> https://issues.apache.org/jira/browse/CASSANDRA-9110
>
>
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Jonathan Ellis
I'm proposing a vector data type for ML use cases.  It's not the same thing
as an array or a list and it's not supposed to be.

While it's true that it would be possible to build a vector type on top of
an array type, it's not necessary to do it that way, and given the lack of
interest in an array type for its own sake I don't see why we would want to
make that a requirement.

It's relevant that pgvector, which among the systems offering vector search
is based on the most similar system to Cassandra in terms of its query
language, adds a vector data type that only supports floats *even though
postgresql already has an array data type* because the semantics are
different.  Random access doesn't make sense, string and collection and
other datatypes don't make sense, typical ordered indexes don't make sense,
etc.  It's just a different beast from arrays, for a different use case.

On Fri, Apr 28, 2023 at 10:40 AM Benedict  wrote:

> But you’re proposing introducing a general purpose type - this isn’t an ML
> plug-in, it’s modifying the core language in a manner that makes targeting
> your workload easier. Which is fine, but that means you have to consider
> its impact on the general language, not just your target use case.
>
> On 28 Apr 2023, at 16:29, Jonathan Ellis  wrote:
>
> 
> That's exactly right.
>
> In particular it makes no sense at all from an ML perspective to have
> vector types of anything other than numerics.  And as I mentioned in the
> POC thread (but I did not mention here), float is overwhelmingly the most
> frequently used vector type, to the point that Pinecone (by far the most
> popular vector search engine) ONLY supports that type.
>
> Lucene and Elastic also add support for vectors of bytes (8-bit ints),
> which are useful for optimizing models that you have already built with
> floats, but we have no reasonable path towards supporting indexing and
> searches against any other vector type.
>
> So in order of what makes sense to me:
>
> 1. Add a vector type for just floats; consider adding bytes later if
> demand materializes. This gives us 99% of the value and limits the scope so
> we can deliver quickly.
>
> 2. Add a vector type for floats or bytes. This gives us another 1% of
> value in exchange for an extra 20% or so of effort.
>
> 3. Add a vector type for all numeric primitives, but you can only index
> floats and bytes.  I think this is confusing to users and a bad idea.
>
> 4. Add a vector type that composes with all Cassandra types.  I can't see
> a reason to do this, nobody wants it, and we killed the most similar
> proposal in the past as wontfix.
>
> On Thu, Apr 27, 2023 at 7:49 PM Josh McKenzie 
> wrote:
>
>> From a machine learning perspective, vectors are a well-known concept
>> that are effectively immutable fixed-length n-dimensional values that are
>> then later used either as part of a model or in conjunction with a model
>> after the fact.
>>
>> While we could have this be non-frozen and not call it a vector, I'd be
>> inclined to still make the argument for a layer of syntactic sugar on top
>> that met ML users where they were with concepts they understood rather than
>> forcing them through the cognitive lift of figuring out the Cassandra
>> specific contortions to replicate something that's ubiquitous in their
>> space. We did the same "Cassandra-first" approach with our JSON support and
>> that didn't do us any favors in terms of adoption and usage as far as I
>> know.
>>
>> So is the goal here to provide something specific and idiomatic for the
>> ML community or is the goal to make a primitive that's C*-centric that then
>> another layer can write to? I personally argue for the former; I don't see
>> this specific data type going away any time soon.
>>
>> On Thu, Apr 27, 2023, at 12:39 PM, David Capwell wrote:
>>
>> but as you point out it has the problem of allowing nulls.
>>
>>
>> If nulls are not allowed for the elements, then either we need  a) a new
>> type, or b) add some way to say elements may not be null…. As much as I do
>> like b, I am leaning towards new type for this use case.
>>
>> So, to flesh out the type requirements I have seen so far
>>
>> 1) represents a fixed size array of element type
>> * on write path we will need to validate this
>> 2) element may not be null
>> * on write path we will need to validate this
>> 3) “frozen” (is this really a requirement for the type or is this
>> just simpler for the ANN work?  I feel that this shouldn’t be a requirement)
>> 4) works for all types (my requirement; origin

Re: [DISCUSS] New data type for vector search

2023-05-02 Thread Jonathan Ellis
versioning…
>
> Honestly I wanted to better understand the cost to be generic and the
> impact to ANN, so I took
> https://github.com/jbellis/cassandra/blob/vsearch/src/java/org/apache/cassandra/db/marshal/VectorType.java
> and made it handle every requirement I have listed so far (size, null, all
> types)… the current patch has several bugs at the type level that would
> need to be fixed, so had to fix those as well…. Total time to do this was
> 10 minutes… and this includes adding a method "public float[]
> composeAsFloats(ByteBuffer bytes)” which made the change to existing logic
> small (change VectorType.Serializer.instance.deserialize(buffer) to
> type.composeAsFloats(buffer))….
>
> Did this have any impact to the final ByteBuffer?  Nope, it had identical
> layout for the FloatType case, but works for all types…. I didn’t change
> the fact we store the size (felt this could be removed, but then we could
> never support expanding the vector in the future…)
>
> So, given the fact it takes a few minutes to implement all these
> requirements, I do find it very reasonable to push back and say we should
> make sure the new type is not leaking details from a special ANN index…. We
> have spent more time debating this than it takes to support… we also have
> fuzz testing on trunk so just updating
> org.apache.cassandra.utils.AbstractTypeGenerators to know about this new
> type means we get type coverage as well…
>
> I have zero issues helping to review this patch and make sure the testing
> is on-par with existing types (this is a strong requirement for me)
>
>
> > On May 1, 2023, at 10:40 AM, Mick Semb Wever  wrote:
> >
> >
> > > But suggesting that Jonathan should work on implementing general
> purpose arrays seems to fall outside the scope of this discussion, since
> the result of such work wouldn't even fill the need Jonathan is targeting
> for here.
> >
> > Every comment I have made so far I have argued that the v1 work doesn’t
> need to do some things, but that the limitations proposed so far are not
> real requirements; there is a big difference between what “could be
> allowed” and what is implemented day one… I am pushing back on what “could
> be allowed”, so far every justification has been that it slows down the ANN
> work…
> >
> > Simple examples of this already exists in C* (every example could be
> enhanced logically, we just have yet to put in the work)
> >
> > * updating a element of a list is only allowed for multi-cell
> > * appending to a list is only allowed for multi-cell
> > * etc.
> >
> > By saying that the type "shall not support", you actively block future
> work and future possibilities...
> >
> >
> >
> > I am coming around strongly to the `VECTOR FLOAT[n]` option.
> >
> > This gives Jonathan the simplest path right now with ths ANN work, while
> also ensuring the CQL API gets the best future potential.
> >
> > With `VECTOR FLOAT[n]` the 'vector' is the ml sugar that means non-null
> and frozen, and that allows both today and in the future, as desired, for
> its implementation to be entirely different to `FLOAT[n]`.  This addresses
> a number of people's concerns that we meet ML's idioms head on.
> >
> > IMHO it feels like it will fit into the ideal future CQL , where all
> `primitive[N]` are implemented, and where we have VECTOR FLOAT[n] (and
> maybe VECTOR BYTE[n]). This will also permit in the future
> `FROZEN` if we wanted nulls in frozen arrays.
> >
> > I think it is totally reasonable that the ANN patch (and Jonathan) is
> not asked to implement on top of, or towards, other array (or other) new
> data types.
> >
> > I also think it is correct that we think about the evolution of CQL's
> API,  and how it might exist in the future when we have both ml vectors and
> general use arrays.
>
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


[POLL] Vector type for ML

2023-05-02 Thread Jonathan Ellis
Should we add a vector type to Cassandra designed to meet the needs of
machine learning use cases, specifically feature and embedding vectors for
training, inference, and vector search?

ML vectors are fixed-dimension (fixed-length) sequences of numeric types,
with no nulls allowed, and with no need for random access. The ML industry
overwhelmingly uses float32 vectors, to the point that the industry-leading
special-purpose vector database ONLY supports that data type.

This poll is to gauge consensus subsequent to the recent discussion thread
at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.

Please rank the discussed options from most preferred option to least,
e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B
= A (C is my preference, followed by B or A approximately equally.)

(A) I am in favor of adding a vector type for floats; I do not believe we
need to tie it to any particular implementation details.

(B) I am okay with adding a vector type but I believe we must add array
types that compose with all Cassandra types first, and make vectors a
special case of arrays-without-null-elements.

(C) I am not in favor of adding a built-in vector type.

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [POLL] Vector type for ML

2023-05-02 Thread Jonathan Ellis
My preference: A > B > C.  Vectors are distinct enough from arrays that we
should not make adding the latter a prerequisite for adding the former.

On Tue, May 2, 2023 at 10:13 AM Jonathan Ellis  wrote:

> Should we add a vector type to Cassandra designed to meet the needs of
> machine learning use cases, specifically feature and embedding vectors for
> training, inference, and vector search?
>
> ML vectors are fixed-dimension (fixed-length) sequences of numeric types,
> with no nulls allowed, and with no need for random access. The ML industry
> overwhelmingly uses float32 vectors, to the point that the industry-leading
> special-purpose vector database ONLY supports that data type.
>
> This poll is to gauge consensus subsequent to the recent discussion thread
> at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>
> Please rank the discussed options from most preferred option to least,
> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B
> = A (C is my preference, followed by B or A approximately equally.)
>
> (A) I am in favor of adding a vector type for floats; I do not believe we
> need to tie it to any particular implementation details.
>
> (B) I am okay with adding a vector type but I believe we must add array
> types that compose with all Cassandra types first, and make vectors a
> special case of arrays-without-null-elements.
>
> (C) I am not in favor of adding a built-in vector type.
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [POLL] Vector type for ML

2023-05-02 Thread Jonathan Ellis
t;>
>> 
>> A > B > C on both polls.
>>
>> Having talked to several users in the community that are highly excited
>> about this change, this gets to what developers want to do at Cassandra
>> scale: store embeddings and retrieve them.
>>
>> On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña 
>> wrote:
>>
>>> A > B > C
>>>
>>> I don't think that ML is such a niche application that it can't have its
>>> own CQL data type. Also, vectors are mathematical elements that have more
>>> applications that ML.
>>>
>>> On Tue, 2 May 2023 at 19:15, Mick Semb Wever  wrote:
>>>
>>>>
>>>>
>>>> On Tue, 2 May 2023 at 17:14, Jonathan Ellis  wrote:
>>>>
>>>>> Should we add a vector type to Cassandra designed to meet the needs of
>>>>> machine learning use cases, specifically feature and embedding vectors for
>>>>> training, inference, and vector search?
>>>>>
>>>>> ML vectors are fixed-dimension (fixed-length) sequences of numeric
>>>>> types, with no nulls allowed, and with no need for random access. The ML
>>>>> industry overwhelmingly uses float32 vectors, to the point that the
>>>>> industry-leading special-purpose vector database ONLY supports that data
>>>>> type.
>>>>>
>>>>> This poll is to gauge consensus subsequent to the recent discussion
>>>>> thread at
>>>>> https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>>>>>
>>>>> Please rank the discussed options from most preferred option to least,
>>>>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > 
>>>>> B
>>>>> = A (C is my preference, followed by B or A approximately equally.)
>>>>>
>>>>> (A) I am in favor of adding a vector type for floats; I do not believe
>>>>> we need to tie it to any particular implementation details.
>>>>>
>>>>> (B) I am okay with adding a vector type but I believe we must add
>>>>> array types that compose with all Cassandra types first, and make vectors 
>>>>> a
>>>>> special case of arrays-without-null-elements.
>>>>>
>>>>> (C) I am not in favor of adding a built-in vector type.
>>>>>
>>>>
>>>>
>>>>
>>>> A  > B > C
>>>>
>>>> B is stated as "must add array types…".  I think this is a bit loaded.
>>>> If B was the (A + the implementation needs to be a non-null frozen float32
>>>> array, serialisation forward compatible with other frozen arrays later
>>>> implemented) I would put this before (A).  Especially because it's been
>>>> shown already this is easy to implement.
>>>>
>>>>
>>>>
>>>
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [POLL] Vector type for ML

2023-05-05 Thread Jonathan Ellis
r not, I think having the VECTOR keyword helps signify what
>> the app is generally about and helps get buy-in from ML stakeholders.
>>
>> On Thu, May 4, 2023 at 3:45 AM Benedict  wrote:
>>
>>
>> Hurrah for initial agreement.
>>
>> For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N],
>> VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t
>> think VECTOR should be used to simply imply non-null, as this would be very
>> unintuitive. More logical would be NONNULL, if this is the only condition
>> being applied. Alternatively for arrays we could default to NONNULL and
>> later introduce NULLABLE if we want to permit nulls.
>>
>> If the word vector is to be used it makes more sense to make it look like
>> a list, so VECTOR as here the word VECTOR is clearly not
>> redundant.
>>
>> So, I vote:
>>
>> 1) (NON NULL) FLOAT[N]
>> 2) FLOAT[N]   (Non null by default)
>> 3) VECTOR
>>
>>
>>
>> On 4 May 2023, at 08:52, Mick Semb Wever  wrote:
>>
>> 
>>
>>
>> Did we agree on a CQL syntax?
>>
>> I don’t believe there has been a pool on CQL syntax… my understanding
>> reading all the threads is that there are ~4-5 options and non are -1ed, so
>> believe we are waiting for majority rule on this?
>>
>>
>>
>>
>> Re-reading that thread, IIUC the valid choices remaining are…
>>
>> 1. VECTOR FLOAT[n]
>> 2. FLOAT VECTOR[n]
>> 3. VECTOR
>> 4. VECTOR[n]
>> 5. ARRAY
>> 6. NON-NULL FROZEN
>>
>>
>> Yes I'm putting my preference (1) first ;) because (banging on) if the
>> future of CQL will have FLOAT[n] and FROZEN, where the VECTOR
>> keyword is: for general cql users; just meaning "non-null and frozen",
>> these gel best together.
>>
>> Options (5) and (6) are for those that feel we can and should provide
>> this type without introducing the vector keyword.
>>
>>
>>
>>
>>
>> --
>> [image: DataStax Logo Square] <https://www.datastax.com/>
>> *Mike Adamson*
>> Engineering
>> +1 650 389 6000 <16503896000> | datastax.com <https://www.datastax.com/>
>> Find DataStax Online:
>> [image: LinkedIn Logo]
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
>>[image: Facebook Logo]
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
>>[image: Twitter Logo] <https://twitter.com/DataStax>   [image: RSS
>> Feed] <https://www.datastax.com/blog/rss.xml>   [image: Github Logo]
>> <https://github.com/datastax>
>>
>>
>>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [POLL] Vector type for ML

2023-05-05 Thread Jonathan Ellis
Sparse vector in ML has the semantics that elements not explicitly set are
zero.  I believe most (all?) sparse vector implementations use a map under
the hood; the point is to save a lot of space when you have 10K zeros and
100 that are nonzero.

On Fri, May 5, 2023 at 2:00 PM David Capwell  wrote:

> If we ever add sparse vectors, we can assume that DENSE is the default and
> allow to use either DENSE, SPARSE or nothing.
>
>
> I have been feeling that sparse is just a fixed size list with nulls… so
> array… if you insert {0: 42, 3: 17} then you get a array
> of [42, null, null, 17]?  One negative doing this is any operator/function
> that needs to reify large vectors (lets say 10k elements) you have a ton
> of memory due to us making it a array… so a new type could be used to lower
> this cost…
>
> With DENSE VECTOR we have the syntax in place that we “could” add SPARSE
> later… With VECTOR we will have complications adding a sparse vector after
> the fact due to this implying DENSE…
>
> Updated ranking
>
> *Syntax*
>
> *Score*
>
> VECTOR
>
> 21
>
> DENSE VECTOR
>
> 12
>
> type[dimension]
>
> 10
>
> NON NULL [dimention]
>
> 8
>
> VECTOR type[n]
>
> 5
>
> DENSE_VECTOR
>
> 4
>
> NON-NULL FROZEN
>
> 3
>
> ARRAY
>
> 1
>
> *Syntax*
>
> *Round 1*
>
> *Round 2*
>
> VECTOR
>
> 4
>
> 4
>
> DENSE VECTOR
>
> 2
>
> 3
>
> NON NULL [dimention]
>
> 2
>
> 1
>
> VECTOR type[n]
>
> 1
>
>
> type[dimension]
>
> 1
>
>
> DENSE_VECTOR
>
> 1
>
>
> NON-NULL FROZEN
>
> 1
>
>
> ARRAY
>
> 0
>
>
>
> VECTOR is still in the lead…
>
> On May 5, 2023, at 11:40 AM, Andrés de la Peña 
> wrote:
>
> My vote is:
>
> 1. VECTOR
> 2. DENSE VECTOR
> 3. type[dimension]
>
> If we ever add sparse vectors, we can assume that DENSE is the default and
> allow to use either DENSE, SPARSE or nothing.
>
> Perhaps the dimension could be separated from the type, such as in
> VECTOR[dimension] or VECTOR(dimension).
>
> On Fri, 5 May 2023 at 19:05, David Capwell  wrote:
>
>> ...where, just to be clear, VECTOR means a frozen fixed
>>> size array w/ no null values?
>>>
>> Assuming this is the case
>>
>>
>> The current agreed requirements are:
>>
>> 1) non-null elements
>> 2) fixed length
>> 3) frozen
>>
>> You pointed out 3 isn’t actually required, but that would be a different
>> conversation to remove =)… maybe defer this to JIRA as long as all parties
>> agree in the ticket?
>>
>> With all votes in, this is what I see
>>
>> *Syntax*
>> *Jonathan Ellis*
>> *David Capwell*
>> *Josh McKenzie*
>> *Caleb Rackliffe*
>> *Patrick McFadin*
>> *Brandon Williams*
>> *Mike Adamson*
>> *Benedict*
>> *Mick Semb Wever*
>> *Derek Chen-Becker*
>> VECTOR
>> 1
>> 2
>> 2
>>
>> 2
>> 1
>> 1
>> 3
>> 2
>>
>> DENSE VECTOR
>> 2
>> 1
>>
>>
>> 1
>>
>> 2
>>
>>
>>
>> type[dimension]
>> 3
>> 3
>> 3
>> 1
>>
>> 3
>>
>> 2
>>
>>
>> DENSE_VECTOR
>>
>>
>> 1
>>
>>
>>
>>
>>
>>
>> 3
>> NON NULL [dimention]
>>
>> 1
>>
>>
>>
>>
>>
>> 1
>>
>> 2
>> VECTOR type[n]
>>
>>
>>
>>
>>
>> 2
>>
>>
>> 1
>>
>> ARRAY
>>
>>
>>
>>
>> 3
>>
>>
>>
>>
>>
>> NON-NULL FROZEN
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 1
>>
>> *Rank*
>> *Weight*
>> *1*
>> 3
>> *2*
>> 2
>> *3*
>> 1
>> *?*
>> 3
>>
>> *Syntax*
>> *Score*
>> VECTOR
>> 18
>> DENSE VECTOR
>> 10
>> type[dimension]
>> 9
>> NON NULL [dimention]
>> 8
>> VECTOR type[n]
>> 5
>> DENSE_VECTOR
>> 4
>> NON-NULL FROZEN
>> 3
>> ARRAY
>> 1
>>
>>
>> *Syntax*
>> *Round 1*
>> *Round 2*
>> VECTOR
>> 3
>> 4
>> DENSE VECTOR
>> 2
>> 2
>> NON NULL [dimention]
>> 2
>> 1
>> VECTOR type[n]
>> 1
>>
>> type[dimension]
>> 1
>>
>> DENSE_VECTOR
>> 1
>>
>> NON-NULL FROZEN
>> 1
>>
>> ARRAY
>> 0
>>
>&

CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread Jonathan Ellis
Hi all,

Following the recent discussion threads, I would like to propose CEP-30 to
add Approximate Nearest Neighbor (ANN) Vector Search via Storage-Attached
Indexes (SAI) to Apache Cassandra.

The primary goal of this proposal is to implement ANN vector search
capabilities, making Cassandra more useful to AI developers and
organizations managing large datasets that can benefit from fast similarity
search.

The implementation will leverage Lucene's Hierarchical Navigable Small
World (HNSW) library and introduce a new CQL data type for vector
embeddings, a new SAI index for ANN search functionality, and a new CQL
operator for performing ANN search queries.

We are targeting the 5.0 release for this feature, in conjunction with the
release of SAI. The proposed changes will maintain compatibility with
existing Cassandra functionality and compose well with the already-approved
SAI features.

Please find the full CEP document here:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Jonathan Ellis
+1

On Tue, May 9, 2023 at 12:04 PM Piotr Kołaczkowski 
wrote:

> Let's vote.
>
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
>
> Piotr Kołaczkowski
> e. pkola...@datastax.com
> w. www.datastax.com
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Jonathan Ellis
+1 for this, especially in the long term.  CREATE INDEX should do the right
thing for most people without requiring extra ceremony.

On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan 
wrote:

> If the consensus is that SAI is the right default index, then we should
> just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.
>
>
> On May 9, 2023, at 4:44 PM, Caleb Rackliffe 
> wrote:
>
> Earlier today, Mick started a thread on the future of our index creation
> DDL on Slack:
>
> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>
> At the moment, there are two ways to create a secondary index.
>
> *1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()*
>
> This creates an optionally named legacy 2i on the provided table and
> column.
>
> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>
> *2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  ()
> USING  [WITH OPTIONS = ]*
>
> This creates a secondary index on the provided table and column using the
> specified 2i implementation class and (optional) parameters.
>
> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
> 'StorageAttachedIndex'
>
> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
> shorthand for the fully-qualified class name, which is also valid.)
>
> So what is there to discuss?
>
> The concern Mick raised is...
>
> "...just folk continuing to use CREATE INDEX  because they think CREATE
> CUSTOM INDEX is advanced (or just don't know of it), and we leave users
> doing 2i (when they think they are, and/or we definitely want them to be,
> using SAI)"
>
> To paraphrase, we want people to use SAI once it's available where
> possible, and the default behavior of CREATE INDEX could be at odds w/
> that.
>
> The proposal we seem to have landed on is something like the following:
>
> For 5.0:
>
> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>
> (Note: How this would interact w/ the existing secondary_indexes_enabled
> YAML options isn't clear yet.)
>
> Post-5.0:
>
> 1.) Deprecate and eventually remove SASI when SAI hits full feature parity
> w/ it.
> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a
> hybrid between the two. For example, CREATE INDEX...USING...WITH. This
> would both be flexible enough to accommodate index implementation selection
> and prescriptive enough to force the user to make a decision (and wouldn't
> change the legacy behavior of the existing CREATE INDEX). In this world,
> creating a legacy 2i might look something like CREATE INDEX...USING
> `legacy`.
> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>
> Eventually we would have a single enabled DDL statement for index creation
> that would be minimal but also explicit/able to handle some evolution.
>
> What does everyone think?
>
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Jonathan Ellis
On Fri, May 12, 2023 at 1:39 PM Caleb Rackliffe 
wrote:

> [POLL] Centralize existing syntax or create new syntax?
>

1 (Existing)

[POLL] Should there be a default? (YES/NO)
>

YES


> [POLL] What do do with the default?
>

1 (Default SAI)


Vector search demo, and query syntax

2023-05-22 Thread Jonathan Ellis
*Hi all,I have a branch of vector search based on cep-7-sai at
https://github.com/datastax/cassandra/tree/cep-vsearch
<https://github.com/datastax/cassandra/tree/cep-vsearch>. Compared to the
original POC branch, this one is based on the SAI code that will be
mainline soon, and handles distributed scatter/gather.  Updates and deletes
to vector values are still not supported.I also put together a demo that
uses this branch to provide context to OpenAI’s GPT, available here:
https://github.com/jbellis/cassgpt
<https://github.com/jbellis/cassgpt>.  Here is the query that gets
executed:SELECT id, start, end, text FROM
{self.keyspace}.{self.table} WHERE embedding ANN OF %s LIMIT %sThe
more I used the proposed `ANN OF` syntax, the less I liked it.  This is
because we don’t want an actual boolean predicate; we just want to order
results.  Put another way, `ANN OF` will include all rows of the table
given a high enough `LIMIT`, and that makes it a bad fit for expression
processing that expects to be able to filter out rows before it starts
LIMIT-ing.  And in fact the code to support executing the query looks
suspiciously like what you’d want for `ORDER BY`.I propose that we adopt
`ORDER BY` syntax, supporting it for vector indexes first and eventually
for all SAI indexes.  So this query would becomeSELECT id, start, end,
text FROM {self.keyspace}.{self.table} ORDER BY embedding ANN OF
%s LIMIT %sAnd it would compose with other SAI indexes with syntax
likeSELECT id, start, end, text FROM
{self.keyspace}.{self.table} WHERE publish_date > %sORDER BY
embedding ANN OF %s LIMIT %sRelated work:This is similar to the
approach used by pgvector, except they invented the symbolic operator `<->`
that has the same semantics as `ANN OF`.  I am okay with adopting their
operator, but I think ANN OF is more readable.*
-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Vector search demo, and query syntax

2023-05-23 Thread Jonathan Ellis
Yes, that's totally reasonable syntactically, but I'd prefer not to open
the can of worms of ordering by some functions but not others (and I
definitely don't want to try to tackle ordering by all functions).  "You
can order by expressions involving SAI columns" is a pretty easy rule to
explain.

On Tue, May 23, 2023 at 12:53 PM David Capwell  wrote:

> I am ok with the syntax, but wondering if a function maybe better than a
> CQL change?
>
> SELECT id, start, end, text
> FROM {self.keyspace}.{self.table}
> ORDER BY ANN(embedding, ?)
> LIMIT ?
>
> Not really a common syntax, but could be useful down the line
>
> On May 23, 2023, at 12:37 AM, Mick Semb Wever  wrote:
>
>
>> *I propose that we adopt `ORDER BY` syntax, supporting it for vector
>> indexes first and eventually for all SAI indexes.  So this query would
>> becomeSELECT id, start, end, text FROM
>> {self.keyspace}.{self.table} ORDER BY embedding ANN OF %s LIMIT %s*
>>
>
>
> LGTM.
>
> I first stumbled a bit with "there's no where clause and no filtering
> allowed…"
>
> But I doubt that reaction from any experienced cql user will last more
> than a moment.
>
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Jonathan Ellis
Hi all,

We've been using agrona for almost a year and it's a huge improvement over
boxing everything.  But it's limited to single thread use cases.

Fastutil is an alternative that has a concurrent wrapper:
https://github.com/vigna/fastutil
https://github.com/trivago/fastutil-concurrent-wrapper

Both fastutil and the concurrent wrapper are actively maintained.  The
authors of the wrapper say they evaluated fastutil vs agrona and built
their wrapper for fastutil because it's faster at reads and writes:
https://tech.trivago.com/post/2022-03-09-why-and-how-we-use-primitive-maps

Any objections to adding the concurrent wrapper and switching out agrona
for fastutil?

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Jonathan Ellis
Try it out and see, the only data point I have is that the company who has
spent more effort here than anyone else I could find likes fastutil better.

On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:

> > On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
> >
> > Any objections to adding the concurrent wrapper and switching out agrona
> for fastutil?
>
> How does fastutil compare to agrona in terms of memory profile and runtime
> performance? How invasive would it be to switch?



-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


[VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Jonathan Ellis
Let's make this official.

CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes

POC that demonstrates all the big rocks, including distributed queries:
https://github.com/datastax/cassandra/tree/cep-vsearch

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Jonathan Ellis
There's about a dozen uses of agrona so far, plus a few more in test code,
almost all of which are SAI.  Porting over won't be hard.

On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:

> > On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
> >
> > Any objections to adding the concurrent wrapper and switching out agrona
> for fastutil?
>
> How does fastutil compare to agrona in terms of memory profile and runtime
> performance? How invasive would it be to switch?



-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Jonathan Ellis
Let's not fall prey to status quo bias, nobody performed an exhaustive
analysis of agrona in November.  If Branimir had proposed fastutils at the
time that's what we'd be using today.



On Thu, May 25, 2023 at 10:50 AM Benedict  wrote:

> Given they provide no data or explanation, and that benchmarking is hard,
> I’m not inclined to give much weight to their analysis.
>
> Agrona was favoured in large part due to the perceived quality of the
> library. I’m not inclined to swap it out without proper evidence the
> fastutils is both materially faster in a manner care about and of similar
> quality.
>
> On 25 May 2023, at 16:43, Jonathan Ellis  wrote:
>
> 
> Try it out and see, the only data point I have is that the company who has
> spent more effort here than anyone else I could find likes fastutil better.
>
> On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:
>
>> > On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
>> >
>> > Any objections to adding the concurrent wrapper and switching out
>> agrona for fastutil?
>>
>> How does fastutil compare to agrona in terms of memory profile and
>> runtime performance? How invasive would it be to switch?
>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Jonathan Ellis
Fair enough.  Yes, my thought was if we're going to use fastutils
concurrent we might as well use them for single threaded use cases rather
than having both floating around, but, if we're in love with Agrona buffers
I'm fine with both.

On Thu, May 25, 2023 at 11:29 AM David Capwell  wrote:

> Agrona isn’t going anywhere due to the library being more than basic
> collections.
>
> Now, with regard to single-threaded collections… honestly I dislike Agrona
> as I always fight to avoid boxing; carrot was far better with this regard….
> Didn’t look at the fastutil versions to see if they are better here, but I
> do know I am personally not happy with Agrona primitive collections…
>
> I do believe the main motivator for this is that fastutil has a concurrent
> version of their collections, so you gain access to concurrent primitive
> collections; something we do not have today… Given the desire for
> concurrent primitive collections, I am cool with it.
>
> I’m not inclined to swap it out
>
>
> When it came to random testing libraries, I believe the stance taken
> before was that we should allow multiple versions and the best one will win
> eventually… so I am cool having the same stance for primitive collections...
>
> On May 25, 2023, at 8:50 AM, Benedict  wrote:
>
> Given they provide no data or explanation, and that benchmarking is hard,
> I’m not inclined to give much weight to their analysis.
>
> Agrona was favoured in large part due to the perceived quality of the
> library. I’m not inclined to swap it out without proper evidence the
> fastutils is both materially faster in a manner care about and of similar
> quality.
>
> On 25 May 2023, at 16:43, Jonathan Ellis  wrote:
>
> 
> Try it out and see, the only data point I have is that the company who has
> spent more effort here than anyone else I could find likes fastutil better.
>
> On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:
>
>> > On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
>> >
>> > Any objections to adding the concurrent wrapper and switching out
>> agrona for fastutil?
>>
>> How does fastutil compare to agrona in terms of memory profile and
>> runtime performance? How invasive would it be to switch?
>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-30 Thread Jonathan Ellis
Thanks, all.  Closing the vote as accepted with 8 binding +1 (including me)
and 11 non-binding votes.

On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis  wrote:

> Let's make this official.
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-30 Thread Jonathan Ellis
Thanks to Benjamin for pointing out to me that committer votes count as
binding for CEPs.

That makes the updated tally 15 binding and 4 non-binding.

On Tue, May 30, 2023 at 8:44 AM Jonathan Ellis  wrote:

> Thanks, all.  Closing the vote as accepted with 8 binding +1 (including
> me) and 11 non-binding votes.
>
> On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis  wrote:
>
>> Let's make this official.
>>
>> CEP:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>>
>> POC that demonstrates all the big rocks, including distributed queries:
>> https://github.com/datastax/cassandra/tree/cep-vsearch
>>
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
>>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Is simplenative in cassandra-stress still relevant?

2023-06-09 Thread Jonathan Ellis
andra-stress still relevant?
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> On Tue, May 30, 2023 at 7:15 PM Brad  bscho...@gmail.com>> wrote:
> > If you're performing stress testing, why would you not want to use the
> official driver?  I've spoken to several people who all have said they've
> never used simplenative mode.
>
> I agree that it shouldn't be used normally, but I'm not sure we should
> remove it, because we can't remove it fully: SimpleClient is still
> used in many tests, and I think that should continue.
>
> If you suspect any kind of native proto or driver issue it may be
> useful to have another implementation easily accessible to aid in
> debugging the problem, and the maintenance cost of keeping it in
> stress is roughly zero in my opinion.  We can make it clear that it's
> not recommended for use and is intended only as a debugging tool,
> though.
>
> Kind Regards,
> Brandon
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] CEP-30 ANN Vector Search

2023-06-16 Thread Jonathan Ellis
Correct.  They will be ordered closest-first.

Unfortunately it's not possible for the near or medium future to do
farthest-first.  HNSW index gets to log(n) time by only keeping a subset of
the closest neighbors for each vector.  So you'd need a separate index with
a inverse-cosine similarity metric, and it's not possible today to use a
custom metric function.

(This has been GA for over a year in Elastic and Solr and so far nobody has
needed farthest-first badly enough to add this as an option to the
underlying Lucene library.)

You can get the distances back today, like this:

SELECT my_text, similarity_cosine(my_embedding, ?)
FROM my_table
ORDER BY my_embedding ANN OF ? LIMIT 2

Then just pass the query vector into both bind variables.

On Fri, Jun 16, 2023 at 7:09 AM Andrew Cobley (Staff) <
a.e.cob...@dundee.ac.uk> wrote:

> Hi,
>
> I’ve got a question and a request about this CEP
>
> In the example:
>
> SELECT * FROM test.foo WHERE j ANN OF [3.4, 7.8, 9.1] limit 1;
>
>
> I presume that limit n will return the nth nearest neighbours?
>
> If that’s the case what order will they be in? Is it posssible to reverse
> the order ?
>
> Secondly would it be possible to return the calculated distances?  This
> might be particular important if there are n returned neighbours?
>
> Andy
> --
> *From:* Patrick McFadin 
> *Sent:* 15 June 2023 01:03
> *To:* dev@cassandra.apache.org 
> *Subject:* Re: [VOTE] CEP-30 ANN Vector Search
>
>
>
>
> CAUTION: This email originated from outside the University of Dundee. Do
> not click links or open attachments unless you recognise the sender's email
> address and know the content is safe.
> Andy,
>
> Good to see you on the ML again! CEP-30 is slated for release with 5.0
> later in the year. Until then, you'll need to do a local build or try it
> out in a preview in Astra. A few of us have been talking about creating a
> preview docker image since there is some interest in having it run in
> k8ssandra. In any case, this is very alpha code and should be treated as
> such. Reporting errors or unusual results would be greatly appreciated!
>
> Patrick
>
>
>
> On Wed, Jun 14, 2023 at 7:10 AM Andrew Cobley (Staff) <
> a.e.cob...@dundee.ac.uk> wrote:
>
> Hi All,
>
>
>
> Great news this has gone through, I wondering if we have a timescale for
> this making it to Beta or release ?  I’m asking because we have a project
> that would benefit from this approach.
>
>
>
> Andy
>
>
>
>
>
> *From: *Jonathan Ellis 
> *Date: *Tuesday, 30 May 2023 at 14:44
> *To: *dev 
> *Subject: *Re: [VOTE] CEP-30 ANN Vector Search
>
>
>
> CAUTION: This email originated from outside the University of Dundee. Do
> not click links or open attachments unless you recognise the sender's email
> address and know the content is safe.
>
> Thanks, all.  Closing the vote as accepted with 8 binding +1 (including
> me) and 11 non-binding votes.
>
>
>
> On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis  wrote:
>
> Let's make this official.
>
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
>
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
>
> --
>
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>
>
> --
>
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Tokenization and SAI query syntax

2023-07-24 Thread Jonathan Ellis
Hi all,

With phase 1 of SAI wrapping up, I’d like to start the ball rolling on
aligning around phase 2 features.

In particular, we need to nail down the syntax for doing non-exact string
matches.  We have a proof of concept that includes full Lucene analyzer and
filter functionality – just the text transformation pieces, none of the
storage parts – which is the gold standard in this space.  For example, the
StandardAnalyzer [1] lowercases all terms and removes stopwords (common
words like “a”, “is”, “the” that are usually not useful to search
against).  Lucene also has classes that offer stemming, special case
handling for email, and many languages besides English [2].

What syntax should we use to express “rows whose analyzed tokens match this
search term?”

The syntax must be clear that we want to look for this term within the
column data using the configured index with corresponding query-time
tokenization and analysis.  This means that the query term is not always a
substring of the original string!  Besides obvious transformations like
lowercasing, you have things like PhoneticFilter available as well.

Here are my thoughts on some of the options:

`column = term`.  This is what the POC does today and it’s super confusing
to overload = to mean something other than exact equality.  I am not a fan.

`column LIKE term` or `column LIKE %term%`. The closest SQL operator, but
neither the wildcarded nor unwildcarded syntax matches the semantics of
term-based search.

`column MATCHES term`. I rather like this one, although Mike points out
that “match” has a meaning in the context of regular expressions that could
cause confusion here.

`column CONTAINS term`. Contains is used by both Java and Python for
substring searches, so at least some users will be surprised by term-based
behavior.

`term_matches(column, term)`. Postgresql FTS makes you use functions like
this for everything.  It’s pretty clunky, and we would need to make the
amazingly hairy SelectStatement even hairier to handle “use a function
result in a predicate” like this.

`column : term`. Inspired by Lucene’s syntax.  I don’t actually hate it.

`column LIKE :term:`. Stick with the LIKE operator but add a new symbol to
indicate term matching.  Arguably more SQL-ish than a new bare symbol
operator.

[1]
https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html
[2] https://lucene.apache.org/core/9_7_0/analysis/common/index.html

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Jonathan Ellis
On Tue, Oct 24, 2023 at 9:23 AM Josh McKenzie  wrote:

> Maybe it won't be a glamorous release but shipping
> 5.0 mitigates our worst case scenario.
>
> I disagree with this characterization of 5.0 personally. UCS, SAI, Trie
> memtables and sstables, maybe vector ANN if the sub-tasks on C-18715 are
> accurate, all combine to make 5.0 a pretty glamorous release IMO
> independent of TCM and Accord. Accord is a true paradigm-shift game-changer
> so it's easy to think of 5.0 as uneventful in comparison, and TCM helps
> resolve one of the biggest pain-points in our system for over a decade, but
> I think 5.0 is a very meaty release in its own right today.
>

Yes.  SAI will make a huge difference for almost everyone, and ANN will be
even more relevant to a smaller subset.

Let's not get sucked back into "we can't do fast releases because we need
to wait for major features, and we have to wait for major features because
it will be a long time before the next release."

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 5.0-alpha2

2023-11-03 Thread Jonathan Ellis
+1

On Mon, Oct 30, 2023 at 3:47 PM Mick Semb Wever  wrote:

> Proposing the test build of Cassandra 5.0-alpha2 for release.
>
> DISCLAIMER, this alpha release does not contain the features:
> Transactional Cluster Metadata (CEP-21) and Accord Transactions
> (CEP-15).  These features are under discussion to be pushed to a
> 5.1-alpha1 release, with an eta still this year.
>
> This release does contain Vector Similarity Search (CEP-30).
>
> Please also note that this is an alpha release and what that means,
> further info at
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle
>
> sha1: ea76d148c374198fede6978422895668857a927f
> Git: https://github.com/apache/cassandra/tree/5.0-alpha2-tentative
> Maven Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1317/org/apache/cassandra/cassandra-all/5.0-alpha2/
>
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/5.0-alpha2/
>
> The vote will be open for 72 hours (longer if needed). Everyone who
> has tested the build is invited to vote. Votes by PMC members are
> considered binding. A vote passes if there are at least three binding
> +1s and no -1's.
>
> [1]: CHANGES.txt:
> https://github.com/apache/cassandra/blob/5.0-alpha2-tentative/CHANGES.txt
> [2]: NEWS.txt:
> https://github.com/apache/cassandra/blob/5.0-alpha2-tentative/NEWS.txt
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Testing 4.0 Post-Freeze

2018-07-03 Thread Jonathan Ellis
Is that worth the risk of demotivating new contributors who might have
other priorities?

On Tue, Jul 3, 2018 at 4:22 PM, Jeff Jirsa  wrote:

> I think there's value in the psychological commitment that if someone has
> time to contribute, their contributions should be focused on validating a
> release, not pushing future features.
>
>
> On Tue, Jul 3, 2018 at 1:03 PM, Jonathan Haddad  wrote:
>
> > I agree with Josh. I don’t see how changing the convention around trunk
> > will improve the process, seems like it’ll only introduce a handful of
> > rollback commits when people forget.
> >
> > Other than that, it all makes sense to me.
> >
> > I’ve been working on a workload centric stress tool on and off for a
> little
> > bit in an effort to create something that will help with wider adoption
> in
> > stress testing. It differs from the stress we ship by including fully
> > functional stress workloads as well as a validation process. The idea
> being
> > to be flexible enough to test both performance and correctness in LWT and
> > MVs as well as other arbitrary workloads.
> >
> > https://github.com/thelastpickle/tlp-stress
> >
> > Jon
> >
> >
> > On Tue, Jul 3, 2018 at 12:28 PM Josh McKenzie 
> > wrote:
> >
> > > Why not just branch a 4.0-rel and bugfix there and merge up while still
> > > accepting new features or improvements on trunk?
> > >
> > > I don't think the potential extra engagement in testing will balance
> out
> > > the atrophy and discouraging contributions / community engagement we'd
> > get
> > > by deferring all improvements and new features in an open-ended way.
> > >
> > > On Tue, Jul 3, 2018 at 1:33 PM sankalp kohli 
> > > wrote:
> > >
> > > > Hi cassandra-dev@,
> > > >
> > > > With the goal of making Cassandra's 4.0 the most stable major release
> > to
> > > > date, we would like all committers of the project to consider joining
> > us
> > > in
> > > > dedicating their time and attention to testing, running, and fixing
> > > issues
> > > > in 4.0 between the September freeze and the 4.0 beta release. This
> > would
> > > > result in a freeze of new feature development on trunk or branches
> > during
> > > > this period, instead focusing on writing, improving, and running
> tests
> > or
> > > > fixing and reviewing bugs or performance regressions found in 4.0 or
> > > > earlier.
> > > >
> > > > How would this work?
> > > >
> > > > We propose that between the September freeze date and beta, a new
> > branch
> > > > would not be created and trunk would only have bug fixes and
> > performance
> > > > improvements committed to it. At the same time we do not want to
> > > discourage
> > > > community contributions. Not all contributors can be expected to be
> > aware
> > > > of such a decision or may be new to the project. In cases where new
> > > > features are contributed during this time, the contributor can be
> > > informed
> > > > of the current status of the release process, be encouraged to
> > contribute
> > > > to testing or bug fixing, and have their feature reviewed after the
> > beta
> > > is
> > > > reached.
> > > >
> > > >
> > > > What happens when beta is reached?
> > > >
> > > > Ideally, contributors who have made significant contributions to the
> > > > release will stick around to continue testing between beta and final
> > > > release. Any additional folks who continue this focus would also be
> > > greatly
> > > > appreciated.
> > > >
> > > > What about before the freeze?
> > > >
> > > > Testing new features is of course important. This isn't meant to
> > > discourage
> > > > development – only to enable us to focus on testing and hardening 4.0
> > to
> > > > deliver Cassandra's most stable major release. We would like to see
> > > > adoption of 4.0 happen much more quickly than its predecessor.
> > > >
> > > > Thanks for considering this proposal,
> > > > Sankalp Kohli
> > >
> > --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
> >
>



-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [Discuss] Accept GoCQL driver donation

2018-08-31 Thread Jonathan Ellis
On Fri, Aug 31, 2018 at 9:14 AM Nate McCall  wrote:

> Hi folks,
> So I was recently talking with, Chris Bannister  the gocql [0]
> maintainer, and he expressed an interest in donating the driver to the
> ASF.
>

Is he looking to continue to maintain it or is he looking to give it a good
home when he moves on?

We could accept this along the same lines as how we took in the dtest
> donation - going through the incubator IP clearance process [1], but
> in this case it's much simpler as an individual (Chris) owns the
> copyright.
>

Is that actually the case?  Github says 128 contributors, and I don't see
any mention of a CLA in
https://github.com/gocql/gocql/blob/master/CONTRIBUTING.md.

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Development Approach for Apache Cassandra Management process

2018-09-12 Thread Jonathan Ellis
> > > >
> > >
> > > Like releases, I think PMC votes count
> > >
> > >
> > > >
> > > > Anyway, fwiw, my opinion on this vote is not far from the one on the
> > > golang
> > > > driver acceptance vote (for which my remark above also apply btw): no
> > yet
> > > > 100% convinced adding more pieces and scope to the project is what
> the
> > > > project needs just right now, but not strongly opposed if people
> really
> > > > wants this (and this one makes more sense to me than the golang
> driver
> > > > actually). But if I'm to pick between a) and b), I'm leaning b).
> > > >
> > >
> > > FWIW, two of the main reasons I'm in favor is as a way to lower barrier
> > to
> > > entry to both using the software AND contributing to the project, so I
> > > think your points are valid (both on gocql thread and on this note
> > above),
> > > but I think that's also part of why we should be encouraging both.
> > >
> > > - Jeff
> > >
> >
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Built in trigger: double-write for app migration

2018-10-18 Thread Jonathan Ellis
Isn't this what CDC was designed for?

https://issues.apache.org/jira/browse/CASSANDRA-8844

On Thu, Oct 18, 2018 at 10:54 AM Carl Mueller
 wrote:

> tl;dr: a generic trigger on TABLES that will mirror all writes to
> facilitate data migrations between clusters or systems. What is necessary
> to ensure full write mirroring/coherency?
>
> When cassandra clusters have several "apps" aka keyspaces serving
> applications colocated on them, but the app/keyspace bandwidth and size
> demands begin impacting other keyspaces/apps, then one strategy is to
> migrate the keyspace to its own dedicated cluster.
>
> With backups/sstableloading, this will entail a delay and therefore a
> "coherency" shortfall between the clusters. So typically one would employ a
> "double write, read once":
>
> - all updates are mirrored to both clusters
> - writes come from the current most coherent.
>
> Often two sstable loads are done:
>
> 1) first load
> 2) turn on double writes/write mirroring
> 3) a second load is done to finalize coherency
> 4) switch the app to point to the new cluster now that it is coherent
>
> The double writes and read is the sticking point. We could do it at the app
> layer, but if the app wasn't written with that, it is a lot of testing and
> customization specific to the framework.
>
> We could theoretically do some sort of proxying of the java-driver somehow,
> but all the async structures and complex interfaces/apis would be difficult
> to proxy. Maybe there is a lower level in the java-driver that is possible.
> This also would only apply to the java-driver, and not
> python/go/javascript/other drivers.
>
> Finally, I suppose we could do a trigger on the tables. It would be really
> nice if we could add to the cassandra toolbox the basics of a write
> mirroring trigger that could be activated "fairly easily"... now I know
> there are the complexities of inter-cluster access, and if we are even
> using cassandra as the target mirror system (for example there is an
> article on triggers write-mirroring to kafka:
> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
>
> And this starts to get into the complexities of hinted handoff as well. But
> fundamentally this seems something that would be a very nice feature
> (especially when you NEED it) to have in the core of cassandra.
>
> Finally, is the mutation hook in triggers sufficient to track all incoming
> mutations (outside of "shudder" other triggers generating data)
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: CASSANDRA-14482

2019-02-15 Thread Jonathan Ellis
IMO "add a new compression class that has demonstrable benefits to Sushma
and Joseph" is sufficiently noninvasive that we should allow it into 4.0.

On Fri, Feb 15, 2019 at 10:48 AM Dinesh Joshi
 wrote:

> Hey folks,
>
> Just wanted to get a pulse on whether we can proceed with ZStd support.
> The consensus on the ticket was that it’s a very valuable addition without
> any risk of destabilizing 4.0. It’s ready to go if there aren’t any
> objections.
>
> Dinesh
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Free pass to DataStax Accelerate

2019-04-30 Thread Jonathan Ellis
Hi Cassandra devs,

Patrick McFadin sent this to the users list, but it was buried at the end
of a long email so I wanted to highlight this here as well.

DataStax is starting a new Apache Cassandra conference, DataStax
Accelerate, May 21-23 in Maryland.  DataStax is pleased to offer a free,
full-conference pass to the Cassandra community: use code DevRelFree when
you register at https://www.datastax.com/accelerate.

Hope to see you there!

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] Moving chats to ASF's Slack instance

2019-05-28 Thread Jonathan Ellis
I agree.  This lowers the barrier to entry for new participants.  Slack is
probably two orders of magnitude more commonly used now than irc for sw
devs and three for everyone else.  And then you have the quality-of-life
features that you get out of the box with Slack and only with difficulty in
irc (history, search, file uploads...)

On Tue, May 28, 2019 at 4:29 PM Nate McCall  wrote:

> Hi Folks,
> While working on ApacheCon last week, I had to get setup on ASF's slack
> workspace. After poking around a bit, on a whim I created #cassandra and
> #cassandra-dev. I then invited a couple of people to come signup and test
> it out - primarily to make sure that the process was seamless for non-ASF
> account holders as well as committers, etc (it was).
>
> If you want to jump in, you can signup here:
> https://s.apache.org/slack-invite
>
> That said, I think it's time we transition from IRC to Slack. Now, I like
> CLI friendly, straight forward tools like IRC as much as anyone, but it's
> been more than once recently where a user I've talked to has said one of
> two things regarding our IRC channels: "What's IRC?" or "Yeah, I don't
> really do that anymore."
>
> In short, I think it's time to migrate. I think this will really just
> consist of some communications to our lists and updating the site (anything
> I'm missing?). The archives of IRC should just kind of persist for
> posterity sake without any additional effort or maintenance. The
> ASF-requirements are all configured already on the Slack workspace, so I
> think we are good there.
>
> Thanks,
> -Nate
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: server side describe

2020-04-01 Thread Jonathan Ellis
I think we should get serious about the so-called freeze.

On Wed, Apr 1, 2020 at 1:27 PM Jon Haddad  wrote:

> Hey folks,
>
> I was looking through our open JIRAs and realized we hadn't merged in
> server side describe calls yet.  The ticket died off a ways ago, and I
> pinged Chris about it yesterday.  He's got a lot of his plate and won't be
> able to work on it anytime soon.  I still think we should include this in
> 4.0.
>
> From a technical standpoint, It doesn't say much on the ticket after Robert
> tossed an alternative patch out there.  I don't mind reviewing and merging
> either of them, it sounded like both are pretty close to done and I think
> from the perspective of updating drivers for 4.0 this will save quite a bit
> of time since driver maintainers won't have to add new CQL generation for
> the various new options that have recently appeared.
>
> Questions:
>
> * Does anyone have an objection to getting this into 4.0? The patches
> aren't too huge, I think they're low risk, and also fairly high reward.
> * I don't have an opinion (yet) on Robert's patch vs Chris's, with regard
> to which is preferable.
> * Since soon after Robert put up his PR he hasn't been around, at least as
> far as I've seen.  How have we dealt with abandoned patches before?  If
> we're going to add this in the patch will need some cleanup.  Is it
> reasonable to continue someone else's work when they've disappeared?
>
> Jon
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Project governance wiki doc (take 2)

2020-06-20 Thread Jonathan Ellis
+1

On Sat, Jun 20, 2020 at 10:12 AM Joshua McKenzie 
wrote:

> Link to doc:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Project+Governance
>
> Change since previous cancelled vote:
> "A simple majority of this electorate becomes the low-watermark for votes
> in favour necessary to pass a motion, with new PMC members added to the
> calculation."
>
> This previously read "super majority". We have lowered the low water mark
> to "simple majority" to balance strong consensus against risk of stall due
> to low participation.
>
>
>- Vote will run through 6/24/20
>- pmc votes considered binding
>- simple majority of binding participants passes the vote
>- committer and community votes considered advisory
>
> Lastly, I propose we take the count of pmc votes in this thread as our
> initial roll call count for electorate numbers and low watermark
> calculation on subsequent votes.
>
> Thanks again everyone (and specifically Benedict and Jon) for the time and
> collaboration on this.
>
> ~Josh
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [Discussion] Windows support

2020-07-29 Thread Jonathan Ellis
I pushed for Windows support 5+ years ago but I just use WSL now.  (WSL2
makes it easier for Docker but it's not required.)

On Tue, Jul 28, 2020 at 8:41 PM Yuki Morishita  wrote:

> Hi,
>
> I'd like to raise my concern about Windows support, as we are getting
> closer to 4.0 release.
>
> Since the support for JDK11 (CASSANDRA-9608), Windows script to start
> Cassandra is broken.
> The fix for the script is posted to
> https://issues.apache.org/jira/projects/CASSANDRA/issues/CASSANDRA-14608.
>
> Windows scripts are not maintained recently, and I don't think we have
> any Windows environment in CI for testing.
> I don't think it is a good idea to release Apache Cassandra with
> broken Windows scripts.
>
> With the latest update of Windows 10, even the Windows 10 Home edition
> users can use Docker for Windows if they enable WSL2 in their machine.
> However, the update is not yet available for everyone, and I believe
> many Enterprises hold onto upgrading to the latest version. Even if
> they do so, they can disable WSL2 from using. Some companies may not
> allow installing VirtualBox either.
>
> So, what we can do for 4.0 release:
>
> - Stop supporting Windows. Remove every bat/ps1 scripts from the
> source and distribution. Encourage Windows users to use VM/Docker.
> - Continue supporting Windows. Set up Windows test environment. Test
> every Windows scripts for future releases.
>
> Since I saw enterprises with restricted dev environments (and saw
> people trying to use cassandra on Windows on StackOverflow), I want to
> have Windows scripts ready to be used.
> But I'm also fine if we decide to remove all Windows scripts since I
> use Docker anyway.
>
> Regards,
> Yuki
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [Vote] Remove Windows support from 4.0+

2020-08-09 Thread Jonathan Ellis
+1

On Sun, Aug 9, 2020 at 10:15 PM Yuki Morishita  wrote:

> As per the discussion(*), I propose to remove Windows support from 4.0
> release and onward.
>
> Windows scripts are not maintained and we lack windows test
> environments. WIndows users can  use docker or cloud environments to
> set up Cassandra application development.
>
> If the vote pass, I will create the following tickets to officially
> remove Windows support from 4.0:
>
> - Remove Windows scripts and add notice to NEWS.txt
> - Update "Getting Started" documents for Windows users (to direct them
> to use docker or cloud)
>
> Regards,
> Yuki
>
> --
> *:
> https://mail-archives.apache.org/mod_mbox/cassandra-dev/202007.mbox/%3CCAGM0Up_3GoPucCP-U18L1akzBXS1eJoKbui997%3DajcCfKJQdng%40mail.gmail.com%3E
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Welcome Jordan West, David Capwell, Zhao Yang and Ekaterina Dimitrova as Cassandra committers

2020-12-16 Thread Jonathan Ellis
Well-deserved congratulations!

On Wed, Dec 16, 2020 at 10:56 AM Benjamin Lerer 
wrote:

> The PMC's members are pleased to announce that Jordan West, David Capwell,
> Zhao Yang and Ekaterina Dimitrova have accepted the invitations to become
> committers this year.
>
> Jordan West accepted the invitation in April
> David Capwell accepted the invitation in July
> Zhao Yang accepted the invitation in September
> Ekaterina Dimitrova accepted the invitation in December
>
> Thanks a lot for everything you have done.
>
> Congratulations and welcome
>
> The Apache Cassandra PMC members
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] When to stop supporting Python 2

2021-01-27 Thread Jonathan Ellis
Python 2 was EOLed over a year ago.  I think it's fine to (1) require
python 3 to run cqlsh and (2) remove code that supports python 2 whenever
it's convenient.

Angelo has the right idea that rather than trying to finesse a deprecation
cycle into 4.0 at this late date, a better migration path can be provided
by backporting python3 support to 3.11.

On Wed, Jan 27, 2021 at 12:36 PM Brandon Williams  wrote:

> On Wed, Jan 27, 2021 at 12:09 PM Adam Holmberg
>  wrote:
> > I want to emphasize here: to my way of thinking, "dropping support" at
> this
> > juncture is just a matter of documenting it, and maybe introducing a
> > warning. We don't need to *remove* support for python2. It will continue
> to
> > work as-is. This would just guide us in deciding whether to work on flaws
> > that are python2-specific, and whether new things are developed with
> > backwards compatibility as a forcing concern.
>
> Actually, I think we have to go a little bit further, and at least as
> far as packaging is concerned, remove support for python2.  Recently
> pip updated to 21.0 and removed python2 support, which broke any
> builds that built artifacts requiring pip.  We now pin pip:
>
> https://github.com/apache/cassandra-builds/commit/54c45a9bcf9b36a3f78b7d773eaf1067483b49b8
> to get around this, but highlights that we too need to move away from
> anything using python2.  So while we would not modify code to *remove*
> python2 support, you would have to invoke python2 on the code in your
> own way, since the packages would depend on python3.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] Releases after 4.0

2021-01-28 Thread Jonathan Ellis
cqlsh isn't a new feature.

On Thu, Jan 28, 2021 at 10:32 AM Benedict Elliott Smith 
wrote:

> But, as discussed, we previously agreed limit features in a minor version,
> as per the release lifecycle (and I continue to endorse this decision)
>
> On 28/01/2021, 16:04, "Mick Semb Wever"  wrote:
>
> > if there's no such features, or anything breaking compatibility
> >
> > What do you envisage being delivered in such a release, besides bug
> > fixes?  Do we have the capacity as a project for releases dedicated
> to
> > whatever falls between those two gaps?
> >
>
>
> All releases that don't break any compatibilities as our documented
> guidelines dictate (wrt. upgrades, api, cql, native protocol, etc).
> Even
> new features can be introduced without compatibility breakages (and
> should
> be as often as possible).
>
> Honouring semver does not imply more releases, to the contrary it is
> just
> that a number of those existing releases will be minor instead of
> major.
> That is, it is an opportunity cost to not recognise minor releases.
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] Releases after 4.0

2021-01-28 Thread Jonathan Ellis
Sorry, I got my threads crossed!

On Thu, Jan 28, 2021 at 10:47 AM Jonathan Ellis  wrote:

> cqlsh isn't a new feature.
>
> On Thu, Jan 28, 2021 at 10:32 AM Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> But, as discussed, we previously agreed limit features in a minor
>> version, as per the release lifecycle (and I continue to endorse this
>> decision)
>>
>> On 28/01/2021, 16:04, "Mick Semb Wever"  wrote:
>>
>> > if there's no such features, or anything breaking compatibility
>> >
>> > What do you envisage being delivered in such a release, besides bug
>> > fixes?  Do we have the capacity as a project for releases dedicated
>> to
>> > whatever falls between those two gaps?
>> >
>>
>>
>> All releases that don't break any compatibilities as our documented
>> guidelines dictate (wrt. upgrades, api, cql, native protocol, etc).
>> Even
>> new features can be introduced without compatibility breakages (and
>> should
>> be as often as possible).
>>
>> Honouring semver does not imply more releases, to the contrary it is
>> just
>> that a number of those existing releases will be minor instead of
>> major.
>> That is, it is an opportunity cost to not recognise minor releases.
>>
>>
>>
>> -----
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Welcome Paulo Motta as Cassandra PMC member

2021-02-09 Thread Jonathan Ellis
Congratulations, Paulo!  Well deserved.

On Tue, Feb 9, 2021 at 9:54 AM Benjamin Lerer 
wrote:

>  The PMC's members are pleased to announce that Paulo Motta has accepted
> the invitation to become a PMC member yesterday.
>
> Thanks a lot, Paulo, for everything you have done for the project all these
> years.
>
> Congratulations and welcome
>
> The Apache Cassandra PMC members
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: March 2015 QA retrospective

2015-04-09 Thread Jonathan Ellis
NDRA-7910> Tyler Hobbs
> wildcard
> prepared statements are incorrect after a column is added to the table
> Alter
> table not tested concurrently with ?  CASSANDRA-8018
> <https://issues.apache.org/jira/browse/CASSANDRA-8018> Benjamin Lerer
> Cassandra
> seems to insert twice in custom PerColumnSecondaryIndex Custom secondary
> indexes not tested before release?  CASSANDRA-8028
> <https://issues.apache.org/jira/browse/CASSANDRA-8028> Carl Yeksigian
> Unable
> to compute when histogram overflowed Histogram output not tested with
> representative data sets, no regression test  CASSANDRA-8122
> <https://issues.apache.org/jira/browse/CASSANDRA-8122> Carl Yeksigian
> Undeclare
> throwable exception while executing 'nodetool netstats localhost' nodetool
> not tested against cluster throughout lifecycle, no regression test
> CASSANDRA-8211 <https://issues.apache.org/jira/browse/CASSANDRA-8211>
> Marcus
> Eriksson Overlapping sstables in L1+ Noted hard to reproduce, but still is
> there a way we could have, no regression test  CASSANDRA-8231
> <https://issues.apache.org/jira/browse/CASSANDRA-8231> Benjamin Lerer
> Wrong
> size of cached prepared statements Expected cache capacity not validated
> with actual cache capcaity, no regression test  CASSANDRA-8243
> <https://issues.apache.org/jira/browse/CASSANDRA-8243> Björn Hegerfors
> DTCS
> can leave time-overlaps, limiting ability to expire entire SSTables
> Performance
> improving fast path not tested in a representative way  CASSANDRA-8264
> <https://issues.apache.org/jira/browse/CASSANDRA-8264> Tyler Hobbs
> Problems
> with multicolumn relations and COMPACT STORAGE How can we catch
> interactions like compact storage not being covered by the test
> CASSANDRA-8280 <https://issues.apache.org/jira/browse/CASSANDRA-8280> Sam
> Tunnicliffe Cassandra crashing on inserting data over 64K into indexed
> strings Added tests are good example, could focusing on testing all access
> paths and boundary conditions per access path have prevented this
> CASSANDRA-8285 <https://issues.apache.org/jira/browse/CASSANDRA-8285>
> Aleksey
> Yeschenko Move all hints related tasks to hints private executor Pierre's
> reproducer represents something we weren't doing, but that users are. Is
> that now being tested?  CASSANDRA-8286
> <https://issues.apache.org/jira/browse/CASSANDRA-8286> Tyler Hobbs
> Regression
> in ORDER BY There were tests that failed in some versions, but not all? Did
> this not ship?  CASSANDRA-8288
> <https://issues.apache.org/jira/browse/CASSANDRA-8288> Tyler Hobbs cqlsh
> describe needs to show 'sstable_compression': '' Roundtrip test for
> describe schema?  CASSANDRA-8292
> <https://issues.apache.org/jira/browse/CASSANDRA-8292> Joshua McKenzie
> From
> Pig: org.apache.cassandra.exceptions.ConfigurationException: Expecting URI
> in variable: [cassandra.config]. Please prefix the file with file:/// for
> local files or file:/// for remote files. PIG not tested
> CASSANDRA-8302 <https://issues.apache.org/jira/browse/CASSANDRA-8302>
> Tyler
> Hobbs Filtering for CONTAINS (KEY) on frozen collection clustering columns
> within a partition does not work More untested combinations, could we have
> spotted that there was an interaction and tested it? Or did this not ship?
> CASSANDRA-8316 <https://issues.apache.org/jira/browse/CASSANDRA-8316>
> Marcus
> Eriksson "Did not get positive replies from all endpoints" error on
> incremental repair What were users doing differently, is there a reproducer
> for this running now?  CASSANDRA-8320
> <https://issues.apache.org/jira/browse/CASSANDRA-8320> Marcus Eriksson
> 2.1.2:
> NullPointerException in SSTableWriter What were users doing that caused
> this, are we doing that?  CASSANDRA-8332
> <https://issues.apache.org/jira/browse/CASSANDRA-8332> T Jake Luciani Null
> pointer after droping keyspace Add/drop keyspace not tested under load,
> with server logs checked for errors  CASSANDRA-8365
> <https://issues.apache.org/jira/browse/CASSANDRA-8365> Benjamin Lerer
> CamelCase
> name is used as index name instead of lowercase How can we establish UI
> consistency?  CASSANDRA-8370
> <https://issues.apache.org/jira/browse/CASSANDRA-8370> Sam Tunnicliffe
> cqlsh
> doesn't handle LIST statements correctly cqlsh untested functionality, no
> regression test?  CASSANDRA-8383
> <https://issues.apache.org/jira/browse/CASSANDRA-8383> Benedict Memtable
> flush may expire records from the commit log that are in a later memtable
> No
> regression test, no follow up ticket. Could/should this have been
> reproducable as an actual bu

Re: 3.0 and the Cassandra release process

2015-04-13 Thread Jonathan Ellis
On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis  wrote:

>
> I’m optimistic that as we improve our process this way, our even releases
> will become increasingly stable.  If so, we can skip sub-minor releases
> (3.2.x) entirely, and focus on keeping the release train moving.  In the
> meantime, we will continue delivering 2.1.x stability releases.
>

The weak point of this plan is the transition from the "big release"
development methodology culminating in 3.0, to the monthly tick-tock
releases.  Since 3.0 needs to go through a beta/release candidate phase,
during which we're going to be serious about not adding new features, that
means that 3.1 will come with multiple months worth of features, so right
off the bat we're starting from a disadvantage from a stability standpoint.

Recognizing that it will take several months for the tick-tock releases to
stabilize, I would like to ship 3.0.x stability releases concurrently with
3.y tick-tock releases.  This should stabilize 3.0.x faster than tick-tock,
while at the same time hedging our bets such that if we assess tick-tock in
six months and decide it's not delivering on its goals, we're not six
months behind in having a usable set of features that we shipped in 3.0.

So, to summarize:

- New features will *only* go into tick-tock releases.
- Bug fixes will go into tick-tock releases and a 3.0.x branch, which will
be maintained for at least a year

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: 3.0 and the Cassandra release process

2015-04-15 Thread Jonathan Ellis
Short answer: yes.

Longer answer, pasted from my reply to Jon Haddad elsewhere in the thread:

We are moving away from designating major releases like 3.0 as "special,"
other than as a marker of compatibility.  In fact we are moving away from
major releases entirely, with each release being a much smaller, digestible
unit of change, and the ultimate goal of every even release being
production-quality.

This means that bugs won't pile up and compound each other.  And bugs that
do slip through will affect less users.  As 3.x stabilizes, more people
will try out the releases, yielding better quality, yielding even more
people trying them out in a virtuous cycle.

This won't just happen by wishing for it.  I am very serious about
investing the energy we would have spent on backporting fixes to a "stable"
branch, into improving our QA process and test coverage.  After a very
short list of in-progress features that may not make the 3.0 cutoff (#6477,
#6696 come to mind) I'm willing to virtually pause new feature development
entirely to make this happen.


On Tue, Apr 14, 2015 at 11:53 PM, Phil Yang  wrote:

> Hi Jonathan,
>
> How long will tick-tock releases will be maintained? Do users have to
> upgrade to a new even release with new features to fix the bugs in an older
> even release?
>
> 2015-04-14 6:28 GMT+08:00 Jonathan Ellis :
>
> > On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis 
> wrote:
> >
> > >
> > > I’m optimistic that as we improve our process this way, our even
> releases
> > > will become increasingly stable.  If so, we can skip sub-minor releases
> > > (3.2.x) entirely, and focus on keeping the release train moving.  In
> the
> > > meantime, we will continue delivering 2.1.x stability releases.
> > >
> >
> > The weak point of this plan is the transition from the "big release"
> > development methodology culminating in 3.0, to the monthly tick-tock
> > releases.  Since 3.0 needs to go through a beta/release candidate phase,
> > during which we're going to be serious about not adding new features,
> that
> > means that 3.1 will come with multiple months worth of features, so right
> > off the bat we're starting from a disadvantage from a stability
> standpoint.
> >
> > Recognizing that it will take several months for the tick-tock releases
> to
> > stabilize, I would like to ship 3.0.x stability releases concurrently
> with
> > 3.y tick-tock releases.  This should stabilize 3.0.x faster than
> tick-tock,
> > while at the same time hedging our bets such that if we assess tick-tock
> in
> > six months and decide it's not delivering on its goals, we're not six
> > months behind in having a usable set of features that we shipped in 3.0.
> >
> > So, to summarize:
> >
> > - New features will *only* go into tick-tock releases.
> > - Bug fixes will go into tick-tock releases and a 3.0.x branch, which
> will
> > be maintained for at least a year
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
> >
>
>
>
> --
> Thanks,
> Phil Yang
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: A proposal for how we use JIRA in the tick-tock release process

2015-04-22 Thread Jonathan Ellis
SGTM.

On Thu, Apr 23, 2015 at 1:48 AM, Ryan McGuire  wrote:

> In the interests of making the tick tock release process as smooth and
>
> efficient as possible, I’d like to propose a few procedural JIRA
>
> rules:
>
>
>  * Let’s use the In Progress status to indicate when development is
>
> actually in progress. This can be a very useful indicator to testers
>
> that it’s the right time to engage the developer to discuss testing
>
> plans and agree on the Definition of Done for that ticket.
>
>
>  * Let’s use the Testing status after a patch has been reviewed, and
>
> before the patch gets merged, to be an opportunity for people to chime
>
> in about whether or not the proposed change has adequate testing and
>
> meets the Definition of Done.
>
>
> It’s not my intention to add needless formalities to the process or to
>
> slow things down for the developers - test planning and test
>
> implementation should always be done concurrently while a test is In
>
> Progress, so the Testing status for a ticket should be short lived.
>
> What it gives us is a more solid way of knowing that what gets merged
>
> into trunk is in as best shape as it can be, and is always
>
> deliverable.
>
>
> I would also note that the Testing phase should not be regarded as
>
> only for the DataStax test engineering team. It really should be a
>
> collaborative phase where we all can discuss the tests that everyone
>
> has contributed. If a developer is confident that all the testing is
>
> in place (unit tests, dtests, etc.) then they should feel free to skip
>
> the testing status.
>
>
> --
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Ryan McGuire
>
> Software Engineering Manager in Test | r...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/in/enigmacurry> [image:
> twitter.png] <http://twitter.com/enigmacurry>
> <http://github.com/enigmacurry>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.1.5

2015-04-27 Thread Jonathan Ellis
+1

On Mon, Apr 27, 2015 at 11:46 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.1.5.
>
> sha1: 3c0a337ebc90b0d99349d0aa152c92b5b3494d8c
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.5-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1053/org/apache/cassandra/apache-cassandra/2.1.5/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1053/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/Coam8e (CHANGES.txt)
> [2]: http://goo.gl/ClUkPI (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: DateTieredCompactionStrategy and static columns

2015-05-01 Thread Jonathan Ellis
tatic" columns (i.e., if each level of clustering
> component
> > > can have columns associated with it). If we do, we should definitely
> keep
> > > it all inline. If not, it probably permits a lot better behaviour to
> > > separate them, since it's easier to reason about and improve their
> > distinct
> > > characteristics.
> > >
> > >
> > > On Fri, May 1, 2015 at 1:24 AM, graham sanderson 
> > wrote:
> > >
> > >> Well you lose the atomicity and isolation, but in this case that is
> > >> probably fine
> > >>
> > >> That said, in every interaction I’ve had with static columns, they
> seem
> > to
> > >> be an odd duck (e.g. adding or complicating range slices), perhaps
> > worthy
> > >> of their own code path and sstables. Just food for thought.
> > >>
> > >>> On Apr 30, 2015, at 7:13 PM, Jonathan Haddad 
> > wrote:
> > >>>
> > >>> If you want it in a separate sstable, just use a separate table.
> > There's
> > >>> nothing that warrants making the codebase more complex to accomplish
> > >>> something it already does.
> > >>>
> > >>> On Thu, Apr 30, 2015 at 5:07 PM graham sanderson 
> > >> wrote:
> > >>>
> > >>>> Anyone here have an opinion; how realistic would it be to have a
> > >> separate
> > >>>> memtable/sstable for static columns?
> > >>>>
> > >>>> Begin forwarded message:
> > >>>>
> > >>>> *From: *Jonathan Haddad 
> > >>>> *Subject: **Re: DateTieredCompactionStrategy and static columns*
> > >>>> *Date: *April 30, 2015 at 3:55:46 PM CDT
> > >>>> *To: *u...@cassandra.apache.org
> > >>>> *Reply-To: *u...@cassandra.apache.org
> > >>>>
> > >>>>
> > >>>> I suspect this will kill the benefit of DTCS, but haven't tested it
> to
> > >> be
> > >>>> 100% here.
> > >>>>
> > >>>> The benefit of DTCS is that sstables are selected for compaction
> based
> > >> on
> > >>>> the age of the data, not their size.  When you mix TTL'ed data and
> non
> > >>>> TTL'ed data, you end up screwing with the "drop the entire SSTable"
> > >>>> optimization.  I don't believe this is any different just because
> > you're
> > >>>> mixing in static columns.  What I think will happen is you'll end up
> > >> with
> > >>>> an sstable that's almost entirely TTL'ed with a few static columns
> > that
> > >>>> will never get compacted or dropped.  Pretty much the worst
> scenario I
> > >> can
> > >>>> think of.
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Thu, Apr 30, 2015 at 11:21 AM graham sanderson 
> > >> wrote:
> > >>>>
> > >>>>> I have a potential use case I haven’t had a chance to prototype
> yet,
> > >>>>> which would normally be a good candidate for DTCS (i.e. data
> > delivered
> > >> in
> > >>>>> order and a fixed TTL), however with every write we’d also be
> > updating
> > >> some
> > >>>>> static cells (namely a few key/values in a static map
> CQL
> > >>>>> column). There could also be explicit deletes of keys in the static
> > >> map,
> > >>>>> though that’s not 100% necessary.
> > >>>>>
> > >>>>> Since those columns don’t have TTL, without reading thru the code
> > code
> > >>>>> and/or trying it, I have no idea what effect this has on DTCS
> > (perhaps
> > >> it
> > >>>>> needs to use separate sstables for static columns). Has anyone
> tried
> > >> this.
> > >>>>> If not I eventually will and will report back.
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Staging Branches

2015-05-07 Thread Jonathan Ellis
On Thu, May 7, 2015 at 7:13 AM, Aleksey Yeschenko 
wrote:

>
> That said, perhaps it’s too much change at once. We still have missing
> pieces of infrastructure, and TE is busy with what’s already back-logged.
> So let’s revisit this proposal in a few months, closer to 3.1 or 3.2, maybe?
>

Agreed.  I would like to wait and see how we do without extra branches for
a release or two.  That will give us a better idea of how much pain the
extra steps will protect us from.


Requiring Java 8 for C* 3.0

2015-05-07 Thread Jonathan Ellis
We discussed requiring Java 8 previously and decided to remain Java
7-compatible, but at the time we were planning to release 3.0 before Java 7
EOL.  Now that 8099 and increased emphasis on QA have delayed us past Java
7 EOL, I think it's worth reopening this discussion.

If we require 8, then we can use lambdas, LongAdder, StampedLock, Streaming
collections, default methods, etc.  Not just in 3.0 but over 3.x for the
next year.

If we don't, then people can choose whether to deploy on 7 or 8 -- but the
vast majority will deploy on 8 simply because 7 is no longer supported
without a premium contract with Oracle.  8 also has a more advanced G1GC
implementation (see CASSANDRA-7486).

I think that gaining access to the new features in 8 as we develop 3.x is
worth losing the ability to run on a platform that will have been EOL for a
couple months by the time we release.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Requiring Java 8 for C* 3.0

2015-05-07 Thread Jonathan Ellis
Yes, it is.

On Thu, May 7, 2015 at 9:43 AM, Nick Bailey  wrote:

> Is running 2.1 with java 8 a supported or recommended way to run at this
> point? If not then we'll be requiring users to upgrade both java and C* at
> the same time when making the jump to 3.0.
>
> On Thu, May 7, 2015 at 11:25 AM, Aleksey Yeschenko 
> wrote:
>
> > The switch will necessarily hurt 3.0 adoption, but I think we’ll live. To
> > me, the benefits (mostly access to lambdas and default methods, tbh)
> > slightly outweigh the downsides.
> >
> > +0.1
> >
> > --
> > AY
> >
> > On May 7, 2015 at 19:22:53, Gary Dusbabek (gdusba...@gmail.com) wrote:
> >
> > +1
> >
> > On Thu, May 7, 2015 at 11:09 AM, Jonathan Ellis 
> wrote:
> >
> > > We discussed requiring Java 8 previously and decided to remain Java
> > > 7-compatible, but at the time we were planning to release 3.0 before
> > Java 7
> > > EOL. Now that 8099 and increased emphasis on QA have delayed us past
> Java
> > > 7 EOL, I think it's worth reopening this discussion.
> > >
> > > If we require 8, then we can use lambdas, LongAdder, StampedLock,
> > Streaming
> > > collections, default methods, etc. Not just in 3.0 but over 3.x for the
> > > next year.
> > >
> > > If we don't, then people can choose whether to deploy on 7 or 8 -- but
> > the
> > > vast majority will deploy on 8 simply because 7 is no longer supported
> > > without a premium contract with Oracle. 8 also has a more advanced G1GC
> > > implementation (see CASSANDRA-7486).
> > >
> > > I think that gaining access to the new features in 8 as we develop 3.x
> is
> > > worth losing the ability to run on a platform that will have been EOL
> > for a
> > > couple months by the time we release.
> > >
> > > --
> > > Jonathan Ellis
> > > Project Chair, Apache Cassandra
> > > co-founder, http://www.datastax.com
> > > @spyced
> > >
> >
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Proposal: release 2.2 (based on current trunk) before 3.0 (based on 8099)

2015-05-09 Thread Jonathan Ellis
*With 8099 still weeks from being code complete, and even longer from being
stable, I’m starting to think we should decouple everything that’s already
done in trunk from 8099.  That is, ship 2.2 ASAP with - Windows support-
UDF- Role-based permissions - JSON- Compressed commitlog- Off-heap row
cache- Message coalescing on by default- Native protocol v4and let 3.0 ship
with 8099 and a few things that finish by then (vnode compaction,
file-based hints, maybe materialized views).Remember that we had 7 release
candidates for 2.1.  Splitting 2.2 and 3.0 up this way will reduce the risk
in both 2.2 and 3.0 by separating most of the new features from the big
engine change.  We might still have a lot of stabilization to do for either
or both, but at the least this lets us get a head start on testing the new
features in 2.2.This does introduce a new complication, which is that
instead of 3.0 being an unusually long time after 2.1, it will be an
unusually short time after 2.2.  The “default” if we follow established
practice would be to*

   -

   EOL 2.1 when 3.0 ships, and maintain 2.2.x and 3.0.x stabilization
   branches


*But, this is probably not the best investment we could make for our users
since 2.2 and 3.0 are relatively close in functionality.  I see a couple
other options without jumping to 3 concurrent stabilization series:*



* - Extend 2.1.x series and 2.2.x until 4.0, but skip 3.0.x stabilization
series in favor of tick-tock 3.x- Extend 2.1.x series until 4.0, but stop
2.2.x when 3.0 ships in favor of developing 3.0.x insteadThoughts?*

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Proposal: release 2.2 (based on current trunk) before 3.0 (based on 8099)

2015-05-11 Thread Jonathan Ellis
On Sun, May 10, 2015 at 2:42 PM, Aleksey Yeschenko 
wrote:

> 3.0, however, will require a stabilisation period, just by the nature of
> it. It might seem like 2.2 and 3.0 are closer to each other than 2.1 and
> 2.2 are, if you go purely by the feature list, but in fact the opposite is
> true.
>

You are probably right.  But let me push back on some of the extra work
you're proposing just a little:

1) 2.0.x branch goes EOL when 3.0 is out, as planned
>

3.0 was, however unrealistically, planned for April.  And it's moving the
goalposts to say the plan was always to keep 2.0.x for three major
releases; the plan was to EOL with "the next major release after 2.1"
whether that was called 3.0 or not.  So I think EOLing 2.0.x when 2.2 comes
out is reasonable, especially considering that 2.2 is realistically a month
or two away even if we can get a beta out this week.

2) 3.0.x LTS branch stays, as planned, and helps us stabilise the new
> storage engine
>

Yes.


> 3) in a few months after 2.2 gets released, we EOL 2.1. Users upgrade to
> 2.2, get the same stability as with 2.1.7, plus a few new features
>

If push comes to shove I'm okay being ambiguous here, but can we just say
"when 3.0 is released we EOL 2.1?"

P.S. The area I'm most concerned about introducing destabilizing changes in
2.2 is commitlog; I will follow up to make sure we have a solid QA plan
there.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Proposal: release 2.2 (based on current trunk) before 3.0 (based on 8099)

2015-05-11 Thread Jonathan Ellis
Sounds good.  I will add the new version to Jira.

Planned tickets to block 2.2 beta for:

#8374
#8984
#9190

Any others?  (If it's not code complete today we should not block for it.)


On Mon, May 11, 2015 at 1:59 PM, Aleksey Yeschenko 
wrote:

> > So I think EOLing 2.0.x when 2.2 comes
> > out is reasonable, especially considering that 2.2 is realistically a
> month
> > or two away even if we can get a beta out this week.
>
> Given how long 2.0.x has been alive now, and the stability of 2.1.x at the
> moment, I’d say it’s fair enough to EOL 2.0 as soon as 2.2 gets out. Can’t
> argue here.
>
> > If push comes to shove I'm okay being ambiguous here, but can we just
> say
> > "when 3.0 is released we EOL 2.1?"
>
> Under our current projections, that’ll be exactly “a few months after 2.2
> is released”, so I’m again fine with it.
>
> > P.S. The area I'm most concerned about introducing destabilizing changes
> in
> > 2.2 is commitlog
>
> So long as you don’t you compressed CL, you should be solid. You are
> probably solid even if you do use compressed CL.
>
> Here are my only concerns:
>
> 1. New authz are not opt-in. If a user implements their own custom
> authenticator or authorized, they’d have to upgrade them sooner. The test
> coverage for new authnz, however, is better than the coverage we used to
> have before.
>
> 2. CQL2 is gone from 2.2. Might force those who use it migrate faster. In
> practice, however, I highly doubt that anybody using CQL2 is also someone
> who’d already switch to 2.1.x or 2.2.x.
>
>
> --
> AY
>
> On May 11, 2015 at 21:12:26, Jonathan Ellis (jbel...@gmail.com) wrote:
>
> On Sun, May 10, 2015 at 2:42 PM, Aleksey Yeschenko 
> wrote:
>
> > 3.0, however, will require a stabilisation period, just by the nature of
> > it. It might seem like 2.2 and 3.0 are closer to each other than 2.1 and
> > 2.2 are, if you go purely by the feature list, but in fact the opposite
> is
> > true.
> >
>
> You are probably right. But let me push back on some of the extra work
> you're proposing just a little:
>
> 1) 2.0.x branch goes EOL when 3.0 is out, as planned
> >
>
> 3.0 was, however unrealistically, planned for April. And it's moving the
> goalposts to say the plan was always to keep 2.0.x for three major
> releases; the plan was to EOL with "the next major release after 2.1"
> whether that was called 3.0 or not. So I think EOLing 2.0.x when 2.2 comes
> out is reasonable, especially considering that 2.2 is realistically a month
> or two away even if we can get a beta out this week.
>
> 2) 3.0.x LTS branch stays, as planned, and helps us stabilise the new
> > storage engine
> >
>
> Yes.
>
>
> > 3) in a few months after 2.2 gets released, we EOL 2.1. Users upgrade to
> > 2.2, get the same stability as with 2.1.7, plus a few new features
> >
>
> If push comes to shove I'm okay being ambiguous here, but can we just say
> "when 3.0 is released we EOL 2.1?"
>
> P.S. The area I'm most concerned about introducing destabilizing changes in
> 2.2 is commitlog; I will follow up to make sure we have a solid QA plan
> there.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Proposal: release 2.2 (based on current trunk) before 3.0 (based on 8099)

2015-05-11 Thread Jonathan Ellis
Unresolved issues tagged for 2.2b1:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20fixVersion%20%3D%20%222.2%20beta%201%22%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC

On Mon, May 11, 2015 at 2:42 PM, Jonathan Ellis  wrote:

> Sounds good.  I will add the new version to Jira.
>
> Planned tickets to block 2.2 beta for:
>
> #8374
> #8984
> #9190
>
> Any others?  (If it's not code complete today we should not block for it.)
>
>
> On Mon, May 11, 2015 at 1:59 PM, Aleksey Yeschenko 
> wrote:
>
>> > So I think EOLing 2.0.x when 2.2 comes
>> > out is reasonable, especially considering that 2.2 is realistically a
>> month
>> > or two away even if we can get a beta out this week.
>>
>> Given how long 2.0.x has been alive now, and the stability of 2.1.x at
>> the moment, I’d say it’s fair enough to EOL 2.0 as soon as 2.2 gets out.
>> Can’t argue here.
>>
>> > If push comes to shove I'm okay being ambiguous here, but can we just
>> say
>> > "when 3.0 is released we EOL 2.1?"
>>
>> Under our current projections, that’ll be exactly “a few months after 2.2
>> is released”, so I’m again fine with it.
>>
>> > P.S. The area I'm most concerned about introducing destabilizing
>> changes in
>> > 2.2 is commitlog
>>
>> So long as you don’t you compressed CL, you should be solid. You are
>> probably solid even if you do use compressed CL.
>>
>> Here are my only concerns:
>>
>> 1. New authz are not opt-in. If a user implements their own custom
>> authenticator or authorized, they’d have to upgrade them sooner. The test
>> coverage for new authnz, however, is better than the coverage we used to
>> have before.
>>
>> 2. CQL2 is gone from 2.2. Might force those who use it migrate faster. In
>> practice, however, I highly doubt that anybody using CQL2 is also someone
>> who’d already switch to 2.1.x or 2.2.x.
>>
>>
>> --
>> AY
>>
>> On May 11, 2015 at 21:12:26, Jonathan Ellis (jbel...@gmail.com) wrote:
>>
>> On Sun, May 10, 2015 at 2:42 PM, Aleksey Yeschenko 
>> wrote:
>>
>> > 3.0, however, will require a stabilisation period, just by the nature of
>> > it. It might seem like 2.2 and 3.0 are closer to each other than 2.1 and
>> > 2.2 are, if you go purely by the feature list, but in fact the opposite
>> is
>> > true.
>> >
>>
>> You are probably right. But let me push back on some of the extra work
>> you're proposing just a little:
>>
>> 1) 2.0.x branch goes EOL when 3.0 is out, as planned
>> >
>>
>> 3.0 was, however unrealistically, planned for April. And it's moving the
>> goalposts to say the plan was always to keep 2.0.x for three major
>> releases; the plan was to EOL with "the next major release after 2.1"
>> whether that was called 3.0 or not. So I think EOLing 2.0.x when 2.2 comes
>> out is reasonable, especially considering that 2.2 is realistically a
>> month
>> or two away even if we can get a beta out this week.
>>
>> 2) 3.0.x LTS branch stays, as planned, and helps us stabilise the new
>> > storage engine
>> >
>>
>> Yes.
>>
>>
>> > 3) in a few months after 2.2 gets released, we EOL 2.1. Users upgrade to
>> > 2.2, get the same stability as with 2.1.7, plus a few new features
>> >
>>
>> If push comes to shove I'm okay being ambiguous here, but can we just say
>> "when 3.0 is released we EOL 2.1?"
>>
>> P.S. The area I'm most concerned about introducing destabilizing changes
>> in
>> 2.2 is commitlog; I will follow up to make sure we have a solid QA plan
>> there.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder, http://www.datastax.com
>> @spyced
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Proposal: release 2.2 (based on current trunk) before 3.0 (based on 8099)

2015-05-11 Thread Jonathan Ellis
I do like 2.2 and 3.0 over 3.0 and 3.1 because going from 2.x to 3.x
signals that 8099 really is a big change.

On Mon, May 11, 2015 at 3:28 PM, Alex Popescu  wrote:

> On Sun, May 10, 2015 at 2:14 PM, Robert Stupp  wrote:
>
> > Instead of labeling it 2.2, I’d like to propose to label it 3.0 (so
> > basically just move 8099 to 3.1).
> > In the end it’s ”only a label”. But there are a lot of new user-facing
> > features in it that justifies a major release.
> >
>
> +1 on labeling the proposed 2.2 as 3.0 and moving (8099 to 3.1)
>
> 1. Tons of new features that feel more than just a 2.2
> 2. The majority of features planned for 3.0 are actually ready for this
> version
> 3. in order to avoid compatiblity questions (and version compatibility
> matrices), the drivers developed by DataStax have
> followed the Cassandra versions so far. The Python and C# drivers are
> already at 2.5 as they added some major features.
>
>Renaming the proposed 2.2 as 3.0 would allow us to continue to use this
> versioning policy until all drivers are supporting
>the latest Cassandra version and continue to not require a user to check
> a compatibility matrix.
>
>
> --
> Bests,
>
> Alex Popescu | @al3xandru
> Sen. Product Manager @ DataStax
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Proposal: release 2.2 (based on current trunk) before 3.0 (based on 8099)

2015-05-12 Thread Jonathan Ellis
Added those to the list, thanks.

On Tue, May 12, 2015 at 3:30 AM, Robert Stupp  wrote:

> I’ve got one - UDF using ecj instead of javassist (
> https://issues.apache.org/jira/browse/CASSANDRA-8241 <
> https://issues.apache.org/jira/browse/CASSANDRA-8241>). Not sure whether
> the licensing thing is fine that way (about what ”appropriately labeled“
> really means in https://www.apache.org/legal/resolved.html#category-b <
> https://www.apache.org/legal/resolved.html#category-b>).
>
> One thing that may annoy using UDFs w/ tuples & UDTs is #9186. It’s about
> "frozen“ getting lost in the signature.
>
> Probably also include #9229 (timeuuid to date/time conversion) ?
>
>
> > Am 12.05.2015 um 09:05 schrieb Marcus Eriksson :
> >
> > We should get https://issues.apache.org/jira/browse/CASSANDRA-8568 in
> 2.2
> > as well (it is patch avail and I'll get it reviewed this week)
> >
> > /Marcus
> >
> > On Mon, May 11, 2015 at 9:42 PM, Jonathan Ellis 
> wrote:
> >
> >> Sounds good.  I will add the new version to Jira.
> >>
> >> Planned tickets to block 2.2 beta for:
> >>
> >> #8374
> >> #8984
> >> #9190
> >>
> >> Any others?  (If it's not code complete today we should not block for
> it.)
> >>
> >>
> >> On Mon, May 11, 2015 at 1:59 PM, Aleksey Yeschenko 
> >> wrote:
> >>
> >>>> So I think EOLing 2.0.x when 2.2 comes
> >>>> out is reasonable, especially considering that 2.2 is realistically a
> >>> month
> >>>> or two away even if we can get a beta out this week.
> >>>
> >>> Given how long 2.0.x has been alive now, and the stability of 2.1.x at
> >> the
> >>> moment, I’d say it’s fair enough to EOL 2.0 as soon as 2.2 gets out.
> >> Can’t
> >>> argue here.
> >>>
> >>>> If push comes to shove I'm okay being ambiguous here, but can we just
> >>> say
> >>>> "when 3.0 is released we EOL 2.1?"
> >>>
> >>> Under our current projections, that’ll be exactly “a few months after
> 2.2
> >>> is released”, so I’m again fine with it.
> >>>
> >>>> P.S. The area I'm most concerned about introducing destabilizing
> >> changes
> >>> in
> >>>> 2.2 is commitlog
> >>>
> >>> So long as you don’t you compressed CL, you should be solid. You are
> >>> probably solid even if you do use compressed CL.
> >>>
> >>> Here are my only concerns:
> >>>
> >>> 1. New authz are not opt-in. If a user implements their own custom
> >>> authenticator or authorized, they’d have to upgrade them sooner. The
> test
> >>> coverage for new authnz, however, is better than the coverage we used
> to
> >>> have before.
> >>>
> >>> 2. CQL2 is gone from 2.2. Might force those who use it migrate faster.
> In
> >>> practice, however, I highly doubt that anybody using CQL2 is also
> someone
> >>> who’d already switch to 2.1.x or 2.2.x.
> >>>
> >>>
> >>> --
> >>> AY
> >>>
> >>> On May 11, 2015 at 21:12:26, Jonathan Ellis (jbel...@gmail.com) wrote:
> >>>
> >>> On Sun, May 10, 2015 at 2:42 PM, Aleksey Yeschenko  >
> >>> wrote:
> >>>
> >>>> 3.0, however, will require a stabilisation period, just by the nature
> >> of
> >>>> it. It might seem like 2.2 and 3.0 are closer to each other than 2.1
> >> and
> >>>> 2.2 are, if you go purely by the feature list, but in fact the
> opposite
> >>> is
> >>>> true.
> >>>>
> >>>
> >>> You are probably right. But let me push back on some of the extra work
> >>> you're proposing just a little:
> >>>
> >>> 1) 2.0.x branch goes EOL when 3.0 is out, as planned
> >>>>
> >>>
> >>> 3.0 was, however unrealistically, planned for April. And it's moving
> the
> >>> goalposts to say the plan was always to keep 2.0.x for three major
> >>> releases; the plan was to EOL with "the next major release after 2.1"
> >>> whether that was called 3.0 or not. So I think EOLing 2.0.x when 2.2
> >> comes
> >>> out is reasonable, especially considering that 2.2 is realistically a
> >> month
> >>> or two away even if we can get a beta out this week.
> >>>
> >>> 2) 3.0.x LTS branch stays, as planned, and helps us stabilise the new
> >>>> storage engine
> >>>>
> >>>
> >>> Yes.
> >>>
> >>>
> >>>> 3) in a few months after 2.2 gets released, we EOL 2.1. Users upgrade
> >> to
> >>>> 2.2, get the same stability as with 2.1.7, plus a few new features
> >>>>
> >>>
> >>> If push comes to shove I'm okay being ambiguous here, but can we just
> say
> >>> "when 3.0 is released we EOL 2.1?"
> >>>
> >>> P.S. The area I'm most concerned about introducing destabilizing
> changes
> >> in
> >>> 2.2 is commitlog; I will follow up to make sure we have a solid QA plan
> >>> there.
> >>>
> >>> --
> >>> Jonathan Ellis
> >>> Project Chair, Apache Cassandra
> >>> co-founder, http://www.datastax.com
> >>> @spyced
> >>>
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder, http://www.datastax.com
> >> @spyced
> >>
>
> —
> Robert Stupp
> @snazy
>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.0.15

2015-05-13 Thread Jonathan Ellis
+1

On Wed, May 13, 2015 at 11:00 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.0.15.
>
> sha1: 418deaf6ca1d0ad2e95d13abc7b18dbd51e676e7
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.15-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1054/org/apache/cassandra/apache-cassandra/2.0.15/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1054/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/1hZh8a (CHANGES.txt)
> [2]: http://goo.gl/weKkFT (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.2.0-beta1

2015-05-18 Thread Jonathan Ellis
+1

On Sun, May 17, 2015 at 9:34 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.2.0-beta1.
>
> sha1: 1735249ebfdbf139ca95507d591a324dfe81da33
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.0-beta1-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1057/org/apache/cassandra/apache-cassandra/2.2.0-beta1/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1057/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> Since this is beta, the vote will be open for 24 hours.
>
> [1]: http://goo.gl/YsRffc (CHANGES.txt)
> [2]: http://goo.gl/jFb87y (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.1.6

2015-06-05 Thread Jonathan Ellis
+1

On Fri, Jun 5, 2015 at 10:05 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.1.6.
>
> sha1: e469f32be180a1e493227111649d067a35201e97
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.6-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1058/org/apache/cassandra/apache-cassandra/2.1.6/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1058/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/lTrkx1 (CHANGES.txt)
> [2]: http://goo.gl/93P7VA (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.2.0-rc1

2015-06-05 Thread Jonathan Ellis
+1

On Fri, Jun 5, 2015 at 10:27 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.2.0-rc1.
>
> sha1: b0ae285bdc7377a64ed92f01c67ff46b40ecaac0
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.0-rc1-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1059/org/apache/cassandra/apache-cassandra/2.2.0-rc1/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1059/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/B6LLNm (CHANGES.txt)
> [2]: http://goo.gl/mr638q (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.1.7

2015-06-19 Thread Jonathan Ellis
+1

On Fri, Jun 19, 2015 at 7:59 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.1.7.
>
> sha1: 718c144324d170535d4f1a1e79dd9869cce19ed1
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.7-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1061/org/apache/cassandra/apache-cassandra/2.1.7/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1061/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/FuQlgX (CHANGES.txt)
> [2]: http://goo.gl/zeB2Os (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.0.16

2015-06-19 Thread Jonathan Ellis
+1

On Fri, Jun 19, 2015 at 7:56 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.0.16.
>
> sha1: 23e66a9d1c50e4331e8c1d212c2eeb940c5471fa
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.16-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1060/org/apache/cassandra/apache-cassandra/2.0.16/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1060/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/stVnvu (CHANGES.txt)
> [2]: http://goo.gl/nkYblK (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.1.8

2015-07-06 Thread Jonathan Ellis
+1

On Mon, Jul 6, 2015 at 12:04 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.1.8.
>
> sha1: db39257c34152f6ccf8d53784cea580dbfe1edad
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.8-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1063/org/apache/cassandra/apache-cassandra/2.1.8/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1063/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/BFYiEO (CHANGES.txt)
> [2]: http://goo.gl/24XaPp (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.2.0-rc2

2015-07-06 Thread Jonathan Ellis
+1

On Mon, Jul 6, 2015 at 1:47 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.2.0-rc2.
>
> sha1: ebc50d783505854f04f183297ad3009b9095b07e
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.0-rc2-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1065/org/apache/cassandra/apache-cassandra/2.2.0-rc2/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1065/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/C1QdHh (CHANGES.txt)
> [2]: http://goo.gl/NPABEq (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Wiki

2015-07-15 Thread Jonathan Ellis
Done.

On Wed, Jul 15, 2015 at 7:48 AM, Benjamin Lerer  wrote:

> Hi,
> Could you add me to the *Contributors* with permission to edit the
> Cassandra wiki?
> My username is BenjaminLerer.
>
> Thanks a lot,
>
> Benjamin
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.2.0

2015-07-17 Thread Jonathan Ellis
+1

On Fri, Jul 17, 2015 at 1:06 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.2.0.
>
> sha1: 437bb9de77f54aa5a4a6a634ab3d2c753a17b3fc
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.0-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1067/org/apache/cassandra/apache-cassandra/2.2.0/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1067/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/3FbKhG (CHANGES.txt)
> [2]: http://goo.gl/sMGs53 (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 3.0.0-alpha1

2015-07-31 Thread Jonathan Ellis
+1

On Fri, Jul 31, 2015 at 8:42 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 3.0.0-alpha1.
>
> sha1: b090ed6938c0fad792e51757384bd5ac7f35a301
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.0-alpha1-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1070/org/apache/cassandra/apache-cassandra/3.0.0-alpha1/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1070/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/oowQCQ (CHANGES.txt)
> [2]: http://goo.gl/s0RHg4 (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [RELEASE] Apache Cassandra 3.0.0-alpha1 released

2015-08-03 Thread Jonathan Ellis
To reduce the temptation to feature creep, I've created a 3.0 branch.
Let's keep new features to trunk where possible to keep the QA burden of
3.0 as low as possible.

On Mon, Aug 3, 2015 at 11:19 AM, Jake Luciani  wrote:

>
> The Cassandra team is pleased to announce the release of Apache Cassandra
> version 3.0.0-alpha1.
>
> This is the first test build of Cassandra 3.0 that includes:
>
>* New storage engine
>* New sstable format
>* Materialized Views
>
> We expect bugs in this release so test and report any issues please!
>
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a *ALPHA* release[1] on the 3.0 series. As always, please
> pay
> attention to the release notes[2] and Let us know[3] if you were to
> encounter
> any problem.
>
> Enjoy!
>
> [1]: http://goo.gl/qTe3Ed (CHANGES.txt)
> [2]: http://goo.gl/eMIDGw (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: I want to develop transactions for Cassandra and I want your feedback

2015-08-07 Thread Jonathan Ellis
Have you seen RAMP transactions?

I think that's a much better fit for C* than fully linearizable operations
cross-partition.

https://issues.apache.org/jira/browse/CASSANDRA-7056

On Fri, Aug 7, 2015 at 7:56 AM, Marek Lewandowski <
marekmlewandow...@gmail.com> wrote:

> actually I have been also thinking about doing something like redundant
> execution of transaction. So you have this *single active thing* that
> executes transaction, but you can also have redundancy of form of other
> _followers_ that try to execute same transactions (like a dry-run) and upon
> detection of failure of *single active thing* one of them could pick
> transaction execution and finish it. Still it's a little bit vague and
> needs a lot more details, but now system could recover from failure of this
> _single active thing_. What do you think?
>
> 2015-08-07 14:48 GMT+02:00 Robert Stupp :
>
> >
> > > On 07 Aug 2015, at 14:35, Marek Lewandowski <
> marekmlewandow...@gmail.com>
> > wrote:
> > >
> > > In both of my ideas there
> > > is some central piece.
> >
> >
> > That’s the point - a single thing. A single thing IS a
> > single-point-of-failure.
> > Sorry to reply that drastically: that’s an absolute no-go in C*. Every
> > node must be equal - no special “this” or special “that”.
> >
> > —
> > Robert Stupp
> > @snazy
> >
> >
>
>
> --
> Marek Lewandowski
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: I want to develop transactions for Cassandra and I want your feedback

2015-08-07 Thread Jonathan Ellis
There is a lot of interest in ramp, but the dependency on requiring a
unique timestamp id is a bitch.

There is zero interest in committing and maintaining a more heavyweight
framework to get all the way to serializable cross-partition transactions.

On Fri, Aug 7, 2015 at 2:42 PM, Marek Lewandowski <
marekmlewandow...@gmail.com> wrote:

> Hi Jonathan,
>
> I haven’t heard about it before, but now I’ve read it and it indeed offers
> something interesting. I’ve read blog post, paper and comments at Jira so I
> need to digest it a bit and let it sink in. Thanks for letting me know
> about it.
>
> Can you tell me something more about the status of that feature? Would you
> like to have it?
> From what I see, discussion stopped year ago and it has minor priority so
> it doesn’t seem like a hot subject that everyone awaits.
>
> Maybe I can incorporate that as a building block for something more
> functional. While reading I noticed that some concepts resemble what I’ve
> been thinking about, but here it is obviously much more detailed and
> specified. I need to digest it.
>
> > On 07 Aug 2015, at 18:05, Jonathan Ellis  wrote:
> >
> > Have you seen RAMP transactions?
> >
> > I think that's a much better fit for C* than fully linearizable
> operations
> > cross-partition.
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-7056
> >
> > On Fri, Aug 7, 2015 at 7:56 AM, Marek Lewandowski <
> > marekmlewandow...@gmail.com> wrote:
> >
> >> actually I have been also thinking about doing something like redundant
> >> execution of transaction. So you have this *single active thing* that
> >> executes transaction, but you can also have redundancy of form of other
> >> _followers_ that try to execute same transactions (like a dry-run) and
> upon
> >> detection of failure of *single active thing* one of them could pick
> >> transaction execution and finish it. Still it's a little bit vague and
> >> needs a lot more details, but now system could recover from failure of
> this
> >> _single active thing_. What do you think?
> >>
> >> 2015-08-07 14:48 GMT+02:00 Robert Stupp :
> >>
> >>>
> >>>> On 07 Aug 2015, at 14:35, Marek Lewandowski <
> >> marekmlewandow...@gmail.com>
> >>> wrote:
> >>>>
> >>>> In both of my ideas there
> >>>> is some central piece.
> >>>
> >>>
> >>> That’s the point - a single thing. A single thing IS a
> >>> single-point-of-failure.
> >>> Sorry to reply that drastically: that’s an absolute no-go in C*. Every
> >>> node must be equal - no special “this” or special “that”.
> >>>
> >>> —
> >>> Robert Stupp
> >>> @snazy
> >>>
> >>>
> >>
> >>
> >> --
> >> Marek Lewandowski
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: I want to develop transactions for Cassandra and I want your feedback

2015-08-08 Thread Jonathan Ellis
That's what I'm saying, yes.

On Sat, Aug 8, 2015 at 5:18 AM, Marek Lewandowski <
marekmlewandow...@gmail.com> wrote:

> So basically, you are saying that even if I had developed something to
> provide serializable cross-partition transactions still nobody cares,
> nobody wants it because it would be too complex and for sure not performant
> enough?
>
> I just want to hear it crystal clear, so that I can talk to my supervisor
> and redirect my efforts to something more useful for you guys like this
> ramp for example.
> > On 07 Aug 2015, at 23:32, Jonathan Ellis  wrote:
> >
> > There is a lot of interest in ramp, but the dependency on requiring a
> > unique timestamp id is a bitch.
> >
> > There is zero interest in committing and maintaining a more heavyweight
> > framework to get all the way to serializable cross-partition
> transactions.
> >
> > On Fri, Aug 7, 2015 at 2:42 PM, Marek Lewandowski <
> > marekmlewandow...@gmail.com> wrote:
> >
> >> Hi Jonathan,
> >>
> >> I haven’t heard about it before, but now I’ve read it and it indeed
> offers
> >> something interesting. I’ve read blog post, paper and comments at Jira
> so I
> >> need to digest it a bit and let it sink in. Thanks for letting me know
> >> about it.
> >>
> >> Can you tell me something more about the status of that feature? Would
> you
> >> like to have it?
> >> From what I see, discussion stopped year ago and it has minor priority
> so
> >> it doesn’t seem like a hot subject that everyone awaits.
> >>
> >> Maybe I can incorporate that as a building block for something more
> >> functional. While reading I noticed that some concepts resemble what
> I’ve
> >> been thinking about, but here it is obviously much more detailed and
> >> specified. I need to digest it.
> >>
> >>> On 07 Aug 2015, at 18:05, Jonathan Ellis  wrote:
> >>>
> >>> Have you seen RAMP transactions?
> >>>
> >>> I think that's a much better fit for C* than fully linearizable
> >> operations
> >>> cross-partition.
> >>>
> >>> https://issues.apache.org/jira/browse/CASSANDRA-7056
> >>>
> >>> On Fri, Aug 7, 2015 at 7:56 AM, Marek Lewandowski <
> >>> marekmlewandow...@gmail.com> wrote:
> >>>
> >>>> actually I have been also thinking about doing something like
> redundant
> >>>> execution of transaction. So you have this *single active thing* that
> >>>> executes transaction, but you can also have redundancy of form of
> other
> >>>> _followers_ that try to execute same transactions (like a dry-run) and
> >> upon
> >>>> detection of failure of *single active thing* one of them could pick
> >>>> transaction execution and finish it. Still it's a little bit vague and
> >>>> needs a lot more details, but now system could recover from failure of
> >> this
> >>>> _single active thing_. What do you think?
> >>>>
> >>>> 2015-08-07 14:48 GMT+02:00 Robert Stupp :
> >>>>
> >>>>>
> >>>>>> On 07 Aug 2015, at 14:35, Marek Lewandowski <
> >>>> marekmlewandow...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> In both of my ideas there
> >>>>>> is some central piece.
> >>>>>
> >>>>>
> >>>>> That’s the point - a single thing. A single thing IS a
> >>>>> single-point-of-failure.
> >>>>> Sorry to reply that drastically: that’s an absolute no-go in C*.
> Every
> >>>>> node must be equal - no special “this” or special “that”.
> >>>>>
> >>>>> —
> >>>>> Robert Stupp
> >>>>> @snazy
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Marek Lewandowski
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Jonathan Ellis
> >>> Project Chair, Apache Cassandra
> >>> co-founder, http://www.datastax.com
> >>> @spyced
> >>
> >>
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Automatic scheduling & execution of repair

2015-08-13 Thread Jonathan Ellis
Now that we have LWT I think it could be self-coordinated.

On Thu, Aug 13, 2015 at 6:45 AM, Marcus Olsson 
wrote:

> Hi,
>
> Scheduling and running repairs in a Cassandra cluster is most often a
> required task, but this can both be hard for new users and it also requires
> a bit of manual configuration. There are good tools out there that can be
> used to simplify things, but wouldn't this be a good feature to have inside
> of Cassandra? To automatically schedule and run repairs, so that when you
> start up your cluster it basically maintains itself in terms of normal
> anti-entropy, with the possibility for manual configuration.
>
> BR
> Marcus Olsson
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Should we align JMX Authentication with internal Cassandra authentication?

2015-08-17 Thread Jonathan Ellis
So we would create a new permission class for JMX?

On Mon, Aug 17, 2015 at 8:34 AM, Jan Karlsson 
wrote:

> Hello fellow devs,
>
> Currently JMX authentication is handled by using a password file. I was
> thinking we could make this a little more sophisticated.
> What I propose is, we plug JMX so that it uses the internal authenticator
> Cassandra has to authenticate the user. This will allow us to authenticate
> towards the Cassandra database. (No pesky password file)
> We can make this pluggable so that users could add their own authenticator
> and authorizer.
>
> I have created a JIRA ticket on this as I have been working on a solution
> which would let us do precisely this.
> https://issues.apache.org/jira/browse/CASSANDRA-10091
>
> What are your thoughts on this?
>
> Best Regards
> Jan Karlsson
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 3.0.0-beta1

2015-08-21 Thread Jonathan Ellis
+1

On Fri, Aug 21, 2015 at 10:14 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 3.0.0-beta1.
>
> sha1: 356c755a3b7aa1c71f72cf81fbe810670bd71de7
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.0-beta1-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1072/org/apache/cassandra/apache-cassandra/3.0.0-beta1/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1072/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 48 hours (longer if needed).
>
> [1]: http://goo.gl/fyezu5 (CHANGES.txt)
> [2]: http://goo.gl/iVuCQU (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.1.9

2015-08-25 Thread Jonathan Ellis
+1

On Tue, Aug 25, 2015 at 9:01 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.1.9.
>
> sha1: 7d74563a25cb34784ae3dca05fe503bdb60f5fe5
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.9-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1073/org/apache/cassandra/apache-cassandra/2.1.9/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1073/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/FRGAh2 (CHANGES.txt)
> [2]: http://goo.gl/7HLc2N (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.2.1

2015-08-25 Thread Jonathan Ellis
+1

On Tue, Aug 25, 2015 at 12:45 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.2.1.
>
> sha1: 323890647718d6e061349cf8cbe857b95bd02b13
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.1-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1074/org/apache/cassandra/apache-cassandra/2.2.1/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1074/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/3ifmBZ (CHANGES.txt)
> [2]: http://goo.gl/7rQrRR (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.2.1 (Attempt #2)

2015-08-28 Thread Jonathan Ellis
+1

On Fri, Aug 28, 2015 at 9:04 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.2.1.
>
> sha1: 01a11fd2626d57bf0c8d0bce1e43060017592896
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.1-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1076/org/apache/cassandra/apache-cassandra/2.2.1/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1076/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/3ifmBZ (CHANGES.txt)
> [2]: http://goo.gl/7rQrRR (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 3.0.0-beta2

2015-09-04 Thread Jonathan Ellis
+1

On Fri, Sep 4, 2015 at 7:57 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 3.0.0-beta2.
>
> sha1: 17528910b82391bd834f1fddce4ff7c9b34ad452
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.0-beta2-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1077/org/apache/cassandra/apache-cassandra/3.0.0-beta2/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1077/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 48 hours (longer if needed).
>
> [1]: http://goo.gl/h3McOM (CHANGES.txt)
> [2]: http://goo.gl/WGb5qo (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Reproduce stale read in Cassandra

2015-09-08 Thread Jonathan Ellis
2 node cluster.  RF=2.  Take one node down.  Do a write.  Take the other
node down.  Do a read.   It will be stale since the node that is up missed
the original write, and you took the node that did see the write down
before it could replay to it.

This is by design though so I'm curious what kind of improvement you are
looking for.

On Tue, Sep 8, 2015 at 3:57 PM, Cindy Wang 
wrote:

> Hi there,
>
> I currently research on the improvement of consistency level of Cassandra.
> Since I am new to C*, does anyone know how to reproduce the stale reads on
> the cluster? Is there a transaction lib that can be loaded into it and
> reproduce the stale reads?
>
> ***TEST environment***
>
> AWS 3 instances(nodes) + Ubuntu 14.04 + Cassandra 2.2.1
>
> I plan to reproduce the stale reads first and then take a look at the
> source code trying to find out a way for improvement.
>
> Thanks so much. Any of your help is highly appreciated!!!
>
> Best,
> Xindi
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.0.17

2015-09-16 Thread Jonathan Ellis
+1

On Wed, Sep 16, 2015 at 1:29 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.0.17.
>
> sha1: 3aff44915edbd2bf07955d5b30fd47bf9c4698da
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.17-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1078/org/apache/cassandra/apache-cassandra/2.0.17/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1078/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/1g0t58 (CHANGES.txt)
> [2]: http://goo.gl/SHN2y3 (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 3.0.0-rc1

2015-09-19 Thread Jonathan Ellis
+1

On Sat, Sep 19, 2015 at 3:42 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 3.0.0-rc1.
>
> sha1: c95a7098cf77b5b8e96feb7c39aca8fec3a02f9c
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.0-rc1-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1080/org/apache/cassandra/apache-cassandra/3.0.0-rc1/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1080/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 24 hours (longer if needed).
>
> [1]: http://goo.gl/m3OPOV (CHANGES.txt)
> [2]: http://goo.gl/Z0HNhw (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.1.10

2015-10-01 Thread Jonathan Ellis
+1

On Thu, Oct 1, 2015 at 9:17 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.1.10.
>
> sha1: 78f2e7aa01d552454fd4270fee8d600c4433df5c
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.10-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1081/org/apache/cassandra/apache-cassandra/2.1.10/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1081/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/GLMVLb (CHANGES.txt)
> [2]: http://goo.gl/uFe1VN (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.2.2

2015-10-01 Thread Jonathan Ellis
+1

On Thu, Oct 1, 2015 at 9:40 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.2.2.
>
> sha1: ae9b7e05222b2a25eda5618cf9eb17103e4d6d8b
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.2-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1082/org/apache/cassandra/apache-cassandra/2.2.2/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1082/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/wbbE4n (CHANGES.txt)
> [2]: http://goo.gl/EN7MxO (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.1.11 (Attempt #2)

2015-10-13 Thread Jonathan Ellis
+1

On Tue, Oct 13, 2015 at 12:30 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.1.11.
>
> sha1: a85afbc7a83709da8d96d92fc4154675794ca7fb
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.11-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1086/org/apache/cassandra/apache-cassandra/2.1.11/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1086/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/cfjxJU (CHANGES.txt)
> [2]: http://goo.gl/nOz2X6 (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.2.3 (Attempt #2)

2015-10-13 Thread Jonathan Ellis
+1

On Tue, Oct 13, 2015 at 1:58 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.2.3.
>
> sha1: 89596682e8c495f0c6c76e0f7a21a6db96d00552
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.3-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1087/org/apache/cassandra/apache-cassandra/2.2.3/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1087/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 48 hours (longer if needed).
>
> [1]: http://goo.gl/dmpzjR (CHANGES.txt)
> [2]: http://goo.gl/ACOC01 (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 3.0.0-rc2

2015-10-16 Thread Jonathan Ellis
+1

On Fri, Oct 16, 2015 at 3:33 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 3.0.0-rc2.
>
> sha1: 56a06d78f20237c15a2bc7fb79826818173baead
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.0-rc2-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1088/org/apache/cassandra/apache-cassandra/3.0.0-rc2/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1088/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 48 hours (longer if needed).
>
> [1]: http://goo.gl/aOEO0x (CHANGES.txt)
> [2]: http://goo.gl/Im3ydD (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Modelling Cassandra's Availability

2015-10-20 Thread Jonathan Ellis
Interesting work.  Thanks!

On Mon, Oct 19, 2015 at 2:38 PM, CARLOS PEREZ  wrote:

> Dear Cassandra Developers,
>
> I have recently published a paper about Cassandra's availability made in
> the context of my PhD thesis. In this work I developed two different
> theoretical models of Cassandra's availability. One under persistent
> failures and another under transient failures. Using these models any
> Cassandra user could obtain accurate values of availability under different
> Cassandra configurations and use them to obtain the best configuration for
> any Cassandra system in terms of availability.
>
> The results of this work can be found in the Journal of Parallel and
> Distributed Computing with the title "Modeling the Availability of
> Cassandra". This work has been made under the supervision of Professors
> Jose Miguel-Alonso and Alexander Mendiburu from the University of the
> Basque Country. You can read it in:
>
> http://www.sciencedirect.com/science/article/pii/S074373151500129X
>
> or
>
> http://dx.doi.org/10.1016/j.jpdc.2015.08.001
>
> I hope you'll find it useful. Finally, I would like to thank all of you
> for all your work in Cassandra and for all the information available about
> Cassandra that has made possible this work.
>
> Best regards
>
> Carlos Perez-Miguel
>
>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 3.0.0

2015-11-06 Thread Jonathan Ellis
+1

On Fri, Nov 6, 2015 at 3:34 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 3.0.0.
>
> sha1: 96f407bce56b98cd824d18e32ee012dbb99a0286
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.0-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1089/org/apache/cassandra/apache-cassandra/3.0.0/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1089/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/0vghCL (CHANGES.txt)
> [2]: http://goo.gl/xAElQy (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: cassandra-3.1 branch and new merge order

2015-11-09 Thread Jonathan Ellis
I'm not a huge fan of leaving features to rot unmerged for a couple
months.  What is wrong with "new features go to trunk, stable branches get
forked at release?"

On Mon, Nov 9, 2015 at 10:54 AM, Jake Luciani  wrote:

> Looking back at the tick-tock email chain we never really discussed this.
>
> Rather than having 3.1 and trunk I think we should have just trunk.
>
> I'd rather not let features sit in a branch with bugfixes going on top that
> can decay.
> They should be merged in when it's time to merge features for 3.even, post
> 3.odd.
>
> I know we have features in trunk today that aren't in 3.0 and we probably
> shouldn't have done that.
>
>
>
>
>
>
> On Mon, Nov 9, 2015 at 11:35 AM, Aleksey Yeschenko 
> wrote:
>
> > With 3.0.0 vote to be over soon, tick-tock is officially starting, and we
> > are creating a new branch for cassandra-3.1 release.
> >
> > New merge order: cassandra-2.2 -> cassandra-3.0 -> cassandra-3.1 -> trunk
> >
> > - cassandra-3.0 branch is going to continue representing the 3.0.x series
> > of releases (3.0 bugfixes only, as no new feature are supposed to go into
> > 3.0.x release series)
> > - cassandra-3.1 branch will contain 3.0 bugfixes *only*
> > - trunk represents the upcoming cassandra-3.2 release (fixes from 3.1 and
> > new features)
> >
> > --
> > AY
>
>
>
>
> --
> http://twitter.com/tjake
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE RESULT] Release Apache Cassandra 3.0.0

2015-11-09 Thread Jonathan Ellis
Thanks everyone!

On Mon, Nov 9, 2015 at 5:03 PM, Jake Luciani  wrote:

> With 6 binding +1, 4 non-binding +1 and no -1 the vote passes.  will
> publish shortly
>
>
> On Fri, Nov 6, 2015 at 4:34 PM, Jake Luciani  wrote:
>
> > I propose the following artifacts for release as 3.0.0.
> >
> > sha1: 96f407bce56b98cd824d18e32ee012dbb99a0286
> > Git:
> >
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.0-tentative
> > Artifacts:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1089/org/apache/cassandra/apache-cassandra/3.0.0/
> > Staging repository:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1089/
> >
> > The artifacts as well as the debian package are also available here:
> > http://people.apache.org/~jake
> >
> > The vote will be open for 72 hours (longer if needed).
> >
> > [1]: http://goo.gl/0vghCL (CHANGES.txt)
> > [2]: http://goo.gl/xAElQy (NEWS.txt)
> >
> >
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Wiki Username: MichaelEdge

2015-11-14 Thread Jonathan Ellis
Added.

On Sat, Nov 14, 2015 at 8:21 PM, Michael Edge 
wrote:

> Hi,
>
> Please add me to wiki.
>
> Wiki Username: MichaelEdge
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.1.12

2015-12-02 Thread Jonathan Ellis
+1

On Wed, Dec 2, 2015 at 10:34 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.1.12.
>
> sha1: a6619e56b10580627fbc7863862c5aaebef57518
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.12-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1090/org/apache/cassandra/apache-cassandra/2.1.12/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1090/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/qUf41s (CHANGES.txt)
> [2]: http://goo.gl/sf3fiU (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.2.4

2015-12-02 Thread Jonathan Ellis
+1

On Wed, Dec 2, 2015 at 11:27 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 2.2.4.
>
> sha1: 16045358a43656a756574cba03f51ab3db6cc2b7
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.4-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1091/org/apache/cassandra/apache-cassandra/2.2.4/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1091/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]:  http://goo.gl/CIJhEb (CHANGES.txt)
> [2]:  http://goo.gl/vNzv9u (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 3.0.1

2015-12-04 Thread Jonathan Ellis
+1

On Fri, Dec 4, 2015 at 3:23 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 3.0.1.
>
> sha1: cf567703db2cc6859731405322f19f55345b5c57
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.1-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1092/org/apache/cassandra/apache-cassandra/3.0.1/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1092/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/fJaAjI (CHANGES.txt)
> [2]: http://goo.gl/DgSi87 (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 3.1

2015-12-04 Thread Jonathan Ellis
+1

On Fri, Dec 4, 2015 at 4:07 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 3.1.
>
> sha1: e092873728dc88aebc6ee10153b9bd3cd90cd858
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.1-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1093/org/apache/cassandra/apache-cassandra/3.1/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1093/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]:  http://goo.gl/HET4Bi (CHANGES.txt)
> [2]:  http://goo.gl/LVqJJo (NEWS.txt)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


  1   2   3   4   5   6   7   8   9   10   >