from:"Ariel Weisberg"

Re: [VOTE] CEP-21 Transactional Cluster Metadata

2023-02-06 Thread Ariel Weisberg

+1

On Mon, Feb 6, 2023, at 11:15 AM, Sam Tunnicliffe wrote:
> Hi everyone,
> 
> I would like to start a vote on this CEP.
> 
> Proposal:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
> 
> Discussion:
> https://lists.apache.org/thread/h25skwkbdztz9hj2pxtgh39rnjfzckk7
> 
> The vote will be open for 72 hours.
> A vote passes if there are at least three binding +1s and no binding vetoes.
> 
> Thanks,
> Sam

Re: Welcome Patrick McFadin as Cassandra Committer

2023-02-09 Thread Ariel Weisberg

Welcome Patrick! Thank you for your years of contributions to the community.

On Thu, Feb 2, 2023, at 12:58 PM, Benjamin Lerer wrote:
> The PMC members are pleased to announce that Patrick McFadin has accepted
> the invitation to become committer today.
> 
> Thanks a lot, Patrick, for everything you have done for this project and its 
> community through the years.
> 
> Congratulations and welcome!
> 
> The Apache Cassandra PMC members

Re: [DISCUSS] Lift MessagingService.minimum_version to 40 in trunk

2023-03-21 Thread Ariel Weisberg

Hi,

I am pretty strongly in favor just to keep the amount of code kept around for 
serialization/deserialization and caching serialized sizes for different 
versions under control.

5.0 will have changes necessitating using another version so it will be adding 
to the clutter.

Ariel

On Mon, Mar 13, 2023, at 9:05 AM, Mick Semb Wever wrote:
> If we do not recommend and do not test direct upgrades from 3.x to
> 5.x, we have the opportunity to clean up a fair chunk of code by
> making `MessagingService.minimum_version=40`
>
> As Cassandra versions 4.x and  5.0 are all on
> `MessagingService.current_version=40` this would mean lifting
> MessagingService.minimum_version would make it equal to the
> current_version.
>
> Today already we don't allow mixed-version streaming.  The only
> argument I can see for keeping minimum_version=30 is for supporting
> non-streaming messages between 3.x and 5.0 nodes, which I can't find a
> basis for.
>
> An _example_ of the code that can be cleaned up is in the patch
> attached to the ticket:
> CASSANDRA-18314 – Lift MessagingService.minimum_version to 40
>
> What do you think?

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Ariel Weisberg

Hi,

Support for multiple storage backends including remote storage backends is a 
pretty high value piece of functionality. I am happy to see there is interest 
in that.

I think that `ChannelProxyFactory` as an integration point is going to quickly 
turn into a dead end as we get into really using multiple storage backends. We 
need to be able to list files and really the full range of filesystem 
interactions that Java supports should work with any backend to make 
development, testing, and using existing code straightforward.

It's a little more work to get C* to creates paths for alternate backends where 
appropriate, but that works is probably necessary even with 
`ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
Fileystems). There will probably also be backend specific behaviors that show 
up above the `ChannelProxy` layer that will depend on the backend.

Ideally there would be some config to specify several backend filesystems and 
their individual configuration that can be used, as well as configuration and 
support for a "backend file router" for file creation (and opening) that can be 
used to route files to the backend most appropriate.

Regards,
Ariel

On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.  
> 
> There are two desires  driving this change:
>  1. The ability to temporarily move some keyspaces/tables to storage outside 
> the normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
>  2. The ability to store infrequently used data on slower cheaper storage 
> layers.
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
> 
> I look forward to productive discussions,
> Claude
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory 
> 
>

Future direction for the row cache and OHC implementation

2023-12-14 Thread Ariel Weisberg

Hi,

Now seems like a good time to discuss the future direction of the row cache and 
its only implementation OHC (https://github.com/snazy/ohc).

OHC is currently unmaintained and we don’t have the ability to release maven 
artifacts for it or commit to the original repo. I have reached out to the 
original maintainer about it and it seems like if we want to keep using it we 
will need to start releasing it under a new package from a different repo.

I see four directions we could pursue.

1. Fork OHC and start publishing under a new package name and continue to use it
2. Replace OHC with a different cache implementation like Caffeine which would 
move it on heap
3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
later release
4. Do work to make a row cache not necessary and deprecate it later (or maybe 
now)

I would like to find out what people know about row cache usage in the wild so 
we can use that to inform the future direction as well as the general thinking 
about what we should do with it going forward.

Thanks,
Ariel

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Ariel Weisberg

Hi,

To add some additional context.

The row cache is disabled by default and it is already pluggable, but there 
isn’t a Caffeine implementation present. I think one used to exist and could be 
resurrected.

I personally also think that people should be able to scratch their own itch 
row cache wise so removing it entirely just because it isn’t commonly used 
isn’t the right move unless the feature is very far out of scope for Cassandra.

Auto enabling/disabling the cache is a can of worms that could result in 
performance and reliability inconsistency as the DB enables/disables the cache 
based on heuristics when you don’t want it to. It being off by default seems 
good enough to me.

RE forking, we could create a GitHub org for OHC and then add people to it. 
There are some examples of dependencies that haven’t been contributed to the 
project that live outside like CCM and JAMM.

Ariel

On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
> I would avoid taking away a feature even if it works in narrow set of 
> use-cases. I would instead suggest -
> 
> 1. Leave it disabled by default.
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn it 
> off. Cassandra should ideally detect this and do it automatically.
> 3. Move to Caffeine instead of OHC.
> 
> I would suggest having this as the middle ground.
> 
>> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>> 
>>   
>>   
>>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
>>> later release
>> 
>> 
>> 
>> I'm for deprecating and removing it.
>> It constantly trips users up and just causes pain.
>> 
>> Yes it works in some very narrow situations, but those situations often 
>> change over time and again just bites the user.  Without the row-cache I 
>> believe users would quickly find other, more suitable and lasting, solutions.

Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Ariel Weisberg

Hi,

I did get one response from Robert indicating that he didn’t want to do the 
work to contribute it.

I offered to do the work and asked for permission to contribute it and no 
response. Followed up later with a ping and also no response.

Ariel

On Fri, Dec 15, 2023, at 9:58 PM, Josh McKenzie wrote:
>> I have reached out to the original maintainer about it and it seems like if 
>> we want to keep using it we will need to start releasing it under a new 
>> package from a different repo.
> 
>> the current maintainer is not interested in donating it to the ASF
> Is that the case Ariel or could you just not reach Robert?
> 
> On Fri, Dec 15, 2023, at 11:55 AM, Jeremiah Jordan wrote:
>>> from a maintenance and
>>> integration testing perspective I think it would be better to keep the
>>> ohc in-tree, so we will be aware of any issues immediately after the
>>> full CI run.
>> 
>> From the original email bringing OHC in tree is not an option because the 
>> current maintainer is not interested in donating it to the ASF.  Thus the 
>> option 1 of some set of people forking it to their own github org and 
>> maintaining a version outside of the ASF C* project.
>> 
>> -Jeremiah
>> 
>> On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov  wrote:
>>> Ariel,
>>> thank you for bringing this topic to the ML.
>>> 
>>> I may be missing something, so correct me if I'm wrong somewhere in
>>> the management of the Cassandra ecosystem.  As I see it, the problem
>>> right now is that if we fork the ohc and put it under its own root,
>>> the use of that row cache is still not well tested (the same as it is
>>> now). I am particularly emphasising the dependency management side, as
>>> any version change/upgrade in Cassandra and, as a result of that
>>> change a new set of libraries in the classpath should be tested
>>> against this integration.
>>> 
>>> So, unless it is being widely used by someone else outside of the
>>> community (which it doesn't seem to be), from a maintenance and
>>> integration testing perspective I think it would be better to keep the
>>> ohc in-tree, so we will be aware of any issues immediately after the
>>> full CI run.
>>> 
>>> I'm also +1 for not deprecating it, even if it is used in narrow
>>> cases, while the cost of maintaining its source code remains quite low
>>> and it brings some benefits.
>>> 
>>> On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> To add some additional context.
>>>> 
>>>> The row cache is disabled by default and it is already pluggable, but 
>>>> there isn’t a Caffeine implementation present. I think one used to exist 
>>>> and could be resurrected.
>>>> 
>>>> I personally also think that people should be able to scratch their own 
>>>> itch row cache wise so removing it entirely just because it isn’t commonly 
>>>> used isn’t the right move unless the feature is very far out of scope for 
>>>> Cassandra.
>>>> 
>>>> Auto enabling/disabling the cache is a can of worms that could result in 
>>>> performance and reliability inconsistency as the DB enables/disables the 
>>>> cache based on heuristics when you don’t want it to. It being off by 
>>>> default seems good enough to me.
>>>> 
>>>> RE forking, we could create a GitHub org for OHC and then add people to 
>>>> it. There are some examples of dependencies that haven’t been contributed 
>>>> to the project that live outside like CCM and JAMM.
>>>> 
>>>> Ariel
>>>> 
>>>> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
>>>> 
>>>> I would avoid taking away a feature even if it works in narrow set of 
>>>> use-cases. I would instead suggest -
>>>> 
>>>> 1. Leave it disabled by default.
>>>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn 
>>>> it off. Cassandra should ideally detect this and do it automatically.
>>>> 3. Move to Caffeine instead of OHC.
>>>> 
>>>> I would suggest having this as the middle ground.
>>>> 
>>>> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in 
>>>> a later release
>>>> 
>>>> 
>>>> 
>>>> 
>>>> I'm for deprecating and removing it.
>>>> It constantly trips users up and just causes pain.
>>>> 
>>>> Yes it works in some very narrow situations, but those situations often 
>>>> change over time and again just bites the user.  Without the row-cache I 
>>>> believe users would quickly find other, more suitable and lasting, 
>>>> solutions.
>>>> 
>>>> 
>

Re: Future direction for the row cache and OHC implementation

2023-12-18 Thread Ariel Weisberg

Hi,

Thanks for the generous offer. Before you do that can you give me a chance to 
add back support for Caffeine for the row cache so you can test the option of 
switching back to an on-heap row cache?

Ariel

On Thu, Dec 14, 2023, at 9:28 PM, Jon Haddad wrote:
> I think we should probably figure out how much value it actually provides by 
> getting some benchmarks around a few use cases along with some profiling.  
> tlp-stress has a --rowcache flag that I added a while back to be able to do 
> this exact test.  I was looking for a use case to profile and write up so 
> this is actually kind of perfect for me.  I can take a look in January when 
> I'm back from the holidays.
> 
> Jon
> 
> On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever  wrote:
>>
>>
>> 
>>> I would avoid taking away a feature even if it works in narrow set of 
>>> use-cases. I would instead suggest -
>>> 
>>> 1. Leave it disabled by default.
>>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn 
>>> it off. Cassandra should ideally detect this and do it automatically.
>>> 3. Move to Caffeine instead of OHC.
>>> 
>>> I would suggest having this as the middle ground.
>> 
>> 
>>  
>> Yes, I'm ok with this. (2) can also be a guardrail: soft value when to warn, 
>> hard value when to disable.

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2024-01-02 Thread Ariel Weisberg

Hi,

I am burying the lede, but it's important to keep an eye on runtime-adaptive vs 
planning time optimization as the cost/benefits vary greatly between the two 
and runtime adaptive can be a game changer. Basically CBO optimizes for query 
efficiency and startup time at the expense of not handling some queries well 
and runtime adaptive is cheap/free for expensive queries and can handle cases 
that CBO can't.

Generally speaking I am +1 on the introduction of a CBO, since it seems like 
there exists things that would benefit from it materially (and many of the 
associated refactors/cleanup) and it aligns with my north star that includes 
joins.

Do we all have the same north star that Cassandra should eventually support 
joins? Just curious if that is controversial.

I don't feel like this CEP in particular should need to really nail down 
exactly how distributed estimates work since we can start with using local 
estimates as a proxy for the entire cluster and then improve. If someone has 
bandwidth to do a separate CEP for that then sure that would be great, but this 
seems big enough in scope already.

RE testing, continuity of performance of queries is going to be really 
important. I would really like to see that we have a fuzzed the space 
deterministically and via a collection of hand rolled cases, and can compare 
performance between versions to catch queries that regress. Hopefully we can 
agree on a baseline for releasing where we know what prior release to compare 
to and what acceptable changes in performance are.

RE prepared statements - It feels to me like trying to send the plan blob back 
and forth to get more predictable, but not absolutely predictable, plans is not 
worth it? Feels like a lot for an incremental improvement over a baseline that 
doesn't exist yet, IOW it doesn't feel like something for V1. Maybe it ends up 
in YAGNI territory.

The north star of predictable behavior for queries is a *very* important one 
because it means the world to users, but CBO is going to make mistakes all over 
the place. It's simply unachievable even with accurate statistics because it's 
very hard to tell how predicates will behave on a column.

This segues nicely into the importance of adaptive execution :-) It's how you 
rescue the queries that CBO doesn't handle  well for any reason such as bugs, 
bad statistics, missing features. Re-ordering predicate evaluation, switching 
indexes, and re-ordering joins can all be done on the fly.

CBO is really a performance optimization since adaptive approaches will allow 
any query to complete with some wasted resources.

If my pager were waking me up at night and I wanted to stem the bleeding I 
would reach for runtime adaptive over CBO because I know it will catch more 
cases even if it is slower to execute up front.

What is the nature of the queries we are looking solve right now? Are they long 
running heavy hitters, or short queries that explode if run incorrectly, or a 
mix of both?

Ariel

On Tue, Dec 12, 2023, at 8:29 AM, Benjamin Lerer wrote:
> Hi everybody,
> 
> I would like to open the discussion on the introduction of a cost based 
> optimizer to allow Cassandra to pick the best execution plan based on the 
> data distribution.Therefore, improving the overall query performance.
> 
> This CEP should also lay the groundwork for the future addition of features 
> like joins, subqueries, OR/NOT and index ordering.
> 
> The proposal is here: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
> 
> Thank you in advance for your feedback.

Re: Harry in-tree (Forked from "Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?")

2024-01-02 Thread Ariel Weisberg

🥳🎉

Thanks for your work on this. Excited to have an easier way to write tests that 
leverage schema and data that also covers more.

Ariel
On Sat, Dec 23, 2023, at 9:17 AM, Alex Petrov wrote:
> Thanks everyone, Harry is now in tree! Of course, that's just a small 
> milestone, hope it'll prove as useful as I expect it to be.
> 
> https://github.com/apache/cassandra/commit/439d1b122af334bf68c159b82ef4e4879c210bd5
> 
> Happy holidays!
> --Alex
> 
> On Sat, Dec 23, 2023, at 11:10 AM, Mick Semb Wever wrote:
>>
>>   
>>> I strongly believe that bringing Harry in-tree will help to lower the 
>>> barrier for fuzz test and simplify co-development of Cassandra and Harry. 
>>> Previously, it has been rather difficult to debug edge cases because I had 
>>> to either re-compile an in-jvm dtest jar and bring it to Harry, or 
>>> re-compile a Harry jar and bring it to Cassandra, which is both tedious and 
>>> time consuming. Moreover, I believe we have missed at very least one RT 
>>> regression [2] because Harry was not in-tree, as its tests would've caught 
>>> the issue even with the model that existed.
>>> 
>>> For other recently found issues, I think having Harry in-tree would have 
>>> substantially lowered a turnaround time, and allowed me to share repros 
>>> with developers of corresponding features much quicker.
>> 
>> 
>> Agree, looking forward to getting to know and writing Harry tests.  Thank 
>> you Alex, happy holidays :) 
>> 
>

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-18 Thread Ariel Weisberg

Hi,

If there is a faster/better way to replace a node why not  have Cassandra 
support that natively without the sidecar so people who aren’t running the 
sidecar can benefit?

Copying files over a network shouldn’t be slow in C* and it would also already 
have all the connectivity issues solved.

Regards,
Ariel

On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
> Hi all,
> 
> I have filed CEP-40 [1] for live migrating Cassandra instances using the 
> Cassandra Sidecar.
> 
> When someone needs to move all or a portion of the Cassandra nodes belonging 
> to a cluster to different hosts, the traditional approach of Cassandra node 
> replacement can be time-consuming due to repairs and the bootstrapping of new 
> nodes. Depending on the volume of the storage service load, replacements 
> (repair + bootstrap) may take anywhere from a few hours to days.
> 
> Proposing a Sidecar based solution to address these challenges. This solution 
> proposes transferring data from the old host (source) to the new host 
> (destination) and then bringing up the Cassandra process at the destination, 
> to enable fast instance migration. This approach would help to minimise node 
> downtime, as it is based on a Sidecar solution for data transfer and avoids 
> repairs and bootstrap.
> 
> Looking forward to the discussions.
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
> 
> Thanks!
> Hari

Re: discuss: add to_human_size function

2024-04-18 Thread Ariel Weisberg

Hi,

I think it’s a good quality of life improvement, but I am someone who believes 
in a rich set of built-in functions being a good thing.

A format function is a bit more scope and kind of orthogonal. It would still be 
good to have shorthand functions for things like size.

Ariel

On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
> Hi,
> 
> I want to propose CASSANDRA-19546. It would be possible to convert raw 
> numbers to something human-friendly. 
> There are cases when we write just a number of bytes in our system tables but 
> these numbers are just hard to parse visually. Users can indeed use this for 
> their tables too if they find it useful.
> 
> Also, a user can indeed write a UDF for this but I would prefer if we had 
> something baked in.
> 
> Does this make sense to people? Are there any other approaches to do this? 
> 
> https://issues.apache.org/jira/browse/CASSANDRA-19546
> https://github.com/apache/cassandra/pull/3239/files
> 
> Regards

Re: [DISCUSS] ccm as a subproject

2024-05-24 Thread Ariel Weisberg

Hi,

Strong +1 as well. It's a pretty critical dependency in the path of testing and 
committing changes. Supporting integration points for alternative distributions 
of Cassandra is something I think we should generally be supportive as it's 
generally an opportunity to make things more modular and testable anyways.

Ariel

On Wed, May 15, 2024, at 10:23 AM, Josh McKenzie wrote:
> Right now ccm isn't formally a subproject of Cassandra or under governance of 
> the ASF. Given it's an integral components of our CI as well as for local 
> testing for many devs, and we now have more experience w/our muscle on IP 
> clearance and ingesting / absorbing subprojects where we can't track down 
> every single contributor to get an ICLA, seems like it might be worth 
> revisiting the topic of donation of ccm to Apache.
> 
> For what it's worth, Sylvain originally and then DataStax after transfer have 
> both been incredible and receptive stewards of the projects and repos, so 
> this isn't about any response to any behavior on their part. Structurally, 
> however, it'd be better for the health of the project(s) long-term to have 
> ccm promoted in. As far as I know there was strong receptivity to that 
> donation in the past but the IP clearance was the primary hurdle.
> 
> Anyone have any thoughts for or against?
> 
> https://github.com/riptano/ccm
>

Re: CCM and CASSANDRA_USE_JDK11

2024-05-24 Thread Ariel Weisberg

Hi,

There is definitely a mismatch between how the full range of dtests work and 
the direction CCM is going in and we have some difficulty getting those to 
match. I fully empathize with several of those CI systems not being publicly 
visible/accessible, and the behavior of upgrade paths being absolutely 
inscrutable relative to the environment variables that are set. 

I am happy to volunteer to test things in advance on Apple's CI. I'll also try 
to get on top of responding faster :-)

The window where reverting is useful is slightly past now that all the issues I 
am aware of have been fixed, but in the future I think the burden for revert 
might need to be lower. It's tough those because putting the burden on ASF for 
non-ASF CI is not necessarily a given.

There is a big gap between CI systems where how they invoke the dtests 
determines the exact set of tests they run and how they invoke CCM (and which 
CCM bugs they expose). I really don't like this approach including relying on 
environment variables to dictate dtests execution behavior. I hope to have some 
time to spend on this once my live migration work is in a better place.

Right now ASF CI is not running the upgrade paths that trigger JDK version 
switching which is at the root of our recent problems. Once we close that gap 
we should be in a much better place in terms of divergence.

The scripts that are in cassandra-builds seem like a starting point for 
converging different CI systems so that they run the same set of tests in as 
similar environments as possible and harness specific quirks are pushed into 
specific integration points where things like pointing to private mirrors is 
supported.

Additionally what I would like to see is that CI harnesses specify the location 
of all JDKs, and then provide flags (not environment variables) to the dtests 
that dictate what should be run. What is currently in Java path or Java home 
shouldn't be relevant for any dtests IMO, I would like the dtests (themselves 
or delegating to CCM) to juggle that themselves.

Those flags should also be as declarative as possible and require specifying C* 
versions and JDK versions so if you want to run the set of tests we required to 
commit you don't need keep changing how the dtests are invoked. 

Ariel

On Thu, May 23, 2024, at 6:22 AM, Mick Semb Wever wrote:
>> When starting Cassandra nodes, CCM uses the current env Java distribution 
>> (defined by the JAVA_HOME env variable). This behavior is overridden in 
>> three cases:
>> 
>> - Java version is not supported by the selected Cassandra distribution - in 
>> which case, CCM looks for supported Java distribution across JAVAx_HOME env 
>> variables
>> 
>> - Java version is specified explicitly (--jvm-version arg or jvm_version 
>> param if used in Python)
>> 
>> - CASSANDRA_USE_JDK11 is defined in env, in which case, for Cassandra 4.x 
>> CCM forces to use only JDK11
>> 
>> 
>> 
>> I want to ask you guys whether you are okay with removing the third 
>> exception. If we remove it, Cassandra 4.x will not be treated in any special 
>> way—CCM will use the current Java version, so if it is Java 11, it will use 
>> Java 11 (and automatically set CASSANDRA_USE_JDK11), and if it is Java 8, it 
>> will use Java 8 (and automatically unset CASSANDRA_USE_JDK11). 
>> 
>> 
>> 
>> I think there is no need for CCM to use CASSANDRA_USE_JDK11 to make a 
>> decision about which Java version to use as it adds more complexity, makes 
>> it work differently for Cassandra 4.x than for other Cassandra versions, and 
>> actually provides no value at all because if we work with Cassandra having 
>> our env configured for Java 11, we have to have CASSANDRA_USE_JDK11 and if 
>> not, we cannot have it. Therefore, CCM can be based solely on the current 
>> Java version and not include the existence of CASSANDRA_USE_JDK11 in the 
>> Java version selection process.
>> 
>> 
>> WDYT? 
> 
>  
> With the recent commits to ccm we have now broken three different CI systems, 
> in numerous different ways.  All remain broken.
> 
> At this point in time, the default behaviour should be to revert those 
> commits.  Not to discuss whether we can further remove existing functionality 
> on the assumption we know all consumers, or that they are all reading this 
> thread and agreeing.
> 
> In ccm, the jdk selection and switching does indeed deserve a clean up.  We 
> have found a number of superfluous ways of achieving the same thing that is 
> leading to unnecessary code complexity.  But we should not be hard breaking 
> things for downstream users and our CI.
> 
> The initial commit to ccm that broke things was to fix ccm running a binary 
> 5.0-beta1 with a particular jdk.  This patch and subsequent fixes has 
> included additional refactoring/cleaning changes that have broken a number of 
> things, like jdk-switching and upgrade_through_versions tests.  We keep 
> trying to fix each breakage, but are also including additional adjustments 
> "t

Re: CCM and CASSANDRA_USE_JDK11

2024-05-29 Thread Ariel Weisberg

Hi,

+1 on CCM not using CASSANDRA_USE_JDK11 to pick a JDK version. I don’t think 
it’s a good interface for expressing what JDK CCM should use. I checked and I 
don’t see it being used in the dtests, so it shouldn’t break anything? I am not 
sure what it was added for.

Cleaning up CCM so it only selects a new JDK for each node when something 
changes that actually impacts what JDK should be used (change in C* version, 
explicit config change) also makes sense.

Ariel

On Sat, May 25, 2024, at 12:09 AM, Jacek Lewandowski wrote:
> Thank you for all the opinions. That is useful for future work on CCM. 
> 
> 
> 
> When I implemented the changes that caused recent headaches, I wasn't aware 
> that the CCM code was so patchworked, which resulted in many surprises. I 
> apologize for that. Anyway, there is no reason to sit and complain instead of 
> fixing what I already found. 
> 
> 
> 
> There are some unit tests attached to CCM, and I wonder whether they are ever 
> run before merging a patch because I'm unaware of any CI. Along with my 
> recent patches, I've implemented quite comprehensive tests verifying whether 
> the env updating function works as "expected", but it turned out the problems 
> were outside that function. I'm making fun of the word "expected" because 
> precisely defined contracts for CCM do not seem to exist. I'd like us to 
> invest in unit tests and make them the contract guards. It is tough to verify 
> whether a patch would cause problems in some environments; we usually have to 
> run all workloads to confirm that, and this would not work in the future if 
> we expect the community to get more involved. This recent incident was an 
> excellent example of such a problem - I'm really thankful to Mick and Ariel 
> for running all the Apache and Apple workloads to verify the patch, which 
> consumed much of their time, and I cannot expect such involvement in the 
> future. 
> 
> 
> 
> To the point, let's pay attention to the unit tests, make sure they are 
> mocking Cassandra instead of running real programs so that they are fast, and 
> add some CI—if they are fast unit tests, I bet we can rely on some free tier 
> of GitHub actions. Adding the expectations we assumed when implementing the 
> CI systems for Cassandra will make it less likely to break something 
> accidentally.
> 
> 
> 
> In the end, I feel we strayed off the topic a bit - my question was quite 
> concrete - I'd like to remove the CASSANDRA_USE_JDK11 knob for CCM - it 
> should set it appropriately for Cassandra 4 so that the CCM user does not 
> have to bother. However, CCM should not use it to decide which Java version 
> to use. I'm unaware of any release cycle of CCM versions anywhere. Perhaps we 
> should do the following - tag a new version before the change and then add 
> the proposed change.
> 
> 
> 
> There are also other problems related to env setup - for example, when a user 
> or the dtest framework wants to force a certain Java version, it is honored 
> only for running a Cassandra node - it is not applied for running nodetool or 
> any other command line tool. Therefore, there is a broader question about 
> when the explicit Java version should be set - it feels like the correct 
> approach would be to set it up when a node is created rather than when it is 
> started so that the selection applies to running the server and all the 
> commands. This would simplify things significantly - instead of resolving env 
> and checking Java distributions each time we are about to run a node or a 
> tool - resolve the required env changes once we create a node or when we 
> update the installation directory, which we do when testing upgrades. Such 
> simplification would also remove some clutter from the logs. Can you remember 
> the whole environment logged frequently and twice when running a node?
> 
> 
> 
> Can we make this discussion conclusive?
> 
> 
> 
> Thanks!
> 
> 
> - - -- --- -  -
> Jacek Lewandowski
> 
> 
> pt., 24 maj 2024 o 20:55 Josh McKenzie  napisał(a):
>> __
>>> The scripts that are in cassandra-builds seem like a starting point for 
>>> converging different CI systems so that they run the same set of tests in 
>>> as similar environments as possible
>> Yeah, I took a superset of circle and ASF tests to try and run 
>> :allthethings:. Part of how the checkstyle dependency check got in the way 
>> too, since we weren't running that on ASF CI. :)
>> 
>> Strong +1 to more convergence on what we're running in CI for sure.
>> 
>> On Fri, May 24, 2024, at 11:59 AM, Ariel Weisberg wrote:
>>> Hi,
>>

Re: [DISCUSS] Stream Pipelines on hot paths

2024-05-30 Thread Ariel Weisberg

+1. To not using streams in hot paths.

Regarding string concatenation in logging, for debug and trace it makes sense 
to avoid concatenation. For info and error I don't think it matters and it can 
be more concise to concatenate. It's not a big deal to standardize on one just 
because the extra verbosity is not that bad.

Ariel

On Thu, May 30, 2024, at 12:29 PM, Benedict wrote:
> 
> Since it’s related to the logging discussion we’re already having, I have 
> seen stream pipelines showing up in a lot of traces recently. I am surprised; 
> I thought it was understood that they shouldn’t be used on hot paths as they 
> are not typically as efficient as old skool for-each constructions done 
> sensibly, especially for small collections that may normally take zero or one 
> items.
> 
> I would like to propose forbidding the use of streams on hot paths without 
> good justification that the cost:benefit is justified. 
> 
> It looks like it was nominally agreed two years ago that we would include 
> words to this effect in the code style guide, but I forgot to include them 
> when I transferred the new contents from the Google Doc proposal. So we could 
> just include the “Performance” section that was meant to be included at the 
> time.
> 
> lists.apache.org 
> 
> favicon.ico 
> 
> 
> 
>> On 30 May 2024, at 13:33, Štefan Miklošovič  
>> wrote:
>> 
>> I see the feedback is overall positive. I will merge that and I will improve 
>> the documentation on the website along with what Benedict suggested.
>> 
>> On Thu, May 30, 2024 at 10:32 AM Mick Semb Wever  wrote:
>>>   
>>>  
 Based on these findings, I went through the code and I have incorporated 
 these rules and I rewrote it like this:
 
 1) no wrapping in "if" if we are not logging more than 2 parameters.
 2) rewritten log messages to not contain any string concatenation but 
 moving it all to placeholders ({}).
 3) wrap it in "if" if we need to execute a method(s) on parameter(s) which 
 is resource-consuming.
>>> 
>>> 
>>> +1
>>> 
>>> 
>>> It's a shame slf4j botched it with lambdas, their 2.0 fluent api doesn't 
>>> impress me.

Re: [DISCUSS] Increments on non-existent rows in Accord

2024-06-24 Thread Ariel Weisberg

Hi,

I think the current behavior maps to SQL more than CQL. In SQL an update 
doesn't generate an error if the row to be updating doesn't exist it just 
return 0 rows updated. 

If someone wanted an upsert or increment behavior in their transaction could 
they accomplish it with the current transaction CQL at all?

We could support a more optimal syntax later, but I suspect that with our one 
shot behavior it would get mixed up by multiple attempts to insert if not 
exists and then update the same row to achieve upsert.

Ariel
On Thu, Jun 20, 2024, at 4:54 PM, Caleb Rackliffe wrote:
> We had a bug report a while back from Luis E Fernandez and team in 
> CASSANDRA-18988  
> around the behavior of increments/decrements on numeric fields for 
> non-existent rows. Consider the following, wich can be run on the 
> cep-15-accord branch:
> 
> CREATE KEYSPACE accord WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true
> 
> CREATE TABLE accord.accounts (
> partition text,
> account_id int,
> balance int,
> PRIMARY KEY (partition, account_id)
> ) WITH CLUSTERING ORDER BY (account_id ASC) AND transactional_mode='full'
> 
> BEGIN TRANSACTION
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 0, 100);
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 1, 100);
> COMMIT TRANSACTION
> 
> BEGIN TRANSACTION
> UPDATE accord.accounts SET balance -= 10 WHERE partition = 'default' AND 
> account_id = 1;
> UPDATE accord.accounts SET balance += 10 WHERE partition = 'default' AND 
> account_id = 3;
> COMMIT TRANSACTION
> 
> Reading the 'default' partition will produce the following result.
> 
>  partition | account_id | balance
> ---++-
>default |  0 | 100
>default |  1 |  90
> 
> As you will notice, we have not implicitly inserted a row for account_id 3, 
> which does not exist when we request that its balance be incremented by 10. 
> This is by design, as null + 10 == null.
> 
> Before I close CASSANDRA-18988 
> , *I'd like to confirm 
> with everyone reading this that the behavior above is reasonable*. The only 
> other option I've seen proposed that would make sense is perhaps producing a 
> result like:
> 
>  partition | account_id | balance
> ---++-
>default |  0 | 100
>default |  1 |  90
>default |  3 |null
> 
> Note however that this is exactly what we would produce if we had first 
> inserted a row w/ no value for balance:
> 
> INSERT INTO accord.accounts (partition, account_id) VALUES ('default', 3);

Re: [DISCUSS] Increments on non-existent rows in Accord

2024-06-24 Thread Ariel Weisberg

Hi,

SGTM. It's not just what we return though it's also supporting UPSERT for RMR 
updates? Because our transactions are one shot I don't think you could do that 
because the statement that does INSERT IF NOT EXIST would not generate a row 
that is visible to a later UPDATE statement in the same transaction that 
increments the row.

We might also have a restriction somewhere that limits us to one update per 
clustering.

Ariel
On Mon, Jun 24, 2024, at 1:30 PM, Caleb Rackliffe wrote:
> It sounds like the best course of action for now would be to keep the current 
> behavior.
> 
> However, we might want to fold this into CASSANDRA-18107 as a specific 
> concern around what we return when an explicit SELECT isn't present in the 
> transaction.
> 
> i.e. For any update, we'll have something like (courtesy of David) UPDATED, 
> SKIPPED (condition was met but couldn't update a non-existent row), or 
> CONDITION_NOT_MET
> 
> 
> On Mon, Jun 24, 2024 at 11:42 AM Ariel Weisberg  wrote:
>> __
>> Hi,
>> 
>> I think the current behavior maps to SQL more than CQL. In SQL an update 
>> doesn't generate an error if the row to be updating doesn't exist it just 
>> return 0 rows updated.
>> 
>> If someone wanted an upsert or increment behavior in their transaction could 
>> they accomplish it with the current transaction CQL at all?
>> 
>> We could support a more optimal syntax later, but I suspect that with our 
>> one shot behavior it would get mixed up by multiple attempts to insert if 
>> not exists and then update the same row to achieve upsert.
>> 
>> Ariel
>> On Thu, Jun 20, 2024, at 4:54 PM, Caleb Rackliffe wrote:
>>> We had a bug report a while back from Luis E Fernandez and team in 
>>> CASSANDRA-18988 <https://issues.apache.org/jira/browse/CASSANDRA-18988> 
>>> around the behavior of increments/decrements on numeric fields for 
>>> non-existent rows. Consider the following, wich can be run on the 
>>> cep-15-accord branch:
>>> 
>>> CREATE KEYSPACE accord WITH replication = {'class': 'SimpleStrategy', 
>>> 'replication_factor': '1'} AND durable_writes = true
>>> 
>>> CREATE TABLE accord.accounts (
>>> partition text,
>>> account_id int,
>>> balance int,
>>> PRIMARY KEY (partition, account_id)
>>> ) WITH CLUSTERING ORDER BY (account_id ASC) AND transactional_mode='full'
>>> 
>>> BEGIN TRANSACTION
>>> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
>>> ('default', 0, 100);
>>> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
>>> ('default', 1, 100);
>>> COMMIT TRANSACTION
>>> 
>>> BEGIN TRANSACTION
>>> UPDATE accord.accounts SET balance -= 10 WHERE partition = 'default' 
>>> AND account_id = 1;
>>> UPDATE accord.accounts SET balance += 10 WHERE partition = 'default' 
>>> AND account_id = 3;
>>> COMMIT TRANSACTION
>>> 
>>> Reading the 'default' partition will produce the following result.
>>> 
>>>  partition | account_id | balance
>>> ---++-
>>>default |  0 | 100
>>>default |  1 |  90
>>> 
>>> As you will notice, we have not implicitly inserted a row for account_id 3, 
>>> which does not exist when we request that its balance be incremented by 10. 
>>> This is by design, as null + 10 == null.
>>> 
>>> Before I close CASSANDRA-18988 
>>> <https://issues.apache.org/jira/browse/CASSANDRA-18988>, *I'd like to 
>>> confirm with everyone reading this that the behavior above is reasonable*. 
>>> The only other option I've seen proposed that would make sense is perhaps 
>>> producing a result like:
>>> 
>>>  partition | account_id | balance
>>> ---++-
>>>default |  0 | 100
>>>default |  1 |  90
>>>default |  3 |null
>>> 
>>> Note however that this is exactly what we would produce if we had first 
>>> inserted a row w/ no value for balance:
>>> 
>>> INSERT INTO accord.accounts (partition, account_id) VALUES ('default', 3);
>>

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Ariel Weisberg

Hi,

I see a vote for this has been called. I should have provided more prompt 
feedback sooner.

I am a strong +1 on adding column level constraints being a good thing to add. 
I'm not too concerned about row/partition/table level constraints, but I would 
like to change the syntax before I would be +1 on this CEP.

It would be good to align the syntax as closely as possible to our existing 
syntax, and if not that then MySQL/Postgres. For example it looks like we don't 
have a string length function so maybe add `LENGTH` (consistent with 
MySQL/Postgres) to also use with column level constraints.

It looks like there are generally two forms of constraint syntax, one is 
expressed as part of the column definition, and the other is a named or 
anonymous constraint on the table. https://www.w3schools.com/sql/sql_check.asp

Can we align with having these column level ones as `CHECK` constraints like in 
SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if creating a named 
or multi-column constraint?

Will column level check constraints support `AND` so that you can specify 
multiple constraints on the column? I am not sure if that is supported in other 
databases, but it would be good to align on that as well.

RE some implementation things to keep in mind:

If TCM is in use and the constraints are defined in the schema data structure 
this should work fine with Accord because all coordinators (regular, recovery) 
will deterministically agree on the constraints being enforced BUT... this also 
has to map to how/when constraints are enforced.

Both Accord and Paxos work best when the constraints are enforced when the 
final mutation to be applied is created and not later when it is being applied 
to the CFS. This also reduces duplication of enforcement checking work to just 
the coordinator for the write.

Ariel

On Fri, May 31, 2024, at 5:23 PM, Bernardo Botella wrote:
> Hello everyone,
> 
> I am proposing this CEP:
> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation 
> 
> cwiki.apache.org 
> 
> favicon.ico 
> 
> 
> And I’m looking for feedback from the community.
> 
> Thanks a lot!
> Bernardo

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Ariel Weisberg

Hi,

I am also +1 on Doug's distinction between things that can be managed by 
operators and things that can be managed by applications.

Some things to note about the syntax is that there are parens around the 
condition in SQL. In your example there are multiple anonymous constraints on 
the same column, how are anonymous constraints handled? Does the database 
automatically generate a named constraint for them so they can be referenced 
later? Do we allow multiple constraints on the same column and AND them 
together?

Ariel



On Mon, Jun 24, 2024, at 6:43 PM, Bernardo Botella wrote:
> Hi Ariel and Jon,
> 
> Let me address your question first. Yes, AND is supported in the proposal. 
> Below you can find some examples of different constraints applied to the same 
> column.
> 
> As per the LENGTH name instead of sizeOf as in the proposal, I am also not 
> opposed to it if it is more consistent with terminology in the databases 
> universe.
> 
> So, to recap, there seems to be general agreement on the usefulness of the 
> Constraints Framework.
> Now, from the feedback that has arrived after the voting has been called, I 
> see there are three different proposals for syntax:
> 
> 1.-
> The syntax currently described in the CEP. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT subnet_mask > 0,
>   CONSTRAINT subnet_mask < 32
> )
> 
> 2.-
> As Jon suggested, leaving this definitions to more specific Guardrails at 
> table level. Example, something like:
> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
> 
> 3.-
> As Ariel suggested, having the CHECK keyword added to align consistency with 
> SQL. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT CHECK subnet_mask > 0,
>   CONSTRAINT CHECK subnet_mask < 32
> )
> 
> For the guardrails vs cql syntax, I think that keeping the conceptual 
> separation that has been explored in this thread, and perfectly recapped by 
> Doug, is closer to what we are trying to achieve with this framework. In my 
> opinion, having them in the CQL schema definition provides those application 
> level constraints that Doug mentions in an more accesible way than having to 
> configure such specific guardrais.
> 
> For the addition of the CHECK keyword, I'm definitely not opposed to it if it 
> helps Cassandra users coming from other databases understand concepts that 
> were already familiar to them.
> 
> I hope this helps move the conversation forward,
> Bernardo
> 
> 
> 
>> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
>> 
>> Hi,
>> 
>> I see a vote for this has been called. I should have provided more prompt 
>> feedback sooner.
>> 
>> I am a strong +1 on adding column level constraints being a good thing to 
>> add. I'm not too concerned about row/partition/table level constraints, but 
>> I would like to change the syntax before I would be +1 on this CEP.
>> 
>> It would be good to align the syntax as closely as possible to our existing 
>> syntax, and if not that then MySQL/Postgres. For example it looks like we 
>> don't have a string length function so maybe add `LENGTH` (consistent with 
>> MySQL/Postgres) to also use with column level constraints.
>> 
>> It looks like there are generally two forms of constraint syntax, one is 
>> expressed as part of the column definition, and the other is a named or 
>> anonymous constraint on the table. 
>> https://www.w3schools.com/sql/sql_check.asp
>> 
>> Can we align with having these column level ones as `CHECK` constraints like 
>> in SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if creating a 
>> named or multi-column constraint?
>> 
>> Will column level check constraints support `AND` so that you can specify 
>> multiple constraints on the column? I am not sure if that is supported in 
>> other databases, but it would be good to align on that as well.
>> 
>> RE some implementation things to keep in mind:
>> 
>> If TCM is in use and the constraints are defined in the schema data 
>> structure this should work fine with Accord because all coordinators 
>> (regular, recovery) will deterministically agree on the constraints being 
>> enforced BUT... this also has to map to how/when constraints are enforced.
>> 
>> Both Accord and Paxos work best when the constraints are enforced when the 
>> final mutation to be applied is created and not later when it is being 
>> applied

Re: [VOTE] CEP-42: Constraints Framework

2024-07-01 Thread Ariel Weisberg

Hi,

I am +1 on CEP-42 with the latest updates to the CEP to clarify syntax, error 
messages, constraint naming and generated naming, alter/drop, describe etc.

I think this now tracks very closely to how other SQL databases define 
constraints and the syntax is easily extensible to multi-column and multi-table 
constraints.

Ariel

On Mon, Jul 1, 2024, at 9:48 AM, Bernardo Botella wrote:
> With all the feedback that came in the discussion thread after the call for 
> votes, I’d like to extend the period another 72 hours starting today.
> 
> As before, a vote passes if there are at least 3 binding +1s and no binding 
> vetoes.
> 
> Thanks,
> Bernardo Botella
> 
>> On Jun 24, 2024, at 7:17 AM, Bernardo Botella  
>> wrote:
>> 
>> Hi everyone,
>> 
>> I would like to start the voting for CEP-42.
>> 
>> Proposal: 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
>> Discussion: https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
>> 
>> The vote will be open for 72 hours. A vote passes if there are at least 3 
>> binding +1s and no binding vetoes.
>> 
>> Thanks,
>> Bernardo Botella

Re: Evolving the client protocol

2018-04-19 Thread Ariel Weisberg

Hi,

So at technical level I don't understand this yet.

So you have a database consisting of single threaded shards and a socket for 
accept that is generating TCP connections and in advance you don't know which 
connection is going to send messages to which shard.

What is the mechanism by which you get the packets for a given TCP connection 
delivered to a specific core? I know that a given TCP connection will normally 
have all of its packets delivered to the same queue from the NIC because the 
tuple of source address + port and destination address + port is typically 
hashed to pick one of the queues the NIC presents. I might have the contents of 
the tuple slightly wrong, but it always includes a component you don't get to 
control.

Since it's hashing how do you manipulate which queue packets for a TCP 
connection go to and how is it made worse by having an accept socket per shard? 

You also mention 160 ports as bad, but it doesn't sound like a big number 
resource wise. Is it an operational headache?

RE tokens distributed amongst shards. The way that would work right now is that 
each port number appears to be a discrete instance of the server. So you could 
have shards be actual shards that are simply colocated on the same box, run in 
the same process, and share resources. I know this pushes more of the 
complexity into the server vs the driver as the server expects all shards to 
share some client visible like system tables and certain identifiers.

Ariel
On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:
> Port-per-shard is likely the easiest option but it's too ugly to 
> contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t 
> IIRC), it will be just horrible to have 160 open ports.
> 
> 
> It also doesn't fit will with the NICs ability to automatically 
> distribute packets among cores using multiple queues, so the kernel 
> would have to shuffle those packets around. Much better to have those 
> packets delivered directly to the core that will service them.
> 
> 
> (also, some protocol changes are needed so the driver knows how tokens 
> are distributed among shards)
> 
> On 2018-04-19 19:46, Ben Bromhead wrote:
> > WRT to #3
> > To fit in the existing protocol, could you have each shard listen on a
> > different port? Drivers are likely going to support this due to
> > https://issues.apache.org/jira/browse/CASSANDRA-7544 (
> > https://issues.apache.org/jira/browse/CASSANDRA-11596).  I'm not super
> > familiar with the ticket so their might be something I'm missing but it
> > sounds like a potential approach.
> >
> > This would give you a path forward at least for the short term.
> >
> >
> > On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg  wrote:
> >
> >> Hi,
> >>
> >> I think that updating the protocol spec to Cassandra puts the onus on the
> >> party changing the protocol specification to have an implementation of the
> >> spec in Cassandra as well as the Java and Python driver (those are both
> >> used in the Cassandra repo). Until it's implemented in Cassandra we haven't
> >> fully evaluated the specification change. There is no substitute for trying
> >> to make it work.
> >>
> >> There are also realities to consider as to what the maintainers of the
> >> drivers are willing to commit.
> >>
> >> RE #1,
> >>
> >> I am +1 on the fact that we shouldn't require an extra hop for range scans.
> >>
> >> In JIRA Jeremiah made the point that you can still do this from the client
> >> by breaking up the token ranges, but it's a leaky abstraction to have a
> >> paging interface that isn't a vanilla ResultSet interface. Serial vs.
> >> parallel is kind of orthogonal as the driver can do either.
> >>
> >> I agree it looks like the current specification doesn't make what should
> >> be simple as simple as it could be for driver implementers.
> >>
> >> RE #2,
> >>
> >> +1 on this change assuming an implementation in Cassandra and the Java and
> >> Python drivers.
> >>
> >> RE #3,
> >>
> >> It's hard to be +1 on this because we don't benefit by boxing ourselves in
> >> by defining a spec we haven't implemented, tested, and decided we are
> >> satisfied with. Having it in ScyllaDB de-risks it to a certain extent, but
> >> what if Cassandra decides to go a different direction in some way?
> >>
> >> I don't think there is much discussion to be had without an example of the
> >> the changes to the CQL specification to look at, but eve

Re: Evolving the client protocol

2018-04-22 Thread Ariel Weisberg

Hi,

> This doesn't work without additional changes, for RF>1. The token ring could 
> place two replicas of the same token range on the same physical server, even 
> though those are two separate cores of the same server. You could add another 
> element to the hierarchy (cluster -> datacenter -> rack -> node -> 
> core/shard), but that generates unneeded range movements when a node is added.

I have seen rack awareness used/abused to solve this.

Regards,
Ariel

> On Apr 22, 2018, at 8:26 AM, Avi Kivity  wrote:
> 
> 
> 
>> On 2018-04-19 21:15, Ben Bromhead wrote:
>> Re #3:
>> 
>> Yup I was thinking each shard/port would appear as a discrete server to the
>> client.
> 
> This doesn't work without additional changes, for RF>1. The token ring could 
> place two replicas of the same token range on the same physical server, even 
> though those are two separate cores of the same server. You could add another 
> element to the hierarchy (cluster -> datacenter -> rack -> node -> 
> core/shard), but that generates unneeded range movements when a node is added.
> 
>> If the per port suggestion is unacceptable due to hardware requirements,
>> remembering that Cassandra is built with the concept scaling *commodity*
>> hardware horizontally, you'll have to spend your time and energy convincing
>> the community to support a protocol feature it has no (current) use for or
>> find another interim solution.
> 
> Those servers are commodity servers (not x86, but still commodity). In any 
> case 60+ logical cores are common now (hello AWS i3.16xlarge or even 
> i3.metal), and we can only expect logical core count to continue to increase 
> (there are 48-core ARM processors now).
> 
>> 
>> Another way, would be to build support and consensus around a clear
>> technical need in the Apache Cassandra project as it stands today.
>> 
>> One way to build community support might be to contribute an Apache
>> licensed thread per core implementation in Java that matches the protocol
>> change and shard concept you are looking for ;P
> 
> I doubt I'll survive the egregious top-posting that is going on in this list.
> 
>> 
>> 
>>> On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> So at technical level I don't understand this yet.
>>> 
>>> So you have a database consisting of single threaded shards and a socket
>>> for accept that is generating TCP connections and in advance you don't know
>>> which connection is going to send messages to which shard.
>>> 
>>> What is the mechanism by which you get the packets for a given TCP
>>> connection delivered to a specific core? I know that a given TCP connection
>>> will normally have all of its packets delivered to the same queue from the
>>> NIC because the tuple of source address + port and destination address +
>>> port is typically hashed to pick one of the queues the NIC presents. I
>>> might have the contents of the tuple slightly wrong, but it always includes
>>> a component you don't get to control.
>>> 
>>> Since it's hashing how do you manipulate which queue packets for a TCP
>>> connection go to and how is it made worse by having an accept socket per
>>> shard?
>>> 
>>> You also mention 160 ports as bad, but it doesn't sound like a big number
>>> resource wise. Is it an operational headache?
>>> 
>>> RE tokens distributed amongst shards. The way that would work right now is
>>> that each port number appears to be a discrete instance of the server. So
>>> you could have shards be actual shards that are simply colocated on the
>>> same box, run in the same process, and share resources. I know this pushes
>>> more of the complexity into the server vs the driver as the server expects
>>> all shards to share some client visible like system tables and certain
>>> identifiers.
>>> 
>>> Ariel
>>>> On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:
>>>> Port-per-shard is likely the easiest option but it's too ugly to
>>>> contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t
>>>> IIRC), it will be just horrible to have 160 open ports.
>>>> 
>>>> 
>>>> It also doesn't fit will with the NICs ability to automatically
>>>> distribute packets among cores using multiple queues, so the kernel
>>>> would have to shuffle those packets around. Much better to have those
>>&

Re: Improve the performance of CAS

2018-05-16 Thread Ariel Weisberg

Hi,

I think you are looking at the right low hanging fruit.  Cassandra deserves a 
better consensus protocol, but it's a very big project.

Regards,
Ariel
On Wed, May 16, 2018, at 5:51 PM, Dikang Gu wrote:
> Cool, create a jira for it,
> https://issues.apache.org/jira/browse/CASSANDRA-14448. I have a draft patch
> working internally, will clean it up.
> 
> The EPaxos is more complicated, could be a long term effort.
> 
> Thanks
> Dikang.
> 
> On Wed, May 16, 2018 at 2:20 PM, sankalp kohli 
> wrote:
> 
> > Hi,
> > The idea of combining read with prepare sounds good. Regarding reducing
> > the commit round trip, it is possible today by giving a lower consistency
> > level for commit I think.
> >
> > Regarding EPaxos, it is a large change and will take longer to land. I
> > think we should do this as it will help lower the latencies a lot.
> >
> > Thanks,
> > Sankalp
> >
> > On Wed, May 16, 2018 at 2:15 PM, Jeremy Hanna 
> > wrote:
> >
> > > Hi Dikang,
> > >
> > > Have you seen Blake’s work on implementing egalitarian paxos or epaxos*?
> > > That might be helpful for the discussion.
> > >
> > > Jeremy
> > >
> > > * https://issues.apache.org/jira/browse/CASSANDRA-6246
> > >
> > > > On May 16, 2018, at 3:37 PM, Dikang Gu  wrote:
> > > >
> > > > Hello C* developers,
> > > >
> > > > I'm working on some performance improvements of the lightweight
> > > transitions
> > > > (compare and set), I'd like to hear your thoughts about it.
> > > >
> > > > As you know, current CAS requires 4 round trips to finish, which is not
> > > > efficient, especially in cross DC case.
> > > > 1) Prepare
> > > > 2) Quorum read current value
> > > > 3) Propose new value
> > > > 4) Commit
> > > >
> > > > I'm proposing the following improvements to reduce it to 2 round trips,
> > > > which is:
> > > > 1) Combine prepare and quorum read together, use only one round trip to
> > > > decide the ballot and also piggyback the current value in response.
> > > > 2) Propose new value, and then send out the commit request
> > > asynchronously,
> > > > so client will not wait for the ack of the commit. In case of commit
> > > > failures, we should still have chance to retry/repair it through hints
> > or
> > > > following read/cas events.
> > > >
> > > > After the improvement, we should be able to finish the CAS operation
> > > using
> > > > 2 rounds trips. There can be following improvements as well, and this
> > can
> > > > be a start point.
> > > >
> > > > What do you think? Did I miss anything?
> > > >
> > > > Thanks
> > > > Dikang
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
> 
> 
> 
> -- 
> Dikang

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: GitHub PR ticket spam

2018-07-30 Thread Ariel Weisberg

Hi,

I really like having it mirrored. I would not be in favor of eliminating 
automated mirroring. What we are seeing is that removing the pain of commenting 
in JIRA is encouraging people to converse more in finer detail. That's a good 
thing.

I have also seen the pain of how various github workflows hide PRs. Rebasing, 
squashing, multiple branches, all of these can obfuscate the history of a 
review. So mirroring stuff to JIRA still makes sense to me as it's easier to 
untangle what happened in chronological order.

I think reducing verbosity to not include diffs is good. Especially if it 
contains a link to the comment. I do like being able to see the diff in JIRA 
(context switching bad) I just don't like to see it mixed in with regular 
comments. Ponies would be for this to be mirrored to a tab separate from 
comments in JIRA.

Regards,
Ariel

On Mon, Jul 30, 2018, at 1:25 PM, dinesh.jo...@yahoo.com.INVALID wrote:
> It is useful to have a historical record. However, it could definitely 
> be better (huge diffs are pointless).
> Thanks,
> Dinesh 
> 
> On Monday, July 30, 2018, 1:27:26 AM PDT, Stefan Podkowinski 
>  wrote:  
>  
>  Looks like we had some active PRs recently to discuss code changes in
> detail on GitHub, which I think is something we agreed is perfectly
> fine, in addition to the usual Jira ticket.
> 
> What bugs me a bit is that for some reasons any comments on the PR would
> be posted to the Jira ticket as well. I'm not sure what would be the
> exact reason for this, I guess it's because the PR is linked in the
> ticket? I find this a bit annoying while subscribed to commits@,
> especially since we created pr@ for these kind of messages. Also I don't
> really see any value in mirroring all github comments to the ticket.
> #14556 is a good example how you could end up with tons of unformatted
> code in the ticket that will also mess up search in jira. Does anyone
> think this is really useful, or can we stop linking the PR in the future
> (at least for highly active PRs)?
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 
>   

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: GitHub PR ticket spam

2018-08-06 Thread Ariel Weisberg

Hi,

Great idea. +1 to moving it to the work log.

Thanks,
Ariel

On Mon, Aug 6, 2018, at 12:40 PM, Aleksey Yeshchenko wrote:
> Nice indeed. I assume it also doesn’t spam commits@ when done this way, 
> in which case double +1 from me.
> 
> —
> AY
> 
> On 6 August 2018 at 17:18:36, Jeremiah D Jordan 
> (jeremiah.jor...@gmail.com) wrote:
> 
> Oh nice. I like the idea of keeping it but moving it to the worklog tab. 
> +1 on that from me.  
> 
> > On Aug 6, 2018, at 5:34 AM, Stefan Podkowinski  wrote:  
> >  
> > +1 for worklog option  
> >  
> > Here's an example ticket from Arrow, where they seem to be using the  
> > same approach:  
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ARROW-2D2583&d=DwICaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=wYZwHSze6YITTXgzOrEvfr_onojahtjeJRzGAt8ByzM&s=KWt0xsOv9ESaieg432edGvPhktGkWHxVuLAdNyORiYY&e=
> >  
> > 
> >   
> >  
> >  
> > On 05.08.2018 09:56, Mick Semb Wever wrote:  
> >>> I find this a bit annoying while subscribed to commits@,  
> >>> especially since we created pr@ for these kind of messages. Also I don't  
> >>> really see any value in mirroring all github comments to the ticket.  
> >>  
> >>  
> >> I agree with you Stefan. It makes the jira tickets quite painful to read. 
> >> And I tend to make comments on the commits rather than the PRs so to avoid 
> >> spamming back to the jira ticket.  
> >>  
> >> But the linking to the PR is invaluable. And I can see Ariel's point about 
> >> a chronological historical archive.  
> >>  
> >>  
> >>> Ponies would be for this to be mirrored to a tab  
> >>> separate from comments in JIRA.  
> >>  
> >>  
> >> Ariel, that would be the the "worklog" option.  
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__reference.apache.org_pmc_github&d=DwICaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=wYZwHSze6YITTXgzOrEvfr_onojahtjeJRzGAt8ByzM&s=1lWQawAO9fITzakpnmdzERuCbZs6IGQsUH_EEIMCMqs&e=
> >>  
> >> 
> >>   
> >>  
> >> If this works for you, and others, I can open a INFRA to switch to 
> >> worklog.  
> >> wdyt?  
> >>  
> >>  
> >> Mick.  
> >>  
> >> -  
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> >>   
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org 
> >>   
> >>  
> >  
> > -  
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> >   
> > For additional commands, e-mail: dev-h...@cassandra.apache.org 
> >   

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: upgrade guava on trunk before 9/1?

2018-08-15 Thread Ariel Weisberg

Hi,

What do we get from Guava in exchange for upgrading?

Ariel

On Wed, Aug 15, 2018, at 10:19 AM, Jason Brown wrote:
> Hey all,
> 
> Does anyone feel strongly about upgrading guava on trunk before the 9/1
> feature freeze for 4.0? We are currently at 23.3 (thanks to
> CASSANDRA-13997), and the current is 26.0.
> 
> I took a quick look, and there's about 17 compilation errors. They fall
> into two categories, both of which appear not too difficult to resolve (I
> didn't look too closely, tbh).
> 
> If anyone wants to tackle this LHF I can rustle up some review time.
> 
> Thanks,
> 
> -Jason

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: upgrade guava on trunk before 9/1?

2018-08-15 Thread Ariel Weisberg

Hi,

They don't even do release notes after 23. Also no API diffs.  I mean I'm fine 
with it, but it's mostly just changing to another arbitrary version that won't 
match what is in apps.

Ariel

On Wed, Aug 15, 2018, at 10:48 AM, Jason Brown wrote:
> Hey Ariel,
> 
> Tbqh, not that much. I was mostly thinking from the "I have conflicts on
> guava versions in my app because I pull in cassandra and XYZ libraries, and
> the transitive dependencies on guava use different versions" POV. Further,
> we'll be on this version of guava for 4.0 for at least two years from now.
> 
> As I asked, "does anybody feeling strongly?". Personally, I'm sorta +0 to
> +0.5, but I was just throwing this out there in case someone does really
> think it best we upgrade (and wants to make a contribution).
> 
> -Jason
> 
> 
> 
> 
> On Wed, Aug 15, 2018 at 7:25 AM, Ariel Weisberg  wrote:
> 
> > Hi,
> >
> > What do we get from Guava in exchange for upgrading?
> >
> > Ariel
> >
> > On Wed, Aug 15, 2018, at 10:19 AM, Jason Brown wrote:
> > > Hey all,
> > >
> > > Does anyone feel strongly about upgrading guava on trunk before the 9/1
> > > feature freeze for 4.0? We are currently at 23.3 (thanks to
> > > CASSANDRA-13997), and the current is 26.0.
> > >
> > > I took a quick look, and there's about 17 compilation errors. They fall
> > > into two categories, both of which appear not too difficult to resolve (I
> > > didn't look too closely, tbh).
> > >
> > > If anyone wants to tackle this LHF I can rustle up some review time.
> > >
> > > Thanks,
> > >
> > > -Jason
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Transient Replication 4.0 status update

2018-08-27 Thread Ariel Weisberg

Hi all,

I wanted to give everyone an update on how development of Transient Replication 
is going and where we are going to be as of 9/1. Blake Eggleston, Alex Petrov, 
Benedict Elliott Smith, and myself have been working to get TR implemented for 
4.0. Up to now we have avoided merging anything related to TR to trunk because 
we weren't 100% sure we were going to make the 9/1 deadline and even minimal TR 
functionality requires significant changes (see 14405).

We focused on getting a minimal set of deployable functionality working, and 
want to avoid overselling what's going to work in the first version. The 
feature is marked explicitly as experimental and has to be enabled via a 
feature flag in cassandra.yaml. The expected audience for TR in 4.0 is more 
experienced users who are ready to tackle deploying experimental functionality. 
As it is deployed by experienced users and we gain more confidence in it and 
remove caveats the # of users it will be appropriate for will expand.

For 4.0 it looks like we will be able to merge TR with support for normal reads 
and writes without monotonic reads. Monotonic reads require blocking read 
repair and blocking read repair with TR requires further changes that aren't 
feasible by 9/1.

Future TR support would look something like

4.0.next:
* vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)

4.next:
* Monotonic reads (https://issues.apache.org/jira/browse/CASSANDRA-14665)
* LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
* Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
* Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)

Possibly never:
* Materialized views
 
Probably never:
* Secondary indexes

The most difficult changes to support Transient Replication should be behind 
us. LWT, Batch log, and counters shouldn't be that hard to make transient 
replication aware. Monotonic reads require some changes to the read path, but 
are at least conceptually not that hard to support. I am confident that by 
4.next TR will have fewer tradeoffs.

If you want to take a peek the current feature branch is 
https://github.com/aweisberg/cassandra/tree/14409-7 although we will be moving 
to 14409-8 to rebase on to trunk.

Regards,
Ariel

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Transient Replication 4.0 status update

2018-08-31 Thread Ariel Weisberg

Hi,

There are no transient nodes. All nodes are the same. If you have transient 
replication enabled each node will transiently replicate some ranges instead of 
fully replicating them.

Capacity requirements are reduced evenly across all nodes in the cluster.

Nodes are not temporarily transient replicas during expansion. They need to 
stream data like a full replica for the transient range before they can serve 
reads. There is a pending state similar to how there is a pending state for 
full replicas. Transient replicas also always receive writes when they are 
pending. There may be some room to relax how that is handled, but for now we 
opt to send pending transient ranges a bit more data and avoid reading from 
them when maybe we could.

This doesn't change how expansion works with vnodes. The same restrictions 
still apply. We won't officially support vnodes until we have done more testing 
and really thought through the corner cases. It's quite possible we will relax 
the restriction on creating transient keyspaces with vnodes in 4.0.x.

Ariel

On Fri, Aug 31, 2018, at 2:07 PM, Carl Mueller wrote:
> I put these questions on the ticket too... Sorry if some of them are
> stupid.
> 
> So are (basically) these transient nodes basically serving as centralized
> hinted handoff caches rather than having the hinted handoffs cluttering up
> full replicas, especially nodes that have no concern for the token range
> involved? I understand that hinted handoffs aren't being replaced by this,
> but is that kind of the idea?
> 
> Are the transient nodes "sitting around"?
> 
> Will the transient nodes have cheaper/lower hardware requirements?
> 
> During cluster expansion, does the newly streaming node acquiring data
> function as a temporary transient node until it becomes a full replica?
> Likewise while shrinking, does a previously full replica function as a
> transient while it streams off data?
> 
> Can this help vnode expansion with multiple concurrent nodes? Admittedly
> I'm not familiar with how much work has gone into fixing cluster expansion
> with vnodes, it is my understanding that you typically expand only one node
> at a time or in multiples of the datacenter size
> 
> On Mon, Aug 27, 2018 at 12:29 PM Ariel Weisberg  wrote:
> 
> > Hi all,
> >
> > I wanted to give everyone an update on how development of Transient
> > Replication is going and where we are going to be as of 9/1. Blake
> > Eggleston, Alex Petrov, Benedict Elliott Smith, and myself have been
> > working to get TR implemented for 4.0. Up to now we have avoided merging
> > anything related to TR to trunk because we weren't 100% sure we were going
> > to make the 9/1 deadline and even minimal TR functionality requires
> > significant changes (see 14405).
> >
> > We focused on getting a minimal set of deployable functionality working,
> > and want to avoid overselling what's going to work in the first version.
> > The feature is marked explicitly as experimental and has to be enabled via
> > a feature flag in cassandra.yaml. The expected audience for TR in 4.0 is
> > more experienced users who are ready to tackle deploying experimental
> > functionality. As it is deployed by experienced users and we gain more
> > confidence in it and remove caveats the # of users it will be appropriate
> > for will expand.
> >
> > For 4.0 it looks like we will be able to merge TR with support for normal
> > reads and writes without monotonic reads. Monotonic reads require blocking
> > read repair and blocking read repair with TR requires further changes that
> > aren't feasible by 9/1.
> >
> > Future TR support would look something like
> >
> > 4.0.next:
> > * vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)
> >
> > 4.next:
> > * Monotonic reads (
> > https://issues.apache.org/jira/browse/CASSANDRA-14665)
> > * LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
> > * Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
> > * Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)
> >
> > Possibly never:
> > * Materialized views
> >
> > Probably never:
> > * Secondary indexes
> >
> > The most difficult changes to support Transient Replication should be
> > behind us. LWT, Batch log, and counters shouldn't be that hard to make
> > transient replication aware. Monotonic reads require some changes to the
> > read path, but are at least conceptually not that hard to support. I am
> > confident that by 4.next TR will have fewer tradeoffs.
> >

Re: Transient Replication 4.0 status update

2018-08-31 Thread Ariel Weisberg

Hi,

All nodes being the same (in terms of functionality) is something we wanted to 
stick with at least for now. I think we want a design that changes the 
operational, availability, and consistency story as little as possible when 
it's completed.

Ariel
On Fri, Aug 31, 2018, at 2:27 PM, Carl Mueller wrote:
> SOrry to spam this with two messages...
> 
> This ticket is also interesting because it is very close to what I imagined
> a useful use case of RF4 / RF6: being basically RF3 + hot spare where you
> marked (in the case of RF4) three nodes as primary and the fourth as hot
> standby, which may be equivalent if I understand the paper/protocol to
> RF3+1 transient.
> 
> On Fri, Aug 31, 2018 at 1:07 PM Carl Mueller 
> wrote:
> 
> > I put these questions on the ticket too... Sorry if some of them are
> > stupid.
> >
> > So are (basically) these transient nodes basically serving as centralized
> > hinted handoff caches rather than having the hinted handoffs cluttering up
> > full replicas, especially nodes that have no concern for the token range
> > involved? I understand that hinted handoffs aren't being replaced by this,
> > but is that kind of the idea?
> >
> > Are the transient nodes "sitting around"?
> >
> > Will the transient nodes have cheaper/lower hardware requirements?
> >
> > During cluster expansion, does the newly streaming node acquiring data
> > function as a temporary transient node until it becomes a full replica?
> > Likewise while shrinking, does a previously full replica function as a
> > transient while it streams off data?
> >
> > Can this help vnode expansion with multiple concurrent nodes? Admittedly
> > I'm not familiar with how much work has gone into fixing cluster expansion
> > with vnodes, it is my understanding that you typically expand only one node
> > at a time or in multiples of the datacenter size
> >
> > On Mon, Aug 27, 2018 at 12:29 PM Ariel Weisberg  wrote:
> >
> >> Hi all,
> >>
> >> I wanted to give everyone an update on how development of Transient
> >> Replication is going and where we are going to be as of 9/1. Blake
> >> Eggleston, Alex Petrov, Benedict Elliott Smith, and myself have been
> >> working to get TR implemented for 4.0. Up to now we have avoided merging
> >> anything related to TR to trunk because we weren't 100% sure we were going
> >> to make the 9/1 deadline and even minimal TR functionality requires
> >> significant changes (see 14405).
> >>
> >> We focused on getting a minimal set of deployable functionality working,
> >> and want to avoid overselling what's going to work in the first version.
> >> The feature is marked explicitly as experimental and has to be enabled via
> >> a feature flag in cassandra.yaml. The expected audience for TR in 4.0 is
> >> more experienced users who are ready to tackle deploying experimental
> >> functionality. As it is deployed by experienced users and we gain more
> >> confidence in it and remove caveats the # of users it will be appropriate
> >> for will expand.
> >>
> >> For 4.0 it looks like we will be able to merge TR with support for normal
> >> reads and writes without monotonic reads. Monotonic reads require blocking
> >> read repair and blocking read repair with TR requires further changes that
> >> aren't feasible by 9/1.
> >>
> >> Future TR support would look something like
> >>
> >> 4.0.next:
> >> * vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)
> >>
> >> 4.next:
> >> * Monotonic reads (
> >> https://issues.apache.org/jira/browse/CASSANDRA-14665)
> >> * LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
> >> * Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
> >> * Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)
> >>
> >> Possibly never:
> >> * Materialized views
> >>
> >> Probably never:
> >> * Secondary indexes
> >>
> >> The most difficult changes to support Transient Replication should be
> >> behind us. LWT, Batch log, and counters shouldn't be that hard to make
> >> transient replication aware. Monotonic reads require some changes to the
> >> read path, but are at least conceptually not that hard to support. I am
> >> confident that by 4.next TR will have fewer tradeoffs.
> >>
> >> If you want to take a peek the current feature branch is
> >> https://github.com/aweisberg/cassandra/tree/14409-7 although we will be
> >> moving to 14409-8 to rebase on to trunk.
> >>
> >> Regards,
> >> Ariel
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Request for post-freeze merge exception

2018-09-04 Thread Ariel Weisberg

+1 Transient Replication had some rebase pain as well, but we were able to get 
through it at the last minute. The traffic on the last few days was pretty 
heavy with several substantial commits.

On Tue, Sep 4, 2018, at 2:19 PM, Jeff Jirsa wrote:
> Seems like a reasonable thing to merge to me. Nothing else has been
> committed, it was approved pre-freeze, seems like the rush to merge was
> bound to have some number of rebase casualties.
> 
> On Tue, Sep 4, 2018 at 11:15 AM Sam Tunnicliffe  wrote:
> 
> > Hey all,
> >
> > On 2018-31-08 CASSANDRA-14145 had been +1'd by two reviewers and CI was
> > green, and so it was marked Ready To Commit. This was before the 4.0
> > feature freeze but before it landed, CASSANDRA-14408, which touched a few
> > common areas of the code, was merged. I didn't have chance to finish the
> > rebase over the weekend but in the end it turned out that most of the
> > conflicts were in test code and were straightforward to resolve. I'd like
> > to commit this now; the rebase is done (& has been re-reviewed), and the CI
> > is still green so I suspect most of the community would probably be ok with
> > that. We did vote for a freeze though and I don't want to subvert or
> > undermine that decision, so I wanted to check and give a chance for anyone
> > to raise objections before I did.
> >
> > I'll wait 24 hours, and if nobody objects before then I'll merge to trunk.
> >
> > Thanks,
> > Sam
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Recommended circleci settings for DTest

2018-09-28 Thread Ariel Weisberg

Hi,

Apply the following diff and if you have access to the higher memory containers 
it should run the dtests with whatever you have. You may need to adjust 
parallelism to match whatever you paid for.

diff --git a/.circleci/config.yml b/.circleci/config.yml
index 5a84f724fc..76a2c9f841 100644
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -58,16 +58,16 @@ with_dtest_jobs_only: &with_dtest_jobs_only
   - build
 # Set env_settings, env_vars, and workflows/build_and_run_tests based on 
environment
 env_settings: &env_settings
-<<: *default_env_settings
-#<<: *high_capacity_env_settings
+#<<: *default_env_settings
+<<: *high_capacity_env_settings
 env_vars: &env_vars
-<<: *resource_constrained_env_vars
-#<<: *high_capacity_env_vars
+#<<: *resource_constrained_env_vars
+<<: *high_capacity_env_vars
 workflows:
 version: 2
-build_and_run_tests: *default_jobs
+#build_and_run_tests: *default_jobs
 #build_and_run_tests: *with_dtest_jobs_only
-#build_and_run_tests: *with_dtest_jobs
+build_and_run_tests: *with_dtest_jobs
 docker_image: &docker_image kjellman/cassandra-test:0.4.3
 version: 2
 jobs:

Ariel

On Fri, Sep 28, 2018, at 5:47 PM, Jay Zhuang wrote:
> Hi,
> 
> Do we have a recommended circleci setup for DTest? For example, what's the
> minimal container number I need to finish the DTest in a reasonable time. I
> know the free account (4 containers) is not good enough for the DTest. But
> if the community member can pay for the cost, what's the recommended
> settings and steps to run that?
> 
> Thanks,
> Jay

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Recommended circleci settings for DTest

2018-09-29 Thread Ariel Weisberg

Hi,

Yes I think it is. I can do it Monday.

Ariel

On Fri, Sep 28, 2018, at 7:09 PM, Jay Zhuang wrote:
> Great, thanks Ariel. I assume it also works for uTest, right? Do you 
> think
> it worth updating the doc for that
> https://github.com/apache/cassandra/blob/trunk/doc/source/development/testing.rst#circleci
> 
> 
> 
> On Fri, Sep 28, 2018 at 2:55 PM Ariel Weisberg  wrote:
> 
> > Hi,
> >
> > Apply the following diff and if you have access to the higher memory
> > containers it should run the dtests with whatever you have. You may need to
> > adjust parallelism to match whatever you paid for.
> >
> > diff --git a/.circleci/config.yml b/.circleci/config.yml
> > index 5a84f724fc..76a2c9f841 100644
> > --- a/.circleci/config.yml
> > +++ b/.circleci/config.yml
> > @@ -58,16 +58,16 @@ with_dtest_jobs_only: &with_dtest_jobs_only
> >- build
> >  # Set env_settings, env_vars, and workflows/build_and_run_tests based on
> > environment
> >  env_settings: &env_settings
> > -<<: *default_env_settings
> > -#<<: *high_capacity_env_settings
> > +#<<: *default_env_settings
> > +<<: *high_capacity_env_settings
> >  env_vars: &env_vars
> > -<<: *resource_constrained_env_vars
> > -#<<: *high_capacity_env_vars
> > +#<<: *resource_constrained_env_vars
> > +<<: *high_capacity_env_vars
> >  workflows:
> >  version: 2
> > -build_and_run_tests: *default_jobs
> > +#build_and_run_tests: *default_jobs
> >  #build_and_run_tests: *with_dtest_jobs_only
> > -#build_and_run_tests: *with_dtest_jobs
> > +build_and_run_tests: *with_dtest_jobs
> >  docker_image: &docker_image kjellman/cassandra-test:0.4.3
> >  version: 2
> >  jobs:
> >
> > Ariel
> >
> > On Fri, Sep 28, 2018, at 5:47 PM, Jay Zhuang wrote:
> > > Hi,
> > >
> > > Do we have a recommended circleci setup for DTest? For example, what's
> > the
> > > minimal container number I need to finish the DTest in a reasonable
> > time. I
> > > know the free account (4 containers) is not good enough for the DTest.
> > But
> > > if the community member can pay for the cost, what's the recommended
> > > settings and steps to run that?
> > >
> > > Thanks,
> > > Jay
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

2018-10-02 Thread Ariel Weisberg

Hi,

I think we should decide based on what is least surprising as you mention, but 
isn't overridden by some other concern.

It seems to me the priorities are

* Correctness
* Performance
* User visible complexity
* Developer visible complexity

Defaulting to silent implicit data loss is not ideal from a correctness 
standpoint.

Doing something better like using wider types doesn't seem like a performance 
issue.

>From a user standpoint doing something less lossy doesn't look more complex as 
>long as it's consistent, and documented and doesn't change from version to 
>version.

There is some developer complexity, but this is a public API and we only get 
one shot at this. 

I wonder about how overflow is handled as well. In VoltDB I think we threw on 
overflow and tended to just do widening conversions to make that less common. 
We didn't imitate another database (as far as I know) we just went with what 
least likely to silently corrupt data.
https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213
https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764

Ariel

On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> ç introduced arithmetic operators, and alongside these 
> came implicit casts for their operands.  There is a semantic decision to 
> be made, and I think the project would do well to explicitly raise this 
> kind of question for wider input before release, since the project is 
> bound by them forever more.
> 
> In this case, the choice is between lossy and lossless casts for 
> operations involving integers and floating point numbers.  In essence, 
> should:
> 
> (1) float + int = float, double + bigint = double; or
> (2) float + int = double, double + bigint = decimal; or
> (3) float + int = decimal, double + bigint = decimal
> 
> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
> double.  Simply casting between these types changes the value.  This is 
> what MS SQL Server does.
> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
> is what PostgreSQL does.
> 
> The question I’m interested in is not just which is the right decision, 
> but how the right decision should be arrived at.  My view is that we 
> should primarily aim for least surprise to the user, but I’m keen to 
> hear from others.
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

2018-10-02 Thread Ariel Weisberg

Hi,

I think overflow and the role of widening conversions are pretty linked so I'll 
continue to inject that into this discussion. Also overflow is much worse since 
most applications won't be impacted by a loss of precision when an expression 
involves an int and float, but will care quite a bit if they get some nonsense 
wrapped number in an integer only expression.

For VoltDB in practice we didn't run into issues with applications not making 
progress due to exceptions with real data due to the widening conversions. The 
range of double and long are pretty big and that hides wrap around/infinity. 

I think the proposal of having all operations return a decimal is attractive in 
that these expressions always result in a consistent type. Two pain points 
might be whether client languages have decimal support and whether there is a 
performance issue? The nice thing about always returning decimal is we can 
sidestep the issue of overflow.

I would start with seeing if that's acceptable, and if it isn't then look at 
other approaches like returning a variety of types such when doing int + int 
return a bigint or int + float return a double.

If we take an approach that allows overflow the ideal end state IMO would be to 
get all users to run Cassandra in way that overflow results in an error even in 
the context of aggregation. The road to get there is tricky, but maybe start by 
having it as an opt in tunable in cassandra.yaml. I don't know how/when we 
could ever change that as a default and it's unfortunate having an option like 
this that 99% won't know they should flip.

It seems like having the default throw on overflow is not as bad as it sounds 
if you do the widening conversions since most people won't run into them. The 
change in the column types of results sets actually sounds worse if we want to 
also improve aggregrations. Many applications won't notice if the client 
library abstracts that away, but I think there are still cases where people 
would notice the type changing.

Ariel

On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
> This (overflow) is an excellent point, but this also affects 
> aggregations which were introduced a long time ago.  They already 
> inherit Java semantics for all of the relevant types (silent wrap 
> around).  We probably want to be consistent, meaning either changing 
> aggregations (which incurs a cost for changing API) or continuing the 
> java semantics here.
> 
> This is why having these discussions explicitly in the community before 
> a release is so critical, in my view.  It’s very easy for these semantic 
> changes to go unnoticed on a JIRA, and then ossify.
> 
> 
> > On 2 Oct 2018, at 15:48, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > I think we should decide based on what is least surprising as you mention, 
> > but isn't overridden by some other concern.
> > 
> > It seems to me the priorities are
> > 
> > * Correctness
> > * Performance
> > * User visible complexity
> > * Developer visible complexity
> > 
> > Defaulting to silent implicit data loss is not ideal from a correctness 
> > standpoint.
> > 
> > Doing something better like using wider types doesn't seem like a 
> > performance issue.
> > 
> > From a user standpoint doing something less lossy doesn't look more complex 
> > as long as it's consistent, and documented and doesn't change from version 
> > to version.
> > 
> > There is some developer complexity, but this is a public API and we only 
> > get one shot at this. 
> > 
> > I wonder about how overflow is handled as well. In VoltDB I think we threw 
> > on overflow and tended to just do widening conversions to make that less 
> > common. We didn't imitate another database (as far as I know) we just went 
> > with what least likely to silently corrupt data.
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 
> > <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 
> > <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
> > 
> > Ariel
> > 
> > On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> >> ç introduced arithmetic operators, and alongside these 
> >> came implicit casts for their operands.  There is a semantic decision to 
> >> be made, and I think the project would do well to explicitly raise this 
> >> kind of question for wider input before release, since the project is 
> >> bound by them forever more.
> >> 
> >

CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-11 Thread Ariel Weisberg

Hi,

This is regarding https://issues.apache.org/jira/browse/CASSANDRA-13241

This ticket has languished for a while. IMO it's too late in 4.0 to implement a
more memory efficient representation for compressed chunk offsets. However I
don't think we should put out another release with the current 64k default as
it's pretty unreasonable.

I propose that we lower the value to 16kb. 4k might never be the correct
default anyways as there is a cost to compression and 16k will still be a large
improvement.

Benedict and Jon Haddad are both +1 on making this change for 4.0. In the past
there has been some consensus about reducing this value although maybe with
more memory efficiency.

The napkin math for what this costs is:
"If you have 1TB of uncompressed data, with 64k chunks that's 16M chunks at 8
bytes each (128MB).
With 16k chunks, that's 512MB.
With 4k chunks, it's 2G.
Per terabyte of data (pre-compression)."
https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621

By way of comparison memory mapping the files has a similar cost per 4k page of
8 bytes. Multiple mappings makes this more expensive. With a default of 16kb
this would be 4x less expensive than memory mapping a file. I only mention this
to give a sense of the costs we are already paying. I am not saying they are
directly related.

I'll wait a week for discussion and if there is consensus make the change.

Regards,
Ariel

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Tested to upgrade to 4.0

2018-10-12 Thread Ariel Weisberg

Hi,

Thanks for reporting this. I'll get this fixed today.

Ariel

On Fri, Oct 12, 2018, at 7:21 AM, Tommy Stendahl wrote:
> Hi,
> 
> I tested to upgrade to Cassandra 4.0. I had an existing cluster with 
> 3.0.15 and upgraded the first node but it fails to start due to a 
> NullPointerException.
> 
> The problem is the new table option "speculative_write_threshold", when 
> it doesn’t exist we get a NullPointerException.
> 
> I created a jira for this 
> CASSANDRA-14820.
> 
> Regards,
> Tommy

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

2018-10-12 Thread Ariel Weisberg

Hi,

I agree with what's been said about expectations regarding expressions 
involving floating point numbers. I think that if one of the inputs is 
approximate then the result should be approximate.

One thing we could look at for inspiration is the SQL spec. Not to follow 
dogmatically necessarily.

>From the SQL 92 spec regarding assignment 
>http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
"
 Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
 FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
 comparable and mutually assignable. If an assignment would result
 in a loss of the most significant digits, an exception condition
 is raised. If least significant digits are lost, implementation-
 defined rounding or truncating occurs with no exception condition
 being raised. The rules for arithmetic are generally governed by
 Subclause 6.12, "".
"

Section 6.12 numeric value expressions:
"
 1) If the data type of both operands of a dyadic arithmetic opera-
tor is exact numeric, then the data type of the result is exact
numeric, with precision and scale determined as follows:
...
 2) If the data type of either operand of a dyadic arithmetic op-
erator is approximate numeric, then the data type of the re-
sult is approximate numeric. The precision of the result is
implementation-defined.
"

And this makes sense to me. I think we should only return an exact result if 
both of the inputs are exact.

I think we might want to look closely at the SQL spec and especially when the 
spec requires an error to be generated. Those are sometimes in the spec to 
prevent subtle paths to wrong answers. Any time we deviate from the spec we 
should be asking why is it in the spec and why are we deviating.

Another issue besides overflow handling is how we determine precision and scale 
for expressions involving two exact types.

Ariel

On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> Hi,
> 
> I'm not sure if I would prefer the Postgres way of doing things, which is
> returning just about any type depending on the order of operators.
> Considering it actually mentions in the docs that using numeric/decimal is
> slow and also multiple times that floating points are inexact. So doing
> some math with Postgres (9.6.5):
> 
> SELECT 2147483647::bigint*1.0::double precision returns double
> precision 2147483647
> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> SELECT 2147483647::bigint*1.0::real returns double
> SELECT 2147483647::double precision*1::bigint returns double 2147483647
> SELECT 2147483647::double precision*1.0::bigint returns double 2147483647
> 
> With + - we can get the same amount of mixture of returned types. There's
> no difference in those calculations, just some casting. To me
> floating-point math indicates inexactness and has errors and whoever mixes
> up two different types should understand that. If one didn't want exact
> numeric type, why would the server return such? The floating point value
> itself could be wrong already before the calculation - trying to say we do
> it lossless is just wrong.
> 
> Fun with 2.65:
> 
> SELECT 2.65::real * 1::int returns double 2.6509536743
> SELECT 2.65::double precision * 1::int returns double 2.65
> 
> SELECT round(2.65) returns numeric 4
> SELECT round(2.65::double precision) returns double 4
> 
> SELECT 2.65 * 1 returns double 2.65
> SELECT 2.65 * 1::bigint returns numeric 2.65
> SELECT 2.65 * 1.0 returns numeric 2.650
> SELECT 2.65 * 1.0::double precision returns double 2.65
> 
> SELECT round(2.65) * 1 returns numeric 3
> SELECT round(2.65) * round(1) returns double 3
> 
> So as we're going to have silly values in any case, why pretend something
> else? Also, exact calculations are slow if we crunch large amount of
> numbers. I guess I slightly deviated towards Postgres' implemention in this
> case, but I wish it wasn't used as a benchmark in this case. And most
> importantly, I would definitely want the exact same type returned each time
> I do a calculation.
> 
>   - Micke
> 
> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith 
> wrote:
> 
> > As far as I can tell we reached a relatively strong consensus that we
> > should implement lossless casts by default?  Does anyone have anything more
> > to add?
> >
> > Looking at the emails, everyone who participated and expressed a
> > preference was in favour of the “Postgres approach” of upcasting to decimal
> > for mixed float/int operands?
> >
> > I’d like to get a clear-cut decision on this, so we know what we’re doing
> > for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s
> > concerns about overflow, which I think are also pressing - particularly for
> > tinyint and smallint.  This does also impact implicit casts for mixed
> > integer type operations, but an approach for thes

Re: Implicit Casts for Arithmetic Operators

2018-10-12 Thread Ariel Weisberg

Hi,

>From reading the spec. Precision is always implementation defined. The spec 
>specifies scale in several cases, but never precision for any type or 
>operation (addition/subtraction, multiplication, division).

So we don't implement anything remotely approaching precision and scale in CQL 
when it comes to numbers I think? So we aren't going to follow the spec for 
scale. We are already pretty far down that road so I would leave it alone. 

I don't think the spec is asking for the most approximate type. It's just 
saying the result is approximate, and the precision is implementation defined. 
We could return either float or double. I think if one of the operands is a 
double we should return a double because clearly the schema thought a double 
was required to represent that number. I would also be in favor of returning a 
double all the time so that people can expect a consistent type from 
expressions involving approximate numbers.

I am a big fan of widening for arithmetic expressions in a database to avoid 
having to error on overflow. You can go to the trouble of only widening the 
minimum amount, but I think it's simpler if we always widen to bigint and 
double. This would be something the spec allows.

Definitely if we can make overflow not occur we should and the spec allows 
that. We should also not return different types for the same operand types just 
to work around overflow if we detect we need more precision.

Ariel
On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this 
> out (and Mike for getting some empirical examples).
> 
> We still have to decide on the approximate data type to return; right 
> now, we have float+bigint=double, but float+int=float.  I think this is 
> fairly inconsistent, and either the approximate type should always win, 
> or we should always upgrade to double for mixed operands.
> 
> The quoted spec also suggests that decimal+float=float, and decimal
> +double=double, whereas we currently have decimal+float=decimal, and 
> decimal+double=decimal
> 
> If we’re going to go with an approximate operand implying an approximate 
> result, I think we should do it consistently (and consistent with the 
> SQL92 spec), and have the type of the approximate operand always be the 
> return type.
> 
> This would still leave a decision for float+double, though.  The most 
> consistent behaviour with that stated above would be to always take the 
> most approximate type to return (i.e. float), but this would seem to me 
> to be fairly unexpected for the user.
> 
> 
> > On 12 Oct 2018, at 17:23, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > I agree with what's been said about expectations regarding expressions 
> > involving floating point numbers. I think that if one of the inputs is 
> > approximate then the result should be approximate.
> > 
> > One thing we could look at for inspiration is the SQL spec. Not to follow 
> > dogmatically necessarily.
> > 
> > From the SQL 92 spec regarding assignment 
> > http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> > "
> > Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> > FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
> > comparable and mutually assignable. If an assignment would result
> > in a loss of the most significant digits, an exception condition
> > is raised. If least significant digits are lost, implementation-
> > defined rounding or truncating occurs with no exception condition
> > being raised. The rules for arithmetic are generally governed by
> > Subclause 6.12, "".
> > "
> > 
> > Section 6.12 numeric value expressions:
> > "
> > 1) If the data type of both operands of a dyadic arithmetic opera-
> >tor is exact numeric, then the data type of the result is exact
> >numeric, with precision and scale determined as follows:
> > ...
> > 2) If the data type of either operand of a dyadic arithmetic op-
> >erator is approximate numeric, then the data type of the re-
> >sult is approximate numeric. The precision of the result is
> >implementation-defined.
> > "
> > 
> > And this makes sense to me. I think we should only return an exact result 
> > if both of the inputs are exact.
> > 
> > I think we might want to look closely at the SQL spec and especially when 
> > the spec requires an error to be generated. Those are sometimes in the spec 
> > to prevent subtle pa

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-12 Thread Ariel Weisberg

Hi,

This would only impact new tables, existing tables would get their 
chunk_length_in_kb from the existing schema. It's something we record in a 
system table.

I have an implementation of a compact integer sequence that only requires 37% 
of the memory required today. So we could do this with only slightly more than 
doubling the memory used. I'll post that to the JIRA soon.

Ariel

On Fri, Oct 12, 2018, at 1:56 AM, Jeff Jirsa wrote:
> 
> 
> I think 16k is a better default, but it should only affect new tables. 
> Whoever changes it, please make sure you think about the upgrade path. 
> 
> 
> > On Oct 12, 2018, at 2:31 AM, Ben Bromhead  wrote:
> > 
> > This is something that's bugged me for ages, tbh the performance gain for
> > most use cases far outweighs the increase in memory usage and I would even
> > be in favor of changing the default now, optimizing the storage cost later
> > (if it's found to be worth it).
> > 
> > For some anecdotal evidence:
> > 4kb is usually what we end setting it to, 16kb feels more reasonable given
> > the memory impact, but what would be the point if practically, most folks
> > set it to 4kb anyway?
> > 
> > Note that chunk_length will largely be dependent on your read sizes, but 4k
> > is the floor for most physical devices in terms of ones block size.
> > 
> > +1 for making this change in 4.0 given the small size and the large
> > improvement to new users experience (as long as we are explicit in the
> > documentation about memory consumption).
> > 
> > 
> >> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg  wrote:
> >> 
> >> Hi,
> >> 
> >> This is regarding https://issues.apache.org/jira/browse/CASSANDRA-13241
> >> 
> >> This ticket has languished for a while. IMO it's too late in 4.0 to
> >> implement a more memory efficient representation for compressed chunk
> >> offsets. However I don't think we should put out another release with the
> >> current 64k default as it's pretty unreasonable.
> >> 
> >> I propose that we lower the value to 16kb. 4k might never be the correct
> >> default anyways as there is a cost to compression and 16k will still be a
> >> large improvement.
> >> 
> >> Benedict and Jon Haddad are both +1 on making this change for 4.0. In the
> >> past there has been some consensus about reducing this value although maybe
> >> with more memory efficiency.
> >> 
> >> The napkin math for what this costs is:
> >> "If you have 1TB of uncompressed data, with 64k chunks that's 16M chunks
> >> at 8 bytes each (128MB).
> >> With 16k chunks, that's 512MB.
> >> With 4k chunks, it's 2G.
> >> Per terabyte of data (pre-compression)."
> >> 
> >> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621
> >> 
> >> By way of comparison memory mapping the files has a similar cost per 4k
> >> page of 8 bytes. Multiple mappings makes this more expensive. With a
> >> default of 16kb this would be 4x less expensive than memory mapping a file.
> >> I only mention this to give a sense of the costs we are already paying. I
> >> am not saying they are directly related.
> >> 
> >> I'll wait a week for discussion and if there is consensus make the change.
> >> 
> >> Regards,
> >> Ariel
> >> 
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >> 
> >> --
> > Ben Bromhead
> > CTO | Instaclustr <https://www.instaclustr.com/>
> > +1 650 284 9692
> > Reliability at Scale
> > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-17 Thread Ariel Weisberg

Hi,

It's really not appreciably slower compared to the decompression we are going 
to do which is going to take several microseconds. Decompression is also going 
to be faster because we are going to do less unnecessary decompression and the 
decompression itself may be faster since it may fit in a higher level cache 
better. I ran a microbenchmark comparing them.

https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988

Fetching a long from memory:   56 nanoseconds
Compact integer sequence   :   80 nanoseconds
Summing integer sequence   :  165 nanoseconds

Currently we have one +1 from Kurt to change the representation and possibly a 
-0 from Benedict. That's not really enough to make an exception to the code 
freeze. If you want it to happen (or not) you need to speak up otherwise only 
the default will change.

Regards,
Ariel

On Wed, Oct 17, 2018, at 6:40 AM, kurt greaves wrote:
> I think if we're going to drop it to 16k, we should invest in the compact
> sequencing as well. Just lowering it to 16k will have potentially a painful
> impact on anyone running low memory nodes, but if we can do it without the
> memory impact I don't think there's any reason to wait another major
> version to implement it.
> 
> Having said that, we should probably benchmark the two representations
> Ariel has come up with.
> 
> On Wed, 17 Oct 2018 at 20:17, Alain RODRIGUEZ  wrote:
> 
> > +1
> >
> > I would guess a lot of C* clusters/tables have this option set to the
> > default value, and not many of them are having the need for reading so big
> > chunks of data.
> > I believe this will greatly limit disk overreads for a fair amount (a big
> > majority?) of new users. It seems fair enough to change this default value,
> > I also think 4.0 is a nice place to do this.
> >
> > Thanks for taking care of this Ariel and for making sure there is a
> > consensus here as well,
> >
> > C*heers,
> > ---
> > Alain Rodriguez - al...@thelastpickle.com
> > France / Spain
> >
> > The Last Pickle - Apache Cassandra Consulting
> > http://www.thelastpickle.com
> >
> > Le sam. 13 oct. 2018 à 08:52, Ariel Weisberg  a écrit :
> >
> > > Hi,
> > >
> > > This would only impact new tables, existing tables would get their
> > > chunk_length_in_kb from the existing schema. It's something we record in
> > a
> > > system table.
> > >
> > > I have an implementation of a compact integer sequence that only requires
> > > 37% of the memory required today. So we could do this with only slightly
> > > more than doubling the memory used. I'll post that to the JIRA soon.
> > >
> > > Ariel
> > >
> > > On Fri, Oct 12, 2018, at 1:56 AM, Jeff Jirsa wrote:
> > > >
> > > >
> > > > I think 16k is a better default, but it should only affect new tables.
> > > > Whoever changes it, please make sure you think about the upgrade path.
> > > >
> > > >
> > > > > On Oct 12, 2018, at 2:31 AM, Ben Bromhead 
> > wrote:
> > > > >
> > > > > This is something that's bugged me for ages, tbh the performance gain
> > > for
> > > > > most use cases far outweighs the increase in memory usage and I would
> > > even
> > > > > be in favor of changing the default now, optimizing the storage cost
> > > later
> > > > > (if it's found to be worth it).
> > > > >
> > > > > For some anecdotal evidence:
> > > > > 4kb is usually what we end setting it to, 16kb feels more reasonable
> > > given
> > > > > the memory impact, but what would be the point if practically, most
> > > folks
> > > > > set it to 4kb anyway?
> > > > >
> > > > > Note that chunk_length will largely be dependent on your read sizes,
> > > but 4k
> > > > > is the floor for most physical devices in terms of ones block size.
> > > > >
> > > > > +1 for making this change in 4.0 given the small size and the large
> > > > > improvement to new users experience (as long as we are explicit in
> > the
> > > > > documentation about memory consumption).
> > > > >
> > > > >
> > > > >> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg 
> > > wrote:
> > > > >>
> > > > >> Hi,
>

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-18 Thread Ariel Weisberg

Hi,

For those who were asking about the performance impact of block size on 
compression I wrote a microbenchmark.

https://pastebin.com/RHDNLGdC

 [java] Benchmark   Mode  Cnt   
   Score  Error  Units
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt   15  
331190055.685 ±  8079758.044  ops/s
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt   15  
353024925.655 ±  7980400.003  ops/s
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt   15  
365664477.654 ± 10083336.038  ops/s
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt   15  
305518114.172 ± 11043705.883  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt   15  
688369529.911 ± 25620873.933  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt   15  
703635848.895 ±  5296941.704  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt   15  
695537044.676 ± 17400763.731  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt   15  
727725713.128 ±  4252436.331  ops/s

To summarize, compression is 8.5% slower and decompression is 1% faster. This 
is measuring the impact on compression/decompression not the huge impact that 
would occur if we decompressed data we don't need less often.

I didn't test decompression of Snappy and LZ4 high, but I did test compression.

Snappy:
 [java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt2  
196574766.116  ops/s
 [java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt2  
198538643.844  ops/s
 [java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt2  
194600497.613  ops/s
 [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt2  
186040175.059  ops/s

LZ4 high compressor:
 [java] CompactIntegerSequenceBench.bench16k  thrpt2  20822947.578  
ops/s
 [java] CompactIntegerSequenceBench.bench32k  thrpt2  12037342.253  
ops/s
 [java] CompactIntegerSequenceBench.bench64k  thrpt2   6782534.469  
ops/s
 [java] CompactIntegerSequenceBench.bench8k   thrpt2  32254619.594  
ops/s

LZ4 high is the one instance where block size mattered a lot. It's a bit 
suspicious really when you look at the ratio of performance to block size being 
close to 1:1. I couldn't spot a bug in the benchmark though.

Compression ratios with LZ4 fast for the text of Alice in Wonderland was:

Chunk size 8192, ratio 0.709473
Chunk size 16384, ratio 0.667236
Chunk size 32768, ratio 0.634735
Chunk size 65536, ratio 0.607208

By way of comparison I also ran deflate with maximum compression:

Chunk size 8192, ratio 0.426434
Chunk size 16384, ratio 0.402423
Chunk size 32768, ratio 0.381627
Chunk size 65536, ratio 0.364865

Ariel
 
On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
> FWIW, I’m not -0, just think that long after the freeze date a change 
> like this needs a strong mandate from the community.  I think the change 
> is a good one.
> 
> 
> 
> 
> 
> > On 17 Oct 2018, at 22:09, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > It's really not appreciably slower compared to the decompression we are 
> > going to do which is going to take several microseconds. Decompression is 
> > also going to be faster because we are going to do less unnecessary 
> > decompression and the decompression itself may be faster since it may fit 
> > in a higher level cache better. I ran a microbenchmark comparing them.
> > 
> > https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988
> > 
> > Fetching a long from memory:   56 nanoseconds
> > Compact integer sequence   :   80 nanoseconds
> > Summing integer sequence   :  165 nanoseconds
> > 
> > Currently we have one +1 from Kurt to change the representation and 
> > possibly a -0 from Benedict. That's not really enough to make an exception 
> > to the code freeze. If you want it to happen (or not) you need to speak up 
> > otherwise only the default will change.
> > 
> > Regards,
> > Ariel
> > 
> > On Wed, Oct 17, 2018, at 6:40 AM, kurt greaves wrote:
> >> I think if we're going to drop it to 16k, we should invest in the compact
> >> sequencing as well. Just lowering it to 16k will have potentially a painful
> >> impact on anyone running low memory nodes, but if we can do it without the
> >> memory impact I don't think there's any reason to wait another major
> >> version to implement it.
> >> 
>

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Ariel Weisberg

Hi,

I ran some benchmarks on my laptop
https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16656821&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16656821

For a random read workload, varying chunk size:
Chunk size  Time
   64k 25:20
   64k 25:33  
   32k 20:01
   16k 19:19
   16k 19:14
8k 16:51
4k 15:39

Ariel
On Thu, Oct 18, 2018, at 2:55 PM, Ariel Weisberg wrote:
> Hi,
> 
> For those who were asking about the performance impact of block size on 
> compression I wrote a microbenchmark.
> 
> https://pastebin.com/RHDNLGdC
> 
>  [java] Benchmark   Mode  
> Cnt  Score  Error  Units
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt   
> 15  331190055.685 ±  8079758.044  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt   
> 15  353024925.655 ±  7980400.003  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt   
> 15  365664477.654 ± 10083336.038  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt   
> 15  305518114.172 ± 11043705.883  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt   
> 15  688369529.911 ± 25620873.933  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt   
> 15  703635848.895 ±  5296941.704  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt   
> 15  695537044.676 ± 17400763.731  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt   
> 15  727725713.128 ±  4252436.331  ops/s
> 
> To summarize, compression is 8.5% slower and decompression is 1% faster. 
> This is measuring the impact on compression/decompression not the huge 
> impact that would occur if we decompressed data we don't need less 
> often.
> 
> I didn't test decompression of Snappy and LZ4 high, but I did test 
> compression.
> 
> Snappy:
>  [java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt
> 2  196574766.116  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt
> 2  198538643.844  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt
> 2  194600497.613  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt
> 2  186040175.059  ops/s
> 
> LZ4 high compressor:
>  [java] CompactIntegerSequenceBench.bench16k  thrpt2  
> 20822947.578  ops/s
>  [java] CompactIntegerSequenceBench.bench32k  thrpt2  
> 12037342.253  ops/s
>  [java] CompactIntegerSequenceBench.bench64k  thrpt2   
> 6782534.469  ops/s
>  [java] CompactIntegerSequenceBench.bench8k   thrpt2  
> 32254619.594  ops/s
> 
> LZ4 high is the one instance where block size mattered a lot. It's a bit 
> suspicious really when you look at the ratio of performance to block 
> size being close to 1:1. I couldn't spot a bug in the benchmark though.
> 
> Compression ratios with LZ4 fast for the text of Alice in Wonderland was:
> 
> Chunk size 8192, ratio 0.709473
> Chunk size 16384, ratio 0.667236
> Chunk size 32768, ratio 0.634735
> Chunk size 65536, ratio 0.607208
> 
> By way of comparison I also ran deflate with maximum compression:
> 
> Chunk size 8192, ratio 0.426434
> Chunk size 16384, ratio 0.402423
> Chunk size 32768, ratio 0.381627
> Chunk size 65536, ratio 0.364865
> 
> Ariel
>  
> On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
> > FWIW, I’m not -0, just think that long after the freeze date a change 
> > like this needs a strong mandate from the community.  I think the change 
> > is a good one.
> > 
> > 
> > 
> > 
> > 
> > > On 17 Oct 2018, at 22:09, Ariel Weisberg  wrote:
> > > 
> > > Hi,
> > > 
> > > It's really not appreciably slower compared to the decompression we are 
> > > going to do which is going to take several microseconds. Decompression is 
> > > also going to be faster because we are going to do less unnecessary 
> > > decompression and the decompression itself may be faster since it may fit 
> > > in a higher level cache better. I ran a microbenchmark comparing them.
> > > 
> > > https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988
> > > 
> > > Fetching a long from memory:   56 nanoseconds
&

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Ariel Weisberg

Hi,

To summarize who we have heard from so far

WRT to changing just the default:

+1:
Jon Haddadd
Ben Bromhead
Alain Rodriguez
Sankalp Kohli (not explicit)

-0:
Sylvaine Lebresne 
Jeff Jirsa

Not sure:
Kurt Greaves
Joshua Mckenzie
Benedict Elliot Smith

WRT to change the representation:

+1:
There are only conditional +1s at this point

-0:
Sylvaine Lebresne

-.5:
Jeff Jirsa

This 
(https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
 is a rough cut of the change for the representation. It needs better naming, 
unit tests, javadoc etc. but it does implement the change.

Ariel
On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
> Sorry, to be clear - I'm +1 on changing the configuration default, but I
> think changing the compression in memory representations warrants further
> discussion and investigation before making a case for or against it yet.
> An optimization that reduces in memory cost by over 50% sounds pretty good
> and we never were really explicit that those sort of optimizations would be
> excluded after our feature freeze.  I don't think they should necessarily
> be excluded at this time, but it depends on the size and risk of the patch.
> 
> On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad  wrote:
> 
> > I think we should try to do the right thing for the most people that we
> > can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
> > of clusters created by a lot of different teams, going from brand new to
> > pretty damn knowledgeable.  I can't think of a single time over the last 2
> > years that I've seen a cluster use non-default settings for compression.
> > With only a handful of exceptions, I've lowered the chunk size considerably
> > (usually to 4 or 8K) and the impact has always been very noticeable,
> > frequently resulting in hardware reduction and cost savings.  Of all the
> > poorly chosen defaults we have, this is one of the biggest offenders that I
> > see.  There's a good reason ScyllaDB  claims they're so much faster than
> > Cassandra - we ship a DB that performs poorly for 90+% of teams because we
> > ship for a specific use case, not a general one (time series on memory
> > constrained boxes being the specific use case)
> >
> > This doesn't impact existing tables, just new ones.  More and more teams
> > are using Cassandra as a general purpose database, we should acknowledge
> > that adjusting our defaults accordingly.  Yes, we use a little bit more
> > memory on new tables if we just change this setting, and what we get out of
> > it is a massive performance win.
> >
> > I'm +1 on the change as well.
> >
> >
> >
> > On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli 
> > wrote:
> >
> >> (We should definitely harden the definition for freeze in a separate
> >> thread)
> >>
> >> My thinking is that this is the best time to do this change as we have
> >> not even cut alpha or beta. All the people involved in the test will
> >> definitely be testing it again when we have these releases.
> >>
> >> > On Oct 19, 2018, at 8:00 AM, Michael Shuler 
> >> wrote:
> >> >
> >> >> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
> >> >>
> >> >> At the risk of hijacking this thread, when are we going to transition
> >> from
> >> >> "no new features, change whatever else you want including refactoring
> >> and
> >> >> changing years-old defaults" to "ok, we think we have something that's
> >> >> stable, time to start testing"?
> >> >
> >> > Creating a cassandra-4.0 branch would allow trunk to, for instance, get
> >> > a default config value change commit and get more testing. We might
> >> > forget again, from what I understand of Benedict's last comment :)
> >> >
> >> > --
> >> > Michael
> >> >
> >> > -
> >> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >> >
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
> >
> > --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
> >
> 
> 
> -- 
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Ariel Weisberg

Hi,

I just asked Jeff. He is -0 and -0.5 respectively.

Ariel

On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote:
> I’m +1 change of default.  I think Jeff was -1 on that though.
> 
> 
> > On 23 Oct 2018, at 16:46, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > To summarize who we have heard from so far
> > 
> > WRT to changing just the default:
> > 
> > +1:
> > Jon Haddadd
> > Ben Bromhead
> > Alain Rodriguez
> > Sankalp Kohli (not explicit)
> > 
> > -0:
> > Sylvaine Lebresne 
> > Jeff Jirsa
> > 
> > Not sure:
> > Kurt Greaves
> > Joshua Mckenzie
> > Benedict Elliot Smith
> > 
> > WRT to change the representation:
> > 
> > +1:
> > There are only conditional +1s at this point
> > 
> > -0:
> > Sylvaine Lebresne
> > 
> > -.5:
> > Jeff Jirsa
> > 
> > This 
> > (https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
> >  is a rough cut of the change for the representation. It needs better 
> > naming, unit tests, javadoc etc. but it does implement the change.
> > 
> > Ariel
> > On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
> >> Sorry, to be clear - I'm +1 on changing the configuration default, but I
> >> think changing the compression in memory representations warrants further
> >> discussion and investigation before making a case for or against it yet.
> >> An optimization that reduces in memory cost by over 50% sounds pretty good
> >> and we never were really explicit that those sort of optimizations would be
> >> excluded after our feature freeze.  I don't think they should necessarily
> >> be excluded at this time, but it depends on the size and risk of the patch.
> >> 
> >> On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad  wrote:
> >> 
> >>> I think we should try to do the right thing for the most people that we
> >>> can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
> >>> of clusters created by a lot of different teams, going from brand new to
> >>> pretty damn knowledgeable.  I can't think of a single time over the last 2
> >>> years that I've seen a cluster use non-default settings for compression.
> >>> With only a handful of exceptions, I've lowered the chunk size 
> >>> considerably
> >>> (usually to 4 or 8K) and the impact has always been very noticeable,
> >>> frequently resulting in hardware reduction and cost savings.  Of all the
> >>> poorly chosen defaults we have, this is one of the biggest offenders that 
> >>> I
> >>> see.  There's a good reason ScyllaDB  claims they're so much faster than
> >>> Cassandra - we ship a DB that performs poorly for 90+% of teams because we
> >>> ship for a specific use case, not a general one (time series on memory
> >>> constrained boxes being the specific use case)
> >>> 
> >>> This doesn't impact existing tables, just new ones.  More and more teams
> >>> are using Cassandra as a general purpose database, we should acknowledge
> >>> that adjusting our defaults accordingly.  Yes, we use a little bit more
> >>> memory on new tables if we just change this setting, and what we get out 
> >>> of
> >>> it is a massive performance win.
> >>> 
> >>> I'm +1 on the change as well.
> >>> 
> >>> 
> >>> 
> >>> On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli 
> >>> wrote:
> >>> 
> >>>> (We should definitely harden the definition for freeze in a separate
> >>>> thread)
> >>>> 
> >>>> My thinking is that this is the best time to do this change as we have
> >>>> not even cut alpha or beta. All the people involved in the test will
> >>>> definitely be testing it again when we have these releases.
> >>>> 
> >>>>> On Oct 19, 2018, at 8:00 AM, Michael Shuler 
> >>>> wrote:
> >>>>> 
> >>>>>> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
> >>>>>> 
> >>>>>> At the risk of hijacking this thread, when are we going to transition
> >>>> from
> >>>>>> "no new features, change whatever else you want including refactoring
> >>>> and
> >>>>>> changing years-old def

Re: Proposed changes to CircleCI testing workflow

2018-10-26 Thread Ariel Weisberg

Hi,

Thank you for working on this. These all sound like good changes to me.

Ariel

On Fri, Oct 26, 2018, at 10:49 AM, Stefan Podkowinski wrote:
> I'd like to give you a quick update on the work that has been done
> lately on running tests using CircleCI. Please let me know if you have
> any objections or don't think this is going into the right direction, or
> have any other feedback!
> 
> We've been using CircleCI for a while now and results are used on
> constant basis for new patches. Not only by committers, but also by
> casual contributors to run unit tests. Looks like people find the
> service valuable and we should keep using it. Therefor I'd like to make
> some improvements that will make it easier to add new tests and to
> continue making CircleCI an option for all contributors, both on paid
> and free plans.
> 
> The general idea of the changes implemented in #14806, is to consolidate
> the existing config to make it more modular and have smaller jobs that
> can be scheduled ad-hoc by the developer, instead of running a few big
> jobs on every commit. Reorganizing and breaking up the existing config
> was done using the new 2.1 config features. Starting jobs on request,
> instead of automatically, is done using the manual approval feature,
> i.e. you now have to click on that job in the workflow page in order to
> start it. I'd like to see us having smaller, more specialized groups of
> tests that we can run more selectively during development, while still
> being able to run bigger tests before committing, or firing up all of
> them during testing and releasing. Other example of smaller jobs would
> be testing coverage (#14788) or cqlsh tests (#14298). But also
> individual jobs for different ant targets, like burn, stress or benchmarks.
> 
> We'd now also be able to run tests using different docker images and
> different JDKs. I've already updated the used image to also include Java
> 11 and added unit and dtest jobs to the config for that. It's now really
> easy to run tests on Java 11, although these won't pass yet. It seems to
> be important to me to have this kind of flexibility, given the
> increasingly diverse ecosystem of Java distributions. We can also add
> jobs for packaging and doing smoke tests by installing and starting
> packages on different docker images (Redhat, Debian, Ubuntu,..) at a
> later point.
> 
> As for the paid vs free plans issue, I'd also like us to discuss how we
> can make tests faster and less resource intensive in general. As a
> desired consequence, we'd be able to move away from multi-node dtests,
> to something that can be run using the free plan. I'm looking forward to
> see if #14821 can get us into that direction. Ideally we can add these
> tests into a job that can be completed on the free plan and encourage
> contributors to add new tests there, instead of having to write a dtest,
> which they won't be able to run on CircleCI without a paid plan.
> 
> Whats changing for you as a CircleCI user?
> * All tests, except unit tests, will need to be started manually and
> will not run on every commit (this can be further discussed and changed
> anytime if needed)
> * Updating the config.yml file now requires using the CircleCI cli tool
> and should not be done directly (see #14806 for technical details)
> * High resource settings can be enabled using a script/patch, either run
> manually or as commit hook (again see ticket for details)
> * Both free and paid plan users now have more tests to run
> 
> As already mentioned, please let me know if you have any thoughts on
> this, or if you think this is going into the wrong direction.
> 
> Thanks.
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-29 Thread Ariel Weisberg

Hi,

Seeing too many -'s for changing the representation and essentially no +1s so I 
submitted a patch for just changing the default. I could use a reviewer for 
https://issues.apache.org/jira/browse/CASSANDRA-13241

I created https://issues.apache.org/jira/browse/CASSANDRA-14857  "Use a more 
space efficient representation for compressed chunk offsets" for post 4.0.

Regards,
Ariel

On Tue, Oct 23, 2018, at 11:46 AM, Ariel Weisberg wrote:
> Hi,
> 
> To summarize who we have heard from so far
> 
> WRT to changing just the default:
> 
> +1:
> Jon Haddadd
> Ben Bromhead
> Alain Rodriguez
> Sankalp Kohli (not explicit)
> 
> -0:
> Sylvaine Lebresne 
> Jeff Jirsa
> 
> Not sure:
> Kurt Greaves
> Joshua Mckenzie
> Benedict Elliot Smith
> 
> WRT to change the representation:
> 
> +1:
> There are only conditional +1s at this point
> 
> -0:
> Sylvaine Lebresne
> 
> -.5:
> Jeff Jirsa
> 
> This 
> (https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
>  
> is a rough cut of the change for the representation. It needs better 
> naming, unit tests, javadoc etc. but it does implement the change.
> 
> Ariel
> On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
> > Sorry, to be clear - I'm +1 on changing the configuration default, but I
> > think changing the compression in memory representations warrants further
> > discussion and investigation before making a case for or against it yet.
> > An optimization that reduces in memory cost by over 50% sounds pretty good
> > and we never were really explicit that those sort of optimizations would be
> > excluded after our feature freeze.  I don't think they should necessarily
> > be excluded at this time, but it depends on the size and risk of the patch.
> > 
> > On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad  wrote:
> > 
> > > I think we should try to do the right thing for the most people that we
> > > can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
> > > of clusters created by a lot of different teams, going from brand new to
> > > pretty damn knowledgeable.  I can't think of a single time over the last 2
> > > years that I've seen a cluster use non-default settings for compression.
> > > With only a handful of exceptions, I've lowered the chunk size 
> > > considerably
> > > (usually to 4 or 8K) and the impact has always been very noticeable,
> > > frequently resulting in hardware reduction and cost savings.  Of all the
> > > poorly chosen defaults we have, this is one of the biggest offenders that 
> > > I
> > > see.  There's a good reason ScyllaDB  claims they're so much faster than
> > > Cassandra - we ship a DB that performs poorly for 90+% of teams because we
> > > ship for a specific use case, not a general one (time series on memory
> > > constrained boxes being the specific use case)
> > >
> > > This doesn't impact existing tables, just new ones.  More and more teams
> > > are using Cassandra as a general purpose database, we should acknowledge
> > > that adjusting our defaults accordingly.  Yes, we use a little bit more
> > > memory on new tables if we just change this setting, and what we get out 
> > > of
> > > it is a massive performance win.
> > >
> > > I'm +1 on the change as well.
> > >
> > >
> > >
> > > On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli 
> > > wrote:
> > >
> > >> (We should definitely harden the definition for freeze in a separate
> > >> thread)
> > >>
> > >> My thinking is that this is the best time to do this change as we have
> > >> not even cut alpha or beta. All the people involved in the test will
> > >> definitely be testing it again when we have these releases.
> > >>
> > >> > On Oct 19, 2018, at 8:00 AM, Michael Shuler 
> > >> wrote:
> > >> >
> > >> >> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
> > >> >>
> > >> >> At the risk of hijacking this thread, when are we going to transition
> > >> from
> > >> >> "no new features, change whatever else you want including refactoring
> > >> and
> > >> >> changing years-old defaults" to "ok, we think we have something that's
> > >> >> stable, time to start testing"?
> > >> >
> > >> > Creating

Re: Implicit Casts for Arithmetic Operators

2018-11-20 Thread Ariel Weisberg

Hi,

+1

This is a public API so we will be much better off if we get it right the first 
time.

Ariel

> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad  wrote:
> 
> Sounds good to me.
> 
> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith 
> wrote:
> 
>> So, this thread somewhat petered out.
>> 
>> There are still a number of unresolved issues, but to make progress I
>> wonder if it would first be helpful to have a vote on ensuring we are ANSI
>> SQL 92 compliant for our arithmetic?  This seems like a sensible baseline,
>> since we will hopefully minimise surprise to operators this way.
>> 
>> If people largely agree, I will call a vote, and we can pick up a couple
>> of more focused discussions afterwards on how we interpret the leeway it
>> gives.
>> 
>> 
>>> On 12 Oct 2018, at 18:10, Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> From reading the spec. Precision is always implementation defined. The
>> spec specifies scale in several cases, but never precision for any type or
>> operation (addition/subtraction, multiplication, division).
>>> 
>>> So we don't implement anything remotely approaching precision and scale
>> in CQL when it comes to numbers I think? So we aren't going to follow the
>> spec for scale. We are already pretty far down that road so I would leave
>> it alone.
>>> 
>>> I don't think the spec is asking for the most approximate type. It's
>> just saying the result is approximate, and the precision is implementation
>> defined. We could return either float or double. I think if one of the
>> operands is a double we should return a double because clearly the schema
>> thought a double was required to represent that number. I would also be in
>> favor of returning a double all the time so that people can expect a
>> consistent type from expressions involving approximate numbers.
>>> 
>>> I am a big fan of widening for arithmetic expressions in a database to
>> avoid having to error on overflow. You can go to the trouble of only
>> widening the minimum amount, but I think it's simpler if we always widen to
>> bigint and double. This would be something the spec allows.
>>> 
>>> Definitely if we can make overflow not occur we should and the spec
>> allows that. We should also not return different types for the same operand
>> types just to work around overflow if we detect we need more precision.
>>> 
>>> Ariel
>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this
>>>> out (and Mike for getting some empirical examples).
>>>> 
>>>> We still have to decide on the approximate data type to return; right
>>>> now, we have float+bigint=double, but float+int=float.  I think this is
>>>> fairly inconsistent, and either the approximate type should always win,
>>>> or we should always upgrade to double for mixed operands.
>>>> 
>>>> The quoted spec also suggests that decimal+float=float, and decimal
>>>> +double=double, whereas we currently have decimal+float=decimal, and
>>>> decimal+double=decimal
>>>> 
>>>> If we’re going to go with an approximate operand implying an
>> approximate
>>>> result, I think we should do it consistently (and consistent with the
>>>> SQL92 spec), and have the type of the approximate operand always be the
>>>> return type.
>>>> 
>>>> This would still leave a decision for float+double, though.  The most
>>>> consistent behaviour with that stated above would be to always take the
>>>> most approximate type to return (i.e. float), but this would seem to me
>>>> to be fairly unexpected for the user.
>>>> 
>>>> 
>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg  wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I agree with what's been said about expectations regarding expressions
>> involving floating point numbers. I think that if one of the inputs is
>> approximate then the result should be approximate.
>>>>> 
>>>>> One thing we could look at for inspiration is the SQL spec. Not to
>> follow dogmatically necessarily.
>>>>> 
>>>>> From the SQL 92 spec regarding assignment
>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>>>> "
>>>>>   Va

Re: Request to review feature-freeze proposed tickets

2018-11-20 Thread Ariel Weisberg

Hi,

I would like to get as many of these as is feasible in. Before the feature 
freeze started 1 out of 17 JIRAs that were patch available were reviewed and 
committed.

If you didn’t have access reviewers and committers, as the one out of the 17 
did, it has been essentially impossible to get your problems with Cassandra 
fixed in 4.0.

This is basically the same as saying that despite the fact Cassandra is open 
source it does you no good because it will be years before the issues impacting 
you get fixed even if you contribute the fixes yourself.

Pulling up the ladder after getting “your own” fixes in is a sure fire way to 
fracture the community into a collection of private forks containing the fixes 
people can’t live without, and pushing people to look at alternatives.

Private forks are a serious threat to the project. The people on them are at 
risk of getting left behind and Cassandra stagnates for them and becomes 
uncompetitive. Those with the resources to maintain a seriously diverged fork 
are also the ones better positioned to be active contributors.

Regards,
Ariel

> On Nov 18, 2018, at 9:18 PM, Vinay Chella  wrote:
> 
> Hi,
> 
> We still have 15 Patch Available/ open tickets which were requested for
> reviews before the Sep 1, 2018 freeze. I am starting this email thread to
> resurface and request a review of community tickets as most of these
> tickets address vital correctness, performance, and usability bugs that
> help avoid critical production issues. I tried to provide context on why we
> feel these tickets are important to get into 4.0. If you would like to
> discuss the technical details of a particular ticket, let's try to do that
> in JIRA.
> 
> CASSANDRA-14525: Cluster enters an inconsistent state after bootstrap
> failures. (Correctness bug, Production impact, Ready to Commit)
> 
> CASSANDRA-14459: DES sends requests to the wrong nodes routinely. (SLA
> breaking latencies, Production impact, Review in progress)
> 
> CASSANDRA-14303 and CASSANDRA-14557: Currently production 3.0+ clusters
> cannot be rebuilt after node failure due to 3.0’s introduction of the
> system_auth keyspace with rf of 1. These tickets both fix the regression
> introduced in 3.0 by letting operators configure rf=3 and prevent future
> outages (Usability bug, Production impact, Patch Available).
> 
> CASSANDRA-14096: Cassandra 3.11.1 Repair Causes Out of Memory. We believe
> this may also impact 3.0 (Title says it all, Production impact, Patch
> Available)
> 
> CASSANDRA-10023: It is impossible to accurately determine local read/write
> calls on C*. This patch allows users to detect when they are choosing
> incorrect coordinators. (Usability bug (troubleshoot), Review in progress)
> 
> CASSANDRA-10789: There is no way to safely stop bad clients bringing down
> C* nodes. This patch would give operators a very important tool to use
> during production incidents to mitigate impact. (Usability bug, Production
> Impact (recovery), Patch Available)
> 
> CASSANDRA-13010: No visibility into which disk is being compacted to.
> (Usability bug, Production Impact (troubleshoot), Review in progress)
> 
> CASSANDRA-12783 - Break up large MV mutations to prevent OOMs (Title says
> it all, Production Impact, Patch InProgress/ Awaiting Feedback)
> 
> CASSANDRA-14319 - nodetool rebuild from DC lets you pass invalid
> datacenters (Usability bug, Production impact, Patch available)
> 
> CASSANDRA-13841 - Smarter nodetool rebuild. Kind of a bug but would be nice
> to get it in 4.0. (Production Impact (recovery), Patch Available)
> 
> CASSANDRA-9452: Cleanup of old configuration, confusing to new C*
> operators. (Cleanup, Patch Available)
> 
> CASSANDRA-14309: Hint window persistence across the record. This way hints
> that are accumulated over a period of time when nodes are creating are less
> likely to take down the entire cluster. (Potential Production Impact, Patch
> Available)
> 
> CASSANDRA-14291: Bug from CASSANDRA-11163? (Usability Bug, Patch Available)
> 
> CASSANDRA-10540: RangeAware compaction. 256 vnode clusters really need this
> to be able to do basic things like repair. The patch needs some rework
> after transient replication (Production impact, needs contributor time)
> 
> URL for all the tickets: JIRA
> 
> 
> 
> Let me know.
> Thanks,
> Vinay Chella


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: JIRA Workflow Proposals

2018-12-07 Thread Ariel Weisberg

Hi,

Late but.

1. A
2. +1
3. +1
4. -1
5. -0
6. +1

RE 4, I think blocker is an important priority. High and urgent mean the same 
thing to me. Wish is fine, but that is too similar to low if you ask me. My 
ideal would be low, medium, high, blocker. Medium feels weird, but it's a real 
thing, it's not high priority and we really want it done, but it's not low 
enough that we might skip it/not get to it anytime soon.

RE 5. I don't think I have ever used the environment field or used the contents 
populated in it. Doesn't mean someone else hasn't, but in terms of making the 
easy things easy it seems like making it required isn't so high value? I don't 
populate it myself usually I put it in the description or the subject without 
thinking.

It seems like the purpose of a field is to make it indexable and possibly 
structured. How often do we search or require structure on this field?

Ariel

On Tue, Dec 4, 2018, at 2:12 PM, Benedict Elliott Smith wrote:
> Ok, so after an initial flurry everyone has lost interest :)
> 
> I think we should take a quick poll (not a vote), on people’s positions 
> on the questions raised so far.  If people could try to take the time to 
> stake a +1/-1, or A/B, for each item, that would be really great.  This 
> poll will not be the end of discussions, but will (hopefully) at least 
> draw a line under the current open questions.
> 
> I will start with some verbiage, then summarise with options for 
> everyone to respond to.  You can scroll to the summary immediately if 
> you like.
> 
> - 1. Component: Multi-select or Cascading-select (i.e. only one 
> component possible per ticket, but neater UX)
> - 2. Labels: rather than litigate people’s positions, I propose we do 
> the least controversial thing, which is to simply leave labels intact, 
> and only supplement them with the new schema information.  We can later 
> revisit if we decide it’s getting messy.
> - 3. "First review completed; second review ongoing": I don’t think we 
> need to complicate the process; if there are two reviews in flight, the 
> first reviewer can simply comment that they are done when ready, and the 
> second reviewer can move the status once they are done.  If the first 
> reviewer wants substantive changes, they can move the status to "Change 
> Request” before the other reviewer completes, if they like.  Everyone 
> involved can probably negotiate this fairly well, but we can introduce 
> some specific guidance on how to conduct yourself here in a follow-up.  
> - 4. Priorities: Option A: Wish, Low, Normal, High, Urgent; Option B: 
> Wish, Low, Normal, Urgent
> - 5. Mandatory Platform and Feature. Make mandatory by introducing new 
> “All” and “None” (respectively) options, so always possible to select an 
> option.
> - 6. Environment field: Remove?
> 
> I think this captures everything that has been brought up so far, except 
> for the suggestion to make "Since Version” a “Version” - but that needs 
> more discussion, as I don’t think there’s a clear alternative proposal 
> yet.
> 
> Summary:
> 
> 1: Component. (A) Multi-select; (B) Cascading-select
> 2: Labels: leave alone +1/-1
> 3: No workflow changes for first/second review: +1/-1
> 4: Priorities: Including High +1/-1
> 5: Mandatory Platform and Feature: +1/-1
> 6: Remove Environment field: +1/-1
> 
> I will begin.
> 
> 1: A
> 2: +1
> 3: +1
> 4: +1
> 5: Don’t mind
> 6: +1
> 
> 
> 
> 
> > On 29 Nov 2018, at 22:04, Scott Andreas  wrote:
> > 
> > If I read Josh’s reply right, I think the suggestion is to periodically 
> > review active labels and promote those that are demonstrably useful to 
> > components (cf. folksonomy -> 
> > taxonomy).
> >  I hadn’t read the reply as indicating that labels should be zero’d out 
> > periodically. In any case, I agree that reviewing active labels and 
> > re-evaluating our taxonomy from time to time sounds great; I don’t think 
> > I’d zero them, though.
> > 
> > Responding to a few comments:
> > 
> > –––
> > 
> > – To Joey’s question about issues languishing in Triage: I like the idea of 
> > an SLO for the “triage” state. I am happy to commit time and resources to 
> > triaging newly-reported issues, and to JIRA pruning/gardening in general. I 
> > spent part of the weekend before last adding components to a few hundred 
> > open issues and preparing the Confluence reports mentioned in the other 
> > thread. It was calming. We can also figure out how to rotate / share this 
> > responsibility.
> > 
> > – Labels discussion: If we adopt a more structured component hierarchy to 
> > treat as our primary method of organization, keep labels around for people 
> > to use as they’d like (e.g., for custom JQL queries useful to their 
> > workflows), and periodically promote those that are widely useful, I think 
> > that sounds like a fine outcome.
> > 
> > – On Sankalp’s question of issue reporter / new contributor burden: I 
> > actually think

Re: JIRA Workflow Proposals

2018-12-07 Thread Ariel Weisberg

Hi,

I think I managed to not get confused. I evaluatec the two separately. I don't 
like or use environment both in terms of populating the field and searching on 
it. That information could go in the description and be just as useful to me 
personally.

I have no problem with an optional platform field that is an improvement on 
environment in that it is more structured and searchable. My bar for optional 
fields is low. I guess I'm not convinced I want either though? If other people 
find it useful then because they search on it then yes we should do a better 
more structured version.

5 groups feature impact and platform. It's platform I think is less useful? I 
am +1 on feature impacts as we have impact on things like CCM and drivers that 
we need to keep track of and I do forget them at times.

Ariel

On Fri, Dec 7, 2018, at 1:17 PM, Benedict Elliott Smith wrote:
> 
> 
> > On 7 Dec 2018, at 17:52, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > Late but.
> 
> No harm in them continuing to roll in, I’m just cognisant of needing to 
> annoy everyone with a second poll, so no point perpetuating it past a 
> likely unassailable consensus.
> 
> > 
> > 1. A
> > 2. +1
> > 3. +1
> > 4. -1
> > 5. -0
> > 6. +1
> > 
> > RE 4, I think blocker is an important priority. High and urgent mean the 
> > same thing to me. Wish is fine, but that is too similar to low if you ask 
> > me. My ideal would be low, medium, high, blocker. Medium feels weird, but 
> > it's a real thing, it's not high priority and we really want it done, but 
> > it's not low enough that we might skip it/not get to it anytime soon.
> 
> It seems like people have really strong (and divergent) opinions about 
> Priority!  
> 
> So, to begin: I don’t think Medium is any different to Normal, in the 
> proposal?  Except Normal is, well, more accurate I think?  It is the 
> default priority, and should be used unless strong reasons otherwise.
> 
> As for Blocker vs Urgent, I obviously disagree (but not super strongly):  
> Urgent conveys more information IMO.  Blocker only says we cannot 
> release without this.  Urgent also says we must release with this, and 
> ASAP.  The meaning of a priority is anyway distinct from its name, and 
> the meaning of Urgent is described in the proposal to make this clear.  
> But, it’s easy to add a quick poll item for the top priority name.  Any 
> other suggestions, besides Urgent and Blocker?
> 
> Of course, if we remove Priority from the Bug type, I agree with others 
> that the top level priority ceases to mean anything, and there probably 
> shouldn’t be one.
> 
> Wish will already be included in the next poll.
> 
> > RE 5. I don't think I have ever used the environment field or used the 
> > contents populated in it. Doesn't mean someone else hasn't, but in terms of 
> > making the easy things easy it seems like making it required isn't so high 
> > value? I don't populate it myself usually I put it in the description or 
> > the subject without thinking.
> > It seems like the purpose of a field is to make it indexable and possibly 
> > structured. How often do we search or require structure on this field?
> 
> Are you conflating this with Q6?  The environment field was not 
> discussed, only the potential Platform field, which we _hope_ to make a 
> multi-select list.  This would make the information quite useful for 
> reporting and searching.
> 
> Environment is being removed because it is unstructured and poorly used, 
> and it looks like you have voted in favour of this?
> 
> If Platform cannot be made into an editable multi-select list, we will 
> probably not make it mandatory. Here we’re trying to gauge an ideal end 
> state - some things may need revisiting if JIRA does not play ball, 
> though that should not affect many items.
> 
> > 
> > Ariel
> > 
> > On Tue, Dec 4, 2018, at 2:12 PM, Benedict Elliott Smith wrote:
> >> Ok, so after an initial flurry everyone has lost interest :)
> >> 
> >> I think we should take a quick poll (not a vote), on people’s positions 
> >> on the questions raised so far.  If people could try to take the time to 
> >> stake a +1/-1, or A/B, for each item, that would be really great.  This 
> >> poll will not be the end of discussions, but will (hopefully) at least 
> >> draw a line under the current open questions.
> >> 
> >> I will start with some verbiage, then summarise with options for 
> >> everyone to respond to.  You can scroll to the summary immediately if 
> >> you like.
> >> 
> >> - 1. Component: Mu

Re: JIRA Workflow Proposals

2018-12-10 Thread Ariel Weisberg

Hi,

RE #1, does this mean if you submit a ticket and you are not a contributor you 
can't modify any of the fields including description or adding/removing 
attachments?

RE #2, while bugs don't necessarily have a priority it's helpful to have it 
sort logically with other issue types on that field. Seems like ideally what we 
want to preserve is a useful sort order without having to populate the field 
manually.

RE #4, Do we need to keep wish at all?

Not voting yet just because I'm not sure on some.

Ariel

On Mon, Dec 10, 2018, at 7:43 AM, Benedict Elliott Smith wrote:
> New questions.  This is the last round, before I call a proper vote on 
> the modified proposal (so we can take a mandate to Infra to modify our 
> JIRA workflows).  
> 
> Thanks again to everyone following and contributing to this discussion.  
> I’m not sure any of these remaining questions are critical, but for the 
> best democratic outcome it’s probably worth running them through the 
> same process.  I also forgot to include (1) on the prior vote.
> 
> 1. Limit edits to JIRA ‘contributor’ role: +1/-1
> 2. Priority on Bug issue type: (A) remove it; (B) auto-populate it; (C) 
> leave it.  Please rank.
> 3. Top priority: (A) Urgent; (B) Blocker.  See here for my explanation 
> of why I chose Urgent 
> <https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E>.
> 4. Priority keep ‘Wish’ (to replace issue type): +1/-1
> 
> For 2, if we cannot remove it, we can make it non-editable and default 
> to Normal; for auto-populate I propose using Severity (Low->Low, Normal-
> >Normal, Critical->Urgent).  No guarantees entirely on what we can 
> achieve, so a ranked choice would be ideal.
> 
> I have avoided splitting out another vote on the Platform field, since 
> everyone was largely meh on the question of mandatoriness; it won by 
> only a slim margin, because everyone was +/- 0, and nobody responded to 
> back Ariel’s dissenting view.
> 
> My votes are:
> 1: +1
> 2: B,C,A
> 3: A
> 4: +0.5
> 
> 
> For tracking, the new consensus from the prior vote is:
> 1: A (+10)
> 2: +9 -0.1
> 3: +10
> 4: +6 -2 (=+4)
> 5: +2; a lot of meh.
> 6: +9
> 
> 
> 
> > On 7 Dec 2018, at 17:52, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > Late but.
> > 
> > 1. A
> > 2. +1
> > 3. +1
> > 4. -1
> > 5. -0
> > 6. +1
> > 
> > RE 4, I think blocker is an important priority. High and urgent mean the 
> > same thing to me. Wish is fine, but that is too similar to low if you ask 
> > me. My ideal would be low, medium, high, blocker. Medium feels weird, but 
> > it's a real thing, it's not high priority and we really want it done, but 
> > it's not low enough that we might skip it/not get to it anytime soon.
> > 
> > RE 5. I don't think I have ever used the environment field or used the 
> > contents populated in it. Doesn't mean someone else hasn't, but in terms of 
> > making the easy things easy it seems like making it required isn't so high 
> > value? I don't populate it myself usually I put it in the description or 
> > the subject without thinking.
> > 
> > It seems like the purpose of a field is to make it indexable and possibly 
> > structured. How often do we search or require structure on this field?
> > 
> > Ariel
> > 
> > On Tue, Dec 4, 2018, at 2:12 PM, Benedict Elliott Smith wrote:
> >> Ok, so after an initial flurry everyone has lost interest :)
> >> 
> >> I think we should take a quick poll (not a vote), on people’s positions 
> >> on the questions raised so far.  If people could try to take the time to 
> >> stake a +1/-1, or A/B, for each item, that would be really great.  This 
> >> poll will not be the end of discussions, but will (hopefully) at least 
> >> draw a line under the current open questions.
> >> 
> >> I will start with some verbiage, then summarise with options for 
> >> everyone to respond to.  You can scroll to the summary immediately if 
> >> you like.
> >> 
> >> - 1. Component: Multi-select or Cascading-select (i.e. only one 
> >> component possible per ticket, but neater UX)
> >> - 2. Labels: rather than litigate people’s positions, I propose we do 
> >> the least controversial thing, which is to simply leave labels intact, 
> >> and only supplement them with the new schema information.  We can later 
> >> revisit if we decide it’s getting messy.
> >> - 3. "First review completed; second review ongoing&q

Re: JIRA Workflow Proposals

2018-12-11 Thread Ariel Weisberg

Hi,

Sorry I was just slow on the uptake as to what auto-populate meant RE #2.

1. -1, while restricting editing on certain fields or issues that people did 
not submit themselves is OK I don't think  it's reasonable to block edits to 
subject, or description on issues a user has submitted. 

Do we actually have a problem that needs solving with restricting edits? I feel 
like we aren't being harmed right now by the current power people are wielding?

2. B, C, A

3. A 

4. -.5, I really don't see Wish as something other then a synonym for low 
priority. Only -.5 because I don't think it's that harmful either.

Ariel

On Mon, Dec 10, 2018, at 8:51 PM, Benedict Elliott Smith wrote:
> On 10 Dec 2018, at 16:21, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > RE #1, does this mean if you submit a ticket and you are not a contributor 
> > you can't modify any of the fields including description or adding/removing 
> > attachments?
> 
> Attachment operations have their own permissions, like comments.  
> Description would be prohibited though.  I don’t see this as a major 
> problem, really; it is generally much more useful to add comments.  If 
> we particularly want to make a subset of fields editable there is a 
> workaround, though I’m not sure anybody would have the patience to 
> implement it:  
> https://confluence.atlassian.com/jira/how-can-i-control-the-editing-of-issue-fields-via-workflow-149834.html
>  
> <https://confluence.atlassian.com/jira/how-can-i-control-the-editing-of-issue-fields-via-workflow-149834.html>
> 
> > RE #2, while bugs don't necessarily have a priority it's helpful to have it 
> > sort logically with other issue types on that field. Seems like ideally 
> > what we want to preserve is a useful sort order without having to populate 
> > the field manually.
> 
> Do you have a suggestion that achieves this besides auto-populating (if 
> that’s even possible)?  More than happy to add suggestions to the list.
> 
> > RE #4, Do we need to keep wish at all?
> 
> I’m unclear on what you’re asking?  I included exactly this question, 
> directly in response to your opinion that it should not be kept.  If you 
> have more to add to your earlier view, please feel free to share it.
> 
> > Not voting yet just because I'm not sure on some.
> > 
> > Ariel
> > 
> > On Mon, Dec 10, 2018, at 7:43 AM, Benedict Elliott Smith wrote:
> >> New questions.  This is the last round, before I call a proper vote on 
> >> the modified proposal (so we can take a mandate to Infra to modify our 
> >> JIRA workflows).  
> >> 
> >> Thanks again to everyone following and contributing to this discussion.  
> >> I’m not sure any of these remaining questions are critical, but for the 
> >> best democratic outcome it’s probably worth running them through the 
> >> same process.  I also forgot to include (1) on the prior vote.
> >> 
> >> 1. Limit edits to JIRA ‘contributor’ role: +1/-1
> >> 2. Priority on Bug issue type: (A) remove it; (B) auto-populate it; (C) 
> >> leave it.  Please rank.
> >> 3. Top priority: (A) Urgent; (B) Blocker.  See here for my explanation 
> >> of why I chose Urgent 
> >> <https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E
> >>  
> >> <https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E>>.
> >> 4. Priority keep ‘Wish’ (to replace issue type): +1/-1
> >> 
> >> For 2, if we cannot remove it, we can make it non-editable and default 
> >> to Normal; for auto-populate I propose using Severity (Low->Low, Normal-
> >>> Normal, Critical->Urgent).  No guarantees entirely on what we can 
> >> achieve, so a ranked choice would be ideal.
> >> 
> >> I have avoided splitting out another vote on the Platform field, since 
> >> everyone was largely meh on the question of mandatoriness; it won by 
> >> only a slim margin, because everyone was +/- 0, and nobody responded to 
> >> back Ariel’s dissenting view.
> >> 
> >> My votes are:
> >> 1: +1
> >> 2: B,C,A
> >> 3: A
> >> 4: +0.5
> >> 
> >> 
> >> For tracking, the new consensus from the prior vote is:
> >> 1: A (+10)
> >> 2: +9 -0.1
> >> 3: +10
> >> 4: +6 -2 (=+4)
> >> 5: +2; a lot of meh.
> >> 6: +9
> >> 
> >> 
> >> 
> >>> On 7 Dec 2018, at 17:52, Ariel Weisberg  wrote:

Re: JIRA Workflow Proposals

2018-12-12 Thread Ariel Weisberg

Hi,

Updating to reflect the new options for 1. 2, 3, and 4 remain unchanged.

1. E, D, C,  B, A

2. B, C, A

3. A

4. -.5

Ariel
On Tue, Dec 11, 2018, at 10:55 AM, Ariel Weisberg wrote:
> Hi,
> 
> Sorry I was just slow on the uptake as to what auto-populate meant RE #2.
> 
> 1. -1, while restricting editing on certain fields or issues that people 
> did not submit themselves is OK I don't think  it's reasonable to block 
> edits to subject, or description on issues a user has submitted. 
> 
> Do we actually have a problem that needs solving with restricting edits? 
> I feel like we aren't being harmed right now by the current power people 
> are wielding?
> 
> 2. B, C, A
> 
> 3. A 
> 
> 4. -.5, I really don't see Wish as something other then a synonym for 
> low priority. Only -.5 because I don't think it's that harmful either.
> 
> Ariel
> 
> On Mon, Dec 10, 2018, at 8:51 PM, Benedict Elliott Smith wrote:
> > On 10 Dec 2018, at 16:21, Ariel Weisberg  wrote:
> > > 
> > > Hi,
> > > 
> > > RE #1, does this mean if you submit a ticket and you are not a 
> > > contributor you can't modify any of the fields including description or 
> > > adding/removing attachments?
> > 
> > Attachment operations have their own permissions, like comments.  
> > Description would be prohibited though.  I don’t see this as a major 
> > problem, really; it is generally much more useful to add comments.  If 
> > we particularly want to make a subset of fields editable there is a 
> > workaround, though I’m not sure anybody would have the patience to 
> > implement it:  
> > https://confluence.atlassian.com/jira/how-can-i-control-the-editing-of-issue-fields-via-workflow-149834.html
> >  
> > <https://confluence.atlassian.com/jira/how-can-i-control-the-editing-of-issue-fields-via-workflow-149834.html>
> > 
> > > RE #2, while bugs don't necessarily have a priority it's helpful to have 
> > > it sort logically with other issue types on that field. Seems like 
> > > ideally what we want to preserve is a useful sort order without having to 
> > > populate the field manually.
> > 
> > Do you have a suggestion that achieves this besides auto-populating (if 
> > that’s even possible)?  More than happy to add suggestions to the list.
> > 
> > > RE #4, Do we need to keep wish at all?
> > 
> > I’m unclear on what you’re asking?  I included exactly this question, 
> > directly in response to your opinion that it should not be kept.  If you 
> > have more to add to your earlier view, please feel free to share it.
> > 
> > > Not voting yet just because I'm not sure on some.
> > > 
> > > Ariel
> > > 
> > > On Mon, Dec 10, 2018, at 7:43 AM, Benedict Elliott Smith wrote:
> > >> New questions.  This is the last round, before I call a proper vote on 
> > >> the modified proposal (so we can take a mandate to Infra to modify our 
> > >> JIRA workflows).  
> > >> 
> > >> Thanks again to everyone following and contributing to this discussion.  
> > >> I’m not sure any of these remaining questions are critical, but for the 
> > >> best democratic outcome it’s probably worth running them through the 
> > >> same process.  I also forgot to include (1) on the prior vote.
> > >> 
> > >> 1. Limit edits to JIRA ‘contributor’ role: +1/-1
> > >> 2. Priority on Bug issue type: (A) remove it; (B) auto-populate it; (C) 
> > >> leave it.  Please rank.
> > >> 3. Top priority: (A) Urgent; (B) Blocker.  See here for my explanation 
> > >> of why I chose Urgent 
> > >> <https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E
> > >>  
> > >> <https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E>>.
> > >> 4. Priority keep ‘Wish’ (to replace issue type): +1/-1
> > >> 
> > >> For 2, if we cannot remove it, we can make it non-editable and default 
> > >> to Normal; for auto-populate I propose using Severity (Low->Low, Normal-
> > >>> Normal, Critical->Urgent).  No guarantees entirely on what we can 
> > >> achieve, so a ranked choice would be ideal.
> > >> 
> > >> I have avoided splitting out another vote on the Platform field, since 
> > >> everyone was largely meh on the question of mandatoriness; it won by 
> > >

Re: Revisit the proposal to use github PR

2018-12-13 Thread Ariel Weisberg

Hi,

I'm not clear on what github makes worse. It preserves more history then the 
JIRA approach. When people invitably force push their branches you can't tell 
from the link to a branch on JIRA. Github preserves the comments and force push 
history so you know what version of the code each comment applied to. Github 
also tracks when requests for changes are acknowledged and resolved.  I have 
had to make the same change request many times and keep track independently 
whether it was resolved. This has also resulting in mistakes getting merged 
when I missed a comment that was ignored.

Now that github can CC JIRA that also CCs to commits@. It's better then JIRA 
comments because each comment includes a small diff of the code the comment 
applies to. To do that in JIRA I have to manually link to the code the PR and 
most people don't do that for every comment so some of them are inscrutable 
after the fact. Also manually created links sometimes refer to references that 
disappear or get force pushed. It's a bit tricky to get right.

To me arguing against leveraging a better code review workflow (whether github 
or some other tool) is like arguing against using source control tools. Sure 
the filesystem and home grown scripts can be used to work around lack of source 
control, but why would you?

I see two complaints so far. One is that github PRs encourage nitpicking. I 
don't see tool based solution to that off the cuff. Another is that github 
doesn't by default CC JIRA. Maybe we can just refuse to accept improperly 
formatted PRs and look into auto rejecting ones that don't refer to a ticket?

Ariel

On Thu, Dec 13, 2018, at 12:20 PM, Aleksey Yeschenko wrote:
> There are some nice benefits to GH PRs, one of them is that we could 
> eventually set up CircleCI hooks that would explicitly prevent commits 
> that don’t pass the tests.
> 
> But handling multiple branches would indeed be annoying. Would have to 
> either submit 1 PR per branch - which is both tedious and non-atomic - 
> or do a mixed approach, with a PR for the oldest branch, then a manual 
> merge upwards. The latter would be kinda meh, especially when commits 
> for different branches diverge.
> 
> For me personally, the current setup works quite well, and I mostly 
> share Sylvain’s opinion above, for the same reasons listed.
> 
> —
> AY
> 
> > On 13 Dec 2018, at 08:15, Sylvain Lebresne  wrote:
> > 
> > Fwiw, I personally find it very useful to have all discussion, review
> > comments included, in the same place (namely JIRA, since for better or
> > worse, that's what we use for tracking tickets). Typically, that means
> > everything gets consistently pushed to the  commits@ mailing list, which I
> > find extremely convenient to keep track of things. I also have a theory
> > that the inline-comments type of review github PR give you is very
> > convenient for nitpicks, shallow or spur-of-the-moment comments, but
> > doesn't help that much for deeper reviews, and that it thus to favor the
> > former kind of review.
> > 
> > Additionally, and to Benedict's point, I happen to have first hand
> > experience with a PR-based process for a multi-branch workflow very similar
> > to the one of this project, and suffice to say that I hate it with a
> > passion.
> > 
> > Anyway, very much personal opinion here.
> > --
> > Sylvain
> > 
> > 
> > On Thu, Dec 13, 2018 at 2:13 AM dinesh.jo...@yahoo.com.INVALID
> >  wrote:
> > 
> >> I've been already using github PRs for some time now. Once you specify the
> >> ticket number, the comments and discussion are persisted in Apache Jira as
> >> work log so it can be audited if desired. However, committers usually
> >> squash and commit the changes once the PR is approved. We don't use the
> >> merge feature in github. I don't believe github we can merge the commit
> >> into multiple branches through the UI. We would need to merge it into one
> >> branch and then manually merge that commit into other branches. The big
> >> upside of using github PRs is that it makes collaborating a lot easier.
> >> Downside is that it makes it very difficult to follow along the progress in
> >> Apache Jira. The messages that github posts back include huge diffs and are
> >> aweful.
> >> Dinesh
> >> 
> >>On Thursday, December 13, 2018, 1:10:12 AM GMT+5:30, Benedict Elliott
> >> Smith  wrote:
> >> 
> >> Perhaps somebody could summarise the tradeoffs?  I’m a little concerned
> >> about how it would work for our multi-branch workflow.  Would we open
> >> multiple PRs?
> >> 
> >> Could we easily link with external CircleCI?
> >> 
> >> It occurs to me, in JIRA proposal mode, that an extra required field for a
> >> permalink to GitHub for the patch would save a lot of time I spend hunting
> >> for a branch in the comments.
> >> 
> >> 
> >> 
> >> 
> >>> On 12 Dec 2018, at 19:20, jay.zhu...@yahoo.com.INVALID wrote:
> >>> 
> >>> It was discussed 1 year's ago:
> >> https://www.mail-archive.com/dev@cassandra.apache.org/msg11810.html
>

Re: Revisit the proposal to use github PR

2018-12-13 Thread Ariel Weisberg

Hi,

Sorry I missed that point. I agree github PRs are not useful for merging.

What I do is force push the feature/bug fix branches (which is fine, github 
remembers the old versions in the PR) with everything updated and ready to 
merge, and then push those branches from my local repo to the apache repo with 
--atomic.

Ariel

On Thu, Dec 13, 2018, at 1:00 PM, Jason Brown wrote:
> To clarify my position: Github PRs are great for *reviewing* code, and the
> commentary is much easier to follow imo. But for *merging* code, esp into
> our multi-branch strategy, PRs don't fit well, unless there's some
> technique I and perhaps others are unaware of.
> 
> On Thu, Dec 13, 2018 at 9:47 AM Ariel Weisberg  wrote:
> 
> > Hi,
> >
> > I'm not clear on what github makes worse. It preserves more history then
> > the JIRA approach. When people invitably force push their branches you
> > can't tell from the link to a branch on JIRA. Github preserves the comments
> > and force push history so you know what version of the code each comment
> > applied to. Github also tracks when requests for changes are acknowledged
> > and resolved.  I have had to make the same change request many times and
> > keep track independently whether it was resolved. This has also resulting
> > in mistakes getting merged when I missed a comment that was ignored.
> >
> > Now that github can CC JIRA that also CCs to commits@. It's better then
> > JIRA comments because each comment includes a small diff of the code the
> > comment applies to. To do that in JIRA I have to manually link to the code
> > the PR and most people don't do that for every comment so some of them are
> > inscrutable after the fact. Also manually created links sometimes refer to
> > references that disappear or get force pushed. It's a bit tricky to get
> > right.
> >
> > To me arguing against leveraging a better code review workflow (whether
> > github or some other tool) is like arguing against using source control
> > tools. Sure the filesystem and home grown scripts can be used to work
> > around lack of source control, but why would you?
> >
> > I see two complaints so far. One is that github PRs encourage nitpicking.
> > I don't see tool based solution to that off the cuff. Another is that
> > github doesn't by default CC JIRA. Maybe we can just refuse to accept
> > improperly formatted PRs and look into auto rejecting ones that don't refer
> > to a ticket?
> >
> > Ariel
> >
> > On Thu, Dec 13, 2018, at 12:20 PM, Aleksey Yeschenko wrote:
> > > There are some nice benefits to GH PRs, one of them is that we could
> > > eventually set up CircleCI hooks that would explicitly prevent commits
> > > that don’t pass the tests.
> > >
> > > But handling multiple branches would indeed be annoying. Would have to
> > > either submit 1 PR per branch - which is both tedious and non-atomic -
> > > or do a mixed approach, with a PR for the oldest branch, then a manual
> > > merge upwards. The latter would be kinda meh, especially when commits
> > > for different branches diverge.
> > >
> > > For me personally, the current setup works quite well, and I mostly
> > > share Sylvain’s opinion above, for the same reasons listed.
> > >
> > > —
> > > AY
> > >
> > > > On 13 Dec 2018, at 08:15, Sylvain Lebresne  wrote:
> > > >
> > > > Fwiw, I personally find it very useful to have all discussion, review
> > > > comments included, in the same place (namely JIRA, since for better or
> > > > worse, that's what we use for tracking tickets). Typically, that means
> > > > everything gets consistently pushed to the  commits@ mailing list,
> > which I
> > > > find extremely convenient to keep track of things. I also have a theory
> > > > that the inline-comments type of review github PR give you is very
> > > > convenient for nitpicks, shallow or spur-of-the-moment comments, but
> > > > doesn't help that much for deeper reviews, and that it thus to favor
> > the
> > > > former kind of review.
> > > >
> > > > Additionally, and to Benedict's point, I happen to have first hand
> > > > experience with a PR-based process for a multi-branch workflow very
> > similar
> > > > to the one of this project, and suffice to say that I hate it with a
> > > > passion.
> > > >
> > > > Anyway, very much personal opinion here.
> > > > --
> > > > S

Re: [VOTE] Change Jira Workflow

2018-12-18 Thread Ariel Weisberg

+1

On Mon, Dec 17, 2018, at 10:19 AM, Benedict Elliott Smith wrote:
> I propose these changes 
> *
>  
> to the Jira Workflow for the project.  The vote will be open for 72 
> hours**.
> 
> I am, of course, +1.
> 
> * With the addendum of the mailing list discussion 
> ;
>  
> in case of any conflict arising from a mistake on my part in the wiki, 
> the consensus reached by polling the mailing list will take precedence.
> ** I won’t be around to close the vote, as I will be on vacation.  
> Everyone is welcome to ignore the result until I get back in a couple of 
> weeks, or if anybody is eager feel free to close the vote and take some 
> steps towards implementation.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Git Repo Migration

2019-01-04 Thread Ariel Weisberg

+1

On Fri, Jan 4, 2019, at 5:49 AM, Sam Tunnicliffe wrote:
> As per the announcement on 7th December 2018[1], ASF infra are planning 
> to shutdown the service behind git-wip-us.apache.org and migrate all 
> existing repos to gitbox.apache.org 
> 
> There are further details in the original mail, but apparently one of 
> the benefits of the migration is that we'll have full write access via 
> Github, including the ability finally to close PRs. This affects the 
> cassandra, cassandra-dtest and cassandra-build repos (but not the new 
> cassandra-sidecar repo).
> 
> A pre-requisite of the migration is to demonstrate consensus within the 
> community, so to satisfy that formality I'm starting this thread to 
> gather any objections or specific requests regarding the timing of the 
> move.
> 
> I'll collate responses in a week or so and file the necessary INFRA Jira.
> 
> Thanks,
> Sam
> 
> [1] 
> https://lists.apache.org/thread.html/667772efdabf49a0a23d585539c127f335477e033f1f9b6f5079aced@%3Cdev.cassandra.apache.org%3E
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Implementing an Abstract Replication Strategy

2019-01-29 Thread Ariel Weisberg

Hi,

Cassandra expects a replication strategy to accept a description of a 
consistent hash ring and then use that description to determine what ranges on 
the consistent hash ring each node replicates.

If you implement the API those operations should all just work. 

I'm not sure what the implicit expectations of rebalancing and add/remove/node 
and so on are. This is despite the fact that I was staring at that code for 6 
months 6 months ago. Most of the code basically looks at a before picture from 
the replication strategy, and an after picture and moves data around until 
those two match.

Depending on the changes your replication strategy makes in responses to 
changes in the ring that code might not ship the data around. There are 
assumptions like when you move a node the data that needs to be streamed and 
fetched can all be done at that one node. You need to make sure that whatever 
state changes occur on ring changes can actually be realized by the 
add/remove/rebalance code. They also need to be done online in a system that is 
continuing to accept reads and writes so things like overlapping group 
memberships need to be taken into account.

It's a hard problem, but easier to talk about once we know what you want the 
replication strategy to do.

Ariel

On Tue, Jan 29, 2019, at 3:52 PM, Seyed Hossein Mortazavi wrote:
> I'm working on changing Cassandra for an academic project where the goal is
> to change the replicas are determined for each partition using static
> parameters that are set outside of Cassandra. I've read online that this
> can be achieved by extending the AbstractReplicationStrategy class. I have
> the following questions
> 
> 1- If we add/remove nodes, and Cassandra goes through the process of
> re-balancing, are functions from my class called?
> 2- For Paxos lightweight transactions, are my functions called?
> 3- Can I run into other problems? If yes, where?
> 
> Thank you very much

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 2.2.14

2019-02-05 Thread Ariel Weisberg

Hi,

Can we also run the upgrade tests? We should do that as part of the release 
process. I can do that tomorrow.

Ariel

On Tue, Feb 5, 2019, at 1:11 PM, Joseph Lynch wrote:
> 2.2.14-tentative unit and dtest run:
> https://circleci.com/gh/jolynch/cassandra/tree/2.2.14-tentative
> 
> unit tests: 0 failures
> dtests: 5 failures
> * test_closing_connections - thrift_hsha_test.TestThriftHSHA (
> https://issues.apache.org/jira/browse/CASSANDRA-14595)
> * test_multi_dc_tokens_default - token_generator_test.TestTokenGenerator
> * test_multi_dc_tokens_murmur3 - token_generator_test.TestTokenGenerator
> * test_multi_dc_tokens_random - token_generator_test.TestTokenGenerator
> * test_multiple_repair - repair_tests.incremental_repair_test.TestIncRepair
> (flake?)
> 
> I've cut https://issues.apache.org/jira/browse/CASSANDRA-15012 for fixing
> the TestTokenGenerator tests, it looks straightforward.
> 
> +1 non binding
> 
> -Joey
> 
> On Sat, Feb 2, 2019 at 4:32 PM Michael Shuler 
> wrote:
> 
> > I propose the following artifacts for release as 2.2.14.
> >
> > sha1: af91658353ba601fc8cd08627e8d36bac62e936a
> > Git:
> >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.14-tentative
> > Artifacts:
> >
> > https://repository.apache.org/content/repositories/orgapachecassandra-1172/org/apache/cassandra/apache-cassandra/2.2.14/
> > Staging repository:
> > https://repository.apache.org/content/repositories/orgapachecassandra-1172/
> >
> > The Debian and RPM packages are available here:
> > http://people.apache.org/~mshuler
> >
> > The vote will be open for 72 hours (longer if needed).
> >
> > [1]: CHANGES.txt:
> >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.14-tentative
> > [2]: NEWS.txt:
> >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.14-tentative
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 2.2.14

2019-02-06 Thread Ariel Weisberg

Hi,

+1

Upgrade tests:
https://circleci.com/gh/aweisberg/cassandra/2587

Known issue https://issues.apache.org/jira/browse/CASSANDRA-14155 which is not 
a blocker
There is a test failure on a thrift connection not being open. Might be  test 
bug. If it's a product bug it's probably not that serious.

Ariel
On Wed, Feb 6, 2019, at 5:03 AM, Marcus Eriksson wrote:
> +1
> 
> Den ons 6 feb. 2019 kl 10:53 skrev Benedict Elliott Smith <
> bened...@apache.org>:
> 
> > +1
> >
> > > On 6 Feb 2019, at 05:09, Jeff Jirsa  wrote:
> > >
> > > +1
> > >
> > > On Sat, Feb 2, 2019 at 4:32 PM Michael Shuler 
> > > wrote:
> > >
> > >> I propose the following artifacts for release as 2.2.14.
> > >>
> > >> sha1: af91658353ba601fc8cd08627e8d36bac62e936a
> > >> Git:
> > >>
> > >>
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.14-tentative
> > >> Artifacts:
> > >>
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1172/org/apache/cassandra/apache-cassandra/2.2.14/
> > >> Staging repository:
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1172/
> > >>
> > >> The Debian and RPM packages are available here:
> > >> http://people.apache.org/~mshuler
> > >>
> > >> The vote will be open for 72 hours (longer if needed).
> > >>
> > >> [1]: CHANGES.txt:
> > >>
> > >>
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.14-tentative
> > >> [2]: NEWS.txt:
> > >>
> > >>
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.14-tentative
> > >>
> > >>
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 3.0.18

2019-02-06 Thread Ariel Weisberg

Hi,

+1

Upgrade test look OK. There are failures I also found against 2.2 plus some 
other test bugs. Nothing that looks like a product bug.
https://circleci.com/gh/aweisberg/cassandra/2589

Ariel

On Wed, Feb 6, 2019, at 5:02 AM, Marcus Eriksson wrote:
> +1
> 
> Den ons 6 feb. 2019 kl 10:53 skrev Benedict Elliott Smith <
> bened...@apache.org>:
> 
> > +1
> >
> > > On 6 Feb 2019, at 08:01, Tommy Stendahl 
> > wrote:
> > >
> > > +1 (non-binding)
> > >
> > > /Tommy
> > >
> > > On lör, 2019-02-02 at 18:32 -0600, Michael Shuler wrote:
> > >
> > > I propose the following artifacts for release as 3.0.18.
> > >
> > > sha1: edd52cef50a6242609a20d0d84c8eb74c580035e
> > > Git:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.18-tentative
> > > Artifacts:
> > >
> > https://repository.apache.org/content/repositories/orgapachecassandra-1171/org/apache/cassandra/apache-cassandra/3.0.18/
> > > Staging repository:
> > >
> > https://repository.apache.org/content/repositories/orgapachecassandra-1171/
> > >
> > > The Debian and RPM packages are available here:
> > > http://people.apache.org/~mshuler
> > >
> > > The vote will be open for 72 hours (longer if needed).
> > >
> > > [1]: CHANGES.txt:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.18-tentative
> > > [2]: NEWS.txt:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.18-tentative
> > >
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 3.11.4

2019-02-06 Thread Ariel Weisberg

Hi,

-0

bootstrap_upgrade_test.py test_simple_bootstrap_mixed_versions fails because it 
doesn't see the expected on disk size within 30% of the expected value. It's 
bootstrapping a new version node and runs cleanup on the existing node. If the 
data were evenly distributed the on disk size should be similar.

https://circleci.com/gh/aweisberg/cassandra/2591#tests/containers/40

I don't have time to see if this reproduces manually. I kicked off the tests 
again to see if reproduces. https://circleci.com/gh/aweisberg/cassandra/2593

Ariel

On Wed, Feb 6, 2019, at 5:02 AM, Marcus Eriksson wrote:
> +1
> 
> Den ons 6 feb. 2019 kl 10:52 skrev Benedict Elliott Smith <
> bened...@apache.org>:
> 
> > +1
> >
> > > On 6 Feb 2019, at 08:01, Tommy Stendahl 
> > wrote:
> > >
> > > +1 (non-binding)
> > >
> > > /Tommy
> > >
> > > On lör, 2019-02-02 at 18:31 -0600, Michael Shuler wrote:
> > >
> > > I propose the following artifacts for release as 3.11.4.
> > >
> > > sha1: fd47391aae13bcf4ee995abcde1b0e180372d193
> > > Git:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.11.4-tentative
> > > Artifacts:
> > >
> > https://repository.apache.org/content/repositories/orgapachecassandra-1170/org/apache/cassandra/apache-cassandra/3.11.4/
> > > Staging repository:
> > >
> > https://repository.apache.org/content/repositories/orgapachecassandra-1170/
> > >
> > > The Debian and RPM packages are available here:
> > > http://people.apache.org/~mshuler
> > >
> > > The vote will be open for 72 hours (longer if needed).
> > >
> > > [1]: CHANGES.txt:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.4-tentative
> > > [2]: NEWS.txt:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.4-tentative
> > >
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 3.11.4

2019-02-06 Thread Ariel Weisberg

Hi,

It fails consistently. I don't know why the data is not evenly distributed. Can 
someone volunteer to debug this failing test to make sure there isn't an issue 
with bootstrap in 3.11? 

https://circleci.com/gh/aweisberg/cassandra/2593

Thanks,
Ariel
On Wed, Feb 6, 2019, at 3:11 PM, Ariel Weisberg wrote:
> Hi,
> 
> -0
> 
> bootstrap_upgrade_test.py test_simple_bootstrap_mixed_versions fails 
> because it doesn't see the expected on disk size within 30% of the 
> expected value. It's bootstrapping a new version node and runs cleanup 
> on the existing node. If the data were evenly distributed the on disk 
> size should be similar.
> 
> https://circleci.com/gh/aweisberg/cassandra/2591#tests/containers/40
> 
> I don't have time to see if this reproduces manually. I kicked off the 
> tests again to see if reproduces. 
> https://circleci.com/gh/aweisberg/cassandra/2593
> 
> Ariel
> 
> On Wed, Feb 6, 2019, at 5:02 AM, Marcus Eriksson wrote:
> > +1
> > 
> > Den ons 6 feb. 2019 kl 10:52 skrev Benedict Elliott Smith <
> > bened...@apache.org>:
> > 
> > > +1
> > >
> > > > On 6 Feb 2019, at 08:01, Tommy Stendahl 
> > > wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > /Tommy
> > > >
> > > > On lör, 2019-02-02 at 18:31 -0600, Michael Shuler wrote:
> > > >
> > > > I propose the following artifacts for release as 3.11.4.
> > > >
> > > > sha1: fd47391aae13bcf4ee995abcde1b0e180372d193
> > > > Git:
> > > >
> > > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.11.4-tentative
> > > > Artifacts:
> > > >
> > > https://repository.apache.org/content/repositories/orgapachecassandra-1170/org/apache/cassandra/apache-cassandra/3.11.4/
> > > > Staging repository:
> > > >
> > > https://repository.apache.org/content/repositories/orgapachecassandra-1170/
> > > >
> > > > The Debian and RPM packages are available here:
> > > > http://people.apache.org/~mshuler
> > > >
> > > > The vote will be open for 72 hours (longer if needed).
> > > >
> > > > [1]: CHANGES.txt:
> > > >
> > > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.4-tentative
> > > > [2]: NEWS.txt:
> > > >
> > > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.4-tentative
> > > >
> > > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 3.11.4

2019-02-07 Thread Ariel Weisberg

Hi,

Vinay, thank you for diagnosing this.

+1 on the release then since this a test bug and 13004 has already extensively 
litigated the UX of this.

Ariel

On Thu, Feb 7, 2019, at 3:49 AM, Vinay Chella wrote:
> Hi Ariel,
> 
> test_simple_bootstrap_mixed_versions issue is related to CASSANDRA-13004
> <https://issues.apache.org/jira/browse/CASSANDRA-13004>, which introduced
> "cassandra.force_3_0_protocol_version" for schema migrations during
> upgrades from 3.0.14 upwards. This flag is missing in
> `test_simple_bootstrap_mixed_versions` upgrade test while we are
> adding/bootstrapping 3.11.4 node to an existing 3.5 version of C* node.
> This resulted in `ks` keyspace schema/data not being bootstrapped to the
> new node.
> 
> I debugged and confirmed that MigrationManager::is30Compatible
> <https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/MigrationManager.java#L181-L185>
> is returning false which is forcing 
> MigrationManager::shouldPullSchemaFrom
> <https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/MigrationManager.java#L168-L177>
> to return false as well.
> 
> *From debug logs:*
> DEBUG [GossipStage:1] 2019-02-06 23:20:47,392 MigrationManager.java:115 -
> Not pulling schema because versions match or shouldPullSchemaFrom returned
> false
> 
> Here is the updated dtest branch:
> https://github.com/vinaykumarchella/cassandra-dtest/tree/fix_failing_upgradetest
> 
> dtests on CircleCI: https://circleci.com/gh/vinaykumarchella/cassandra/345
> 
> P.S: While MigrationManager
> <https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/MigrationManager.java#L181-L185>
> confirms that schema migrations from pre 3.11 are not allowed without
> `cassandra.force_3_0_protocol_version` option, release notes for 3.11 
> are
> confusing - docs
> <https://github.com/apache/cassandra/blob/cassandra-3.11/NEWS.txt#L173-L174>
> 
> Let me know if this looks good to you, I will send a patch to
> cassandra-dtest
> 
> 
> 
> Thanks,
> Vinay Chella
> 
> 
> On Wed, Feb 6, 2019 at 8:07 PM Vinay Chella  wrote:
> 
> > Hi Ariel,
> >
> > Sure, I am volunteering to debug this. Will update the progress here.
> >
> > Thanks,
> > Vinay
> >
> >
> > On Wed, Feb 6, 2019 at 1:41 PM Ariel Weisberg  wrote:
> >
> >> Hi,
> >>
> >> It fails consistently. I don't know why the data is not evenly
> >> distributed. Can someone volunteer to debug this failing test to make sure
> >> there isn't an issue with bootstrap in 3.11?
> >>
> >> https://circleci.com/gh/aweisberg/cassandra/2593
> >>
> >> Thanks,
> >> Ariel
> >> On Wed, Feb 6, 2019, at 3:11 PM, Ariel Weisberg wrote:
> >> > Hi,
> >> >
> >> > -0
> >> >
> >> > bootstrap_upgrade_test.py test_simple_bootstrap_mixed_versions fails
> >> > because it doesn't see the expected on disk size within 30% of the
> >> > expected value. It's bootstrapping a new version node and runs cleanup
> >> > on the existing node. If the data were evenly distributed the on disk
> >> > size should be similar.
> >> >
> >> > https://circleci.com/gh/aweisberg/cassandra/2591#tests/containers/40
> >> >
> >> > I don't have time to see if this reproduces manually. I kicked off the
> >> > tests again to see if reproduces.
> >> > https://circleci.com/gh/aweisberg/cassandra/2593
> >> >
> >> > Ariel
> >> >
> >> > On Wed, Feb 6, 2019, at 5:02 AM, Marcus Eriksson wrote:
> >> > > +1
> >> > >
> >> > > Den ons 6 feb. 2019 kl 10:52 skrev Benedict Elliott Smith <
> >> > > bened...@apache.org>:
> >> > >
> >> > > > +1
> >> > > >
> >> > > > > On 6 Feb 2019, at 08:01, Tommy Stendahl <
> >> tommy.stend...@ericsson.com>
> >> > > > wrote:
> >> > > > >
> >> > > > > +1 (non-binding)
> >> > > > >
> >> > > > > /Tommy
> >> > > > >
> >> > > > > On lör, 2019-02-02 at 18:31 -0600, Michael Shuler wrote:
> >> > > > >
> >> > > > > I propose the following artifacts for release as 3.11.4.
> >> > > > >
> >> > > > > sha1: fd47391aae13bcf4ee995abcde1b0e180372d193
> >> > > &g

Re: cqlsh tests and Python 3

2019-02-11 Thread Ariel Weisberg

Hi,

Do you mean Python 2/3 compatibility? 

This has been discussed earlier and I think that being compatible with both is 
an easier sell.

Ariel

> On Feb 11, 2019, at 1:24 PM, dinesh.jo...@yahoo.com.INVALID 
>  wrote:
> 
> Hey all,
> We've gotten the cqlsh tests running in the Cassandra repo (these are 
> distinct from the cqlsh tests in dtests repo). They're in Python 2.7 and 
> using the nosetests. We'd like to make them consistent with the rest of the 
> tests which means moving them to Python 3 & Pytest framework. However this 
> would involve migrating cqlsh to Python 3. Does anybody have any concerns if 
> we move cqlsh to Python 3? Please note that Python 2 is EOL'd and will be 
> unsupported in about 10 months.
> So here are the options -
> 1. Leave cqlsh in Python 2.7 & nosetests. Just make sure they're running as 
> part of the build process.2. Move cqlsh to Python 3 & pytests.3. Leave cqlsh 
> in Python 2.7 but move to Pytests. This option doesn't really add much value 
> though.
> Thanks,
> Dinesh


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: CASSANDRA-14482

2019-02-15 Thread Ariel Weisberg

Hi,

I am +1 since it's an additional compressor and not the default.

Ariel

On Fri, Feb 15, 2019, at 11:41 AM, Dinesh Joshi wrote:
> Hey folks,
> 
> Just wanted to get a pulse on whether we can proceed with ZStd support. 
> The consensus on the ticket was that it’s a very valuable addition 
> without any risk of destabilizing 4.0. It’s ready to go if there aren’t 
> any objections.
> 
> Dinesh
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: March 2015 QA retrospective

2015-04-09 Thread Ariel Weisberg

 mixed CQL thrift
interactions? Possibly abstract everything to be done either by CQL or
Thrift and then permute? Seems low value, but necessary if both are claimed
to be supported.  CASSANDRA-8577
<https://issues.apache.org/jira/browse/CASSANDRA-8577> Artem Aliev Values
of set types not loading correctly into Pig Full set of interactions with
PIG not validated  CASSANDRA-8579
<https://issues.apache.org/jira/browse/CASSANDRA-8579> Jimmy Mårdell
sstablemetadata
can't load org.apache.cassandra.tools.SSTableMetadataViewer Running C* from
source tree not representative of behavior of deployed builds
CASSANDRA-8580 <https://issues.apache.org/jira/browse/CASSANDRA-8580> Marcus
Eriksson AssertionErrors after activating unchecked_tombstone_compaction
with leveled compaction How could this have been reproduced before release?
No regression test  CASSANDRA-8588
<https://issues.apache.org/jira/browse/CASSANDRA-8588> Dave Brosius Fix
DropTypeStatements isusedBy for maps (typo ignored values) Not released,
but was it detected before release by an automated test?  CASSANDRA-8619
<https://issues.apache.org/jira/browse/CASSANDRA-8619> Benedict using
CQLSSTableWriter gives ConcurrentModificationException What kind of test
would have caught this before release?  CASSANDRA-8623
<https://issues.apache.org/jira/browse/CASSANDRA-8623> Marcus Eriksson
sstablesplit
fails *randomly* with Data component is missing Feature not tested before
release? No regression test  CASSANDRA-8632
<https://issues.apache.org/jira/browse/CASSANDRA-8632> Benedict
cassandra-stress
only generating a single unique row We rely on stress for performance
testing, that might mean it needs real testing that demonstrates it
generates load that looks like the load it is supposed to be generating.
CASSANDRA-8635 <https://issues.apache.org/jira/browse/CASSANDRA-8635> Marcus
Eriksson STCS cold sstable omission does not handle overwrites without reads If
this workload is a challenge for certain kinds of optimizations we should
test it if we think it could happen again.  CASSANDRA-8640
<https://issues.apache.org/jira/browse/CASSANDRA-8640> Anthony Cozzie Paxos
requires all nodes for CAS If PAXOS is not supposed to require all nodes
for CAS we should be able to fail nodes or a certain number of nodes and
still continue to CAS (test availability of CAS under failure conditions).
No regression test.  CASSANDRA-8641
<https://issues.apache.org/jira/browse/CASSANDRA-8641> *Unassigned* Repair
causes a large number of tiny SSTables User says something doesn't work for
them? Could we have anticipated that vnodes would not work as formulated
for this case.  CASSANDRA-8652
<https://issues.apache.org/jira/browse/CASSANDRA-8652> Edward Ribeiro DROP
TABLE should also drop BATCH prepared statements associated to it Not sure
if this is an optimization or fixes a user visible issue, but could this
have been detected by exercising the functionality better before release.
CASSANDRA-8668 <https://issues.apache.org/jira/browse/CASSANDRA-8668>
Benedict We don't enforce offheap memory constraints; regression introduced
by 7882 Memory constraints was a supported feature/UI, but not completely
tested before release. Could this have been found most effectively by a
unit test or a blackbox test?  CASSANDRA-8675
<https://issues.apache.org/jira/browse/CASSANDRA-8675> *Unassigned* COPY
TO/FROM broken for newline characters COPY TO/FROM not tested with
representative data  CASSANDRA-8677
<https://issues.apache.org/jira/browse/CASSANDRA-8677> Ariel Weisberg
rpc_interface
and listen_interface generate NPE on startup when specified interface
doesn't exist Missing unit tests checking error messages for
DatabaseDescriptor  CASSANDRA-8687
<https://issues.apache.org/jira/browse/CASSANDRA-8687> Jeremiah Jordan Keyspace
should also check Config.isClientMode Is there a way to test for missing
Config.isClientMode checks?  CASSANDRA-8688
<https://issues.apache.org/jira/browse/CASSANDRA-8688> Yuki Morishita
Standalone
sstableupgrade tool throws exception Tool not tested before release, no
regression test  CASSANDRA-8691
<https://issues.apache.org/jira/browse/CASSANDRA-8691> *Unassigned*
SSTableReader.getPosition()
does not correctly filter out queries that exceed its bounds Is there a
scenario where this is user visible, should we test for that?
CASSANDRA-8694 <https://issues.apache.org/jira/browse/CASSANDRA-8694> Jeff
Jirsa Repair of empty keyspace hangs rather than ignoring the request Missing
boundary condition test, requesting operation on empty, non-existent, or
not applicable entity.  CASSANDRA-8695
<https://issues.apache.org/jira/browse/CASSANDRA-8695> Chris Lockfort thrift
column definition list sometimes immutable What user visible activities
reproduced this, could we have done that before release?  CASSANDRA-8719
<https://issues.apache.org/jira/browse/CASSANDRA

Re: March 2015 QA retrospective

2015-04-09 Thread Ariel Weisberg

Repeated with sort
   *Key* *Assignee* *Summary* *Revisit reason*  CASSANDRA-8285
<https://issues.apache.org/jira/browse/CASSANDRA-8285> Aleksey Yeschenko Move
all hints related tasks to hints private executor Pierre's reproducer
represents something we weren't doing, but that users are. Is that now
being tested?  CASSANDRA-8462
<https://issues.apache.org/jira/browse/CASSANDRA-8462> Aleksey
Yeschenko Upgrading
a 2.0 to 2.1 breaks CFMetaData on 2.0 nodes Have additional dtest coverage,
need to do this in kitchen sink tests  CASSANDRA-8640
<https://issues.apache.org/jira/browse/CASSANDRA-8640> Anthony Cozzie Paxos
requires all nodes for CAS If PAXOS is not supposed to require all nodes
for CAS we should be able to fail nodes or a certain number of nodes and
still continue to CAS (test availability of CAS under failure conditions).
No regression test.  CASSANDRA-8677
<https://issues.apache.org/jira/browse/CASSANDRA-8677> Ariel Weisberg
rpc_interface
and listen_interface generate NPE on startup when specified interface
doesn't exist Missing unit tests checking error messages for
DatabaseDescriptor  CASSANDRA-8577
<https://issues.apache.org/jira/browse/CASSANDRA-8577> Artem Aliev Values
of set types not loading correctly into Pig Full set of interactions with
PIG not validated  CASSANDRA-7704
<https://issues.apache.org/jira/browse/CASSANDRA-7704> Benedict
FileNotFoundException
during STREAM-OUT triggers 100% CPU usage Streaming testing didn't
reproduce this before release  CASSANDRA-8383
<https://issues.apache.org/jira/browse/CASSANDRA-8383> Benedict Memtable
flush may expire records from the commit log that are in a later memtable No
regression test, no follow up ticket. Could/should this have been
reproducable as an actual bug?  CASSANDRA-8429
<https://issues.apache.org/jira/browse/CASSANDRA-8429> Benedict Some keys
unreadable during compaction Running stress in CI would have caught this,
and we're going to do that  CASSANDRA-8459
<https://issues.apache.org/jira/browse/CASSANDRA-8459> Benedict
"autocompaction"
on reads can prevent memtable space reclaimation What would have reproduced
this before release?  CASSANDRA-8499
<https://issues.apache.org/jira/browse/CASSANDRA-8499> Benedict Ensure
SSTableWriter cleans up properly after failure Testing error paths? Any way
to test things in a loop to detect leaks?  CASSANDRA-8513
<https://issues.apache.org/jira/browse/CASSANDRA-8513> Benedict SSTableScanner
may not acquire reference, but will still release it when closed This had a
user visible component, what test could have caught it befor erelease?
CASSANDRA-8619 <https://issues.apache.org/jira/browse/CASSANDRA-8619>
Benedict using CQLSSTableWriter gives ConcurrentModificationException What
kind of test would have caught this before release?  CASSANDRA-8632
<https://issues.apache.org/jira/browse/CASSANDRA-8632> Benedict
cassandra-stress
only generating a single unique row We rely on stress for performance
testing, that might mean it needs real testing that demonstrates it
generates load that looks like the load it is supposed to be generating.
CASSANDRA-8668 <https://issues.apache.org/jira/browse/CASSANDRA-8668>
Benedict We don't enforce offheap memory constraints; regression introduced
by 7882 Memory constraints was a supported feature/UI, but not completely
tested before release. Could this have been found most effectively by a
unit test or a blackbox test?  CASSANDRA-8719
<https://issues.apache.org/jira/browse/CASSANDRA-8719> Benedict Using
thrift HSHA with offheap_objects appears to corrupt data Untested
configuration before release, this would be straightforward if we ran with
it?  CASSANDRA-8726 <https://issues.apache.org/jira/browse/CASSANDRA-8726>
Benedict throw OOM in Memory if we fail to allocate OOM test Cassandra? Try
and validate that it fails cleanly and can be restarted on OOM? Same for
disk full.  CASSANDRA-8018
<https://issues.apache.org/jira/browse/CASSANDRA-8018> Benjamin Lerer Cassandra
seems to insert twice in custom PerColumnSecondaryIndex Custom secondary
indexes not tested before release?  CASSANDRA-8231
<https://issues.apache.org/jira/browse/CASSANDRA-8231> Benjamin Lerer Wrong
size of cached prepared statements Expected cache capacity not validated
with actual cache capcaity, no regression test  CASSANDRA-8365
<https://issues.apache.org/jira/browse/CASSANDRA-8365> Benjamin Lerer CamelCase
name is used as index name instead of lowercase How can we establish UI
consistency?  CASSANDRA-8421
<https://issues.apache.org/jira/browse/CASSANDRA-8421> Benjamin Lerer Cassandra
2.1.1 & Cassandra 2.1.2 UDT not returning value for LIST type as UDT Is
there a test that could have found this condition before release?
CASSANDRA-8514 <https://issues.apache.org/jira/browse/CASSANDRA-8514> Benjamin
Lerer ArrayIndexOutOfBoundsException in no

Re: March 2015 QA retrospective

2015-04-13 Thread Ariel Weisberg

Hi Benedict,

This only requires unit testing or dtests to be run this way. However for
> the kitchen sink tests this is just another dimension in the configuration
> state space, which IMO should be addressed as a whole methodically. Perhaps
> we should file a central JIRA, or the Google doc you suggested, for
> tracking all of these data points?

I created a doc

that
is requirements, but not implementation. I want to list things we would
like it to test in the general sense, as well as enumerating specific bugs
that it should have been able to catch.

This does raise an interesting, but probably not significant downside to
> the new approach: I fixed this ticket because somebody mentioned to me that
> it was hurting them, and I saw a quick and easy fix. The testing would not
> be quick and easy, so I am unlikely to volunteer to patch quick fixes in
> the new world order. This will certainly lead to higher quality bug fixes,
> but it may lead to fewer of them, and fewer instances of volunteer work to
> help people out, because the overhead eats too much into the work you're
> actually responsible for. This may lead to bug fixing being seen as much
> more of a chore than it already can be. I don't say this to discourage the
> new approach; it is just a thought that occurs to me off the back of this
> specific discussion.

It's a real problem. People doing bugs fixes can be stuck spending months
doing nothing but that and writing tests to fill in coverage. Then they get
unhappy and unproductive.

One of the reasons I leave the option for filing a JIRA  open instead of
saying that they have to do something is that it gives assignees and
reviewers the option to have the work done later or by someone else. The
person who is scheduling releases can see that the test issues before
release (you would set fix version for the next release). It's still not
done and the release is not done. That puts pressure on the person who
wants to release to make sure it is in someone's queue.

If you are hardcore agile and doing one or two week sprints what happens is
that there are no tickets left in the sprint other than what was agreed on
at at the planning meeting and people will have no choice but to work on
test tasks. How we manage and prioritize tasks right now is "magic" to me
and maybe not something that scales down to monthly releases.

For monthly releases on at least a weekly basis you need to know what
stands between you and the release being done and you need to have a plan
for who is going to take care of the blockers that crop up.

The testing would not
> be quick and easy, so I am unlikely to volunteer to patch quick fixes in
> the new world order.

I think this gets into how we load balance bug fixes. There is a clear
benefit to routing the bug to the person who will know how to fix and test
it. I have never seen bugs as something you volunteer for. They typically
belong somewhere and if it is with you then so be it.

 because the overhead eats too much into the work you're
> actually responsible for.

We need to make sure that bug fixing isn't seen that way. I think it's
important to make sure bugs find their way home. The work your actually
responsible for is not done so you can't claim that bug fixes are eating
into it. It already done been ate.

We shouldn't prioritize new work over past work that was never finished.
With monthly releases and breaking things down into much smaller chunks it
means you have the option to let new work slip to accommodate without
moving tasks between people.

Ariel

On Fri, Apr 10, 2015 at 7:07 PM, Benedict Elliott Smith <
belliottsm...@datastax.com> wrote:

> >
> > CASSANDRA-8459 
> > "autocompaction"
> > on reads can prevent memtable space reclaimation
> >
> > Can you link a ticket to CASSANDRA-9012 and characterize in a way we can
> > try and implement how to make sufficiently large partitions, over
> > sufficiently large periods of time?
>
> Maybe also enumerate the other permutations where this matters like
> > secondary indexes and the access patterns (scans).
> >
>
> Does this really qualify for its own ticket? This should just be one of
> many configurations for stress' part in the new tests. We should perhaps
> have an aggregation ticket where we ensure we enumerate the configuration
> data points we've met that need to be covered. But, IMO at least, a
> methodical exhaustive approach should be undertaken separately, and only be
> corroborated against such a list to ensure it was done sufficiently well.
>
>
> >
> > CASSANDRA-8619  -
> > using
> > CQLSSTableWriter gives ConcurrentModificationException
> >
> > OK. I don't think the original fix meets our new definition of done since
> > the was insufficient coverage, and in this case no regress

Re: Cassandra fixVersion JIRA change

2015-04-29 Thread Ariel Weisberg

Hi,

How are we going to communicate this convention to people that are being
onboarded to Cassandra development?

Ariel

On Wed, Apr 29, 2015 at 12:56 PM, Jake Luciani  wrote:

> Hi,
>
>
> Currently in JIRA we mark an issue with a specific fixVersion upfront.
> When doing releases this causes issues because once the release is cut
> we need to bulk move all the unresolved issues for that release to the
> next version.  This bulk move operation replaces the fixVersion field
> and wipes out any other fixVersions an issue happened to have. like
> 2.1.5 and 2.0.15.
>
> So I propose (actually this is Sylvain's idea) we create series
> placeholder versions like 2.0.x 2.1.x 3.x which we use to mark the
> series of a JIRA upfront.  On commit we then set the specific version
> the ticket was committed for.
>
> I've started migrating our series over to this already.  Please try to
> remember to do this.
>
> -Jake
>

May 2015 Retrospective

2015-05-04 Thread Ariel Weisberg

Hi,

It's time. This month were are going to try and do the retrospective
in a Google
doc
.
Inside the docs I am guessing we will do a threaded conversation and sign
off contributions in the discussion section.

Last month things didn't thread well and it was hard to track what was
going on.

We released 2.1.5 in April so now is a good time to review anything you
fixed in 2.1.5 with an eye towards things that we would like to have done
better. You don't have to have an answer for how we could do things better.
We don't want to be limited to discussing things we already have answers
for.

If something is already being addressed (or is queued to be addressed by a
JIRA) there is no need to mention it unless you want +1 the issue for some
reason such as it not being prioritized sufficiently.

I am going to look at 2.1.5 the same way I didn't with 2.1.3, but I am
going to hold off on volunteering stuff until I see what people come with.

Regards,
Ariel

Re: May 2015 Retrospective

2015-05-04 Thread Ariel Weisberg

Hi,

Someone asked if they can add their own to went well/poorly/changes and the
answer is yes. We'll iterate on what goes up, but anyone can bring
something up for discussion. I don't think you should post a personal
well/poorly section, but you can post about things that went poorly for
just you. When something doesn't work for you it's not working for any of
us.

Ariel

On Mon, May 4, 2015 at 11:40 AM, Ariel Weisberg  wrote:

> Hi,
>
> It's time. This month were are going to try and do the retrospective in a 
> Google
> doc
> <https://docs.google.com/document/d/159yJY2YS5hLTqlU7J2lOYJr5cfhECRGe7k-QwavuiBw/edit?usp=sharing>.
> Inside the docs I am guessing we will do a threaded conversation and sign
> off contributions in the discussion section.
>
> Last month things didn't thread well and it was hard to track what was
> going on.
>
> We released 2.1.5 in April so now is a good time to review anything you
> fixed in 2.1.5 with an eye towards things that we would like to have done
> better. You don't have to have an answer for how we could do things better.
> We don't want to be limited to discussing things we already have answers
> for.
>
> If something is already being addressed (or is queued to be addressed by a
> JIRA) there is no need to mention it unless you want +1 the issue for some
> reason such as it not being prioritized sufficiently.
>
> I am going to look at 2.1.5 the same way I didn't with 2.1.3, but I am
> going to hold off on volunteering stuff until I see what people come with.
>
> Regards,
> Ariel
>
>

Re: Staging Branches

2015-05-07 Thread Ariel Weisberg

Hi,

I don't think this is necessary. If you merge with trunk, test, and someone
gets in a head of you just merge up and push to trunk anyways. Most of the
time the changes the other person made will be unrelated and they will
compose fine. If you actually conflict then yeah you test again but this
doesn't happen often.

The goal isn't to have trunk passing every single time it's to have it pass
almost all the time so the test history means something and when it fails
it fails because it's broken by the latest merge.

At this size I don't see the need for a staging branch to prevent trunk
from ever breaking. There is a size where it would be helpful I just don't
think we are there yet.

Ariel

On Thu, May 7, 2015 at 5:05 AM, Benedict Elliott Smith <
belliottsm...@datastax.com> wrote:

> A good practice as a committer applying a patch is to build and run the
> unit tests before updating the main repository, but to do this for every
> branch is infeasible and impacts local productivity. Alternatively,
> uploading the result to your development tree and waiting a few hours for
> CI to validate it is likely to result in a painful cycle of race-to-merge
> conflicts, rebasing and waiting again for the tests to run.
>
> So I would like to propose a new strategy: staging branches.
>
> Every major branch would have a parallel branch:
>
> cassandra-2.0 <- cassandra-2.0_staging
> cassandra-2.1 <- cassandra-2.1_staging
> trunk <- trunk_staging
>
> On commit, the idea would be to perform the normal merge process on the
> _staging branches only. CI would then run on every single git ref, and as
> these passed we would fast forward the main branch to the latest validated
> staging git ref. If one of them breaks, we go and edit the _staging branch
> in place to correct the problem, and let CI run again.
>
> So, a commit would look something like:
>
> patch -> cassandra-2.0_staging -> cassandra-2.1_staging -> trunk_staging
>
> wait for CI, see 2.0, 2.1 are fine but trunk is failing, so
>
> git rebase -i trunk_staging 
> fix the problem
> git rebase --continue
>
> wait for CI; all clear
>
> git checkout cassandra-2.0; git merge cassandra-2.0_staging
> git checkout cassandra-2.1; git merge cassandra-2.1_staging
> git checkout trunk; git merge trunk_staging
>
> This does introduce some extra steps to the merge process, and we will have
> branches we edit the history of, but the amount of edited history will be
> limited, and this will remain isolated from the main branches. I'm not sure
> how averse to this people are. An alternative policy might be to enforce
> that we merge locally and push to our development branches then await CI
> approval before merging. We might only require this to be repeated if there
> was a new merge conflict on final commit that could not automatically be
> resolved (although auto-merge can break stuff too).
>
> Thoughts? It seems if we want an "always releasable" set of branches, we
> need something along these lines. I certainly break tests by mistake, or
> the build itself, with alarming regularity. Fixing with merges leaves a
> confusing git history, and leaves the build broken for everyone else in the
> meantime, so patches applied after, and development branches based on top,
> aren't sure if they broke anything themselves.
>

Re: Staging Branches

2015-05-07 Thread Ariel Weisberg

Hi,

Whoah. Our process is our own. We don't have to subscribe to any cargo cult
book buying seminar giving process.

And whatever we do we can iterate and change until it works for us and
solves the problems we want solved.

Ariel

On Thu, May 7, 2015 at 10:13 AM, Aleksey Yeschenko 
wrote:

> Strictly speaking, the train schedule does demand that trunk, and all
> other branches, must be releasable at all times, whether you like it or not
> (for the record - I *don’t* like it, but here we are).
>
> This, and other annoying things, is what be subscribed to tick-tock vs.
> supported branches experiment.
>
> > We still need to run CI before we release. So what does this buy us?
>
> Ideally (eventually?) we won’t have to run CI, including duration tests,
> before we release, because we’ll never merge anything that hadn’t passed
> the full suit, including duration tests.
>
> That said, perhaps it’s too much change at once. We still have missing
> pieces of infrastructure, and TE is busy with what’s already back-logged.
> So let’s revisit this proposal in a few months, closer to 3.1 or 3.2, maybe?
>
> --
> AY
>
> On May 7, 2015 at 16:56:07, Ariel Weisberg (ariel.weisb...@datastax.com)
> wrote:
>
> Hi,
>
> I don't think this is necessary. If you merge with trunk, test, and someone
> gets in a head of you just merge up and push to trunk anyways. Most of the
> time the changes the other person made will be unrelated and they will
> compose fine. If you actually conflict then yeah you test again but this
> doesn't happen often.
>
> The goal isn't to have trunk passing every single time it's to have it pass
> almost all the time so the test history means something and when it fails
> it fails because it's broken by the latest merge.
>
> At this size I don't see the need for a staging branch to prevent trunk
> from ever breaking. There is a size where it would be helpful I just don't
> think we are there yet.
>
> Ariel
>
> On Thu, May 7, 2015 at 5:05 AM, Benedict Elliott Smith <
> belliottsm...@datastax.com> wrote:
>
> > A good practice as a committer applying a patch is to build and run the
> > unit tests before updating the main repository, but to do this for every
> > branch is infeasible and impacts local productivity. Alternatively,
> > uploading the result to your development tree and waiting a few hours for
> > CI to validate it is likely to result in a painful cycle of race-to-merge
> > conflicts, rebasing and waiting again for the tests to run.
> >
> > So I would like to propose a new strategy: staging branches.
> >
> > Every major branch would have a parallel branch:
> >
> > cassandra-2.0 <- cassandra-2.0_staging
> > cassandra-2.1 <- cassandra-2.1_staging
> > trunk <- trunk_staging
> >
> > On commit, the idea would be to perform the normal merge process on the
> > _staging branches only. CI would then run on every single git ref, and as
> > these passed we would fast forward the main branch to the latest
> validated
> > staging git ref. If one of them breaks, we go and edit the _staging
> branch
> > in place to correct the problem, and let CI run again.
> >
> > So, a commit would look something like:
> >
> > patch -> cassandra-2.0_staging -> cassandra-2.1_staging -> trunk_staging
> >
> > wait for CI, see 2.0, 2.1 are fine but trunk is failing, so
> >
> > git rebase -i trunk_staging 
> > fix the problem
> > git rebase --continue
> >
> > wait for CI; all clear
> >
> > git checkout cassandra-2.0; git merge cassandra-2.0_staging
> > git checkout cassandra-2.1; git merge cassandra-2.1_staging
> > git checkout trunk; git merge trunk_staging
> >
> > This does introduce some extra steps to the merge process, and we will
> have
> > branches we edit the history of, but the amount of edited history will be
> > limited, and this will remain isolated from the main branches. I'm not
> sure
> > how averse to this people are. An alternative policy might be to enforce
> > that we merge locally and push to our development branches then await CI
> > approval before merging. We might only require this to be repeated if
> there
> > was a new merge conflict on final commit that could not automatically be
> > resolved (although auto-merge can break stuff too).
> >
> > Thoughts? It seems if we want an "always releasable" set of branches, we
> > need something along these lines. I certainly break tests by mistake, or
> > the build itself, with alarming regularity. Fixing with merges leaves a
> > confusing git history, and leaves the build broken for everyone else in
> the
> > meantime, so patches applied after, and development branches based on
> top,
> > aren't sure if they broke anything themselves.
> >
>

Re: Staging Branches

2015-05-07 Thread Ariel Weisberg

Hi,

Sorry didn't mean to blame or come off snarky. I just it is important not
to #include our release process from somewhere else. We don't have to do
anything unless it is necessary to meet some requirement of what we are
trying to do.

So the phrase "Trunk is always releasable" definitely has some wiggle room
because you have to define what your release process is.

If your requirement is that at any time you be able to tag trunk and ship
it within minutes then yes staging branches help solve that problem.

The reality is that the release process always takes low single digit days
because you branch trunk, then wait for longer running automated tests to
run against that branch. If there happens to be a failure you may have to
update the branch, but you have bounded how much brokeness sits between you
and release already. We also don't have a requirement to be able to ship
nigh immediately.

We can balance the cost of extra steps and process against the cost of
having to delay some releases some of the time by a few days and pick
whichever is more important. We are stilly reducing the amount of time it
takes to get a working release. Reduced enough that we should be able to
ship every month without difficulty. I have been on a team roughly our size
that shipped every three weeks without having staging branches. Trunk broke
infrequently enough it wasn't an issue and when it did break it wasn't hard
to address. The real pain point was flapping tests and the diffusion of
responsibility that prevented them from getting fixed.

If I were trying to sell staging branches I would work the angle that I
want to be able to bisect trunk without coming across broken revisions.
Then balance the value of that with the cost of the process.

Ariel

On Thu, May 7, 2015 at 10:41 AM, Benedict Elliott Smith <
belliottsm...@datastax.com> wrote:

> It's a bit unfair to characterize Aleksey as subscribing to a cargo cult.
> *We* agreed to define the new release process as "keeping trunk always
> releasable".
>
> Your own words that catalyzed this: "If we release off trunk it is pretty
> much necessary for trunk to be in a releasable state all the time"
>
> It is possible we have been imprecise in our discussions, and people have
> agreed to different things. But it does seem to me we agreed to the
> position Aleksey is taking, and he is not blindly following some other
> process that is not ours.
>
> On Thu, May 7, 2015 at 3:25 PM, Ariel Weisberg <
> ariel.weisb...@datastax.com>
> wrote:
>
> > Hi,
> >
> > Whoah. Our process is our own. We don't have to subscribe to any cargo
> cult
> > book buying seminar giving process.
> >
> > And whatever we do we can iterate and change until it works for us and
> > solves the problems we want solved.
> >
> > Ariel
> >
> > On Thu, May 7, 2015 at 10:13 AM, Aleksey Yeschenko 
> > wrote:
> >
> > > Strictly speaking, the train schedule does demand that trunk, and all
> > > other branches, must be releasable at all times, whether you like it or
> > not
> > > (for the record - I *don’t* like it, but here we are).
> > >
> > > This, and other annoying things, is what be subscribed to tick-tock vs.
> > > supported branches experiment.
> > >
> > > > We still need to run CI before we release. So what does this buy us?
> > >
> > > Ideally (eventually?) we won’t have to run CI, including duration
> tests,
> > > before we release, because we’ll never merge anything that hadn’t
> passed
> > > the full suit, including duration tests.
> > >
> > > That said, perhaps it’s too much change at once. We still have missing
> > > pieces of infrastructure, and TE is busy with what’s already
> back-logged.
> > > So let’s revisit this proposal in a few months, closer to 3.1 or 3.2,
> > maybe?
> > >
> > > --
> > > AY
> > >
> > > On May 7, 2015 at 16:56:07, Ariel Weisberg (
> ariel.weisb...@datastax.com)
> > > wrote:
> > >
> > > Hi,
> > >
> > > I don't think this is necessary. If you merge with trunk, test, and
> > someone
> > > gets in a head of you just merge up and push to trunk anyways. Most of
> > the
> > > time the changes the other person made will be unrelated and they will
> > > compose fine. If you actually conflict then yeah you test again but
> this
> > > doesn't happen often.
> > >
> > > The goal isn't to have trunk passing every single time it's to have it
> > pass
> > > almost all the time so the test history means something and when it
> fa

Re: Staging Branches

2015-05-07 Thread Ariel Weisberg

Hi,

If it were automated I would have no problem with it. That would be less
work for me because the problems detected would occur anyways and have to
be dealt with by me. I just don't want to deal with extra steps and latency
manually.

So who and when is going to implement the automation?

Ariel

On Thu, May 7, 2015 at 11:11 AM, Benedict Elliott Smith <
belliottsm...@datastax.com> wrote:

> It's odd, because I honestly think this release process will be easier,
> since the stricter we make it the smoother it can become. It requires well
> formed commits from everyone, and lets the committers asynchronously
> confirm their work, and for it to never be in question *who* needs to fix
> something, nor what the effect of their fixing it will be. It means we can,
> as Ariel said, perform a bisect and honestly know its result is accurate.
> Small commits don't need to worry about fast-forwarding; in fact, nobody
> does. It can either be automated, or we can fast forward at a time that
> suits us. In which case the process is *the same* as it is currently.
>
> I have no interest in making the commit process harder.
>
>
> On Thu, May 7, 2015 at 3:59 PM, Jake Luciani  wrote:
>
> > Ok let's focus then on the idea that trunk is releasable.  Releasable
> > to me doesn't mean it can't contain a bad merge.
> >
> > It means it doesn't contain some untested and unstable feature.  We
> > can always "release from trunk" and we still have a release process.
> >
> > The idea that trunk must contain. a first time it hits the branch,
> > releasable code is way overboard
> >
> > On Thu, May 7, 2015 at 10:50 AM, Benedict Elliott Smith
> >  wrote:
> > >>
> > >> This breaks your model of applying every commit ref by ref.
> > >
> > >
> > > How? The rebase only affects commits after the "real" branch, so it
> still
> > > cleanly fast forwards?
> > >
> > > Merging is *hard*. Especially 2.1 -> 3.0, with many breaking API
> changes
> > > (this is before 8099, which is going to make a *world* of hurt, and
> will
> > > stick around for a year). It is *very* easy to break things, with even
> > the
> > > utmost care.
> > >
> > > On Thu, May 7, 2015 at 3:46 PM, Jake Luciani  wrote:
> > >
> > >> You then fetch and repair
> > >> your local version and try again.
> > >>
> > >> This breaks your model of applying every commit ref by ref.
> > >>
> > >> I'm all for trying to avoid extra work/stability but we already have
> > >> added a layer of testing every change before commit.  I'm not going to
> > >> accept we need to also add a layer of testing before every merge.
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, May 7, 2015 at 10:36 AM, Benedict Elliott Smith
> > >>  wrote:
> > >> >>
> > >> >> wouldn't you need to force push?
> > >> >
> > >> >
> > >> > git push --force-with-lease
> > >> >
> > >> > This works essentially like CAS; if the remote repositories are not
> > the
> > >> > same as the one you have modified, it will fail. You then fetch and
> > >> repair
> > >> > your local version and try again.
> > >> >
> > >> > So what does this buy us?
> > >> >
> > >> >
> > >> > This buys us a clean development process. We bought into "always
> > >> > releasable". It's already a tall order; if we start weakening the
> > >> > constraints before we even get started, I am unconvinced we will
> > >> > successfully deliver. A monthly release cycle requires *strict*
> > >> processes,
> > >> > not *almost* strict, or strict*ish*.
> > >> >
> > >> > Something that could also help make a more streamlined process: if
> > actual
> > >> > commits were constructed on development branches ready for commit,
> > with a
> > >> > proper commit message and CHANGES.txt updated. Even more ideally:
> with
> > >> git
> > >> > rerere data for merging up to each of the branches. If we had that,
> > and
> > >> > each of the branches had been tested in CI, we would be much closer
> > than
> > >> we
> > >> > are currently, as the risk-at-commit is minimized.
> > >> >
> > >> > On Thu, May 7, 2015 at 2:48 PM, Jake Luciani 
> > wrote:
> > >> >
> > >> >> git rebase -i trunk_staging 
> > >> >> fix the problem
> > >> >> git rebase --continue
> > >> >>
> > >> >> In this situation, if there was an untested follow on commit
> wouldn't
> > >> >> you need to force push?
> > >> >>
> > >> >> On Thu, May 7, 2015 at 9:28 AM, Benedict Elliott Smith
> > >> >>  wrote:
> > >> >> >>
> > >> >> >> If we do it, we'll end up in weird situations which will be
> > annoying
> > >> for
> > >> >> >> everyone
> > >> >> >
> > >> >> >
> > >> >> > Such as? I'm not disputing, but if we're to assess the relative
> > >> >> > strengths/weaknesses, we need to have specifics to discuss.
> > >> >> >
> > >> >> > If we do go with this suggestion, we will most likely want to
> > enable a
> > >> >> > shared git rerere cache, so that rebasing is not painful when
> there
> > >> are
> > >> >> > future commits.
> > >> >> >
> > >> >> > If instead we go with "repairing" commits, we cannot have a
> > "queue" of
> > >> >> > things t

Re: Staging Branches

2015-05-07 Thread Ariel Weisberg

Hi,

I meant in the hypothetical case that we did this. There is going to be an
interim period where we wouldn't have this. The automation comes at the
expense of something else.

Ariel

On Thu, May 7, 2015 at 11:40 AM, Josh McKenzie 
wrote:

> >
> > So who and when is going to implement the automation?
>
>
> I don't believe we have sufficient consensus that this is necessary to
> start doling out action-items for implementation.
>
> On Thu, May 7, 2015 at 10:16 AM, Ariel Weisberg <
> ariel.weisb...@datastax.com
> > wrote:
>
> > Hi,
> >
> > If it were automated I would have no problem with it. That would be less
> > work for me because the problems detected would occur anyways and have to
> > be dealt with by me. I just don't want to deal with extra steps and
> latency
> > manually.
> >
> > So who and when is going to implement the automation?
> >
> > Ariel
> >
> > On Thu, May 7, 2015 at 11:11 AM, Benedict Elliott Smith <
> > belliottsm...@datastax.com> wrote:
> >
> > > It's odd, because I honestly think this release process will be easier,
> > > since the stricter we make it the smoother it can become. It requires
> > well
> > > formed commits from everyone, and lets the committers asynchronously
> > > confirm their work, and for it to never be in question *who* needs to
> fix
> > > something, nor what the effect of their fixing it will be. It means we
> > can,
> > > as Ariel said, perform a bisect and honestly know its result is
> accurate.
> > > Small commits don't need to worry about fast-forwarding; in fact,
> nobody
> > > does. It can either be automated, or we can fast forward at a time that
> > > suits us. In which case the process is *the same* as it is currently.
> > >
> > > I have no interest in making the commit process harder.
> > >
> > >
> > > On Thu, May 7, 2015 at 3:59 PM, Jake Luciani  wrote:
> > >
> > > > Ok let's focus then on the idea that trunk is releasable.  Releasable
> > > > to me doesn't mean it can't contain a bad merge.
> > > >
> > > > It means it doesn't contain some untested and unstable feature.  We
> > > > can always "release from trunk" and we still have a release process.
> > > >
> > > > The idea that trunk must contain. a first time it hits the branch,
> > > > releasable code is way overboard
> > > >
> > > > On Thu, May 7, 2015 at 10:50 AM, Benedict Elliott Smith
> > > >  wrote:
> > > > >>
> > > > >> This breaks your model of applying every commit ref by ref.
> > > > >
> > > > >
> > > > > How? The rebase only affects commits after the "real" branch, so it
> > > still
> > > > > cleanly fast forwards?
> > > > >
> > > > > Merging is *hard*. Especially 2.1 -> 3.0, with many breaking API
> > > changes
> > > > > (this is before 8099, which is going to make a *world* of hurt, and
> > > will
> > > > > stick around for a year). It is *very* easy to break things, with
> > even
> > > > the
> > > > > utmost care.
> > > > >
> > > > > On Thu, May 7, 2015 at 3:46 PM, Jake Luciani 
> > wrote:
> > > > >
> > > > >> You then fetch and repair
> > > > >> your local version and try again.
> > > > >>
> > > > >> This breaks your model of applying every commit ref by ref.
> > > > >>
> > > > >> I'm all for trying to avoid extra work/stability but we already
> have
> > > > >> added a layer of testing every change before commit.  I'm not
> going
> > to
> > > > >> accept we need to also add a layer of testing before every merge.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, May 7, 2015 at 10:36 AM, Benedict Elliott Smith
> > > > >>  wrote:
> > > > >> >>
> > > > >> >> wouldn't you need to force push?
> > > > >> >
> > > > >> >
> > > > >> > git push --force-with-lease
> > > > >> >
> > > > >> > This works essentially like CAS; if the remote repositories are
> > not
> > > > the
> > > > &g

Re: Staging Branches

2015-05-07 Thread Ariel Weisberg

Hi,

I agree with release (or even collaboration branches) having the same
approach for only merging code that has run in cassci off of that branch.

Ariel

On Thu, May 7, 2015 at 1:34 PM, Aleksey Yeschenko 
wrote:

> > +100 from me, but why the exception for trunk?
>
> Sorry, just my poor wording again. No exception for trunk. What I meant
> by ‘non-trunk’ patches was patches that originated in 2.0 or 2.1
> branches. ‘trunk’ patches (today) would be 3.0-only features and fixes, and
> these need no upstream merging, b/c trunk is already as upstream as it gets.
>
> Of course you’d still have cassci vet the trunk branch itself - we already
> should be doing that.
>
> --
> AY
>
> On May 7, 2015 at 20:30:07, Ryan McGuire (r...@datastax.com) wrote:
>
> > In the meantime can we agree on having cassci to validate personal merged
> branches before pushing either, in case of non-trunk patches?
>
> +100 from me, but why the exception for trunk? Wouldn't it be easier to
> wait for the dev branch tests to pass and then do all the merging at once
> (2.0, 2.1,3.0, trunk)?
>
> On Thu, May 7, 2015 at 1:21 PM, Aleksey Yeschenko 
> wrote:
>
> > > Agreed. I would like to wait and see how we do without extra branches
> > for
> > > a release or two. That will give us a better idea of how much pain the
> > > extra steps will protect us from.
> >
> > In the meantime can we agree on having cassci to validate personal merged
> > branches before pushing either, in case of non-trunk patches?
> >
> > That doesn’t require any more infra setup, and with the delta between 2.0
> > and 2.1, and 2.1 and 3.0, and, sometimes, 2.0 and 3.0, doing so is
> crucial.
> >
> > I know that at least Sam does that already, but I want it to be an agreed
> > upon procedure.
> >
> > Oh, ans as Benedict said, it’d be nice for all those commits to start
> > including CHANGES.txt and the commit messages, both. With the developers
> > total/committers ratio that we have now, it helps the process scale
> better.
> >
> > --
> > AY
> >
> > On May 7, 2015 at 19:02:36, Benedict Elliott Smith (
> > belliottsm...@datastax.com) wrote:
> >
> > >
> > > I would argue that we must *at least* do the following for now.
> >
> >
> > If we get this right, the extra staging branches can certainly wait to be
> > assessed until later.
> >
> > IMO, any patch should have a branch in CI for each affected mainline
> > branch, and should have the commit completely wired up (CHANGES.txt,
> commit
> > message, the works), so that it can be merged straight in. If it
> conflicts
> > significantly, it can be bumped back to the author/reviewer to refresh.
> >
> > On Thu, May 7, 2015 at 4:16 PM, Aleksey Yeschenko 
> > wrote:
> >
> > > I would argue that we must *at least* do the following for now.
> > >
> > > If your patch is 2.1-based, you need to create a private git branch for
> > > that and a merged trunk branch ( and -trunk). And you don’t
> push
> > > anything until cassci validates all of those three branches, first.
> > >
> > > An issue without a link to cassci for both of those branches passing
> > > doesn’t qualify as done to me.
> > >
> > > That alone will be enough to catch most merge-related regressions.
> > >
> > > Going with staging branches would also prevent any issues from
> concurrent
> > > pushes, but given the opposition, I’m fine with dropping that
> > requirement,
> > > for now.
> > >
> > > --
> > > AY
> > >
> > > On May 7, 2015 at 18:04:20, Josh McKenzie (josh.mcken...@datastax.com)
> > > wrote:
> > >
> > > >
> > > > Merging is *hard*. Especially 2.1 -> 3.0, with many breaking API
> > changes
> > > > (this is before 8099, which is going to make a *world* of hurt, and
> > will
> > > > stick around for a year). It is *very* easy to break things, with
> even
> > > the
> > > > utmost care.
> > >
> > >
> > > While I agree re:merging, I'm not convinced the proportion of commits
> > that
> > > will benefit from a staging branch testing pipeline is high enough to
> > > justify the time and complexity overhead to (what I expect are) the
> vast
> > > majority of commits that are smaller, incremental changes that won't
> > > benefit from this.
> > >
> > > On Thu, May 7, 2015 at 9:56 AM, Ariel Weisberg <
> > > ariel.weisb...@datastax.c

Re: May 2015 Retrospective

2015-05-12 Thread Ariel Weisberg

Hi all,

Thank you for your participation. I think this retrospective went really
well and people are getting the hang of it. If you haven't read it since it
was started take a quick look when you have a chance. In addition to what
participants added I also commented on #9098 and #9036, but didn't have
anything actionable for them.

I reviewed al the fixed bugs in 2.1.5 like I did for 2.1.3 and people are
doing a good job adding unit and dtests to cover what they worked on. This
isn't really a change in behavior, people were doing a good job before
there were just some hard/larger testing elements that went unrepresented
but are now in JIRA.

I created far fewer issues hanging off of CASSANDRA-9012
<https://issues.apache.org/jira/browse/CASSANDRA-9012> as part of this
retrospective. I think this points to there being a bounded scope of things
that are missing and that we are starting to get a picture of what is
painful and what isn't. I regret not linking the issues that trigger the
creation of a test task to the task so we can get visibility into severity
and frequency associated with a test task. I did some of that this time,
but that is missing from the 2.1.3 review I did. This is something anyone
can do so if you are working on a bug that we shipped and you can find a
test task that would have helped you should link them. This informs
prioritization as well as test design.

Reviewing the tickets in a release is turning out to be really informative
for me. Already some thing stand out as repeated pain points across the two
releases. I plan on continuing to do that so with a focus on linking bugs
to test tasks that would address them.

What is hanging off of CASSANDRA-9012 is not small. Many of those tickets
are just stories and will probably transform into several tasks spread out
over longer periods of time. But it is bounded and if we start working
through them we should be able to catch a good chunk of bugs before they
ship. You can look at the linked issues and see that some would not have
been hard to catch.

Philip Thompson has started work on the kitchen sink harness. I put some of
my notes on implementation details from our discussion into the kitchen
sink doc
<https://docs.google.com/document/d/1kccPqxEAoYQpT0gXnp20MYQUDmjOrakAeQhf6vkqjGo/edit?usp=sharing>.
Nothing is set in stone so feel to comment. Previously the doc just had a
links to bugs we feel should be tested for by the kitchen sink harness.

Regards,
Ariel

On Mon, May 4, 2015 at 11:50 AM, Ariel Weisberg  wrote:

> Hi,
>
> Someone asked if they can add their own to went well/poorly/changes and
> the answer is yes. We'll iterate on what goes up, but anyone can bring
> something up for discussion. I don't think you should post a personal
> well/poorly section, but you can post about things that went poorly for
> just you. When something doesn't work for you it's not working for any of
> us.
>
> Ariel
>
> On Mon, May 4, 2015 at 11:40 AM, Ariel Weisberg <
> ariel.weisb...@datastax.com> wrote:
>
>> Hi,
>>
>> It's time. This month were are going to try and do the retrospective in a 
>> Google
>> doc
>> <https://docs.google.com/document/d/159yJY2YS5hLTqlU7J2lOYJr5cfhECRGe7k-QwavuiBw/edit?usp=sharing>.
>> Inside the docs I am guessing we will do a threaded conversation and sign
>> off contributions in the discussion section.
>>
>> Last month things didn't thread well and it was hard to track what was
>> going on.
>>
>> We released 2.1.5 in April so now is a good time to review anything you
>> fixed in 2.1.5 with an eye towards things that we would like to have done
>> better. You don't have to have an answer for how we could do things better.
>> We don't want to be limited to discussing things we already have answers
>> for.
>>
>> If something is already being addressed (or is queued to be addressed by
>> a JIRA) there is no need to mention it unless you want +1 the issue for
>> some reason such as it not being prioritized sufficiently.
>>
>> I am going to look at 2.1.5 the same way I didn't with 2.1.3, but I am
>> going to hold off on volunteering stuff until I see what people come with.
>>
>> Regards,
>> Ariel
>>
>>
>

May 2015 retrospective

2015-06-02 Thread Ariel Weisberg

Hi,

Astute observers will note that last month's retrospective email was also
for May. The Google doc was labelled April. The bug is with the subject
(should have been April) so this is retrospective for the events of May
2015.

The retrospective doc is available here.


I think we are in something of a lull right now. We didn't ship a release
so there is no feedback from that process. We consciously haven't start
working on expanding the scope of what and how we test.

What people power is dedicated to improving the testing situation has been
focused on the technical and process debt associated with unit and dtests.

The unit tests are nigh done and started providing usable feedback earlier
in the month. I can tell because it has become obvious when someone has not
run the tests in cassci before merging :-)

The dtests are close and the remaining failures on trunk are flappers
according to Tyler. That might be my next stop.

Regards,
Ariel

Re: May 2015 retrospective

2015-06-22 Thread Ariel Weisberg

Hi,

I added the bugs that could be caught by the kitchen sink harness to
the kitchen
sink requirements doc.
<https://docs.google.com/document/d/1kccPqxEAoYQpT0gXnp20MYQUDmjOrakAeQhf6vkqjGo/edit#>
I
also created a similar document for continous performance testing
<https://docs.google.com/document/d/1TMdJ7-y-hKQwhPRFYL0VXf0R53MsF4QmhZmwbT8wpE0/edit#>
where
I am listing regressions we would like to have caught via performance tests.

Regards,
Ariel

On Tue, Jun 2, 2015 at 4:53 PM, Ariel Weisberg 
wrote:

> Hi,
>
> Astute observers will note that last month's retrospective email was also
> for May. The Google doc was labelled April. The bug is with the subject
> (should have been April) so this is retrospective for the events of May
> 2015.
>
> The retrospective doc is available here.
> <https://docs.google.com/document/d/1GtuYRocdr9luNdwmm8wE84uC5Wr6TvewFbQtqoAFVeU/edit?usp=sharing>
>
> I think we are in something of a lull right now. We didn't ship a release
> so there is no feedback from that process. We consciously haven't start
> working on expanding the scope of what and how we test.
>
> What people power is dedicated to improving the testing situation has been
> focused on the technical and process debt associated with unit and dtests.
>
> The unit tests are nigh done and started providing usable feedback earlier
> in the month. I can tell because it has become obvious when someone has not
> run the tests in cassci before merging :-)
>
> The dtests are close and the remaining failures on trunk are flappers
> according to Tyler. That might be my next stop.
>
> Regards,
> Ariel
>

June 2015 retrospective

2015-07-06 Thread Ariel Weisberg

Hi all,

June 2015 retrospective doc


It's time for the June retrospective. It looks like there were 3 releases
in June (2.0.16, 2.1.6, 2.1.7).

There are 90 issues listed across those three releases. Hopefully
everything is annotated correctly. I want to take another shot at further
removing myself as a bottleneck for the retrospective process.

The goal at the end of each retrospective is to have transformed every
issue we had into the solution (or next action) for that issue. I think
that boils down to

   - Performance regression, add to performance harness doc
   

   - Correctness regression, add a JIRA or to Cassandra validation harness
   

   doc
   - Note in retrospective what went wrong and what needed to be do
   differently at implementation/code review time

For example CASSANDRA-9592
 shows a scenario
that could easily be detected by the performance harness or the C*
validation harness and one or the other should be made to produce a
scenario where this can happen and be able to detect that it does happen.
So you would pick one of the two docs and put it in there. If there is a
dtest that could also test for this you would create a JIRA instead and
link it to CASSANDRA-9012
.

By linking these issues to the solutions were are also making it easier to
see how much not having a given piece of test functionality is costing
which will increase bang for the buck post 3.0 when we go to decide what to
work on.

Thanks,
Ariel

Re: Discussion: reviewing larger tickets

2015-07-08 Thread Ariel Weisberg

Hi,

I really like github’s workflow. If you don’t abuse it you get a history of the 
entire review process.

Right now some people have a workflow that involves force pushing and deleting 
branches. If you delete branches I think the pull requests are still valid so 
people can still do it (although I don’t), but force pushing on the branch 
containing your work history is not compatible with using github pull requests. 
We don’t need to use github to merge we can just close the pull request.

I don’t see value in having code review process in JIRA because it’s not good 
at it. Having discussion mixed in JIRA and a code review tool is not great, but 
the status quo is an unreadable for me. I also leave fewer comments because of 
how difficult it is to comment on code. Maybe this is less terrible if you use 
github’s issue tracker (haven’t done it), but that is not an option.

Maybe what we want is to use the Atlassian review tool to get “proper” 
integration with JIRA?

Regards,
Ariel

> On Jul 8, 2015, at 3:21 PM, Josh McKenzie  wrote:
> 
> As some of you might have noticed, Tyler and I tossed around a couple of
> thoughts yesterday regarding the best way to perform larger reviews on JIRA.
> 
> I've been leaning towards the approach Benedict's been taking lately
> w/putting comments inline on a branch for the initial author to inspect as
> that provides immediate locality for a reviewer to write down their
> thoughts and the same for the initial developer to ingest them. One
> downside to that approach is that the extra barrier to entry makes it more
> of a 1-on-1 conversation rather than an open discussion via JIRA comments.
> Also, if one deletes branches from github we then lose our discussion
> history on the review process which is a big problem for digging into why
> certain decisions were made or revised during the process.
> 
> On the competing side, monster comments like this
> 
> (which
> is one of multiple to come) are burdensome to create and map into a JIRA
> comment and, in my experience, also a burden to map back into the code-base
> as a developer. Details are lost in translation; I'm comfortable labeling
> this a sub-optimal method of communication.
> 
> So what to do?
> 
> -- 
> Joshua McKenzie

Re: Discussion: reviewing larger tickets

2015-07-08 Thread Ariel Weisberg

Hi,

If you navigate in an IDE how do you know if you are commenting on code that 
has changed or not?

My workflow is usually to look at the diff and have it open in an IDE 
separately, but maybe I am failing hard at tools.

Ariel
> On Jul 8, 2015, at 4:00 PM, Josh McKenzie  wrote:
> 
> The ability to navigate a patch in an IDE and add comments while exploring
> is not something the github PR interface can provide; I expect I at least
> would end up having to use multiple tools to perform a review given the PR
> approach.
> 
> On Wed, Jul 8, 2015 at 3:50 PM, Jake Luciani  wrote:
> 
>> putting comments inline on a branch for the initial author to inspect
>> 
>> I agree and I think we can support this by using github pull requests for
>> review.
>> 
>> Pull requests live forever even if the source branch is removed. See
>> https://github.com/apache/cassandra/pull/4
>> They also allow for comments to be updated over time as new fixes are
>> pushed to the branch.
>> 
>> Once review is done we can just close them without committing and just
>> commit the usual way
>> 
>> Linking to the PR in JIRA for reference.
>> 
>> 
>> On Wed, Jul 8, 2015 at 3:21 PM, Josh McKenzie 
>> wrote:
>> 
>>> As some of you might have noticed, Tyler and I tossed around a couple of
>>> thoughts yesterday regarding the best way to perform larger reviews on
>>> JIRA.
>>> 
>>> I've been leaning towards the approach Benedict's been taking lately
>>> w/putting comments inline on a branch for the initial author to inspect
>> as
>>> that provides immediate locality for a reviewer to write down their
>>> thoughts and the same for the initial developer to ingest them. One
>>> downside to that approach is that the extra barrier to entry makes it
>> more
>>> of a 1-on-1 conversation rather than an open discussion via JIRA
>> comments.
>>> Also, if one deletes branches from github we then lose our discussion
>>> history on the review process which is a big problem for digging into why
>>> certain decisions were made or revised during the process.
>>> 
>>> On the competing side, monster comments like this
>>> <
>>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-6477?focusedCommentId=14617221&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14617221
 
>>> (which
>>> is one of multiple to come) are burdensome to create and map into a JIRA
>>> comment and, in my experience, also a burden to map back into the
>> code-base
>>> as a developer. Details are lost in translation; I'm comfortable labeling
>>> this a sub-optimal method of communication.
>>> 
>>> So what to do?
>>> 
>>> --
>>> Joshua McKenzie
>>> 
>> 
>> 
>> 
>> --
>> http://twitter.com/tjake
>> 
> 
> 
> 
> -- 
> Joshua McKenzie
> DataStax -- The Apache Cassandra Company

End June retrospective, begin July retrospective

2015-08-03 Thread Ariel Weisberg

Hi all,

Thanks for the participation in the June retrospective. I linked in a few
issues that people brought up, but didn't push into the CVH doc. Some of
you did your own filing which is great.

The July 2015 retrospective doc

is now available.

There were some issues with color coding comments. Since this is a doc we
can't easily tell who is writing something unless it's labelled. So pick a
color

In July we released 2.2 and 2.1.8 and here is a link to the bugs you worked
on for those releases

.

Here is the performance harness doc

and
the Cassandra validation harness

 doc.

Regards,
Ariel

Proposal, add Epic to the set of issue types available in ASF Jira for Cassandra

2015-08-04 Thread Ariel Weisberg

Hi all,

I am playing with using an Agile board to track what goes into each
Cassandra release. What slips from release to release, as well as what is
added after the initial set of tasks for a release is started.

You can see the SCRUM agile board I created here

.

The board has two ways to bucket issues. One is the release in which the
issue is supposed to be fixed. The other is Epics. Epics are not associated
with a release so a feature like 8099 might have an epic. Epics can be used
to bucket issues within a release or across releases.

I would characterize Epics as being a lot like labels that integrate with
the Agile board. You can use labels with Agile boards by adding quick
filters to select on labels.

The current set of issues types associated with the C* project in ASF JIRA
doesn't include Epic or Story. I don't use Story, but Epic would be useful
for further categorizing things.

When I asked ASF Infra about it they said that this needs discussion and
approval by the PMC.

Thanks,
Ariel

Re: Proposal, add Epic to the set of issue types available in ASF Jira for Cassandra

2015-08-05 Thread Ariel Weisberg

Hi,

At this stage I wasn't going to propose a process change. My goal is to
observe and report mall cop style so I can present what happens the way we
currently operate. Right now Epics are just a way for me to bucket and then
rank things inside a release based on what they are, enhancement, core to
the release (Materialized Views, 8099), bugs or failing tests.

Regards,
Ariel

On Wed, Aug 5, 2015 at 11:43 AM, Gary Dusbabek  wrote:

> Who would have the burden of assigning and managing epics?
>
> Thanks,
>
> Gary.
>
>
>
> On Tue, Aug 4, 2015 at 3:08 PM, Ariel Weisberg <
> ariel.weisb...@datastax.com>
> wrote:
>
> > Hi all,
> >
> > I am playing with using an Agile board to track what goes into each
> > Cassandra release. What slips from release to release, as well as what is
> > added after the initial set of tasks for a release is started.
> >
> > You can see the SCRUM agile board I created here
> > <
> >
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=83&view=planning.nodetail&selectedIssue=CASSANDRA-9908&epics=visible
> > >
> > .
> >
> > The board has two ways to bucket issues. One is the release in which the
> > issue is supposed to be fixed. The other is Epics. Epics are not
> associated
> > with a release so a feature like 8099 might have an epic. Epics can be
> used
> > to bucket issues within a release or across releases.
> >
> > I would characterize Epics as being a lot like labels that integrate with
> > the Agile board. You can use labels with Agile boards by adding quick
> > filters to select on labels.
> >
> > The current set of issues types associated with the C* project in ASF
> JIRA
> > doesn't include Epic or Story. I don't use Story, but Epic would be
> useful
> > for further categorizing things.
> >
> > When I asked ASF Infra about it they said that this needs discussion and
> > approval by the PMC.
> >
> > Thanks,
> > Ariel
> >
>

Re: Proposal, add Epic to the set of issue types available in ASF Jira for Cassandra

2015-08-06 Thread Ariel Weisberg

Hi,

It's not instead of it's in addition too. The presence of Epics doesn't
prevent the use of labels and you can filter with labels on the agile
board it's just really gross. You have to create quick filters or type
in JQL. Epics get's you drag and drop as well as some other UI niceness
like tracking how close an Epic is to completion.

There is also no process change where we stop using labels and start
using Epics.

Ariel

On Thu, Aug 6, 2015, at 10:04 AM, Jake Luciani wrote:
> Is the reason to use epics over labels simply because the agile board
> doesn't support it?
> 
> On Wed, Aug 5, 2015 at 12:42 PM, Ariel Weisberg
>  > wrote:
> 
> > Hi,
> >
> > At this stage I wasn't going to propose a process change. My goal is to
> > observe and report mall cop style so I can present what happens the way we
> > currently operate. Right now Epics are just a way for me to bucket and then
> > rank things inside a release based on what they are, enhancement, core to
> > the release (Materialized Views, 8099), bugs or failing tests.
> >
> > Regards,
> > Ariel
> >
> > On Wed, Aug 5, 2015 at 11:43 AM, Gary Dusbabek 
> > wrote:
> >
> > > Who would have the burden of assigning and managing epics?
> > >
> > > Thanks,
> > >
> > > Gary.
> > >
> > >
> > >
> > > On Tue, Aug 4, 2015 at 3:08 PM, Ariel Weisberg <
> > > ariel.weisb...@datastax.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I am playing with using an Agile board to track what goes into each
> > > > Cassandra release. What slips from release to release, as well as what
> > is
> > > > added after the initial set of tasks for a release is started.
> > > >
> > > > You can see the SCRUM agile board I created here
> > > > <
> > > >
> > >
> > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=83&view=planning.nodetail&selectedIssue=CASSANDRA-9908&epics=visible
> > > > >
> > > > .
> > > >
> > > > The board has two ways to bucket issues. One is the release in which
> > the
> > > > issue is supposed to be fixed. The other is Epics. Epics are not
> > > associated
> > > > with a release so a feature like 8099 might have an epic. Epics can be
> > > used
> > > > to bucket issues within a release or across releases.
> > > >
> > > > I would characterize Epics as being a lot like labels that integrate
> > with
> > > > the Agile board. You can use labels with Agile boards by adding quick
> > > > filters to select on labels.
> > > >
> > > > The current set of issues types associated with the C* project in ASF
> > > JIRA
> > > > doesn't include Epic or Story. I don't use Story, but Epic would be
> > > useful
> > > > for further categorizing things.
> > > >
> > > > When I asked ASF Infra about it they said that this needs discussion
> > and
> > > > approval by the PMC.
> > > >
> > > > Thanks,
> > > > Ariel
> > > >
> > >
> >
> 
> 
> 
> -- 
> http://twitter.com/tjake

Re: Should we make everything blue in Jenkins?

2015-08-16 Thread Ariel Weisberg

Hi,

Thanks for bring this up Michael.

I want to elaborate on the impetus for this (or at least my take on it). When 
8099 merged we had a thing that must never happen for our process to work. We 
introduced a large enough number of test failures that it was difficult to tell 
if you introduced a regression.

At the time we thought we could exclude the test failures prior to 8099 and 
that the test failures introduced by 8099 would get addressed promptly. What 
has happened instead is that the number of failures have snowballed to the 
point that you can hardly tell if you broke anything even if you compare test 
by test with trunk. You have to go into the history on trunk for each test and 
go back several pages to really be sure.

If you don’t have consistently passing CI you can’t avoid the addition of test 
failures by ongoing work that slip in masked by known failures.

The artery is severed, we’re bleeding out, and we’re going to have to lose the 
leg. I’m sure the prosthetic when it comes will be just as good, but the rehab 
is going to suck. There that’s my analogy.

I think the utests are in pretty good shape but the pig tests are a problem. 
They extend the job time a lot, cause aborts, and fail randomly.

Ariel

> On Aug 14, 2015, at 3:16 PM, Michael Shuler  wrote:
> 
> This is a prompt for Cassandra developers to discuss the alternatives and let 
> Test Engineering know what you desire.
> 
> As discussed a few times in person, on irc, etc., there are a couple 
> different ways we can run tests in Jenkins, particularly cassandra-dtest. The 
> Cassandra developers are the committers to unit tests, so Test Engineering 
> runs whatever is in the branch. If you'd like to make changes to unit tests 
> to make things blue, just commit those!
> 
> Currently, we run dtests as 1), but we could do 2):
> 
> 1) Run all dtests that don't catastrophically hang a server, pass or fail, 
> and report the results.
> 2) Run only known passing dtests, skipping anything that fails - make it all 
> blue on the main branch builds.
> 
> The biggest benefit is that dev branch builds should be easily recognizable 
> as able to merge, if the dtest run is passing and blue. There is no 
> comparison with the main branch build needing interpretation.
> 
> Test Eng has recently added the ability run *only* the skipped tests and has 
> a a prototype job, trunk_dtest-skipped-with-require, to dig through. This 
> could be set up for all main branch builds, moving anything that doesn't pass 
> 100% to the -skipped job. This is perhaps the drawback with 2) above: we're 
> simply not going to run all the dtests on your dev branch. I don't think it 
> makes sense to set up a -skipped dtest job on your dev branches. In addition, 
> there's another job result set to go look at to properly evaluate the true 
> state of a Cassandra branch or release. There may be other side effects - 
> feel free to chime in.
> 
> I'm on a "disconnected" holiday until Monday Aug 24, so I won't have a chance 
> to check in until then - the Test Eng team can field questions or 
> clarifications, if needed.
> 
> -- 
> Warm regards,
> Michael

End June retrospective, July retrospective will start @ C* summit

2015-09-08 Thread Ariel Weisberg

Hi all,

I am closing out July retrospective now. The retrospective doc has a single
author (me) which kind of says that doing this asynchronously by email
isn't working. At least not as a starting point.

I am not super surprised nor am I disappointed. Trying and failing is part
of eventually trying and succeeding. Process and iterating on process is a
skill you have to develop and it's hard to do when you don't have dedicated
time in your schedule for it. I'm an advocate for retrospectives and I
still don't send out the emails out on time.

We are going to have a butts in seats retrospective at summit and I will
take notes and make that available. We'll have a chance to discuss where to
go from there.

Regards,
Ariel

Re: End June retrospective, July retrospective will start @ C* summit

2015-09-08 Thread Ariel Weisberg

Hi,

I a mistake in the original email. This is actually the end of the July
retrospective. The August retrospective will start @ C* summit.

Regards,
Ariel

On Tue, Sep 8, 2015 at 2:36 PM, Ariel Weisberg 
wrote:

> Hi all,
>
> I am closing out July retrospective now. The retrospective doc has a
> single author (me) which kind of says that doing this asynchronously by
> email isn't working. At least not as a starting point.
>
> I am not super surprised nor am I disappointed. Trying and failing is part
> of eventually trying and succeeding. Process and iterating on process is a
> skill you have to develop and it's hard to do when you don't have dedicated
> time in your schedule for it. I'm an advocate for retrospectives and I
> still don't send out the emails out on time.
>
> We are going to have a butts in seats retrospective at summit and I will
> take notes and make that available. We'll have a chance to discuss where to
> go from there.
>
> Regards,
> Ariel
>

Re: cassandra-3.1 branch and new merge order

2015-11-09 Thread Ariel Weisberg

Hi,

What I had thought we were going to do was branch for a feature release
every two months and then backport fixes to the last feature release (or
prior ones) as desired.

So in terms of extra merge effort you only have to backport fixes, and
if we solved the release quality issues this is something that is very
little work. Even if we don't we only have to backport a months worth of
fixes (although I would advocate making it two months). If we do a
feature release every two months that gives us two months to put
together a bug fix release for it without falling behind or supporting
multiple older releases.

I am pretty confident that a month is not enough time to find and fix
the bugs in a release. Not the way we currently operate where activities
are unsynchronized and we don't make an effort to get stuff to "done"
quickly as a default. Instead we opt to have a lot of stuff in flight to
hide the latency of getting things to "done". That is also a problem
when there are dependencies.

TL;DR Even minor stuff stretches out to multiple months.

Another thing that came up is how the release schedule impacts
development work. There were some voices (can't recall who) who thought
that the month after a feature release would only contain bug fix work.
I don't think that makes much sense since you won't even necessarily
know what all the bugs are and you can't leave developers idle waiting
for bugs to come in.

I think we should scheduled feature work and as bugs come in do them
first. I think we should always be doing bugs first and that's how I
schedule my time (actually review, bugs, then new work). If you don't
connect the backpressure from bugs to the rate at which you do new work
you will always end up falling behind or rubber banding and not get the
benefits of passing CI.

Ariel

On Mon, Nov 9, 2015, at 12:23 PM, Jake Luciani wrote:
> I don't think there's anything wrong with it. I just think it's not
> needed.
> 
> - The max someone would wait is 4 weeks and in my mind the feature writer
> is the best person to make the changes to merge the code to the latest
> version.  Vs an unrelated change that may be improperly merged.
> - Today for every change we make we must manually merge on commit every
> time which is error prone, this goes away with a single branch.
> - We historically care about tests in the latest release branch.  We've
> never done a great job fixing issues in the trunk when they happen (look
> at
> cassci trunk vs 3.0 today).  So it's not like we gain anything by keeping
> 2
> branches there.
> - This keeps our testing focus on the task at hand so we dedicate our
> testing time in a given month to the changes of that release and don't
> need
> any parallel runs for features.
> 
> 
> 
> On Mon, Nov 9, 2015 at 12:06 PM, Jonathan Ellis 
> wrote:
> 
> > I'm not a huge fan of leaving features to rot unmerged for a couple
> > months.  What is wrong with "new features go to trunk, stable branches get
> > forked at release?"
> >
> > On Mon, Nov 9, 2015 at 10:54 AM, Jake Luciani  wrote:
> >
> > > Looking back at the tick-tock email chain we never really discussed this.
> > >
> > > Rather than having 3.1 and trunk I think we should have just trunk.
> > >
> > > I'd rather not let features sit in a branch with bugfixes going on top
> > that
> > > can decay.
> > > They should be merged in when it's time to merge features for 3.even,
> > post
> > > 3.odd.
> > >
> > > I know we have features in trunk today that aren't in 3.0 and we probably
> > > shouldn't have done that.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Nov 9, 2015 at 11:35 AM, Aleksey Yeschenko 
> > > wrote:
> > >
> > > > With 3.0.0 vote to be over soon, tick-tock is officially starting, and
> > we
> > > > are creating a new branch for cassandra-3.1 release.
> > > >
> > > > New merge order: cassandra-2.2 -> cassandra-3.0 -> cassandra-3.1 ->
> > trunk
> > > >
> > > > - cassandra-3.0 branch is going to continue representing the 3.0.x
> > series
> > > > of releases (3.0 bugfixes only, as no new feature are supposed to go
> > into
> > > > 3.0.x release series)
> > > > - cassandra-3.1 branch will contain 3.0 bugfixes *only*
> > > > - trunk represents the upcoming cassandra-3.2 release (fixes from 3.1
> > and
> > > > new features)
> > > >
> > > > --
> > > > AY
> > >
> > >
> > >
> > >
> > > --
> > > http://twitter.com/tjake
> > >
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
> >
> 
> 
> 
> -- 
> http://twitter.com/tjake

Re: Use of posix_fadvise

2016-10-18 Thread Ariel Weisberg

Hi,

With compaction there can be hot and cold data mixed together. So we want
to drop the data and then warm it via early opening so only the hot data is
in the cache.

Some of those cases are for the old sstable that have been rewritten or
discarded so the data is entirely defunct. The files might not get deleted
though so they do add pressure to the cache until they are evicted.

In the instance you are looking at in a tidier won't there always be a
reference held in the current view for the column family? It don't think it
would constantly be evicting them nor closing/reopening and remapping the
file.


Specifically regarding the behavior in different kernels, from `man
> posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
> this was interpreted literally as "zero bytes", rather than as meaning "all
> bytes through to the end of the file"."

Not ideal, but at least not actively harmful right? The cache is supposed
to be scan/flush resistant.

Ariel

On Tue, Oct 18, 2016 at 11:57 AM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> Right, so in SSTableReader#GlobalTidy$tidy it does:
> // don't ideally want to dropPageCache for the file until all instances
> have been released
> CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
> CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);
>
> It seems to me every time the reference is released on a new sstable we
> would immediately tidy() it and then call posix_fadvise with
> POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
> thinking is doing so in respect to the API behavior in modern Linux kernel
> builds?). Am I reading things correctly here? Sorta hard as there are many
> different code paths the reference could have tidy() called.
>
> Why would we want to drop the segment we just write from the page cache --
> wouldn't that most likely be the most hot data, and even if it turned out
> not to be wouldn't it be better in this case to have kernel be smart at
> what it's best at?
>
> best,
> kjellman
>
> > On Oct 18, 2016, at 8:50 AM, Jake Luciani  wrote:
> >
> > The main point is to avoid keeping things in the page cache that are no
> > longer needed like compacted data that has been early opened elsewhere.
> >
> > On Oct 18, 2016 11:29 AM, "Michael Kjellman" <
> mkjell...@internalcircle.com>
> > wrote:
> >
> >> We use posix_fadvise in a bunch of places, and in stereotypical
> Cassandra
> >> fashion no comments were provided.
> >>
> >> There is a check the OS is Linux (okay, a start) but it turns out the
> >> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> >> kernels. We don't check the kernel version -- or even note it.
> >>
> >> What is the *expected* outcome of our use of posix_fadvise -- not what
> >> does it do or not do today -- but what problem was it added to solve and
> >> what's the expected behavior regardless of kernel versions.
> >>
> >> best,
> >> kjellman
> >>
> >> Sent from my iPhone
>
>

Re: Use of posix_fadvise

2016-10-18 Thread Ariel Weisberg

Hi,

Compaction can merge some very large files together with data that may
be completely cold. So yeah caching the whole file just creates pressure
to evict useful stuff. In some theories.

In other theories the page cache is flush and scan resistant and should
just eat this stuff up without intervention. Sure it might hurt a bit,
but it's a bounded amount before the cache stops discarding useful stuff
in favor of new stuff that is unproven.

If there is a benchmark with this enabled/disabled I haven't seen it.
Doesn't mean it doesn't exist though.

Ariel
On Tue, Oct 18, 2016, at 12:05 PM, Michael Kjellman wrote:
> Within a single SegmentedFile?
> 
> On Oct 18, 2016, at 9:02 AM, Ariel Weisberg
> mailto:ariel.weisb...@datastax.com>> wrote:
> 
> With compaction there can be hot and cold data mixed together.
>

Re: Rough roadmap for 4.0

2016-11-15 Thread Ariel Weisberg

Hi,

I think one additional issue to add to the pile is CASSANDRA-7544 "Allow
storage port to be configurable per node"

I think no matter what we land on implementation wise it will only
possible to make this change in a major release as it will means change
to the system schema as well as internode messaging protocol. 

Having this properly done and reviewed by January is optimistic. I think
it wouldn't slip past February/March.

Ariel

On Tue, Nov 15, 2016, at 11:34 PM, Nate McCall wrote:
> Agreed. As long as we have a goal I don't see why we have to adhere to
> arbitrary date for 4.0.
> 
> On Nov 16, 2016 1:45 PM, "Aleksey Yeschenko" 
> wrote:
> 
> > I’ll comment on the broader issue, but right now I want to elaborate on
> > 3.11/January/arbitrary cutoff date.
> >
> > Doesn’t matter what the original plan was. We should continue with 3.X
> > until all the 4.0 blockers have been
> > committed - and there are quite a few of them remaining yet.
> >
> > So given all the holidays, and the tickets remaining, I’ll personally be
> > surprised if 4.0 comes out before
> > February/March and 3.13/3.14. Nor do I think it’s an issue.
> >
> > —
> > AY
> >
> > On 16 November 2016 at 00:39:03, Mick Semb Wever (m...@thelastpickle.com)
> > wrote:
> >
> > On 4 November 2016 at 13:47, Nate McCall  wrote:
> >
> > > Specifically, this should be "new stuff that could/will break things"
> > > given we are upping
> > > the major version.
> > >
> >
> >
> > How does this co-ordinate with the tick-tock versioning¹ leading up to the
> > 4.0 release?
> >
> > To just stop tick-tock and then say yeehaa let's jam in all the breaking
> > changes we really want seems to be throwing away some of the learnt wisdom,
> > and not doing a very sane transition from tick-tock to
> > features/testing/stable². I really hope all this is done in a way that
> > continues us down the path towards a stable-master.
> >
> > For example, are we fixing the release of 4.0 to November? or continuing
> > tick-tocks until we complete the 4.0 roadmap? or starting the
> > features/testing/stable branching approach with 3.11?
> >
> >
> > Background:
> > ¹) Sylvain wrote in an earlier thread titled "A Home for 4.0"
> >
> > > And as 4.0 was initially supposed to come after 3.11, which is coming,
> > it's probably time to have a home for those tickets.
> >
> > ²) The new versioning scheme slated for 4.0, per the "Proposal - 3.5.1"
> > thread
> >
> > > three branch plan with “features”, “testing”, and “stable” starting with
> > 4.0?
> >
> >
> > Mick
> >

Re: 3.10 release status: blocked on dtest

2017-01-07 Thread Ariel Weisberg

Hi,

When we say all tests passing it does seem like we are including the
upgrade tests, but there are some failures that don't seem to have
tickets blocking the release. It seems like we are also excluding any
tests decorated as resource intensive? There is also large_dtest,
novnode_dtest, and offheap_dtest which all have a few failing tests. I
think we should consider those as blockers as well.

The upgrade tests have this chicken and egg issue where they test the
previous current release against the in development release and if there
is a bug preventing upgrade you end up with a lot of failing tests that
continue to fail even after the bug is fixed.

There is a also a real bug fixed by
https://github.com/apache/cassandra/commit/c612cd8d7dbd24888c216ad53f974686b88dd601
that near as I can tell isn't included in 3.10. It will continue to fail
until we release a version that addresses the issue. It kind of makes
you think that if the current version fails, but  the in development
version passes we want a quick way of filtering out the failure.

I also found an issue with max version in the since decorator that
causes some of the upgrade tests to fail. I gave the fix for that to
Philip.

I haven't managed to get the upgrade tests passing after addressing the
above two issues, but it's always hard to tell what is my environment
and what is the tests.

Ariel

On Wed, Jan 4, 2017, at 01:46 PM, Michael Shuler wrote:
> Thanks! I think I was looking at a wrong JIRA, sorry 'bout that.
> 
> -- 
> Michael
> 
> On 01/04/2017 12:31 PM, Oleksandr Petrov wrote:
> > #13025 was updated yesterday. It just needs some feedback, but we know what
> > the problem is there.
> > 
> > On Wed, Jan 4, 2017 at 5:32 PM Michael Shuler 
> > wrote:
> > 
> >> On 12/20/2016 03:48 PM, Michael Shuler wrote:
> >>> Current release blockers in JIRA on the cassandra-3.11 branch are:
> >>>
> >>> https://issues.apache.org/jira/browse/CASSANDRA-12617
> >>> https://issues.apache.org/jira/browse/CASSANDRA-13058
> >>
> >> and https://issues.apache.org/jira/browse/CASSANDRA-13025
> >>
> >> CASSANDRA-13058 is unassigned, but was just updated (thanks Stefan!).
> >> The other tickets are assigned, but have not been updated in a while.
> >>
> >> JQL for 3.10:
> >>
> >> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20fixVersion%20%3D%203.10%20AND%20resolution%20%3D%20Unresolved
> >>
> >> --
> >> Kind regards,
> >> Michael
> >>
>

Re: Per blockng release on dtest

2017-01-10 Thread Ariel Weisberg

Hi,

At least some of those failures are real. I don't think we should
release 3.10 until the real failures are addressed. As I said earlier
one of them is a wrong answer bug that is not going to be fixed in 3.10.

Can we just ignore failures because we think they don't mean anything?
Who is going to check which of the 60 failures is real?

These tests were passing just fine at the beginning of December and then
commits happened and now the tests are failing. That is exactly what
their for. They are good tests. I don't think it matters if the failures
are "real" today because those are valid tests and they don't test
anything if they fail for spurious reasons. They are a critical part of
the Cassandra infrastructure as much as the storage engine or network
code.

In my opinion the tests need to be fixed and people need to fix them as
they break them and we need to figure out how to get from people
breaking them and it going unnoticed to they break it and then fix it in
a time frame that fits the release schedule.

My personal opinion is that releases are a reward for finishing the job.
Releasing without finishing the job creates the wrong incentive
structure for the community. If you break something you are no longer
the person that blocked the release you are just one of several people
breaking things without consequence.

I think that rapid feedback and triaging combined with releases blocked
by the stuff individual contributors have broken is the way to more
consistent releases both schedule wise and quality wise.

Regarding delaying 3.10? Who exactly is the consumer that is chomping at
the bit to get another release? One that doesn't reliably upgrade from a
previous version?

Ariel

On Tue, Jan 10, 2017, at 08:13 AM, Josh McKenzie wrote:
> First, I think we need to clarify if we're blocking on just testall +
> dtest
> or blocking on *all test jobs*.
> 
> If the latter, upgrade tests are the elephant in the room:
> http://cassci.datastax.com/view/cassandra-3.11/job/cassandra-3.11_dtest_upgrade/lastCompletedBuild/testReport/
> 
> Do we have confidence that the reported failures are all test problems
> and
> not w/Cassandra itself? If so, is that documented somewhere?
> 
> On Mon, Jan 9, 2017 at 7:33 PM, Nate McCall  wrote:
> 
> > I'm not sure I understand the culmination of the past couple of threads on
> > this.
> >
> > With a situation like:
> > http://cassci.datastax.com/view/cassandra-3.11/job/cassandra-3.11_dtest/
> > lastCompletedBuild/testReport/
> >
> > We have some sense of stability on what might be flaky tests(?).
> > Again, I'm not sure what our criteria is specifically.
> >
> > Basically, it feels like we are in a stalemate right now. How do we
> > move forward?
> >
> > -Nate
> >

Re: Wrapping up tick-tock

2017-01-10 Thread Ariel Weisberg

Hi,

With yearly releases trunk is going to be a mess when it comes time to
cut a release. Cutting releases is when people start caring whether all
the things in the release are in a finished state. It's when the state
of CI finally becomes relevant.

If we wait a year we are going to accumulate a years worth of unfinished
stuff in a single release. It's more expensive to context switch back
and then address those issues. If we put out large unstable releases it
means time until the features in the release are usable is pushed back
even further since it takes another 6-12 months for the release to
stabilize. Features introduced at the beginning of the cycle will have
to wait 18-24 months before anyone can benefit from them.

Is the biggest pain point with tick-tock just the elimination of long
term support releases? What is the pain point around release frequency?
Right now people should be using 3.0 unless they need a bleeding edge
feature from 3.X and those people will have to give up something to get
something.

Ariel

On Tue, Jan 10, 2017, at 10:29 AM, Jonathan Haddad wrote:
> I don't see why it has to be one extreme (yearly) or another (monthly).
> When you had originally proposed Tick Tock, you wrote:
> 
> "The primary goal is to improve release quality.  Our current major “dot
> zero” releases require another five or six months to make them stable
> enough for production.  This is directly related to how we pile features
> in
> for 9 to 12 months and release all at once.  The interactions between the
> new features are complex and not always obvious.  2.1 was no exception,
> despite DataStax hiring a full tme test engineering team specifically for
> Apache Cassandra."
> 
> I agreed with you at the time that the yearly cycle was too long to be
> adding features before cutting a release, and still do now.  Instead of
> elastic banding all the way back to a process which wasn't working
> before,
> why not try somewhere in the middle?  A release every 6 months (with
> monthly bug fixes for a year) gives:
> 
> 1. long enough time to stabilize (1 year vs 1 month)
> 2. not so long things sit around untested forever
> 3. only 2 releases (current and previous) to do bug fix support at any
> given time.
> 
> Jon
> 
> On Tue, Jan 10, 2017 at 6:56 AM Jonathan Ellis  wrote:
> 
> > Hi all,
> >
> > We’ve had a few threads now about the successes and failures of the
> > tick-tock release process and what to do to replace it, but they all died
> > out without reaching a robust consensus.
> >
> > In those threads we saw several reasonable options proposed, but from my
> > perspective they all operated in a kind of theoretical fantasy land of
> > testing and development resources.  In particular, it takes around a
> > person-week of effort to verify that a release is ready.  That is, going
> > through all the test suites, inspecting and re-running failing tests to see
> > if there is a product problem or a flaky test.
> >
> > (I agree that in a perfect world this wouldn’t be necessary because your
> > test ci is always green, but see my previous framing of the perfect world
> > as a fantasy land.  It’s also worth noting that this is a common problem
> > for large OSS projects, not necessarily something to beat ourselves up
> > over, but in any case, that's our reality right now.)
> >
> > I submit that any process that assumes a monthly release cadence is not
> > realistic from a resourcing standpoint for this validation.  Notably, we
> > have struggled to marshal this for 3.10 for two months now.
> >
> > Therefore, I suggest first that we collectively roll up our sleeves to vet
> > 3.10 as the last tick-tock release.  Stick a fork in it, it’s done.  No
> > more tick-tock.
> >
> > I further suggest that in place of tick tock we go back to our old model of
> > yearly-ish releases with as-needed bug fix releases on stable branches,
> > probably bi-monthly.  This amortizes the release validation problem over a
> > longer development period.  And of course we remain free to ramp back up to
> > the more rapid cadence envisioned by the other proposals if we increase our
> > pool of QA effort or we are able to eliminate flakey tests to the point
> > that a long validation process becomes unnecessary.
> >
> > (While a longer dev period could mean a correspondingly more painful test
> > validation process at the end, my experience is that most of the validation
> > cost is “fixed” in the form of flaky tests and thus does not increase
> > proportionally to development time.)
> >
> > Thoughts?
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
> >

1 2 >

1 - 100 of 169 matches

Mail list logo